Passion is like genius; a miracle.

Tag: statistics

Data Mining Course Material from psu

material.html Looks pretty readable and nice._________________________________________

November 5, 2011
Statistical data matching (or data fusion)

http://epp.eurostat.ec.europa.eu/cache/ITY_PUBLIC/NTTS2001/43.pdf Statistical method to combine data from different sources when exact matching of key is not feasible. For example, combine health data and income data from two different sources when SSN is not available because of privacy issue.

November 5, 2011
Proportion estimation in R

Given 1 (head of a coin) and 0(tail of a coin) as a sequence like 101011110101, …, we want to figure out the proportion of 1 in population, i.e., how likely is it to observe head for the given coin. Let X be the random variable where 1 means head and 0 means tail. If…

October 26, 2011
Poisson Process

Poisson process deals with occurrences that happen with low probability, and it connects possison distribution and exponential distribution nicely. According to wikipedia entry, http://en.wikipedia.org/wiki/Poisson_process: [quote] The basic form of Poisson process, often referred to simply as “the Poisson process”, is a continuous-time counting process {N(t), t ≥ 0} that possesses the following properties: N(0) = 0 Independent increments (the numbers of occurrences counted in disjoint…

October 21, 2011
Sampling with/without replacement

If samples are taken without replacement(i.e., sample taken is not put into the bin again) from population, then the sample follows hypergeometric distribution. In this case, each sample is not independent from each other. For example, if I take 1 out of the bin containing 1, 1, 2, 2, 3, then there’s no choice but…

October 18, 2011
Hypothesis Testing History

Hypothesis testing was invented by Student (Student’s t-test) followed by Fisher, and then by Neyman and Pearson. The Fisher, Neyman-Pearson Theories of Testing Hypothesis: One Theory or Two by E. L. LEHMANN discuss on them.

October 11, 2011
Linear Regression in R

Let’s build artificial data: Now, y has 3*x + error (which follows independent normal distribution.) Build a model: This shows that yhat = 3.0 * x – 1.168e-05. Let’s check the goodness of this model: Adjusted R-squared says that 100% of variation is explained by this linear regression, and we can see that t value…

October 7, 2011
ANOVA is Linear Regression.

Why ANOVA and Linear Regression Are the Same AnalysisThey’re just the same thing presented differently. Testing if means of A, B, and C are different = Testing if a and b are statistically significant in Y=a*A+b*B+C.

October 7, 2011
Generative model VS Discriminative model

bayesian – Generative vs discriminant models – Statistical Analysis – Stack Exchange

October 6, 2011
Logistic Regression for iris in R

See which data has odds larger than 1. Checkout the model. References) 1. http://www.stat.cmu.edu/~cshalizi/490/clustering/clustering01.r

October 4, 2011