Tag: statistics
-
Data Mining Course Material from psu
material.html Looks pretty readable and nice._________________________________________
-
Statistical data matching (or data fusion)
http://epp.eurostat.ec.europa.eu/cache/ITY_PUBLIC/NTTS2001/43.pdf Statistical method to combine data from different sources when exact matching of key is not feasible. For example, combine health data and income data from two different sources when SSN is not available because of privacy issue.
-
Proportion estimation in R
Given 1 (head of a coin) and 0(tail of a coin) as a sequence like 101011110101, …, we want to figure out the proportion of 1 in population, i.e., how likely is it to observe head for the given coin. Let X be the random variable where 1 means head and 0 means tail. If…
-
Poisson Process
Poisson process deals with occurrences that happen with low probability, and it connects possison distribution and exponential distribution nicely. According to wikipedia entry, http://en.wikipedia.org/wiki/Poisson_process: [quote] The basic form of Poisson process, often referred to simply as “the Poisson process”, is a continuous-time counting process {N(t), t ≥ 0} that possesses the following properties: N(0) = 0 Independent increments (the numbers of occurrences counted in disjoint…
-
Sampling with/without replacement
If samples are taken without replacement(i.e., sample taken is not put into the bin again) from population, then the sample follows hypergeometric distribution. In this case, each sample is not independent from each other. For example, if I take 1 out of the bin containing 1, 1, 2, 2, 3, then there’s no choice but…
-
Hypothesis Testing History
Hypothesis testing was invented by Student (Student’s t-test) followed by Fisher, and then by Neyman and Pearson. The Fisher, Neyman-Pearson Theories of Testing Hypothesis: One Theory or Two by E. L. LEHMANN discuss on them.
-
Linear Regression in R
Let’s build artificial data: Now, y has 3*x + error (which follows independent normal distribution.) Build a model: This shows that yhat = 3.0 * x – 1.168e-05. Let’s check the goodness of this model: Adjusted R-squared says that 100% of variation is explained by this linear regression, and we can see that t value…
-
ANOVA is Linear Regression.
Why ANOVA and Linear Regression Are the Same AnalysisThey’re just the same thing presented differently. Testing if means of A, B, and C are different = Testing if a and b are statistically significant in Y=a*A+b*B+C.
-
Generative model VS Discriminative model
bayesian – Generative vs discriminant models – Statistical Analysis – Stack Exchange
-
Logistic Regression for iris in R
See which data has odds larger than 1. Checkout the model. References) 1. http://www.stat.cmu.edu/~cshalizi/490/clustering/clustering01.r