-
Poisson Process
Poisson process deals with occurrences that happen with low probability, and it connects possison distribution and exponential distribution nicely. According to wikipedia entry, http://en.wikipedia.org/wiki/Poisson_process: [quote] The basic form of Poisson process, often referred to simply as “the Poisson process”, is a continuous-time counting process {N(t), t ≥ 0} that possesses the following properties: N(0) = 0 Independent increments (the numbers of occurrences counted in disjoint…
Tags:
-
Book on Mining Massive Datasets
http://i.stanford.edu/~ullman/mmds.html It’s a free ebook in pdf. At first, I just shared the link on twitter without reading it. Later, after reading some chapters, I realized that this book covers algorithms for massive data sets really nicely. Some concepts, for example, combiner(ch2), shingling and minhashing (ch3), bloom filter(ch4), association rules(ch6), etc., are must know concept…
Tags:
-
Sampling with/without replacement
If samples are taken without replacement(i.e., sample taken is not put into the bin again) from population, then the sample follows hypergeometric distribution. In this case, each sample is not independent from each other. For example, if I take 1 out of the bin containing 1, 1, 2, 2, 3, then there’s no choice but…
Tags:
-
Hypothesis Testing History
Hypothesis testing was invented by Student (Student’s t-test) followed by Fisher, and then by Neyman and Pearson. The Fisher, Neyman-Pearson Theories of Testing Hypothesis: One Theory or Two by E. L. LEHMANN discuss on them.
Tags:
-
Linear Regression in R
Let’s build artificial data: Now, y has 3*x + error (which follows independent normal distribution.) Build a model: This shows that yhat = 3.0 * x – 1.168e-05. Let’s check the goodness of this model: Adjusted R-squared says that 100% of variation is explained by this linear regression, and we can see that t value…
Tags:
-
ANOVA is Linear Regression.
Why ANOVA and Linear Regression Are the Same AnalysisThey’re just the same thing presented differently. Testing if means of A, B, and C are different = Testing if a and b are statistically significant in Y=a*A+b*B+C.
Tags:
-
Generative model VS Discriminative model
bayesian – Generative vs discriminant models – Statistical Analysis – Stack Exchange
Tags:
-
Logistic Regression for iris in R
See which data has odds larger than 1. Checkout the model. References) 1. http://www.stat.cmu.edu/~cshalizi/490/clustering/clustering01.r
Tags:
-
Generating unique id
http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram Use time + shard id + autoincrement.
Tags:
-
Neural net for iris in R
iris3 looks like this: Merge them into two dimensional matrix: Now, ir looks like this: And we have total of 150 rows where 1:50 is Setosa, 51:100 is Vericolor, and 101:150 is Virginica. Represents that using class.ind: In this example, we split data into two. One is for training and the other is for testing:…
Tags: