Tag: statistics
PCA tutorial
http://www.snl.salk.edu/~shlens/pca.pdf The easiest explanation on PCA among several documents I’ve read. It’s written by Jonathon Shlens.
Folded normal distribution – Wikipedia, the free encyclopedia
http://en.m.wikipedia.org/wiki/Folded_normal_distribution If X is a random variable from normal distribution, then |x| follows folded normal distribution. Folding can happen anywhere. But if the folding is done where pdf is 0.5, it’s called half normal distribution.
Neuralnet for XOR
Let’s use caret to find out the better # of hidden nodes. In the below, I needed many data so that default sampling method, i.e., k-fold CV, can have enough data in it. (i.e., if k=5 or 10, how can we run k-fold using just 4 data rows?) We may choose to instantiate trainControl, but…
Caret package in R
http://caret.r-forge.r-project.org/Classification_and_Regression_Training.html The caret package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for: • data splitting • pre-processing • model tuning using resampling • variable importance estimation as well as other functionality. There are many different modeling functions…
Permutation Test
Permutation test is a way of getting p value using randomization without assuming a certain distribution of data. The basic idea is simple. Suppose that we want to see if y = ax + b + error holds where x is 0 or 1. In other words, we’re interested if mean of y differs depending…
R^2 without intercept is not what you want
In R, gives you this example: But the doc does not explain the difference between lm.D9 and lm.D90. Their difference is that lm.D9 has intercept (like weight = intercept + beta * group) while lm.D90 does not (weight = beta * group). But this is only small part of the difference. If you look at…
error != residual
Errors and residuals in statistics – Wikipedia, the free encyclopedia [quote]The error of a sample is the deviation of the sample from the (unobservable) true function value, while the residual of a sample is the difference between the sample and the estimated function value.[/quoye]
Multiple comparison in R
R: Adjust P-values for Multiple Comparisons Adjust p value using, e.g., Bonferroni correction in mutiple comparisons.
Scorecard is logistic regression
Scorecard is a table to compute, for example, credit score of a person. For example, add 10 if age < 30, add 20 if age < 40, add 30 if age <40, add 10 if he/she does not own a house, and add 20 if he/she owns a house. The grand sum of this process…
svm tool
LIBSVM — A Library for Support Vector Machines includes a introduction for beginner and pythin tool.