Passion is like genius; a miracle.

Tag: statistics

PCA tutorial

http://www.snl.salk.edu/~shlens/pca.pdf The easiest explanation on PCA among several documents I’ve read. It’s written by Jonathon Shlens.

December 27, 2011
Folded normal distribution – Wikipedia, the free encyclopedia

http://en.m.wikipedia.org/wiki/Folded_normal_distribution If X is a random variable from normal distribution, then |x| follows folded normal distribution. Folding can happen anywhere. But if the folding is done where pdf is 0.5, it’s called half normal distribution.

December 27, 2011
Neuralnet for XOR

Let’s use caret to find out the better # of hidden nodes. In the below, I needed many data so that default sampling method, i.e., k-fold CV, can have enough data in it. (i.e., if k=5 or 10, how can we run k-fold using just 4 data rows?) We may choose to instantiate trainControl, but…

December 22, 2011
Caret package in R

http://caret.r-forge.r-project.org/Classification_and_Regression_Training.html The caret package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for: • data splitting • pre-processing • model tuning using resampling • variable importance estimation as well as other functionality. There are many different modeling functions…

December 21, 2011
Permutation Test

Permutation test is a way of getting p value using randomization without assuming a certain distribution of data. The basic idea is simple. Suppose that we want to see if y = ax + b + error holds where x is 0 or 1. In other words, we’re interested if mean of y differs depending…

December 21, 2011
R^2 without intercept is not what you want

In R, gives you this example: But the doc does not explain the difference between lm.D9 and lm.D90. Their difference is that lm.D9 has intercept (like weight = intercept + beta * group) while lm.D90 does not (weight = beta * group). But this is only small part of the difference. If you look at…

December 19, 2011
error != residual

Errors and residuals in statistics – Wikipedia, the free encyclopedia [quote]The error of a sample is the deviation of the sample from the (unobservable) true function value, while the residual of a sample is the difference between the sample and the estimated function value.[/quoye]

December 19, 2011
Multiple comparison in R

R: Adjust P-values for Multiple Comparisons Adjust p value using, e.g., Bonferroni correction in mutiple comparisons.

December 4, 2011
Scorecard is logistic regression

Scorecard is a table to compute, for example, credit score of a person. For example, add 10 if age < 30, add 20 if age < 40, add 30 if age <40, add 10 if he/she does not own a house, and add 20 if he/she owns a house. The grand sum of this process…

November 21, 2011
svm tool

LIBSVM — A Library for Support Vector Machines includes a introduction for beginner and pythin tool.

November 15, 2011