Passion is like genius; a miracle. – Page 31 – Blog on Software, Statistics, and Quant

Folded normal distribution – Wikipedia, the free encyclopedia

http://en.m.wikipedia.org/wiki/Folded_normal_distribution If X is a random variable from normal distribution, then |x| follows folded normal distribution. Folding can happen anywhere. But if the folding is done where pdf is 0.5, it’s called half normal distribution.

December 27, 2011

Tags:

statistics
Neuralnet for XOR

Let’s use caret to find out the better # of hidden nodes. In the below, I needed many data so that default sampling method, i.e., k-fold CV, can have enough data in it. (i.e., if k=5 or 10, how can we run k-fold using just 4 data rows?) We may choose to instantiate trainControl, but…

December 22, 2011

Tags:

statistics
Caret package in R

http://caret.r-forge.r-project.org/Classification_and_Regression_Training.html The caret package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for: • data splitting • pre-processing • model tuning using resampling • variable importance estimation as well as other functionality. There are many different modeling functions…

December 21, 2011

Tags:

statistics
Permutation Test

Permutation test is a way of getting p value using randomization without assuming a certain distribution of data. The basic idea is simple. Suppose that we want to see if y = ax + b + error holds where x is 0 or 1. In other words, we’re interested if mean of y differs depending…

December 21, 2011

Tags:

statistics
R^2 without intercept is not what you want

In R, gives you this example: But the doc does not explain the difference between lm.D9 and lm.D90. Their difference is that lm.D9 has intercept (like weight = intercept + beta * group) while lm.D90 does not (weight = beta * group). But this is only small part of the difference. If you look at…

December 19, 2011

Tags:

statistics
error != residual

Errors and residuals in statistics – Wikipedia, the free encyclopedia [quote]The error of a sample is the deviation of the sample from the (unobservable) true function value, while the residual of a sample is the difference between the sample and the estimated function value.[/quoye]

December 19, 2011

Tags:

statistics
Multiple comparison in R

R: Adjust P-values for Multiple Comparisons Adjust p value using, e.g., Bonferroni correction in mutiple comparisons.

December 4, 2011

Tags:

statistics
Guessing user profile in social network

http://www.ccs.neu.edu/home/amislove/publications/Inferring-WSDM.pdf Friends share social attribute, e.g., school. Thus, even if you hide your profile, it could be predicted from your friends’ profile. This is so true… When I was looking for people to follow on twitter, I started from some engineers I know of. After some time, I was able to follow many people working…

December 3, 2011

Tags:

software
Scorecard is logistic regression

Scorecard is a table to compute, for example, credit score of a person. For example, add 10 if age < 30, add 20 if age < 40, add 30 if age <40, add 10 if he/she does not own a house, and add 20 if he/she owns a house. The grand sum of this process…

November 21, 2011

Tags:

statistics
svm tool

LIBSVM — A Library for Support Vector Machines includes a introduction for beginner and pythin tool.

November 15, 2011

Tags:

statistics