• Folded normal distribution – Wikipedia, the free encyclopedia

    http://en.m.wikipedia.org/wiki/Folded_normal_distribution If X is a random variable from normal distribution, then |x| follows folded normal distribution. Folding can happen anywhere. But if the folding is done where pdf is 0.5, it’s called half normal distribution.

    Tags:

  • Neuralnet for XOR

    Let’s use caret to find out the better # of hidden nodes. In the below, I needed many data so that default sampling method, i.e., k-fold CV, can have enough data in it. (i.e., if k=5 or 10, how can we run k-fold using just 4 data rows?) We may choose to instantiate trainControl, but…

    Tags:

  • Caret package in R

    http://caret.r-forge.r-project.org/Classification_and_Regression_Training.html The caret package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for: • data splitting • pre-processing • model tuning using resampling • variable importance estimation as well as other functionality. There are many different modeling functions…

    Tags:

  • Permutation Test

    Permutation test is a way of getting p value using randomization without assuming a certain distribution of data. The basic idea is simple. Suppose that we want to see if y = ax + b + error holds where x is 0 or 1. In other words, we’re interested if mean of y differs depending…

    Tags:

  • R^2 without intercept is not what you want

    In R, gives you this example: But the doc does not explain the difference between lm.D9 and lm.D90. Their difference is that lm.D9 has intercept (like weight = intercept + beta * group) while lm.D90 does not (weight = beta * group). But this is only small part of the difference. If you look at…

    Tags:

  • error != residual

    Errors and residuals in statistics – Wikipedia, the free encyclopedia [quote]The error of a sample is the deviation of the sample from the (unobservable) true function value, while the residual of a sample is the difference between the sample and the estimated function value.[/quoye]

    Tags:

  • Multiple comparison in R

    R: Adjust P-values for Multiple Comparisons Adjust p value using, e.g., Bonferroni correction in mutiple comparisons.

    Tags:

  • Guessing user profile in social network

    http://www.ccs.neu.edu/home/amislove/publications/Inferring-WSDM.pdf Friends share social attribute, e.g., school. Thus, even if you hide your profile, it could be predicted from your friends’  profile. This is so true… When I was looking for people to follow on twitter, I started from some engineers I know of. After some time, I was able to follow many people working…

    Tags:

  • Scorecard is logistic regression

    Scorecard is a table to compute, for example, credit score of a person. For example, add 10 if age < 30, add 20 if age < 40, add 30 if age <40, add 10 if he/she does not own a house, and add 20 if he/she owns a house. The grand sum of this process…

    Tags:

  • svm tool

    LIBSVM — A Library for Support Vector Machines includes a introduction for beginner and pythin tool.

    Tags: