Tag: statistics

  • Rattle for exploration of variables

    I’m reading a book on rattle. In rattle, data exploration is easy. To see pairs (or splom) plot, select explore tab then click execute. Following is the output for weather dataset. There’s histogram in the diagonal. Upper right side has correlation in numbers. Lower left has scatter plot with smoothing lines. To see correlation, check…

  • Random forest for variable selection

    Package randomForest has importance() to estimate the importance of variables. The example in the reference manual has this: In importance(), type=1 shows mean squared error increase if each variable is removed from the predictors. Type 2 shows increase in node impurity averaged over all trees. To visualize: To get the top three important variables: Thus…

  • ROC graph 101

    Tom Fawcet, ROC Graphs: Notes and Practical Considerations for Data Mining Researcher, HP Labs Technical Reports, 2003. This is a paper on the ROC graph, and I really enjoyed reading it. Though many ‘introduction to machine learning’ books describe ROC curve, none of them could explain it in this much depth. Starting from algorithms to…

  • Tweaking bayes theorem

    Tweaking Bayes’ Theorem This is my own trial to explain the tweak mentioned in the above link. In the video, what we want is to find the best english text for the given foreign text, and it can be written as: For the purpose of finding english text, ignore Pr(f), i.e.,: What’s pointed out as…

  • T-test for comparing means

    Before we get started, open up Gaussian distribution and Chi, t, F distributions if you need some reference on the math. One sample t-test If we don’t know variance of population (that is usually the case), for : where s is standard deviation of samples and n is the number of samples. As an example,…

  • Partition plot of decision tree

    This is slightly modified version of example in tree package reference manual. Partition plot can draw a diagram representing leaves of a tree. Then we get a plot like: Reference) http://cran.r-project.org/web/packages/tree/tree.pdf

  • “tree” and “rpart” in R

    https://stat.ethz.ch/pipermail/r-help/2005-May/070922.html https://stat.ethz.ch/pipermail/r-help/2001-July/014175.html Use rpart instead of tree for decision tree in R.

  • Tree-Based Models

    http://www.statmethods.net/advstats/cart.html Classification/Regression tree, conditional inference tree, and random forest.

  • Strata 2012 Proceedings

    http://strataconf.com/strata2012/public/schedule/proceedings Thank you Oreilly and Strata.

  • Automatic machine learning

    Google has an interesting automatic prediction API: https://developers.google.com/prediction/ It has an easy to follow hello world which predicts the language(ENGLISH/SPANISH/FRENCH) of the given sentence: https://developers.google.com/prediction/docs/hello_world In the hello world example, one thing that was confusing was ‘Switching to private mode’. For that, you just need to turn on OAuth 2.0 on the top right of…