Passion is like genius; a miracle. – Page 23 – Blog on Software, Statistics, and Quant

Estimating time for training

Training time becomes an issue, esp., if someone is using complicated model like svm with huge data. When that happens, it’s critical to estimate when the training will end. I’ve written a simple R code to do that efficiently (in terms of time but not in terms of accuracy). Here’s the graph:

July 20, 2012

Tags:

statistics
확률 분포간의 관계

http://www.johndcook.com/distribution_chart.html 확률분포간의 관계. 정말 잘 정리되어있네요.

July 19, 2012

Tags:

statistics
My .tmux.conf

어쩌다보니 screen대신 이걸 쓰게 되었네요.

July 14, 2012

Tags:

software
SVM for iris in R

Here’s output.

July 11, 2012

Tags:

statistics
Installing Rattle on OSX

If you have trouble because of RGtk2 (e.g., it’s saying you don’t have gtk2.8.0 or higher while you have one already). If so, follow this procedure. (Note: Don’t use macports! It won’t work with RGtk2 properly.) Instead, install R from r-project.org. This is pretty much important. Don’t install R from macports. Install R from http://www.r-project.org/.…

July 8, 2012

Tags:

statistics
Random Projection for Dimensionality Reduction

Random projection in dimensionality reduction: Applications to image and text data This is really easy way of dimensionality reduction. Simply, multiply data with random matrix where is a random number from . If is a dxN dimension where d is very high dimension and N is the number of data and is a kxd dimension…

July 4, 2012

Tags:

statistics
Effect Size

It’s the Effect Size, Stupid http://en.wikipedia.org/wiki/Effect_size Statistical significance를 사용하여 두 그룹간의 차이를 검증할때에는 두 그룹간의 차이뿐만 아니라 샘플의 크기에도 영향을 받습니다. 왜냐하면 샘플의 크기가 커질수록 confidence interval 이 줄어들기 때문입니다. 또 어찌되었던지간에 statistical significance라는건 두 그룹간에 차이가 실제로는 없는데 우연히 차이가 발생할 확률로부터 계산하는데 이 값은 두 그룹간의 차이가 얼마나 큰가를 직접적으로 평가하는 것이 아닙니다.…

July 2, 2012

Tags:

statistics
SVM User Guide

Pretty much readable document on SVM: A User’s Guide to Support Vector Machines. It’s a doc posted on python based machine learning tools: PyML.

June 17, 2012

Tags:

statistics
My nlp-class.org certification

After finishing 8-week of homeworks, lectures, and programming, I finally received my statement of accomplishment from nlp class, coursera. Now, I’m taking http://ml-class.org/, and wish more people can take these special opportunities.

June 13, 2012

Tags:

statistics
서평: Machine Learning for Hackers

최근에 Oreilly에서 나온 Machine Learning for Hackers를 읽었습니다. 일단 이 책은 R을 어느정도 아는 독자를 대상으로 machine learning을 가르쳐주는 컨셉의 책으로 시작했을것이라고 추측이되지만, 결과적으로는 거의 모든것을 이미 알고있는 독자만 읽을 수 있습니다. 구체적으로는 이런 스킬셋이 필요합니다. 1) ggplot에 익숙해야함. 2) R을 이해하고 있어야함. 3) Machine learning이나 statistics에 지식이 있어야함. 책에서 설명하는 모든 알고리즘을 이미 알고…

June 13, 2012

Tags:

statistics