-
Reshape Package in R
Reshape package is powerful tool for reshaping data and aggregation. Nice tutorial can be found at: http://www.jstatsoft.org/v21/i12/paper We’re going to use french_fries. But as it’s big, get some samples. Melt it into id and variables. Using cast, reshape the data to express (variable, repetition) as separate variables. In the formula, ‘…’ represents all the other…
Tags:
-
One Way Anova in R
This is follow up on the previous posting ANOVA IN R. In R, whether we have different number of repeated experiments does not matter when executing aov(). Given four different systems a1, a2, a3 and a4 and measurements, I’ll describe how we check if their means are different, how we get confidence interval at each…
Tags:
-
Rescaling Data using R
reshape package provides with convenient function called rescaler: There are other types of rescaling: range(to make data 0~1), rank(rank data by value), robust(use median and MAD instead of mean and sd). But just to remove mean without dividing data with any number, use scale:
Tags:
-
서평: GGPLOT2, Elegant Graphics for Data Analysis (Use R!)
GGPLOT2는 R을 위한 문법 기반의 그래픽 시스템입니다. 기본적으로 포함된 R의 plotting function 들이 보통 하나의 함수안에서 모든 기능을 다 넣기때문에 plot을 여러가지로 변형하거나 재사용하거나 확장하기가 어려웠던 반면, gpplot2는 graphics자체를 다시 생각하고 차트의 요소를 geom, statistics, scales, coordinate system, faceting, position, aesthetics등으로 분리했습니다. 그리고 각 차트는 이러한 요소들의 조합으로 그려지게 됩니다. 그렇기 때문에 각 요소를 손쉽게…
Tags:
-
Fork Join Framework in Java7
Java7 introduced interesting framework called fork join. It’s a mapreduce like lightweight parallel programming library. The basic idea is to split the work into multiple small parts. Process them, and merge the results into final return value. I’ve written a sample code that computes sum of integers in ArrayList: Fork-Join framework is interesting in some…
Tags:
-
Octave Tutorial
GNU Octave is a high-level interpreted language, primarily intended for numerical computations. (http://www.gnu.org/software/octave/) Here’s tutorial from Andrew Ng. He mentions that Octave is great prototyping language for implementing algorithm, and that Octave is superior to NumPy as Python is clunkier than Octave for ML programming purpose.
Tags:
-
서평: R을 이용한 누구나하는 통계 분석
R을 이용한 누구나 하는 통계분석은 책의 저자가 서문에 적었듯이 잘 만들어진 R cookbook입니다. 그렇기에 통계적 방법에 대한 설명이 체계적으로 나열되고, 결과에 대한 분석도 빠지지 않고 잘 설명되어있습니다. 개인적으로는 쿡북으로 유명한 오라일리에서 나온 R cookbook 책보다 훨씬 가치가 있다고 생각이 드는 책입니다. 특히 이 책의 전반부에서 나오는 다양한 환경에서의 평균 비교(paired, two-sample, non-parametric) 방법에 대한 구성이나…
Tags:
-
Scrypt – follow up on moderan password hashing algorithm
This is the follow up on the previous article: modern password hashing. In case you didn’t read it, bcrypt is slow hashing algorithm which is not vulnerable to rainbow table as it has built-in salt. Also, as it can be slowed down as much as you want, it can’t be broken even if computers get…
Tags:
-
Sentiment Analysis Resource
Sentiment Symposium Tutorial is a nice website with detailed explanation and even some codes. Thumbs up? Sentiment Classification using Machine Learning Techniques is a paper quoted +1700 times. These two are recommended reading material from nlp-class.org on sentiment analysis.
Tags:
-
Kappa for inter-rater agreement
Cohen’s kappa coefficient is a statistical measure of inter-rater agreement or inter-annotator agreement for qualitative (categorical) items. (See http://en.wikipedia.org/wiki/Cohen’s_kappa) Kappa is computed as: is observed prob. of agrement and is prob. of agreement by chance, i.e., is the chance of agreement assuming the independence of raters. So, the equation is looking at ‘prob. of observed…
Tags: