• Sentiment Analysis Resource

    Sentiment Symposium Tutorial is a nice website with detailed explanation and even some codes. Thumbs up? Sentiment Classification using Machine Learning Techniques is a paper quoted +1700 times. These two are recommended reading material from nlp-class.org on sentiment analysis.

    Tags:

  • Kappa for inter-rater agreement

    Cohen’s kappa coefficient is a statistical measure of inter-rater agreement or inter-annotator agreement for qualitative (categorical) items. (See http://en.wikipedia.org/wiki/Cohen’s_kappa) Kappa is computed as: is observed prob. of agrement and is prob. of agreement by chance, i.e., is the chance of agreement assuming the independence of raters. So, the equation is looking at ‘prob. of observed…

    Tags:

  • Rattle for exploration of variables

    I’m reading a book on rattle. In rattle, data exploration is easy. To see pairs (or splom) plot, select explore tab then click execute. Following is the output for weather dataset. There’s histogram in the diagonal. Upper right side has correlation in numbers. Lower left has scatter plot with smoothing lines. To see correlation, check…

    Tags:

  • Random forest for variable selection

    Package randomForest has importance() to estimate the importance of variables. The example in the reference manual has this: In importance(), type=1 shows mean squared error increase if each variable is removed from the predictors. Type 2 shows increase in node impurity averaged over all trees. To visualize: To get the top three important variables: Thus…

    Tags:

  • Introduction to Information Retrieval 무료 ebook

    http://nlp.stanford.edu/IR-book/ 정말 좋은 무료 ir book. 예를들어 ‘stopword는 색인안할것이다’같은 저의 오해를 깔끔히 깨주었습니다.

    Tags:

  • ROC graph 101

    Tom Fawcet, ROC Graphs: Notes and Practical Considerations for Data Mining Researcher, HP Labs Technical Reports, 2003. This is a paper on the ROC graph, and I really enjoyed reading it. Though many ‘introduction to machine learning’ books describe ROC curve, none of them could explain it in this much depth. Starting from algorithms to…

    Tags:

  • SWIRL 2012 – Future of IR

    http://www.cs.rmit.edu.au/swirl12/discussion.php IR 관련 방향을 제시하는 중요논문들을 모아놓은 페이지입니다. 참가자들에게 숙제로 3개씩 추천을 받았나보네요. 다른데 볼거없이 이것만 읽어봐도 좋을듯.

    Tags:

  • Tweaking bayes theorem

    Tweaking Bayes’ Theorem This is my own trial to explain the tweak mentioned in the above link. In the video, what we want is to find the best english text for the given foreign text, and it can be written as: For the purpose of finding english text, ignore Pr(f), i.e.,: What’s pointed out as…

    Tags:

  • Online regular expression testing

    http://regexpal.com/ Nice online tool for testing regexp. Still, I wish it had the ability to print captured groups.

    Tags:

  • T-test for comparing means

    Before we get started, open up Gaussian distribution and Chi, t, F distributions if you need some reference on the math. One sample t-test If we don’t know variance of population (that is usually the case), for : where s is standard deviation of samples and n is the number of samples. As an example,…

    Tags: