Passion is like genius; a miracle. – Page 35 – Blog on Software, Statistics, and Quant

Hotmail이 ×10 성능개선한 방법

http://windowsteamblog.com/windows_live/b/windowslive/archive/2011/06/30/instant-email-how-we-made-hotmail-10x-faster.aspx 페이지 사이즈 줄이기, 캐싱, preloading, 비동기처리가 핵심이네요.

July 2, 2011

Tags:

software
Kaggle – data prediction competition

topcoder가 알고리즘 프로그래밍 경진대회라면 Kaggle은 데이터를 주고, 이를 잘 설명하는 모델을 만드는 대회입니다. 수십일 정도 기간이 주어지기도 하고, 지금 올라온 것 중엔 21개월 남은 것도 있네요. 상금도 준다는게 특이한데, 21개월 남은 병원 데이터 관련 내용인데 무려 상금이 3M$. 언젠가 저도 (등수에는 못들겠지만) 출전해보려고 합니다.

June 28, 2011

Tags:

software
Reading : On Chomsky and the Two Cultures of Statistical Learning

http://norvig.com/chomsky.html 얼마전 chomsky가 통계에 기반한 기계학습 기법을, 마치 꿀벌들이 움직이는 모습을 그 의미를 모른채 따라하는것과 같다고 비판하였습니다. (아시다시피 chomsky는 Universal grammar 라는 문법체계를 주장하죠) 그에 대해 peter norvig이 내놓은 답입니다. (역시 아시다시피 통계에 기반한 기계학습 전문가죠) 어느쪽이 옳은가에 앞서 어쨌든 흥미롭게 읽을만한 글입니다. 얼마전 올린 기계학습의 두 부류에 대한 논문과도 맣닿아있구요. 추천!

May 28, 2011

Tags:

software
Some papers on Google technolgy

Dremel: Interactive Analysis of Web-Scale Datasets Tool for analyzing lots of data in interactive way. Nested columnar storage is presented. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation Experiment environment at Google. Bigtable: A Distributed Storage System for Structured Data Paper on bigtable. This is surely the best introduction on this topic.

May 27, 2011

Tags:

software
Data Mining Map

http://chem-eng.utoronto.ca/~datamining/dmc/data_mining_map.htm 마이닝 알고리즘이 정말 잘 분류가 되어있네요. 알고리즘 적용시에 체계적으로 접근하는데 매우 유용할듯.

May 22, 2011

Tags:

software
Unbiased estimators and consistent estimators

http://www.johndcook.com/bias_consistency.html Unbiased: estimate theta multiple times, then their expected average is the parameter of population. Consistent: estimator converges to parameter of the population as the sample size got bigger.

May 21, 2011

Tags:

software
PhoneGap

http://www.phonegap.com/about/ 아니 이런 쿨한 녀석이 있군요. 자바스크립트로 네이티브 api불러가며 코딩한뒤 다수의 모바일 폰에 deploy!

May 20, 2011

Tags:

software
Data Modeling vs Algorithmic Modeling

Statistical modeling: The two cultures I like review articles esp. because I’m still learning machine learning & statistics. This article discussed the reason why the author thinks statistics didn’t play much role in machine learning. Here’s author’s arguments in the article which I think interesting to read: 1) Standard tests of goodness-of-fit did not reject…

May 4, 2011

Tags:

software
Document Similarity and Containment

On the Resemblance and Containment of Documents Very popular article on document similarity and containment (Cited 528 times according to Google). For similarity, minhash I’ve already posted here is discussed. For containment (document A is contained in B), authors suggest to extract shingles which satisfies 0 mod m, i.e., shingles whose remainder is zero when…

April 19, 2011

Tags:

software
Introducing CityHash – Google Open Source Blog

http://google-opensource.blogspot.com/2011/04/introducing-cityhash.html 기존 알고리즘보다 최대 두배빠른 string hash.

April 19, 2011

Tags:

software