Passion is like genius; a miracle. – Page 36 – Blog on Software, Statistics, and Quant

Concurrency는 왜 중요한가

http://www.gotw.ca/publications/concurrency-ddj.htm Moore의 법칙은 한계에 도달할 것이며 이미 clock speed의 향상이 느려지고 있다. CPU개발자들은 이제 cache 또는 multicore에 의존한 성능 향상을 노리고 있으며, 이에따라 기존의 single thread, single process에 기반한 프로그래밍은 한계에 도달할 것이다. 90년대에 OOP가 구조적 프로그래밍을 대체하였듯이, 마찬가지로 concurrent programming이 기존의 패러다임을 대체하게 될것이다. 더구나 다른 하드웨어(네트워크, 디스크)보다 CPU가 더 빨리 성능 한계에 도달하고…

April 16, 2011

Tags:

software
Extracting article text from HTML documents

트위터나 페이스북이 이제 RSS 리더의 기능을 대체해나가면서 (물론 digg나 reddit, hacker news, /. 도 점차 대체해나가겠죠), FlipBoard(share된 링크로부터 웹사이트의 컨텐츠를 정리해서 보여줌), Readability, Instapaper 같은 앱/웹앱이 히트를 치면서, 몇몇 사람들은 이미 깨닫기 시작한 변화 중 한가지는 이제는 더이상 제공되는 RSS를 통하지 않아도 웹페이지에서 컨텐츠를 분석해서 보여줄 수 있게 되었다는 점입니다. 이런 민간인(?)은 잘 모르는 서비스…

April 1, 2011

Tags:

software
JavaScript ( (__ = !$ + $)[+$] + ({} + $)[_/_] +({} + $)[_/_] )

http://adamcecc.blogspot.com/2011/01/javascript.html 자바스크립트에서 xss 블랙리스팅을 bypass 하는 테크닉입니다. 온갖 종류의 security hole이 새로이 오픈되는듯…

January 25, 2011

Tags:

software
Multiple Comparison (or Multiple Testing) Issue

http://en.wikipedia.org/wiki/Multiple_comparisons One practical example of this. Suppose that you compare search quality of two search engines multiple times: one is good engine while the other is bad. If you compare good engine and bad engine multiple times, you’ll observe bad wins in a comparison simply by accumulated statistical testing errors while it loses for 99…

December 8, 2010

Tags:

software
German Tank Problem

http://en.wikipedia.org/wiki/German_tank_problem How to estimate # of products from observed serial numbers. Let be the number of samples and be the maximum serial number. Then, . Because underestimates , we compensate it by adding the number of missing samples between adjacent observed samples. Also, -1 because we don’t count sample itself when considering missing samples. Given…

November 24, 2010

Tags:

software
Peer-to-Peer System Algorithms

Peer-to-Peer Systems Really well written tutorial on the field. Best among the ones I’ve read so far. Includes applications, membership, distributed state, content distribution and challenges.

November 9, 2010

Tags:

software
URL에 사용되는 특수문자 #!

요즘 몇몇 웹사이트들이 다음과 같은 URL패턴을 사용하기 시작했습니다. http://twitter.com/#!/search/twitter 혹 모르고 지나가고 계신분들을 위해 이 URL에 있는 #!의 의미를 설명드릴까 합니다. 1) # #은 브라우저가 리로딩 없이 자바스크립트를 불러오기위한 방법입니다. 동시에 URL을 브라우저 히스토리에 남깁니다. 예를들어, http://twitter.com/# 을 방문한다음 http://twitter.com/#!/search/twitter 링크를 클릭하면 이 URL은 (#이 있으므로) 서버로 요청이 전송되지 않습니다. 그러나 URL이 이렇게 change되면 url…

October 27, 2010

Tags:

software
비추정, 회귀추정, 계통추출법

데이터가 많이 있을때 이 중 약간의 샘플을 추출하여 그 샘플의 성격을 평가하는 일은 이바닥(?)에서 종종 있는 일입니다. 다들 층화추출(유사한 성격의 데이터를 그룹지어놓고, 각 그룹에서 랜덤으로 샘플을 뽑는 방식으로 전체 데이터의 샘플을 구하면 좋은 샘플이 나온다)는 많이 들어보셨을테니, 여기서는 그 외의 기법들에 대해 적어보겠습니다. * 비 추정(ratio estimation) 예를들어, 한국 전체 가정의 평균 외식비를 알고 싶다고…

October 24, 2010

Tags:

software
CAP

http://codahale.com/you-cant-sacrifice-partition-tolerance/ Explains CAP theorem briefly and asks the change of the focus to yield(lower uptime?) vs harvest(stale data?). I really enjoyed the reading – esp. the short summarization of the Gilbert and Lynch’s paper.

October 8, 2010

Tags:

software
채교수의 통계학/확률모형 강의

받아놓고 하나도 읽지 않고 있었지만 좋은 자료입니다. ^^ ANOVA의 원리가 뭘까 궁금해하고 있었는데, 통계학강의에 이미 한 챕터가 ANOVA강의군요.. 채교수의 통계학 강의 채교수의 확률 모형 강의

October 3, 2010

Tags:

software