• Minkowski distance

    Minkowski distance is generalization of Euclidean distance, L1, and manhattan(or city block) distance, L2. I saw this equation before, but didn’t know its name yet, so I’m writing it here.

    Tags:

  • Java 7 Webcast

    http://www.oracle.com/us/corporate/events/java7/index.html The page also has pdf files on java7 new features. Among them fork-join is really exciting.  I’ll definitely use them in the next java project as much as I can.

    Tags:

  • Modern password hashing algorithms

    I found these good articles on modern algorithms for password hashing: http://www.f-secure.com/weblog/archives/00002095.html suggest the following three: • PBKDF2 • Bcrypt • PBMAC Most notably, they are intended to be slow (so that bruteforce takes long time) and prevent rainbow table attack. http://www.openwall.com/phpass/ is php implementation to use bcrypt. Read http://www.openwall.com/articles/PHP-Users-Passwords for explanation. Here’s another article…

    Tags:

  • Naive bayes with expectation maximization

    http://pages.cs.wisc.edu/~jerryzhu/cs769/em.pdfBy using EM algorithm, one can run naive bayes algorithm with partially labeled data(or with unlabeled data).

    Tags:

  • Pseudo Sigma

    While reading books on EDA(Exploratory Data Analysis), one of the interesting things was pseudo sigma. It’s a standard deviation like measure which is resistant to noises or outliers. Simply put, given the first quartile H1 and the third quartile H3, pseudo sigma is (H3-H1)/1.35. Why? It’s because H1= μ – 0.675σ and H3 = μ + 0.675σ if X…

    Tags:

  • Javascript Design Patterns

    Essential JavaScript Design Patterns For Beginners by Addy Osmani is a minibook published online. Let me quote ‘Revealing Module Pattern’ code snippet which I find very neat. Compare it with module pattern. I really enjoyed reading it. In addition to the design pattern article, you may like to read my articles on some javascript patterns:…

    Tags:

  • Spearman Correlation

    Spearman’s rank correlation coefficient is a non parametric measure of statistical dependence between variables. Unlike Pearson correlation coefficient which assumes linear dependence, Spearman’s rank correlation does not have such assumptions. It uses the same formula to get the coefficient, but it uses ‘rank(position in the descending order)’ of x and y.

    Tags:

  • Parzen Windows

    Parzen Windows(wiki) is a non-parametric density estimation method given samples. Though the name sounds scary (it was, at least to me), it’s not that complicated algorithm. It just considers samples near x for computing p(x). For example, to compute p(x=1), it considers samples near it, e.g., 0, 3, 4 while it ignores samples far from…

    Tags:

  • Google Storage for Developers

    클라우드가 인기를 끌면서 사람들은 애플만 이야기하지만, 실은 구글에는 이미 Google Storage for Developers 같은 포텐셜 넘치는 서비스가 있죠. 과금도 벌써 정해져 있습니다. 클라우드의 핵심 역활중 하나는 이처럼 개발자나 사업자가 scalable하게 자신의 서비스를 발전시켜나갈 수 있다는 점입니다. 오늘 서비스를 런칭하고 내일 사용자가 몰려 서버가 죽어나갈필요 없이 간단히 최소의 비용으로 용량을 확대해 나갈 수 있는 것이죠. 몇년전만해도…

    Tags:

  • Scalable debugging

    Debugging in the (Very) Large: Ten Years of Implementation and Experience Paper on Windows Error Reporting(WER) system. It has interesting features like: 1) Automatic bucketing of error reports based on heuristics at client and server side; ideally, reports on one bug are assigned to one bucket. 2) Progressive data collection from minimal dump to full;…

    Tags: