Passion is like genius; a miracle. – Page 21 – Blog on Software, Statistics, and Quant

C++에서 const Klass&반환값 형태의 단점들.

논의하고 싶은 상황은 예를들면 아래와 같은 경우입니다. foo()는 Klass을 반환해야할까요 아니면 const Klass&을 반환해야할까요? const Klass& 형태의 리턴을 원하는 까닭은 당연하게도 퍼포먼스입니다. 그러나 Klass을 반환해야하는 이유는 더 많습니다. const reference가 아니라 value를 반환해야하는 이유 만약 foo()의 구현이 내부에서 복잡한 연산을 한다음 Klass를 리턴하는 것이라면 Klass 복사하는 것을 없애는 것이 전체 실행시간에 큰 영향이 없습니다. 특히…

November 25, 2012

Tags:

software
(무료도서) R을 이용한 데이터 분석 실무

R을 배우기 시작한지 시간도 좀 지났고해서 블로그 글을 꾸준히 쓴는 것도 좋지만 좀 더 잘 정리해보자는 생각이 들었습니다. 그래서 몇달전부터 latex을 붙잡고 열심히 씨름한 결과 공개해도 괜찮을 정도의 분량의 문서 작성이 끝났습니다. http://r4pda.co.kr/에 ‘R을 이용한 데이터 분석 실무’라는 제목으로 책을 올렸습니다. 이 책의 컨셉은 어느정도 프로그래밍도 되고, 통계나 머신 러닝기법에 대한 개념이 있는 분이 손쉽게…

November 24, 2012

Tags:

statistics
My certification on Mathematical Biostatistics Bootcamp

This is earned from coursera.org course. It is a basic statistics course, but the quiz questions are difficult than the course contents. It’s a nice way to brush up basic statistical knowledge.

November 20, 2012

Tags:

statistics
Relative Risk, Odds Ratio

Relative Risk는 Cohort Study에서 사용되며, Odds Ratio는 Case Control Study에 사용된다. 범주형 자료 분석 : 왜 오즈비(odds ratio) vs 상대 위험도(relative risk) 를 만들었을까?에 설명되어있듯이 Relative Risk가 더 이해하기 쉽지만, Case Control Study처럼 결과를 먼저 뽑은 뒤 원인을 분석하는 방법에는 적당하지 않다. 왜냐하면 어떤 원인에 의해 어떤 결과가 발생한 총비율을 알 수 없기 때문이다. 링크의…

November 20, 2012

Tags:

statistics
서평: An Introduction to Generalized Linear Model

An Introduction to Generalized Linear Model은 일반화 선형 모형에 대한 입문서 입니다. Logistic Regression라고 하면 glm(family=”binomial”…) 명령을 당연하게 생각하면서도, 정작 glm이 뭔지… glm만 나오면 ‘난 모르는일..’ 하고 넘어가다가 언젠가는 한번 봐둬야 하지 않을까 싶어서 읽게 되었습니다. 이 책의 첫인상은 난무하는 행렬식. 그래서 겁을 먹은 나머지 읽는데 수개월은 걸릴거라고 생각하고 시작했는데, 실제로는 훨씬 빨리 끝낼 수…

November 5, 2012

Tags:

statistics
My Certification on Computing for Data Analysis

I got certification from coursera.org’s course with distinction on data analysis using R. This lecture has pretty neat lecture slides covering data manipulation and plotting. Even if you’re good at R, it worth spending time on reading the material and taking the quiz. I learned a lot.

November 2, 2012

Tags:

statistics
Finding Optimal Threshold using ROC Curve for Classification

Let’s assume two class(A and B) classification. Also assume that a classification algorithm predicts that the given data is class A with probability 0.8. To predict if it’s class A or B, we need a threshold parameter (i.e., cutoff). If 0.8 is higher then the threshold , we’ll predict that the data is class A.…

October 13, 2012

Tags:

statistics
Resampling for Confidence Interval

Resampling is a method doing one of these: bootstrapping(random resampling), jackknifing(using subset of data), permutation test(or exact tests, randomization tests or re-randomization tests), cross validation. By taking many smaller samples from the given samples, one can estimate confidence interval: Statistics and Data Analysis: Confidence Intervals Based on Resampling Here’s R code for it: Quick-R: Bootstrapping…

October 10, 2012

Tags:

statistics
신뢰수준(significance level)의 의미

신뢰수준은 참값이 특정 범위에 있는 확률과는 약간 다릅니다. 그보다는 참값을 구하기 위한 작업을 많이 반복했을때 참값이 특정 범위에 있는 비율을 말합니다. 또는 방법의 정확도를 뜻합니다. 예를들어 10,000명이 치른 통계 시험 성적이 있다고 하겠습니다. 이 때, 100명을 랜덤 샘플링해서 샘플로부터 “99%의 신뢰수준으로 평균이 이다”라고 평균을 추정했다고 해보겠습니다. 이 때 99%를 신뢰수준(significance level)이라고 하고, 를 신뢰구간(confidence interval)이라고…

October 9, 2012

Tags:

statistics
maxLik package for optimization

maxLik is a statistical package for maximum likelihood estimation. For example, the max of can be found like the below using newton raphson method. Estimate was 3.5, and this is correct because . Optimization and Mathematical Programming has list of packages for optimization.

October 9, 2012

Tags:

statistics