Monte Carlo simulations: Sampling from probability density functions

Monte Carlo Method의 정의는 다음과 같습니다.

A numerical modeling procedure that makes use of random numbers to simulate processes that involve an element of chance. In Monte Carlo simulation, a particular experiment is repeated many times with different randomly determined data to allow statistical conclusions to be drawn.
http://amsglossary.allenpress.com/glossary/browse

여기서는 평균 0, 분산 1인 Gaussian Distribution

X \sim N(0,1)

이 있고, 이의 pdf가

p(x)

, cdf가

F(x)

인 경우 이 분포로부터 샘플링을 해보겠습니다. 샘플링 방법은

Y \sim U(0,1)

인 분포로부터 y를 구하고

y=F^{-1}(x)

를 풀면 됩니다. 그리고 그렇게 구한

는 평균 0, 분산 1의 Gaussian Distribution을 따르게 됩니다.

이렇게 되는 이유는 Introduction to Biomedical Optics, Chap4. Transport: Monte Carlo에서도 나오듯이

p(x)

가 빨리 증가하는 구간에서는

F(x)

역시 빨리 증가하고 따라서

가 임의의 값을 골랐을 때 그 값은

F(x)

가 빨리 증가하는 구간 (즉

p(x)

가 큰 구간)이기 때문입니다.

문제는

값은 구하기 쉬우나 (단지 0에서 1사이의 pseudo random number만 구하면 되죠), Gaussian Distribution의 역함수를 우리는 모른다는 것입니다. (누군가는 아실지도.) 그 때 사용하는 코드가 바로 다음과 같습니다. 다음 코드는 Peter J. Acklam이 작성한 코드이며 제가 약간 수정하였습니다.

//
// This function returns an approximation of the inverse cumulative
// standard normal distribution function.  I.e., given P, it returns
// an approximation to the X satisfying P = Pr{Z <= X} where Z is a
// random variable from the standard normal distribution.
//
// The algorithm uses a minimax approximation by rational functions
// and the result has a relative error whose absolute value is less
// than 1.15e-9.
//
// Author:      Peter J. Acklam
// Time-stamp:  2002-06-09 18:45:44 +0200
// E-mail:      jacklam@math.uio.no
// WWW URL:     http://www.math.uio.no/~jacklam
//
// C implementation adapted from Peter's Perl version
double invNormal(const double& percentsToCover)
{
    static const double LOW = 0.02425;
    static const double HIGH = 0.97575;

    // Coefficients in rational approximations.
    static const double a&#91;&#93; = {
        -3.969683028665376e+01, 2.209460984245205e+02,
        -2.759285104469687e+02, 1.383577518672690e+02,
        -3.066479806614716e+01, 2.506628277459239e+00
    };

    static const double b&#91;&#93; = {
        -5.447609879822406e+01, 1.615858368580409e+02,
        -1.556989798598866e+02, 6.680131188771972e+01,
        -1.328068155288572e+01
    };

    static const double c&#91;&#93; = {
        -7.784894002430293e-03, -3.223964580411365e-01,
        -2.400758277161838e+00, -2.549732539343734e+00,
        4.374664141464968e+00, 2.938163982698783e+00
    };

    static const double d&#91;&#93; = {
        7.784695709041462e-03, 3.224671290700398e-01,
        2.445134137142996e+00, 3.754408661907416e+00
    };

    double q, r;
    if(percentsToCover < 0 || percentsToCover > 1)
    {
        throw "Given percent to invNormal is less than zero or larger than one.";
    }
    else if(percentsToCover == 0)
    {
        return -numeric_limits<double>::max();          // minus "infinity"
    }
    else if(percentsToCover == 1)
    {
        return numeric_limits<double>::max();            // "infinity"
    }
    else if(percentsToCover < LOW)
    {
        // Rational approximation for lower region
        q = sqrt(-2*log(percentsToCover));
        return (((((c&#91;0&#93;*q+c&#91;1&#93;)*q+c&#91;2&#93;)*q+c&#91;3&#93;)*q+c&#91;4&#93;)*q+c&#91;5&#93;) /
            ((((d&#91;0&#93;*q+d&#91;1&#93;)*q+d&#91;2&#93;)*q+d&#91;3&#93;)*q+1);
    }
    else if(percentsToCover > HIGH)
    {
        // Rational approximation for upper region
        q  = sqrt(-2*log(1-percentsToCover));
        return -(((((c[0]*q+c[1])*q+c[2])*q+c[3])*q+c[4])*q+c[5]) /
            ((((d[0]*q+d[1])*q+d[2])*q+d[3])*q+1);
    }
    else
    {
        // Rational approximation for central region
        q = percentsToCover - 0.5;
        r = q*q;
        return (((((a[0]*r+a[1])*r+a[2])*r+a[3])*r+a[4])*r+a[5])*q /
            (((((b[0]*r+b[1])*r+b[2])*r+b[3])*r+b[4])*r+1);
    }
}

물론 단순히 어떤 분포로부터 pseudo number generator로 값을 샘플링하는 것이 Monte Carlo Method의 전부는 아닙니다. 실제로는 이렇게 분포로부터 값을 구한다음 어떤 계산을 하고, 여기서부터 또다시 무엇인가를 해나가는데 목적이 있는 것이죠.

* Probability distribution functions (pdf’s) – the physical (or mathematical) system must be described by a set of pdf’s.
* Random number generator – a source of random numbers uniformly distributed on the unit interval must be available.
* Sampling rule – a prescription for sampling from the specified pdf’s, assuming the availability of random numbers on the unit interval, must be given.
* Scoring (or tallying) – the outcomes must be accumulated into overall tallies or scores for the quantities of interest.
* Error estimation – an estimate of the statistical error (variance) as a function of the number of trials and other quantities must be determined.
* Variance reduction techniques – methods for reducing the variance in the estimated solution to reduce the computational time for Monte Carlo simulation
* Parallelization and vectorization – algorithms to allow Monte Carlo methods to be implemented efficiently on advanced computer architectures.
From Major Components of a Monte Carlo Algorithm