Given 1 (head of a coin) and 0(tail of a coin) as a sequence like 101011110101, …, we want to figure out the proportion of 1 in population, i.e., how likely is it to observe head for the given coin.
Let X be the random variable where 1 means head and 0 means tail. If the probability of observing head is p, then X ~ Bernoulli(p). If we flip coin n times and sum the random numbers, 1 (head) and 0(tail), then SUM ~ B(n, p). If np >= 5 and n(1-p) >= 5, by Central Limit Theorem, SUM ~ N(np, np(1-p)). To get proportion, say, Y, Y = SUM/n ~ N(p, p(1-p)/n). So, proportion is p +/- 1.96 * sqrt(p(1-p)/n).
To perform proportion estimation in R,
> heads <- rbinom(1, size=100, prob = .8) > prop.test(heads, 100) 1-sample proportions test with continuity correction data: heads out of 100, null probability 0.5 X-squared = 53.29, df = 1, p-value = 2.878e-13 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.7843987 0.9262321 sample estimates: p 0.87
What’s interesting here is it’s not symmetric interval:
> .87-0.7843987 [1] 0.0856013 > 0.9262321 -.87 [1] 0.0562321
It’s because the interval is not using gaussian distribution described above. Instead, it’s using other confidence interval computation method:
http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/prop.test.html
If two proportions are involved, we can compare them(see prop.test manual in R):
## Data from Fleiss (1981), p. 139. ## H0: The null hypothesis is that the four populations from which ## the patients were drawn have the same true proportion of smokers. ## A: The alternative is that this proportion is different in at ## least one of the populations. > smokers <- c( 83, 90, 129, 70 ) > patients <- c( 86, 93, 136, 82 ) > prop.test(smokers, patients) 4-sample test for equality of proportions without continuity correction data: smokers out of patients X-squared = 12.6004, df = 3, p-value = 0.005585 alternative hypothesis: two.sided sample estimates: prop 1 prop 2 prop 3 prop 4 0.9651163 0.9677419 0.9485294 0.8536585
As p value < 0.05, we reject H0, accepting A. Note the relationship between prop.test and chisq.test (chisqaure testing in contingency table): http://stats.stackexchange.com/questions/2391/what-is-the-relationship-between-a-chi-square-test-and-test-of-equal-proportions