## Proportion estimation in R

Given 1 (head of a coin) and 0(tail of a coin) as a sequence like 101011110101, …, we want to figure out the proportion of 1 in population, i.e., how likely is it to observe head for the given coin.

Let X be the random variable where 1 means head and 0 means tail. If the probability of observing head is p, then X ~ Bernoulli(p). If we flip coin n times and sum the random numbers, 1 (head) and 0(tail), then SUM ~ B(n, p). If np >= 5 and n(1-p) >= 5, by Central Limit Theorem, SUM ~ N(np, np(1-p)). To get proportion, say, Y, Y = SUM/n ~ N(p, p(1-p)/n). So, proportion is p +/- 1.96 * sqrt(p(1-p)/n).

To perform proportion estimation in R,

```> heads <- rbinom(1, size=100, prob = .8)

1-sample proportions test with continuity
correction

data:  heads out of 100, null probability 0.5
X-squared = 53.29, df = 1, p-value = 2.878e-13
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.7843987 0.9262321
sample estimates:
p
0.87
```

What’s interesting here is it’s not symmetric interval:

```> .87-0.7843987
 0.0856013
> 0.9262321 -.87
 0.0562321
```

It’s because the interval is not using gaussian distribution described above. Instead, it’s using other confidence interval computation method:
http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/prop.test.html

If two proportions are involved, we can compare them(see prop.test manual in R):

```## Data from Fleiss (1981), p. 139.
## H0: The null hypothesis is that the four populations from which
##     the patients were drawn have the same true proportion of smokers.
## A:  The alternative is that this proportion is different in at
##     least one of the populations.
> smokers  <- c( 83, 90, 129, 70 )
> patients <- c( 86, 93, 136, 82 )
> prop.test(smokers, patients)

4-sample test for equality of proportions without
continuity correction

data:  smokers out of patients
X-squared = 12.6004, df = 3, p-value = 0.005585
alternative hypothesis: two.sided
sample estimates:
prop 1    prop 2    prop 3    prop 4
0.9651163 0.9677419 0.9485294 0.8536585
```

As p value < 0.05, we reject H0, accepting A. Note the relationship between prop.test and chisq.test (chisqaure testing in contingency table): http://stats.stackexchange.com/questions/2391/what-is-the-relationship-between-a-chi-square-test-and-test-of-equal-proportions

Similar Posts: