T-test for comparing means

Tags:

Before we get started, open up Gaussian distribution and Chi, t, F distributions if you need some reference on the math.

One sample t-test

If we don’t know variance of population (that is usually the case), for X \sim N(\mu, \sigma^2):

  \dfrac{\overline{X}-\mu}{s/\sqrt{n}} \sim t(n-1)

where s is standard deviation of samples and n is the number of samples.

As an example, to test if the mean of “1, 3, 2, 7, 8, 9, 3, 4, 5” is 5, we should test if they’re normally distributed:

> x = c(1, 3, 2, 7, 8, 9, 3, 4, 5)
> shapiro.test(x)

	Shapiro-Wilk normality test

data:  x 
W = 0.9409, p-value = 0.5917

As p-value > 0.05, we can not reject H0, i.e., it’s following normal distribution. See Testing Normality for additional way of testing normality.

Now, to apply t-test:

> t.test(x, mu=5)

	One Sample t-test

data:  x 
t = -0.3592, df = 8, p-value = 0.7287
alternative hypothesis: true mean is not equal to 5 
95 percent confidence interval:
 2.526785 6.806548 
sample estimates:
mean of x 
 4.666667 

As p-value is 0.7287 > 0.05, H0 is not rejected, meaning that the true mean is 5.

Or to see if the mean of x is larger than 5:

> t.test(x, mu=5, alternative="greater")

	One Sample t-test

data:  x 
t = -0.3592, df = 8, p-value = 0.6356
alternative hypothesis: true mean is greater than 5 
95 percent confidence interval:
 2.941079      Inf 
sample estimates:
mean of x 
 4.666667 

In this case, true mean is NOT greater than 5 as p-value > 0.05.

Independent two sample t-test

Here, we want to know if the mean of X_1, X_2, \cdots, X_{n_1} and Y_1, Y_2, \cdots, Y_{n_2} are the same when X_{n_1} and Y_{n_2} are independent and X \sim N(\mu_1, \sigma_1), Y \sim N(\mu_2, \sigma_2).

1) If we know \sigma_1 and \sigma_2.
The test statistics is

  \dfrac{\overline{X}-\overline{Y}}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}} \sim N(0,1)

However, we usually don’t know \sigma_1 and \sigma_2.

2) We don’t know \sigma_1, \sigma_2, but n_1 and n_2 are big enough.
Then the test statistics is:

  \dfrac{\overline{X}-\overline{Y}}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}} \sim N(0, 1)

Usually 30 is magic number to determine if the sample size is big.

3) We don’t know \sigma_1, \sigma_2, but \sigma_1=\sigma_2
Test statistics can be written as

  \dfrac{\hat{X}-\hat{Y}-(\mu_1-\mu2)}{S_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}} \sim t(n_1 + n_2 - 2)

where S_p is so called pooled sample variance:

  S_p=\dfrac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{(n_1 + n_2 - 2)}

(Note: We’re still assuming that X and Y follows normal distribution. See assumptions of t-test.)

As an example, let’s test if the means are the same for “1, 3, 2, 7, 8, 9, 3, 4, 5” and “1, 2, 4, 3, 2, 5, 6, 7, 8, 2, 3, 5”.

Let’s test if the variances are the same:


> var.test(x,y)

	F test to compare two variances

data:  x and y 
F = 1.5787, num df = 8, denom df = 11, p-value = 0.4734
alternative hypothesis: true ratio of variances is not equal to 1 
95 percent confidence interval:
 0.4308902 6.6990915 
sample estimates:
ratio of variances 
          1.578704 

As p-value > 0.05, we can not reject that their variances are the same.

If we need to test normality, we want to see if \overline{X}-\overline{Y} is normally distributed. In this example, we know that the variance is the same for X and Y. So, when using shapiro.test, we need to think of this t-test as simplified version of anova. Then, X_i = \mu_i + \epsilon_i and Y_j = \mu_j + \epsilon_j where \epsilon_{i} \sim N(0, \sigma_E),~\epsilon_{j} \sim N(0, \sigma_E). As \epsilon_i and \epsilon_j is normally distributed with the same mean and variance, put them together and test normality. Suppose that we have data “1, 3, 2, 7, 8, 9, 3, 4, 5″ and “1, 2, 4, 3, 2, 5, 6, 7, 8, 2, 3, 5″. Then, run shapiro.test like the below:

> x = c(1, 3, 2, 7, 8, 9, 3, 4, 5)
> y = c(1, 2, 4, 3, 2, 5, 6, 7, 8, 2, 3, 5)
> shapiro.test(c(x-mean(x), y-mean(y)))

	Shapiro-Wilk normality test

data:  c(x - mean(x), y - mean(y)) 
W = 0.9426, p-value = 0.2452

In this case, H0 holds: it’s normal.

Another way of doing this is using lm:

> f = data.frame(val=c(x, y), klass=c(rep("x", NROW(x)), rep("y", NROW(y))))
> f  
   val klass
1    1     x
2    3     x
3    2     x
4    7     x
5    8     x
6    9     x
7    3     x
8    4     x
9    5     x
10   1     y
11   2     y
12   4     y
13   3     y
14   2     y
15   5     y
16   6     y
17   7     y
18   8     y
19   2     y
20   3     y
21   5     y
> # As klass is a factor variable, val = alpha * klass + epsilon where alpha is either 0 or 1.
> shapiro.test(resid(lm(val ~ klass, data=f)))

	Shapiro-Wilk normality test

data:  resid(lm(val ~ klass, data = f)) 
W = 0.9426, p-value = 0.2452

As you can see, using lm gives the same result with subtracting mean from x and y separately.

If the variances were different, we would use shapiro.test for each of X an Y separately.

Now, t-test:

> t.test(x, y, var.equal=TRUE)

	Two Sample t-test

data:  x and y 
t = 0.6119, df = 19, p-value = 0.5479
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -1.613802  2.947136 
sample estimates:
mean of x mean of y 
 4.666667  4.000000 

It’s confidence interval includes zero. Thus p-value > 0.05, meaning that we can not reject H0 that their means are the same.

3) If we know that \sigma_1 \neq \sigma_2

We’re still assuming that X and Y follow normal distribution and they’re independent. As their variances are not the same, we just use the fact that X - Y = N(\mu_1 - \mu_2, \sigma_1 + \sigma_2).

Because we do not know their variances, use sample variance:

  \dfrac{\overline{X}-\overline{Y}}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}} \sim t(df)

Code for R is the same, except that we use t.test(x, y, var.equal=FALSE).

But one should think really hard why he/she want to compare mean in the first place when they have different variances.

Paired sample t-test

I think this is the data that any intelligent engineer will try to get from their experiment. Paired samples has data in this form: (X_1, Y_1),~ (X_2, Y_2),~ \cdots,~ (X_n, Y_n). For example, it could be like data of (old method performance, new method performance) observed from several machines.

If X and Y are normally distributed, D=X-Y follows normal distribution. Even when it’s not the case, Central Limit Theorem states that sample average follows normal distribution. Therefore D \sim N.

As we do not know variance of D, use sample variance to get:

  \dfrac{D - \mu_D}{S_D / \sqrt{n}} \sim t(n-1)

In R (I am assuming X \sim N and Y \sim N. Without it, one should run normality test first as we have small number of data in this example):

> x = c(1, 2, 3, 4, 3, 2)
> y = c(5, 3, 2, 3, 1, 7)
> t.test(x, y, paired=TRUE)

	Paired t-test

data:  x and y 
t = -0.8452, df = 5, p-value = 0.4366
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -4.041553  2.041553 
sample estimates:
mean of the differences 
                     -1 

We conclude that we can not reject H0 that true means are the same. In human language, “their mean are the same”.

If all the assumptions do not hold

All the methods in the above have some kind of assumptions like sample size is large or normal distribution.

If such assumptions look invalid, one could use non-parametric methods like rank sum test. For example, for the paired t-test case in the above:

> x = c(1, 2, 3, 4, 3, 2)
> y = c(5, 3, 2, 3, 1, 7)
> library(BSDA)
> wilcox.test(x, y, paired=TRUE)

	Wilcoxon signed rank test with continuity correction

data:  x and y 
V = 8, p-value = 0.6716
alternative hypothesis: true location shift is not equal to 0 

See Rank Tests for more examples.

Refernces)
배도선 외, 통계학 이론과 응용, 청문각.
임동훈, R을 이용한 비모수 통계학, 자유아카데미.
김재희, R을 이용한 통계 프로그래밍 기초, 자유아카데미.
안재형, R을 이용한 누구나하는 통계분석, 한나래.