Chi-square test can be used for testing independence using the formula at wiki.
Given a table:
> x = c(12, 26, 22) > names(x) = c("A", "B", "C") > x A B C 12 26 22
We want to know the values are independent from class, i.e., A, B, C. It’s very trivial in R:
> chisq.test(x) Chi-squared test for given probabilities data: x X-squared = 5.2, df = 2, p-value = 0.07427
Or one can use matrix:
> x = matrix(c(10, 12, 15, 13, 17, 14), nrow=2, dimnames=list(c("A", "B"), c("1", "2", "3"))) > x 1 2 3 A 10 15 17 B 12 13 14 > chisq.test(x) Pearson's Chi-squared test data: x X-squared = 0.5046, df = 2, p-value = 0.777
Or one can just list of values or its table:
> x = c("M", "M", "M", "M", "F", "F", "F", "F") > y = c(1, 3, 2, 4, 5, 3, 2, 7) > chisq.test(x, y) Pearson's Chi-squared test data: x and y X-squared = 4, df = 5, p-value = 0.5494 > chisq.test(table(x, y)) Pearson's Chi-squared test data: table(x, y) X-squared = 4, df = 5, p-value = 0.5494
Two things to remember:
– If the number of samples is small in 2×2 contingency table, we need to use fisher.test() for fisher’s exact testing. It uses hypergeometric distribution instead of chi square for more accurate test.
– If samples are paired (like measurement for the same person before and after of treatment or like rating change before and after election campaign), we need to use mcnemar.test(). But even in mcnemar.test(), if the expected count is small, we need to use binom.test() to see if there’s any difference.
Reference)
1. 김재희, R을 이용한 통계 프로그래밍 기초, 자유아카데미.
2. 이태림 외, 생명과학 자료분석, 한국방송통신대학교출판부.
3. 신준호, 빈도분석과 카이자승 검정. http://www.richis.org/html/research/statistics/a05.pdf
4. Chi-square, Fisher’s exact, and McNemar’s test