In R,
> ?lm
gives you this example:
> ## Annette Dobson (1990) “An Introduction to Generalized Linear Models”. > ## Page 9: Plant Weight Data. > ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) > trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) > group <- gl(2,10,20, labels=c(“Ctl”,”Trt”)) > weight <- c(ctl, trt) > lm.D9 <- lm(weight ~ group) > lm.D90 <- lm(weight ~ group – 1) # omitting intercept
But the doc does not explain the difference between lm.D9 and lm.D90.
Their difference is that lm.D9 has intercept (like weight = intercept + beta * group) while lm.D90 does not (weight = beta * group). But this is only small part of the difference. If you look at their summary, the result is surprising:
> summary(lm.D9) Call: lm(formula = weight ~ group) Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.0320 0.2202 22.850 9.55e-15 *** groupTrt -0.3710 0.3114 -1.191 0.249 — Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-squared: 0.07308, Adjusted R-squared: 0.02158 F-statistic: 1.419 on 1 and 18 DF, p-value: 0.249 > summary(lm.D90) Call: lm(formula = weight ~ group – 1) Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(>|t|) groupCtl 5.0320 0.2202 22.85 9.55e-15 *** groupTrt 4.6610 0.2202 21.16 3.62e-14 *** — Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-squared: 0.9818, Adjusted R-squared: 0.9798 F-statistic: 485.1 on 2 and 18 DF, p-value: < 2.2e-16
Suddenly, R square, F statistics and p value says that the model is significant in lm.D90 while lm.D9 is not. Why? Is it because we shouldn’t have intercept in the model? No, it’s not. It’s because R-square you’re reading is not computing what you are expecting.
As you know, R square = 1 – (SSR/SST) where SST is sum of squares of (y-avg(y)) and SSR is sum of squares of (y-yhat). However, if intercept is set to zero, then we compute sum of squares of y as SST. In other words, it’s not comparing y = ax + b + epsilon VS y = mu + epsilon. Instead, it’s comparing y = ax + b + epsilon VS y = 0 + epsilon. See http://www.bios.unc.edu/~truong/b663/pdf/noint.pdf It’s not just R or SAS. It’s just how you model it, so excel has this issue too: http://support.microsoft.com/kb/829249
If this is still confusing, remember how you run ANOVA:
> summary(aov(weight~group)) Df Sum Sq Mean Sq F value Pr(>F) group 1 0.6882 0.68820 1.4191 0.249 Residuals 18 8.7293 0.48496 >
Its p value for group is 0.249 which concur with lm(weight~group).
Another aspect of this problem lies in the assumption of our model: y = ax + b + epslion. This is, at best, is just the approximation of the real world. For example, what if the reality has another variable Z such that the right model is y = ax + b + epsilon + cz? In y = ax + b + epsilon, the effect of cz is swallowed by b by setting b = mean(cz). If it were not for b, the intercept, this kind of sink does not happen. See pp.43 of Linear Models with R by Julian Faraway.
So, how do we fix that? To compute the correct F statistics which compares y = ax + epsilon VS y = mu + epsilon:
> anova(lm(weight ~ 1), lm(weight ~ group -1))