R^2 without intercept is not what you want – Passion is like genius; a miracle.

In R,

> ?lm

gives you this example:

> ## Annette Dobson (1990) “An Introduction to Generalized Linear Models”.
> ## Page 9: Plant Weight Data.
> ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
> trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
> group <- gl(2,10,20, labels=c(“Ctl”,”Trt”))
> weight <- c(ctl, trt)
> lm.D9 <- lm(weight ~ group)
> lm.D90 <- lm(weight ~ group – 1) # omitting intercept

But the doc does not explain the difference between lm.D9 and lm.D90.

Their difference is that lm.D9 has intercept (like weight = intercept + beta * group) while lm.D90 does not (weight = beta * group). But this is only small part of the difference. If you look at their summary, the result is surprising:

> summary(lm.D9)

Call:
lm(formula = weight ~ group)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.0710 -0.4938  0.0685  0.2462  1.3690 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5.0320     0.2202  22.850 9.55e-15 ***
groupTrt     -0.3710     0.3114  -1.191    0.249    
—
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-squared: 0.07308, Adjusted R-squared: 0.02158 
F-statistic: 1.419 on 1 and 18 DF,  p-value: 0.249 


> summary(lm.D90)

Call:
lm(formula = weight ~ group – 1)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.0710 -0.4938  0.0685  0.2462  1.3690 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)    
groupCtl   5.0320     0.2202   22.85 9.55e-15 ***
groupTrt   4.6610     0.2202   21.16 3.62e-14 ***
—
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-squared: 0.9818, Adjusted R-squared: 0.9798 
F-statistic: 485.1 on 2 and 18 DF,  p-value: < 2.2e-16

Suddenly, R square, F statistics and p value says that the model is significant in lm.D90 while lm.D9 is not. Why? Is it because we shouldn’t have intercept in the model? No, it’s not. It’s because R-square you’re reading is not computing what you are expecting.

As you know, R square = 1 – (SSR/SST) where SST is sum of squares of (y-avg(y)) and SSR is sum of squares of (y-yhat). However, if intercept is set to zero, then we compute sum of squares of y as SST. In other words, it’s not comparing y = ax + b + epsilon VS y = mu + epsilon. Instead, it’s comparing y = ax + b + epsilon VS y = 0 + epsilon. See http://www.bios.unc.edu/~truong/b663/pdf/noint.pdf It’s not just R or SAS. It’s just how you model it, so excel has this issue too: http://support.microsoft.com/kb/829249

If this is still confusing, remember how you run ANOVA:

> summary(aov(weight~group))
            Df Sum Sq Mean Sq F value Pr(>F)
group        1 0.6882 0.68820  1.4191  0.249
Residuals   18 8.7293 0.48496               
>

Its p value for group is 0.249 which concur with lm(weight~group).

Another aspect of this problem lies in the assumption of our model: y = ax + b + epslion. This is, at best, is just the approximation of the real world. For example, what if the reality has another variable Z such that the right model is y = ax + b + epsilon + cz? In y = ax + b + epsilon, the effect of cz is swallowed by b by setting b = mean(cz). If it were not for b, the intercept, this kind of sink does not happen. See pp.43 of Linear Models with R by Julian Faraway.

So, how do we fix that? To compute the correct F statistics which compares y = ax + epsilon VS y = mu + epsilon:

> anova(lm(weight ~ 1), lm(weight ~ group -1))