Logistic Regression for iris in R

Tags:

> data(iris)
> i = cbind(iris, setosa=ifelse(iris$Species==”setosa”, 1, 0))
> i[1:5,]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species setosa
1          5.1         3.5          1.4         0.2  setosa      1
2          4.9         3.0          1.4         0.2  setosa      1
3          4.7         3.2          1.3         0.2  setosa      1
4          4.6         3.1          1.5         0.2  setosa      1
5          5.0         3.6          1.4         0.2  setosa      1

> m = glm(setosa ~ Sepal.Width + Sepal.Length + Petal.Width + Petal.Width, family=binomial, data=i)
Warning messages:
1: glm.fit: algorithm did not converge 
2: glm.fit: fitted probabilities numerically 0 or 1 occurred 

See which data has odds larger than 1.

> exp(predict(m, i)) > 1
    1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20 
 TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE 
   21    22    23    24    25    26    27    28    29    30    31    32    33    34    35    36    37    38    39    40 
 TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE 
   41    42    43    44    45    46    47    48    49    50    51    52    53    54    55    56    57    58    59    60 
 TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
   61    62    63    64    65    66    67    68    69    70    71    72    73    74    75    76    77    78    79    80 
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
   81    82    83    84    85    86    87    88    89    90    91    92    93    94    95    96    97    98    99   100 
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
  101   102   103   104   105   106   107   108   109   110   111   112   113   114   115   116   117   118   119   120 
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
  121   122   123   124   125   126   127   128   129   130   131   132   133   134   135   136   137   138   139   140 
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
  141   142   143   144   145   146   147   148   149   150 
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 

Checkout the model.

> summary(m)

Call:
glm(formula = setosa ~ Sepal.Width + Sepal.Length + Petal.Width + 
    Petal.Width, family = binomial, data = i)

Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-3.503e-05  -2.100e-08  -2.100e-08   2.100e-08   3.719e-05

Coefficients:
               Estimate Std. Error z value Pr(>|z|)
(Intercept)      25.477 171178.555   0.000    1.000
Sepal.Width      19.057  50639.724   0.000    1.000
Sepal.Length     -6.762  45486.041   0.000    1.000
Petal.Width     -59.292  62182.274  -0.001    0.999

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1.9095e+02  on 149  degrees of freedom
Residual deviance: 4.1441e-09  on 146  degrees of freedom
AIC: 8

Number of Fisher Scoring iterations: 25

References)
1. http://www.stat.cmu.edu/~cshalizi/490/clustering/clustering01.r