Neural net for iris in R – Passion is like genius; a miracle.

> install.packages(“nnet”)
> library(nnet)
> data(iris)

iris3 looks like this:

> iris3[1,,]

         Setosa Versicolor Virginica
Sepal L.    5.1        7.0       6.3
Sepal W.    3.5        3.2       3.3
Petal L.    1.4        4.7       6.0
Petal W.    0.2        1.4       2.5

> iris3[1:3,,]
, , Setosa
     Sepal L. Sepal W. Petal L. Petal W.
[1,]      5.1      3.5      1.4      0.2
[2,]      4.9      3.0      1.4      0.2
[3,]      4.7      3.2      1.3      0.2

, , Versicolor
     Sepal L. Sepal W. Petal L. Petal W.
[1,]      7.0      3.2      4.7      1.4
[2,]      6.4      3.2      4.5      1.5
[3,]      6.9      3.1      4.9      1.5

, , Virginica
     Sepal L. Sepal W. Petal L. Petal W.
[1,]      6.3      3.3      6.0      2.5
[2,]      5.8      2.7      5.1      1.9
[3,]      7.1      3.0      5.9      2.1

Merge them into two dimensional matrix:

> ir = rbind(iris3[,,1], iris3[,,2], iris3[,,3])

Now, ir looks like this:

> ir[1:5,]

     Sepal L. Sepal W. Petal L. Petal W.
[1,]      5.1      3.5      1.4      0.2
[2,]      4.9      3.0      1.4      0.2
[3,]      4.7      3.2      1.3      0.2
[4,]      4.6      3.1      1.5      0.2
[5,]      5.0      3.6      1.4      0.2

And we have total of 150 rows where 1:50 is Setosa, 51:100 is Vericolor, and 101:150 is Virginica. Represents that using class.ind:

> targets = class.ind(c(rep(“s”, 50), rep(“c”, 50), rep(“v”, 50)))
> targets
     c s v
[1,] 0 1 0
[2,] 1 0 0
[3,] 0 0 1

In this example, we split data into two. One is for training and the other is for testing:

> samp = c(sample(1:50, 25), sample(51:100, 25), sample(101:150, 25))

Build neural net:

> ir1 = nnet(ir[samp,], targets[samp,], size=2, rang=0.1, decay=5e-4, maxit=200)
# weights:  19
initial  value 55.922394 
iter  10 value 44.575894
iter  20 value 1.119795
iter  30 value 0.595917
iter  40 value 0.481304
iter  50 value 0.457375
iter  60 value 0.446084
iter  70 value 0.432889
iter  80 value 0.428464
iter  90 value 0.427190
iter 100 value 0.426795
iter 110 value 0.426622
iter 120 value 0.426541
iter 130 value 0.426511
iter 140 value 0.426509
iter 150 value 0.426507
iter 160 value 0.426507
final  value 0.426507 
converged

We used 2 hidden layer nodes, initial weights in [-0.1, 0.1], weight decay 5e-4 (decay is penalty for larger weights; this is for avoiding overfit), and 200 iterations. Note that we might have used softmax=TRUE for liklihood fitting.

Then, write a function to get a table for classification accuracy estimation:

> test.cl = function(true, pred) {
+   true = max.col(true)
+   cres = max.col(pred)
+   table(true, cres)
+ }

where parameter true is true class and parameter cres is classification result. In the function body, max.col() is for finding out a column with maximum value. For example, we have targets like this:

> targets[1:5,]
     c s v
[1,] 0 1 0
[2,] 0 1 0
[3,] 0 1 0
[4,] 0 1 0
[5,] 0 1 0

Thus we’ll get 2 in this case in max.col(). Similarly, we get values in predict:

> predict(ir1, ir[1:10,])
               c         s           v
 [1,] 0.01612068 0.9842167 0.008921044
 [2,] 0.01712504 0.9833107 0.008876818
 [3,] 0.01648626 0.9838865 0.008904615
 [4,] 0.01774089 0.9827572 0.008851044
 [5,] 0.01606327 0.9842686 0.008923660
 [6,] 0.01626360 0.9840875 0.008914575
 [7,] 0.01662113 0.9837648 0.008898651
 [8,] 0.01652012 0.9838559 0.008903113
 [9,] 0.01823139 0.9823174 0.008831188
[10,] 0.01711277 0.9833218 0.008877341

So, max.col will return a column number whose value is the largest for each row. Given these, it’s easy to print a table:

> test.cl(targets[-samp,], predict(ir1, ir[-samp,]))
    cres
true  1  2  3
   1 22  0  3
   2  0 25  0
   3  0  0 25

We can easily see that first column(“c”, i.e., Vericolor) has some errors while we get perfect answers for others.

Another way to see this is drawing a plot:

> plot(predict(ir1, ir[-samp,][1:25,])[,2])

where ir[-samp,] is testing data, ir[-samp,][1:25,] is testing data for “Setosa”, and ir[-samp,][1:25,][2,] is output value from neural net for classifying data to “Setosa”.

The result looks like this:

As we classify those data into “Setosa” only if those values are larger than the values in other columns, we should see large values in this plot.
References)
1. R documentation, “Fit Neural Network”. http://stat.ethz.ch/R-manual/R-devel/library/nnet/html/nnet.html
2. 구자용, 박헌진, 최대우, 김성수, “데이터 마이닝”, Knou Press.