Naive Bayes in R

Package e1071 provides with naiveBayes function. It assumes independence of predictors, and assumes Gaussian distribution for metric predictors.

Its example includes iris sample:

> library(e1071)
> data(iris)
> head(iris)
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Our target variable is Species:

> m <- naiveBayes(Species ~ ., iris)
> m

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace)

A-priori probabilities:
Y
    setosa versicolor  virginica 
 0.3333333  0.3333333  0.3333333 

Conditional probabilities:
            Sepal.Length
Y             [,1]      [,2]
  setosa     5.006 0.3524897
  versicolor 5.936 0.5161711
  virginica  6.588 0.6358796

            Sepal.Width
Y             [,1]      [,2]
  setosa     3.428 0.3790644
  versicolor 2.770 0.3137983
  virginica  2.974 0.3224966

            Petal.Length
Y             [,1]      [,2]
  setosa     1.462 0.1736640
  versicolor 4.260 0.4699110
  virginica  5.552 0.5518947

            Petal.Width
Y             [,1]      [,2]
  setosa     0.246 0.1053856
  versicolor 1.326 0.1977527
  virginica  2.026 0.2746501

To see its performance (remember that 5th column is Species):

> table(predict=predict(m, iris[, -5]), true=iris[,5])
            true
predict      setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         47         3
  virginica       0          3        47

For prediction, use predict. When type=”raw” is given, probability is printed:

> predict(m, iris[1:10, -5])
 [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
Levels: setosa versicolor virginica

> predict(m, iris[1:10, -5], type="raw")
      setosa   versicolor    virginica
 [1,]      1 2.981309e-18 2.152373e-25
 [2,]      1 3.169312e-17 6.938030e-25
 [3,]      1 2.367113e-18 7.240956e-26
 [4,]      1 3.069606e-17 8.690636e-25
 [5,]      1 1.017337e-18 8.885794e-26
 [6,]      1 2.717732e-14 4.344285e-21
 [7,]      1 2.321639e-17 7.988271e-25
 [8,]      1 1.390751e-17 8.166995e-25
 [9,]      1 1.990156e-17 3.606469e-25
[10,]      1 7.378931e-18 3.615492e-25

Similar Posts:

Comments 2

  1. Shameek Mukherjee wrote:

    Hi Minkoo,

    Many thanks for the post. But I was just asking that some of the conditional probabilities are seeming to be greater than one?Could you please explain this.

    Many thanks in advance.

    Posted 15 Sep 2013 at 2:26 pm
  2. Minkoo Seo wrote:

    Sorry for late response. According to http://www-users.cs.york.ac.uk/~jc/teaching/arin/R_practical/#nbayes it’s not actually conditional probability but mean and standard deviation. I don’t know why those tables are titled as “conditional probability”.

    But in general, to compute the probability, we just need to use predict().

    Posted 24 Sep 2013 at 5:50 pm

Post a Comment

Your email is never published nor shared. Required fields are marked *