SMOTE for handling class imbalance

Package DMwR which is heavily described by the book Data Mining with R: Learning with Case Studies has several interesting libraries.

Among them, SMOTE is easy to use function to handle class imbalance. To quote the example of the package, first, generate small sample example:

> data(iris)
> data <- iris[, c(1, 2, 5)]
> data$Species <- factor(ifelse(data$Species == "setosa", "rare", "common"))
> head(data)
  Sepal.Length Sepal.Width Species
1          5.1         3.5    rare
2          4.9         3.0    rare
3          4.7         3.2    rare
4          4.6         3.1    rare
5          5.0         3.6    rare
6          5.4         3.9    rare
> table(data$Species)

common   rare 
   100     50 

Then, we generate new data set by 1) adding new examples for minority class based on k-nn and interpolation, and 2) under-sampling majority class examples:

> newData <- SMOTE(Species ~., data, perc.over=600, perc.under=100)
> table(newData$Species)

common   rare 
   300    350 

Similar Posts:

Comments 1

  1. BHR wrote:

    Hi
    dear all I have a question about SMOTE in R
    I have unbalanced classification problem
    class one has 4000 samples and class2 9200 samples

    when I use below code in R it gave me error
    code:
    data1 <- read.csv(file="D:NNDB.csv",head=TRUE,sep=",")
    data2 <- na.omit(data1)
    data2$Species<-data2[10]
    newdata<-SMOTE(Species ~ . , data2,perc.over = 8000,perc.under=100)

    error:
    Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr, :
    length of 'dimnames' [2] not equal to array extent
    how resolve this error and how set SMOTE parameters?
    thanks

    Posted 03 Dec 2014 at 8:50 pm

Post a Comment

Your email is never published nor shared. Required fields are marked *