For a classification task where output is either T or F, if training set contains too many of T while the number of F is small, then the classifier performs poorly for predicting F. It’s because the classifier achieves high precision just returning T for the most of time, meaning that it hardly learns how to classify data as F. This paper addresses a solution called one sided selection and cited 634 times according to Google Scholar.
A paper on imbalanced training sets
Tags: