A paper on imbalanced training sets

Tags:

Miroslav Kubat and Stan Matwin, Addressing the Curse of Imbalanced Training Sets: One-Sided Selection, In Proceedings of the Fourteenth International Conference on Machine Learning, 1997

For a classification task where output is either T or F, if training set contains too many of T while the number of F is small, then the classifier performs poorly for predicting F. It’s because the classifier achieves high precision just returning T for the most of time, meaning that it hardly learns how to classify data as F. This paper addresses a solution called one sided selection and cited 634 times according to Google Scholar.