ROC graph 101

Tags:


Tom Fawcet, ROC Graphs: Notes and Practical Considerations for Data Mining Researcher, HP Labs Technical Reports, 2003.

This is a paper on the ROC graph, and I really enjoyed reading it. Though many ‘introduction to machine learning’ books describe ROC curve, none of them could explain it in this much depth.

Starting from algorithms to draw the graph correctly and efficiently, it explains that ROC curve is class skew invariant unlike precision-recall graph, and it explains how to use cross validation to draw a vertically averaged graph(so that we can find confidence interval for each false positive rate) and to draw an averaged curve by threshold(which may not be attractive if we’re averaging different models and if scores are not probabilities).

The paper goes even further to explain cost sensitive ROC curve and multi-class ROC graph(and AUC of it). Finally, it describes interpolation of classifiers to get a classifier somewhere in the middle of two points in the ROC graph(we can do this by random sampling classifier output) and it describes conditional classifier for removing concavities in ROC graph. Chained classifier was also discussed (by mentioning that it’s violating the assumption that each model in ROC graph is supposed to be independent).

I recommend this to everyone who didn’t study ROC graph in details.