Conformal prediction for dummies

Here I’m writing the simplest form of the concept so that anyone can quickly get the idea. If you want a serious post, read paper or other blog article. This isn’t for you.

Conformal prediction outputs range for regression and multiple lables for classifications. Its purpose is to have output contains the correct answer for (1-\alpha)\% of time where \alpha is called significance level.

Let’s use regression example and a score function of | \hat{y} - y|. For all the data points, calculate scores: S(X_1, \hat{y_1}), S(X_2, \hat{y_2}), \cdots, S(X_n, \hat{y_n}). And then calculate the 1-\alpha quantile, e.g., sort the scores in ascending order and find the 90%th score if \alpha=10\%.

Now, we know how we can come up with a prediction range that would contain the answer for the 90% of time: \{\hat{y}|S(X_{n+1}, \hat{y}) <= 90th \enspace score\}.