When there’s a data point that you want to ignore in loss computation, you can use ignore_class
parameter in tf.keras.losses.SparseCategoricalCrossentropy
. But the same parameter doesn’t exist in tf.keras.losses.CategoricalCrossentropy
. I don’t know why, but that’s troublesome when there’s needs. Even the SparseCategoricalCrossEntropy’s ignore_class isn’t easy to use either since it requires one to add a class just to ignore.
One trick that works for categorical cross entropy is using one hot encoding while setting all values of y_true to zero.
from tensorflow.keras.losses import CategoricalCrossentropy
loss = CategoricalCrossentropy()
print(loss([[1, 0, 0]], [[1.0, 0.0, 0.0]]).numpy())
print(loss([[1, 0, 0]], [[1.0, 0.0, 0.0]]).numpy())
Above code prints:
1.192093e-07
1.192093e-07
But if all values in y_true are zero, loss is zero:
print(loss([[0, 0, 0]], [[1.0, 0.0, 0.0]]).numpy())
It’s obvious if you think about it. Loss = sum(-p(x)log(q(x)) where p(x) is y_true and q(x) is y_pred. If all p(x) is zero, Loss = sum(-0 * log(q(x)) = 0. This works for any loss who’s using similar form of loss equation.
This is a neat trick, and it looks like many know about it. But this is hardly mentioned properly, so I’m adding a post to spread the knowledge.