This is my own trial to explain the tweak mentioned in the above link.
In the video, what we want is to find the best english text for the given foreign text, and it can be written as:
For the purpose of finding english text, ignore Pr(f), i.e.,:
What’s pointed out as interesting in the linked document is 1.5.
Here’s my explanation.
As it’s probability^1.5, it makes the probability lower, but not higher, i.e., x^1.5 < x if 0 <= x <=1. I think this might be a tweak due to data scarcity. For p(e), there's tons of data to build a model. On the other hand, p(f|e) requires for you to get parallel corpus (texts that's written in both of English and foreign language) which is inherently scarce. As a result, p(e)^1.5 * p(e|f) lowers p(e) as it's supposed to be too high compared to p(f|e).