Bleu Score
Bleu stands for Bilingual Evaluation Understudy.
Bleu Score is a measure of how accurate a translation is. It is especially useful when there are multiple acceptable translations.
The above formula is used to calculate the precision of a translation by comparing the n-grams that occur in the result with the n-grams in the available acceptable translations.
is the maximum number of times the n-gram occurs in any one of the acceptable translations. is the number of times the n-gram occurs in the machine-generated translation.
For example, consider the following two acceptable translations:
T1: The cat is on the mat T2: There is a cat on the mat
The following is the machine translation:
MT: The cat the cat on the mat
n-gram (bi-gram) | count(n-gram) | count_clip(n-gram) |
the cat | 2 | 1 |
cat the | 1 | 0 |
cat on | 1 | 1 |
on the | 1 | 1 |
the mat | 1 | 1 |
So, is the Bleu Score computed on n-grams only. A combined Bleu Score (across different n-grams) is calculated as follows:
Where BP is the Brevity Penalty i.e. to penalize shorter translations.
BP = 1; if machine_translation_length > reference_translation_length
BP = (1 - machine_translation_length/reference_translation_length); otherwise
Last updated