Bleu Score
Bleu stands for Bilingual Evaluation Understudy.
Bleu Score is a measure of how accurate a translation is. It is especially useful when there are multiple acceptable translations.
Precision=∑n−gramcount(n−gram)∑n−gramcountclip(n−gram)
The above formula is used to calculate the precision of a translation by comparing the n-grams that occur in the result with the n-grams in the available acceptable translations.
countclip(n−gram) is the maximum number of times the n-gram occurs in any one of the acceptable translations. count(n−gram) is the number of times the n-gram occurs in the machine-generated translation.
For example, consider the following two acceptable translations:
T1: The cat is on the mat T2: There is a cat on the mat
The following is the machine translation:
MT: The cat the cat on the mat
n-gram (bi-gram)
count(n-gram)
count_clip(n-gram)
the cat
2
1
cat the
1
0
cat on
1
1
on the
1
1
the mat
1
1
precision=2+1+1+1+11+0+1+1+1=64=0.66
So, pn is the Bleu Score computed on n-grams only. A combined Bleu Score (across different n-grams) is calculated as follows:
CombinedBleuScore=BP∗e∑i=14pi
Where BP is the Brevity Penalty i.e. to penalize shorter translations.
BP = 1; if machine_translation_length > reference_translation_length
BP = (1 - machine_translation_length/reference_translation_length); otherwise
Last updated