Bleu Score

Bleu stands for Bilingual Evaluation Understudy.

Bleu Score is a measure of how accurate a translation is. It is especially useful when there are multiple acceptable translations.

$Precision = \frac{\sum_{n-gram} count_{clip}(n-gram)}{\sum_{n-gram} count(n-gram)}$

The above formula is used to calculate the precision of a translation by comparing the n-grams that occur in the result with the n-grams in the available acceptable translations.

$count_{clip}(n-gram)$ is the maximum number of times the n-gram occurs in any one of the acceptable translations. $count(n-gram)$ is the number of times the n-gram occurs in the machine-generated translation.

For example, consider the following two acceptable translations:

T1: The cat is on the mat T2: There is a cat on the mat

The following is the machine translation:

MT: The cat the cat on the mat

n-gram (bi-gram)	count(n-gram)	count_clip(n-gram)
the cat	2	1
cat the	1	0
cat on	1	1
on the	1	1
the mat	1	1

$precision = \frac{1+0+1+1+1}{2+1+1+1+1} = \frac{4}{6} = 0.66$

So, $p_n$ is the Bleu Score computed on n-grams only. A combined Bleu Score (across different n-grams) is calculated as follows:

$Combined\, Bleu\, Score = BP * e^{\sum_{i=1}^4 p_i}$

Where BP is the Brevity Penalty i.e. to penalize shorter translations.

BP = 1; if machine_translation_length > reference_translation_length

BP = (1 - machine_translation_length/reference_translation_length); otherwise

PreviousBeam Search NextAttention Model

Last updated 4 years ago