Bleu Score

Bleu stands for Bilingual Evaluation Understudy.

Bleu Score is a measure of how accurate a translation is. It is especially useful when there are multiple acceptable translations.

Precision=ngramcountclip(ngram)ngramcount(ngram)Precision = \frac{\sum_{n-gram} count_{clip}(n-gram)}{\sum_{n-gram} count(n-gram)}

The above formula is used to calculate the precision of a translation by comparing the n-grams that occur in the result with the n-grams in the available acceptable translations.

countclip(ngram)count_{clip}(n-gram) is the maximum number of times the n-gram occurs in any one of the acceptable translations. count(ngram)count(n-gram) is the number of times the n-gram occurs in the machine-generated translation.

For example, consider the following two acceptable translations:

T1: The cat is on the mat T2: There is a cat on the mat

The following is the machine translation:

MT: The cat the cat on the mat

n-gram (bi-gram)

count(n-gram)

count_clip(n-gram)

the cat

2

1

cat the

1

0

cat on

1

1

on the

1

1

the mat

1

1

precision=1+0+1+1+12+1+1+1+1=46=0.66precision = \frac{1+0+1+1+1}{2+1+1+1+1} = \frac{4}{6} = 0.66

So, pnp_n is the Bleu Score computed on n-grams only. A combined Bleu Score (across different n-grams) is calculated as follows:

CombinedBleuScore=BPei=14piCombined\, Bleu\, Score = BP * e^{\sum_{i=1}^4 p_i}

Where BP is the Brevity Penalty i.e. to penalize shorter translations.

BP = 1; if machine_translation_length > reference_translation_length

BP = (1 - machine_translation_length/reference_translation_length); otherwise

Last updated