# Bleu Score

Bleu stands for **Bilingual Evaluation Understudy**.

Bleu Score is a measure of how accurate a translation is. It is especially useful when there are multiple acceptable translations.

$$Precision = \frac{\sum\_{n-gram} count\_{clip}(n-gram)}{\sum\_{n-gram} count(n-gram)}$$

The above formula is used to calculate the precision of a translation by comparing the n-grams that occur in the result with the n-grams in the available acceptable translations.

$$count\_{clip}(n-gram)$$ is the maximum number of times the n-gram occurs in any one of the acceptable translations.\
$$count(n-gram)$$ is the number of times the n-gram occurs in the machine-generated translation.

For example, consider the following two acceptable translations:

T1: The cat is on the mat\
T2: There is a cat on the mat

The following is the machine translation:

MT: The cat the cat on the mat

| **n-gram (bi-gram)** | **count(n-gram)** | **count\_clip(n-gram)** |
| -------------------- | ----------------- | ----------------------- |
| the cat              | 2                 | 1                       |
| cat the              | 1                 | 0                       |
| cat on               | 1                 | 1                       |
| on the               | 1                 | 1                       |
| the mat              | 1                 | 1                       |

$$precision = \frac{1+0+1+1+1}{2+1+1+1+1} = \frac{4}{6} = 0.66$$

So, $$p\_n$$ is the Bleu Score computed on n-grams only. A combined Bleu Score (across different n-grams) is calculated as follows:

$$Combined, Bleu, Score = BP \* e^{\sum\_{i=1}^4 p\_i}$$

Where BP is the Brevity Penalty i.e. to penalize shorter translations.

BP = 1; if machine\_translation\_length > reference\_translation\_length

BP = (1 - machine\_translation\_length/reference\_translation\_length); otherwise
