Implementation for paper BLEU: a Method for Automatic Evaluation of Machine Translation

Last update: Oct 07, 2021

Overview

BLEU Score

Implementation for paper:

BLEU: a Method for Automatic Evaluation of Machine Translation

Author: Ba Ngoc from ProtonX

BLEU score is a popular metric to evaluate machine translation. Check out the recent Transformer project we published.

I. Usage

from bleu_score import cal_corpus_bleu_score

candidates = ['eating chicken chicken is a eating a eating chicken',
              'eating chicken chicken is not good']
references_list = [['a chicken is eating chicken', 'there is a chicken eating chicken'], [
    'a chicken is eating chicken', 'there is a chicken eating chicken']]

bleu_score = cal_corpus_bleu_score(candidates, references_list,
                      weights=(0.25, 0.25, 0.25, 0.25), N=4)

print('Bleu Score: {}'.format(bleu_score))

II. BLEU Score Formula

1. Precision

We count specific n-grams in the candidates and the number of those grams in the references. Then we calculate the proportion of two countings and get the precision.

Important to note: Count clip means that the number of typical n-grams can not exceed the maximum number of that n-grams in any single reference.

For example: if ('a', 'a') gram exists 3 times in a candidate. However, the maximum number of this gram in any single reference is 2. So we will use value 2 for calculation.

If you never heard about grams? It means that we count the number of continuous substrings with a pre-set length in a string.

Candidate 1: 'eating chicken chicken is a eating a eating chicken'

-------Unigram------


eating	3
chicken	3
is	1
a	2

-------bigrams------


eating chicken	2
chicken chicken	1
chicken is	1
is a	1
a eating	2
eating a	1

We can do the same thing with trigrams and 4-grams

2. Sentence brevity penalty

We prefer the reference with a length that is closest to the candidate's.

Checkout function get_eff_ref_length in utils.py.

c: the total lengths of all candidates

r: the total lengths of all effective reference lengths

3. BLEU Formula

N: the number of grams

w: list of pre-set weight for each gram

Implementation for paper BLEU: a Method for Automatic Evaluation of Machine Translation

Related tags

Overview

BLEU Score

1. Precision

2. Sentence brevity penalty

3. BLEU Formula

Owner

Ngoc Nguyen Ba

A modular Karton Framework service that unpacks common packers like UPX and others using the Qiling Framework.

Code for the paper "Language Models are Unsupervised Multitask Learners"

Translate U is capable of translating the text present in an image from one language to the other.

Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

KR-FinBert And KR-FinBert-SC

NVDA, the free and open source Screen Reader for Microsoft Windows

Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"

Global Rhythm Style Transfer Without Text Transcriptions

Code Implementation of "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

Google's Meena transformer chatbot implementation

test

This codebase facilitates fast experimentation of differentially private training of Hugging Face transformers.

Natural Language Processing Specialization

An open source framework for seq2seq models in PyTorch.

A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python

This repository has a implementations of data augmentation for NLP for Japanese.

Gold standard corpus annotated with verb-preverb connections for Hungarian.

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Python api wrapper for JellyFish Lights

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.