Measure your KantanMT Engine Quality

Although KantanMT has made Machine Translation incredibly easy for its members to use, it is still a very complex process behind the scenes. Statistical Machine Translation(SMT) uses highly specialised hardware and software that works hard at analysing monolingual and bilingual content, it also builds sophisticated data models and processes complex mathematical algorithms.

At KantanMT, we try our best to hide all the complexity from our members, so that they can concentrate on doing what they do best, that is – creating value for their clients.

In addition to providing the hardware and software for building engines, KantanMT provides its members with internationally recognised quality metrics to measure and evaluate the quality of their Machine Translation engines. Bleu score, Translation Error Rate(TER) and F Measure allow members to effectively measure and manage their Machine Translation.

Below you will find an overview of these metrics.

Bleu scoreBLEUScore:
Human evaluations can take days or even weeks to finish so a scoring system was developed to automate this process of evaluation. This first method is referred to as BLEU Score which stands for Bilingual Evaluation Understudy. It is an internationally recognised metric and the most widely used measure of MT engine quality. BLEU score measures phrases and therefore is a good measurement of an engine’s ‘Fluency’. Our BLEU metric scores a translation on a scale of 0 to 100%.
The closer to 100%, the more the translation correlates to a human translation so you need to aim for a high score.

F MeasureF-Measure Score:
F-Measure is an automated measurement used to determine the precision and recall capabilities of a KantanMT engine. It is used as a general guide to determine the overall quality performance of an engine.
F-Measure is the ratio between recall and precision measurements and is displayed as a percentage value on a scale of between 0 to 100%. Always aim for a HIGH score. The higher the score the better!

TER scoreTER Score:
Post-Editing of Machine Translation outputs requires considerable effort and expense. It is also difficult to predict the time required to post-edit a translation and bring it up to publishable translation quality. A method was developed to help in predicting this post-editing effort and this is called Translation Error Rate (or TER). TER is quick to use, inexpensive to operate, language independent and correlates highly with actual post-editing effort. A TER score is a value in the range of 0-100%. A high TER score suggests that a translation will require more post-editing.

Click here for more information on Quality Evaluation Metrics

Kevin McCoy, Customer Relationship Manager

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s