Student Speak: Student at UCL Chats with KantanMT Team

architecture-1122359_1920Dissemination of Machine Translation innovation is a major priority for us at KantanMT. We believe that Academic Partnerships have a huge role to play in furthering the scope of research and innovation in the field of Machine Translation, and as such we have partnered with a number of Universities to help students use the KanataMT platform in a real word scenario.

We are always looking for ways to improve the KantanMT platform, and to keep our finger on the pulse of the KantanMT user experience, we asked one of the students using the platform to answer some questions about the platform.

Continue reading

Understanding BLEU for Machine Translation

KantanMT Whitepaper Improving your MT

It can often be challenging to measure the fluency of your Machine Translation engine,       and that’s where automatic metrics become very useful tool for the localization            engineer.

BLEU is one of the metrics used in KantanAnalytics for quality evaluation. BLEU Score is quick to use, inexpensive to operate, language independent, and correlates highly with human evaluation. It is the most widely used automated method of determining the quality of machine translation.

How to use BLEU ?

  1. To check the fluency of your KantanMT engine click on the ‘BLEU Scores’ tab. You will now be directed to the ‘BLEU Score’ page.bleu
  2. Place your cursor on the ‘Bleu Scores Chart’ to see the individual fluency score of each segment. . A pop-up will now appear on your screen with details of the segment under these headings, ‘Segment no.’, ‘Score’, ‘Source’‘Reference/Target’ and ‘KantanMT Output’.SEgment
  3. To see the ‘Bleu Scores’ of each segment in a table format scroll down. You will now see a table with the headings ‘No’, ‘Source’, ‘Reference/Target’, ‘KantanMT Output’ and ‘Score’.table
  4. To see an even more in depth breakdown of a particular ‘Segment’ click on the ‘Triangle’ beside the number of the segment you wish to view.
    Triangle
  5. To download the ‘BLEU Score’ of all segments click on the ‘Download’ button on the ‘BLEU Score’ page.download

This is one of the features provided by Kantan BuildAnalytics to improve an engine’s quality after its initial training .To see other features used by Kantan BuildAnalytics please click on the link below .To get more information about KantanMT and the services we provide please contact our support team at  at info@kantanmt.com.

Essential KPIs for SMT: F-Measure

In our last blog post I discussed some of the Key Performance Indicators (KPIs) used by SMT developers to estimate the performance quality of their KantanMT engines. These KPIs help developers understand what aspects of their SMT engine are performing well and which need improvement.

In this blog I’m going to dive deep into F-Measure, a KPI which can provide insight into; the relevancy of your training data, the engine’s overall performance, and the suitability of an SMT engine for a particular domain or content type.

What is F-Measure?

F-Measure is a KPI which measures the precision and recall capabilities of an SMT system. It can also be viewed as a measure of translation accuracy and relevancy.

f-measure analagyBursting Red Balloons

In SMT, we can look at precision as a percentage of retrieved words that are relevant and recall (sometimes referred to as sensitivity) as the percentage of relevant words that are retrieved.

This is best explained using a thought experiment: So, imagine a box containing 10 red balloons and a few green balloons. Suppose we burst 5 balloons at random and 3 of these are red – we can calculate our precision as 3/5 (60%) and our recall as 3/10 (40%).

These two calculations offer a good estimation of the accuracy with which we are able to burst red balloons – the higher this calculation is, the better the chances that we will burst more red balloons.

So what has this thought experiment got to do with SMT systems?

Precision & Recall

Precision and recall are closely related to the understanding of accuracy.  Since SMT systems are based on pattern recognition, it is helpful to see how accurate they are at retrieving words and more importantly how relevant this retrieval is.

F-Measure is a calculation of both precision and recall and is expressed as a ratio.
If we go back to our balloon bursting experiment, precision was calculated as 60% and recall as 40%. To express these two values as a ratio, we can use the F-Measure formula as follows:-

f-measure     0.48

Source: Statistical Machine Translation by Philipp Koehn

In simple terms – we’re just not good at bursting red balloons 🙂

F-Measure and SMT engines

Using F-Measure we can get a general sense of the accuracy in which an SMT engine can retrieve words. If we examine the distribution of these scores across a set of reference translations we can get helpful insights which we can use to improve the training data and boost engine performance.

Here’s an example of an F-Measure distribution:

Statistical Machine Translation graph

Screen shot of Kantan BuildAnalytics F-Measure distributions

The overall F-Measure score for this particular SMT engine is 72%. This is a good value, and we can say that this engine is highly accurate at retrieving words for its target language and domain i.e. it has high precision in word retrieval and these are relevant to the target domain.

Also, the distribution of these scores across the reference translation set shows that the majority of these (60% of the total reference translations set) are in the 70-100% range. The distribution graph also shows that approximately 20% of the reference translations score less than 40%.  By examining this we can check to see if words/terminology are missing, and then create additional training material to improve the performance the engine.

Closing remarks…

F-Measure is a good starting point for understanding the quality of an SMT engine but it does have a major downfall, while it measures the recall and precision capabilities of an SMT engine, it doesn’t take into the account the order in which the words are retrieved.

So, as in the famous sketch with Andre Previn and Morecambe and Wise, we may know all the notes but not necessarily in the right order:

Morecambe_and_Wise_YT_screenshot

One more thing…
In order to improve the F-Measure score, an engine must become aware of word order, which is sometimes referred to as fluency. In the next post I will look at BLEU (Bilingual Evaluation Understudy) and examine how this metric helps us to further understand the quality of SMT engines.

KantanMT’s new BuildAnalytics technology illustrates the distribution of F-Measure, BLEU, and TER score across our members SMT engines. It also generates a Gap Analysis, highlighting missing words in members training data, and gives a provides KantanMT members with a training data rejects reports – great information that helps members of KantanMT.com develop a deep understanding of how their SMT engines work, and how to improve their performance.

You can watch a video of Kantan BuildAnalytics here>>