Student Speak: Student at UCL Chats with KantanMT Team

architecture-1122359_1920Dissemination of Machine Translation innovation is a major priority for us at KantanMT. We believe that Academic Partnerships have a huge role to play in furthering the scope of research and innovation in the field of Machine Translation, and as such we have partnered with a number of Universities to help students use the KanataMT platform in a real word scenario.

We are always looking for ways to improve the KantanMT platform, and to keep our finger on the pulse of the KantanMT user experience, we asked one of the students using the platform to answer some questions about the platform.

Continue reading

Identifying Translation Gaps and Managing Machine Translation with KantanTimeLine™

What is Gap Analysis and Kantan TimeLine ?

Gap Analysis identifies and reports any untranslated words in the training data set and allows you to take preventive measures quickly by fine tuning training data and filling data gaps.The KantanTimeLine™ provides a chronological history of activities for each engine and uses version control for precise management of released and production-ready engines.

Using Kantan TimeLine and Gap Analysis:

In KantanBuildAnalytics, click the Gap Analysis tab to see the amount of untranslated words that remain in the generated translations. You will be directed to the Gap Analysis page, where you will see a breakdown of any gaps in your training data.

Gap Analysis tab in KantanMT

A table appears with 3 headings: ‘#’, Unknown Word, Reference/Source, KantanMT Output. Under those headings  you will find details of any untranslated words, their source and the KantanMT Output.

KantanMT Gap Analysis Table

Click Download to download your Gap Analysis report.

Download Gap Analysis KantanMT

Note: You can also click the Timeline tab to view your profiles’s Timeline, which is essentially a record of the changes you have made on your engine.

TimeLine Image

This is one of the many features provided in KantanBuildAnalytics, which aids Localization Project Managers in improving an engine’s quality after its initial training. To see other features used in KantanBuildAnalytics suite please see the links below.

Contact our team to get more information about KantanMT.com or to arrange a platform demonstration, demo@kantanmt.com.

Understanding BLEU for Machine Translation

KantanMT Whitepaper Improving your MT

It can often be challenging to measure the fluency of your Machine Translation engine,       and that’s where automatic metrics become very useful tool for the localization            engineer.

BLEU is one of the metrics used in KantanAnalytics for quality evaluation. BLEU Score is quick to use, inexpensive to operate, language independent, and correlates highly with human evaluation. It is the most widely used automated method of determining the quality of machine translation.

How to use BLEU ?

  1. To check the fluency of your KantanMT engine click on the ‘BLEU Scores’ tab. You will now be directed to the ‘BLEU Score’ page.bleu
  2. Place your cursor on the ‘Bleu Scores Chart’ to see the individual fluency score of each segment. . A pop-up will now appear on your screen with details of the segment under these headings, ‘Segment no.’, ‘Score’, ‘Source’‘Reference/Target’ and ‘KantanMT Output’.SEgment
  3. To see the ‘Bleu Scores’ of each segment in a table format scroll down. You will now see a table with the headings ‘No’, ‘Source’, ‘Reference/Target’, ‘KantanMT Output’ and ‘Score’.table
  4. To see an even more in depth breakdown of a particular ‘Segment’ click on the ‘Triangle’ beside the number of the segment you wish to view.
    Triangle
  5. To download the ‘BLEU Score’ of all segments click on the ‘Download’ button on the ‘BLEU Score’ page.download

This is one of the features provided by Kantan BuildAnalytics to improve an engine’s quality after its initial training .To see other features used by Kantan BuildAnalytics please click on the link below .To get more information about KantanMT and the services we provide please contact our support team at  at info@kantanmt.com.

What is KantanBuildAnalytics™?

KantanBuildAnalyticsRegardless of what we do in our professional careers there is one thing that we all have in common, and that is how to get more done, be more productive and achieve the results we want…yesterday! For Machine Translation or Localization engineers this means finding the quickest way to get their MT engines ready to translate files.

KantanBuildAnalytics is a feature that solves the problem of how to quickly improve an engine after its initial training with minimum cost and effort. This post will teach you how to use KantanBuildAnalytics to get your KantanMT engines ready to translate faster.

Lets look at some of the features available for KantanBuildAnalytics:

  • Fluency Analysis – work with segment level BLEU scores to find out how relevant your training data is and how it impacts engine fluency.
  • Recall and Precision Analysis – use segment level F-Measure scores to understand the recall precision of your MT engines.
  • Post-Editing Estimation – calculate how much editing it will take to prepare a machine translated file for publishing using segment level TER (Translation Error Rate) scores.
  • Gap Analysis –  improve your engine quickly by creating terminology (glossary) files, simply download a list of untranslated words or ‘gaps’ (as an excel file) then re upload the excel files as new glossary training data.
  • Training Data Reject Reports – see any training data segments that have been rejected from the engine and their reason for rejection in a downloadable excel file.
  • Timeline – like your facebook timeline, see your MT engine’s history, with every action taken to improve the engine. It even lets you archive versions so if something goes wrong in the retraining, you can go back to an earlier version.

How to use KantanBuildAnalytics

Login into your KantanMT account using your email and your password.

You will be directed to the ‘My Client Profiles’ page. You will be in the ‘Client Profiles’ section of the ‘My Client Profiles’ page. The last profile you were working on will be ‘Active’.

My Client Profiles KantanMT
My Client Profiles Dashboard, KantanMT.com

To use ‘KantanBuildAnalytics’ with another profile other than the ‘Active’ profile. Click on the profile you want to use the ‘KantanBuildAnalytics’ with and make sure that the profile selected has at least one ‘Build’ job done successfully.

Then click on the ‘Build Analytics’ tab on the My Client Profiles’ page.

KantanBuildAnalytics
Selecting KantanBuildAnalytics™ on an active KantanMT profile.

This will take you to the ‘KantanBuildAnalytics’ page, where you will see the ‘Summary’ tab. This is selected by default. Your summary tab should give you an overview of the performance and measurement of your KantanMT engine.

And of course for the excel lovers, its possible to download the full summary report as an excel spreadsheet, so the engine’s performance information can be analysed to suit your organisation’s specific style requirements. To download the report click on the ‘Download summary report’ button.

To ‘Deep Tune’ the engine click on the ‘Deep Tune’ button. be warned though, this is a thorough tuning of the engine and will take a lot of time, the bigger the MT engine, the longer the tuning process takes.

KantanBuildAnalytics Summary Report
Download KantanBuildAnalytics Summary Report

A ‘Tune Engine’ pop up window will now appear on your screen, click on the ‘OK’ button if you want to deep tune or on ‘Cancel’ if you no longer wish to deep tune the engine.

To see how many segments in the training data were rejected, click on the ‘Rejects Report’ tab. This takes you to the ‘Rejects Report’ page, where you will see a list of segments and the reasons they were rejected.

KantanBuildAnalytics Rejects Report
Generating your KantanBuildAnalytics Rejects Report

To download an excel version of the rejects report click on the ‘Download’ button.

To create, test and manage customised preprocessing rules for your training data, click on the ‘Preprocessor Mngt’ button.

These features help MT or Localization Engineers build and develop better performing KantanMT engines. Read more about these features below, or Contact a member of our sales team,  to start using our platform now!

Essential KPIs For Your SMT Engines

key performance indicator SMT engineThere are several key metrics that developers of Statistical Machine Translation (SMT) engines need to pay attention to. These key performance indicators (KPIs) help you to understand which aspects of an engine are performing well and which need improvement. They also provide insight into how you can improve the overall performance of your SMT engine.

SMT KPIs

While the following list is not exhaustive, it is a good starting point in your understanding of SMT performance and how to improve it.  You should remember that no single KPI can be analysed individually when determining the quality of an engine – they need to be analysed together, in a holistically manner, to give an accurate sense of the overall performance of the SMT engine.

F-Measure

F-Measure is a KPI that is used to determine the recall and precision capabilities of an SMT engine. Put simply, this measures how many words are picked from the SMT engine and how accurate the selection process is. Expressed as a ratio, F-Measure provides a good insight into the language coverage of an SMT engine.

If your F-Measure score is low, it indicates your engine is missing many words – if the score is high, it means that most words have been found in your engine.

While most SMT vendors present a single F-Measure value for an SMT engine, more progressive suppliers of SMT systems provide a distribution analysis of F-Measure scores.

Unfortunately, the level of precision and recall doesn’t give us information about the word order of a segment, for this we need to look at another KPI commonly referred to as BLEU.

BLEU

BLEU (Bilingual Evaluation Understudy) is a KPI that measures the fluency of the translated output of an SMT engine, which means it measures how many words overlap in a given translation when compared to a reference translation. Higher scores are given to segments which contain a greater number of sequential words.

BLEU is a major improvement on F-Measure as it takes word-order in account!

A high BLEU KPI means that an SMT engine is producing highly fluent translations; a low score means that it’s generally producing garbage. BLEU scores of 60% or higher are normally required before any SMT engine is considered production ready.

BLEU Score is easy use and understand, it is language independent and correlates highly with human evaluation which is why it is the most widely used KPI determining the quality of SMT engines.

TER

TER stands for Translation Error Rate and is an important KPI used to predict the most likely post-editing effort required for an SMT engine. It basically counts the number of insertions, deletions and substitutions that are required to transform a translation into a reference translation.

This is essentially what a professional translator would do in order to post-edit a translation to a level of publishable quality. Since post-editing is a timely and costly activity, SMT developers will try to minimize this and aim for low TER scores; less than 40% is a good benchmark for this KPI.

One more thing…

While older SMT systems produce these KPIs as single numerical values, a more modern approach is to look at the distribution of these KPIs across an SMT engine. This provides deeper insights and more accurate analysis of how an SMT engine is most likely going to perform in a production environment.

Three Take-aways:

1. Always aim for a High BLEU score, a High F-Measure score and a Low TER score.

2. Look at all three scores together to get a more holistic assessment of your SMT engine.

3. Examine the distribution of these KPIs across your SMT engine – this will help you to make smarter data choices for future customizations.

In our next blog we shall take a detailed look at each of these scores and see how we can work with them to improve SMT systems.

Tony O’Dowd, Founder and Chief Architect, KantanMT

 

Many Languages, One World: Student Essay Contest

The United Nations (UN) are big promoters of multilingualism and this week is no exception. The UN Academic Impact (UNAI) and the ELS Educational Services launched a student essay contest to promote international education and multilingualism. Entrants should submit an essay written in one of the six official languages of the UN: Arabic, Chinese, English, French, Russian and Spanish as long as it’s not their native tongue.

The theme of the contest “Many Languages, One World’, focuses on multilingualism in a globalised world and supports communication between all global citizens. The UN is a global organisation, which understands the challenges in making hefty volumes of content available in different languages.

multilingualism, languages, UN official languages, countries spoken
The number of countries where each official UN language is spoken

In 2001, Kofi Annan, UN Secretary-General at the time, suggested there was a linguistic imbalance with the UN having a tendency towards English. The reasons behind the imbalance boiled down to high translation costs and a lack of resources.

UN official languages, multilingualism, languages
UN official languages by number of speakers
Source: Ethnologue Languages of the World (SIL International, 2013)

Ten years later, in 2011, the World Intellectual Property Organization (WIPO) in collaboration with the UN, trained their Moses technology based Machine Translation engine, using approx. 11 years of translated UN documents (2000 – 2012), which were provided by the UN’s Documentation Division (DD).  The Tapta4Un was born – a Statistical Machine Translation (SMT) engine for professional UN translators.

The UN had used Google translate and Bing Translator to translate their publicly available documents at first, and with good results. But as data from other organisations was added to those engines, the quality of UN translated documents began to decrease.

The TAPTA engine, built with customised UN training data, provided a much higher quality Machine Translation result and higher BLEU scores compared with google translate. This paved the way for the ‘gText’ project, a global UN project, which is the product of the positive adoption of Machine Translation, tasked with integrating computer aided translation (CAT) tools into the document workflow.

KantanMT allows users to build a customised translation engine with training data that will be specific to their needs. KantanMT are continuing to offer a 14 day free trial to new members. click here>>