Using F-Measure in Kantan BuildAnalytics

What is F-Measure ?

KantanMT Logo 800x800 F-Measure is an automated measurement that determines the precision and recall  capabilities of a KantanMT engine. F-Measure measures enables you to determine the  quality and performance of your KantanMT engine

  • To see the accuracy and performance of your engine click on the ‘F-measure Scores’ tab. You will now be directed to the ‘F-measure Scores’ page.

F-Measure tab

  • Place your cursor on the ‘F-measure Scores Chart’ to see the individual score of each segment. A pop-up will now appear on your screen with details of the segment under these headings, ‘Segment no.’, ‘Score’, ‘Source’, ‘Reference/Target’ and ‘KantanMT Output’.


  • To see the ‘F-measure Scores’ of each segment in a table format scroll down. You will now see a table with the headings ‘No’, ‘Source’, ‘Reference/Target’, ‘KantanMT Output’ and ‘Score’.
  • To see an even more in depth breakdown of a particular ‘Segment’ click on the Triangle beside the number of the segment you wish to view.Triangle
  • To reuse the engine as Test Data click on the ‘Reuse as Test Data’. When you do so, the ‘Reuse as Test Data’ button will change to ‘Delete Test Data’.Test Data
    Delete Test Data
  • To download the ‘F-measure Scores’, ‘BLEU Score’ and ‘TER Scores’ of all segments click on the ‘Download’ button on either the ‘F-measure Scores’, ‘BLEU Score’ or ‘TER Scores’

This is one of the features provided by Kantan BuildAnalytics to improve an engine’s quality after its initial training .To see other features used by Kantan BuildAnalytics please click on the link below .To get more information about KantanMT and the services we provide please contact our support team at  at

Understanding BLEU for Machine Translation

KantanMT Whitepaper Improving your MT

It can often be challenging to measure the fluency of your Machine Translation engine,       and that’s where automatic metrics become very useful tool for the localization            engineer.

BLEU is one of the metrics used in KantanAnalytics for quality evaluation. BLEU Score is quick to use, inexpensive to operate, language independent, and correlates highly with human evaluation. It is the most widely used automated method of determining the quality of machine translation.

How to use BLEU ?

  1. To check the fluency of your KantanMT engine click on the ‘BLEU Scores’ tab. You will now be directed to the ‘BLEU Score’ page.bleu
  2. Place your cursor on the ‘Bleu Scores Chart’ to see the individual fluency score of each segment. . A pop-up will now appear on your screen with details of the segment under these headings, ‘Segment no.’, ‘Score’, ‘Source’‘Reference/Target’ and ‘KantanMT Output’.SEgment
  3. To see the ‘Bleu Scores’ of each segment in a table format scroll down. You will now see a table with the headings ‘No’, ‘Source’, ‘Reference/Target’, ‘KantanMT Output’ and ‘Score’.table
  4. To see an even more in depth breakdown of a particular ‘Segment’ click on the ‘Triangle’ beside the number of the segment you wish to view.
  5. To download the ‘BLEU Score’ of all segments click on the ‘Download’ button on the ‘BLEU Score’

This is one of the features provided by Kantan BuildAnalytics to improve an engine’s quality after its initial training .To see other features used by Kantan BuildAnalytics please click on the link below .To get more information about KantanMT and the services we provide please contact our support team at  at

Why you should use KantanAnalytics

What is KantanAnalytics?

things to think about when buying MTKantanAnalytics generates quality estimation scores for automated translations generated by KantanMT engines. The better the KantanAnalytics scores – the better the quality performance of a KantanMT engine as it means translations are more accurate and fluent and require less post-editing effort.

KantanAnalytics creates a detailed project management report of all segments within a KantanMT project. This includes segment-by-segment quality estimation scores in addition to other useful project statistics such as word, character, placeholder and tag counts.

KantanAnalytics can help Project Managers make the right decision as it predicts the cost and post-editing effort for Machine Translation projects. Prioritizing the right translations through segment quality estimation will yield the fastest possible project turn-around.

How to use KantanAnalytics

  1. Login into your KantanMT account using your email and your password.
  2. You will be directed to the ‘My Client Profiles’ page. You will be in the ‘Client Profiles’ section of the ‘My Client Profiles’ page. The last profile you were working on will be ‘Active’
  3. If you wish to view the ‘KantanAnalytics’ of another profile other than the ‘Active’ profile. Click on the profile you wish to view the ‘KantanAnalytics’ of, then click on the ‘Client Files’ tab.
  4. You will now be directed to the ‘Client Files’ page.
  5. Click on the ‘Analyse’ tab.
  6. A ‘Launch Job’ pop will now appear on your screen saying your job has been launched. Note the ‘Job ID’ Click on the ‘OK’ button on the ‘Launch Job’ pop-up.
  7. You will receive an email notification stating that your job has been launched in the email address you use to register on
  8. You will also receive an email notification when the job has been completed.
  9. Click on the ‘KantanMT’ tab and select ‘My Jobs’ from the drop down menu.
  10. You will now be directed to the ‘My Jobs’ page.
  11. Search for the job using the ‘Job ID’, you can use the ‘Search Bar’ or go to the ‘#’ column and scroll till you find the ‘Job ID’.
  12. Click on the analyse Icon beside the job to view the job analysis
  13. You will now be directed to the ‘KantanAnalytics Report’ page for the job.
  14. To exit out of the ‘KantanAnalytics Report’ page click on the ‘Back to My Jobs’ button.
  15. You will be directed back to the ‘My jobs’ page.

To download the ‘KantanAnalytics’ of the job click on the download icon

Additional Information

For more details on KantanAnalytics please see the following video below:

To learn more about KantanAnalytics or get help with KantanMT technologies, please contact us at or visit the KantanMT website. Hear from the Development team on the benefits of KantanAnalytics


SMT Quality Challenge

KantanMT Machine Translation TechnologyOne of the biggest challenges when customizing Statistical Machine Translation (SMT) is improving the engine after its initial development. While you can build a baseline engine using existing Translation Memories (TM), terminology and monolingual training data assets – the real challenge is going beyond this, and achieving even higher levels of quality. More importantly, how can you do this rapidly with minimum cost and effort? A proactive approach to measuring the quality of your training data will greatly assist in doing this.

Kantan BuildAnalytics™ is a new technology that addresses this head-on and helps SMT developers to build engines that are production ready, fast!

What is Kantan BuildAnalytics?

Kantan BuildAnalytics brings a new level of transparency to the SMT building and training process, and KantanMT users can now build higher performing engines for each domain, resulting in less post-editing requirements.

How it works…

When you build a KantanMT engine, some of your training data is automatically extracted and kept to one side. This is called a Reference Data Set – and contains both source and target texts. After a KantanMT engine is built, this Reference Data Set is used to calculate a series of automated quality scores – including BLEU (Bilingual Evaluation Understudy), F-Measure and TER.

This Reference Data Set is also used to perform a Gap Analysis. Gap Analysis is a quick way to determine any missing words in the engine’s phrase-tables. I’ll come back to this later and demonstrate how Gap Analysis can improve the quality performance of KantanMT engines.

But for now, let’s focus on the automated quality scores of BLEU, F-Measure and TER.

BuildAnalytics uses the KantanMT data visualization library to graphically display the distribution of these automated scores based on the Reference Data Set. Since an automated score is calculated for each text segment within the Reference Data Set, this means we get a detailed view of how a KantanMT engine is performing and how it should generate translated output.

By analysing these scores and the Gap Analysis results, and examining the translated output, users of KantanMT are producing higher quality engines because their training data choices are more strategic and refined.


Let’s look at F-Measure first, as this is the most straightforward to understand and visualize.  F-Measure scores show how precise a KantanMT engine is when retrieving words, and how many words it can retrieve or recall during translation. This is why it is commonly referred to as a Recall and Precision measurement. By expressing these two measurements as a ratio, it is a good indicator of the engines performance and its ability to translate content.

KantanMT F-Measure
KantanMT engine F-Measure score distribution

However, while your KantanMT engine may have a high F-Measure score – it doesn’t mean that these words are recalled in the correctly translated order.  We need another metric to give us an indication of how well the engine translated the text and BLEU is one of the most recognized and automated metric for estimating the texts fluency.


BLEU is an automatic evaluation metric well known in both the industry and academia, which calculates an estimation of text fluency. Fluency is a measure of the correspondence between a KantanMT engine output and that of a professional translator.

Since the Reference Data Set consists of both source and human translated equivalents, which were created by a professional translator, BLEU score can be calculated by comparing the output of a KantanMT engine to this Reference Data Set.

KantanMT BLEU score
KantanMT engine BLEU score distribution

In practice, BLEU achieves a high correlation with human judgement of quality and remains one of the most popular automated metrics in use today.


TER standards for Translation Error Rate and is used to estimate the amount of post-editing required to transform a generated translation to its original human translation equivalent. In simple terms this is a count of the number of insertions, deletions and substitutions required to transform a segment to match its original human translation equivalent.

KantanMT TER score
KantanMT engine TER score distribution

So the lower this score, the less transformation required which means the less post-editing required too.

Working with Kantan BuildAnalytics™

BuildAnalytics is a really great way to see all these automated scores in action. It uses KantanMT data visualization technology to graphically present these scores, helping developers of KantanMT engines to fine-tune their training data and maximize their engine’s quality performance.

Let’s take a closer look at how this data visualization can be used to gain insights into an engine and determine if it is a high or low performing engine, and what steps we can take to improve it.

Here’s the summary distribution graphs for an engine that contains approx. 3.2m words. It’s a small engine within a technical domain. Its overall scores are:

KantanMT BuildAnalytics Graph

These Summary Graphs show the distribution of scores, grouped into bands (i.e. <40%, 40-54% etc.), for each automated score. This is very helpful in determining the scores’ overall distribution, and how the KantanMT engine is likely to be performing.

Here are the detailed distribution graphs for each automated score:

KantanMT distribution graphs

By reviewing both the Summary Graphs and the more detailed Distribution Graphs we can make some observations of how this engine would most likely perform. My observations are included as part of the commentary in the table above.

It’s important to point out that no one individual score gives an absolute of how a KantanMT engine will perform. We need to take a holistic view on how to determine a general sense of the performance of the engine by reviewing all automated scores together.

Using Kantan BuildAnalytics users can get a good sense of how a KantanMT engine will perform in a production environment and with a little practice and experimentation, they can use this knowledge to build higher performing MT engines.

Gap Analysis

I mentioned this concept earlier in the post, so let’s take a closer look at this really helpful new feature. Gap Analysis determines how many untranslated words remain in the generated translations. These missing words, or ‘Gaps’ can quickly be identified and filled by introducing the most relevant training data to your KantanMT engine and re-training it.

The Gap Analysis feature not only lists the gaps, it also presents suitable training data, which can be post-edited and resubmitted as training data to improve overall engine’s performance. This makes filling the gaps just that little bit easier!

One more (very important) thing…

Most quality improvements for SMT systems will be created by fine tuning terminology and filling data gaps. Post-editing raw-MT output and a focus on minimizing data gaps will significantly improve the quality performance of your KantanMT engines. This cannot be done without the involvement of professional translators. They have the skills, knowledge and linguistic expertise to finesse terminology, identify gaps and choose better training data. While BuildAnalytics helps SMT developers get engines ready for production, ultimately, it’s the professional translator that should have the final say in how production-ready it truly is!

To get the most from your Machine Translation engine, always keep in mind:

  • Measuring and improving training data – high quality training data is the first step to building a successful Machine Translation engine.
  • Take a holistic approach to evaluating performance – automatic evaluation metrics can give a good indicator of how your KantanMT engine will perform, but metrics alone are insufficient for measuring post-editing effort.

Kantan BuildAnalytics is available to Enterprise members of KantanMT, but you can also experience this quality estimation and measurement software by signing up for a free trial on

KantanMT – 2013 Year in Review

KantanMT 2013 year in ReviewKantanMT had an exciting year as it transitioned from a publicly funded business idea into a commercial enterprise that was officially launched in June 2013. The KantanMT team are delighted to have surpassed expectations, by developing and refining cutting edge technologies that make Machine Translation easier to understand and use.

Here are some of the highlights for 2013, as KantanMT looks back on an exceptional year.

Strong Customer Focus…

The year started on a high note, with the opening of a second office in Galway, Ireland, and KantanMT kept the forward momentum going as the year progressed. The Galway office is focused on customer service, product education and Customer Relationship Management (CRM), and is home to Aidan Collins, User Engagement Manager, Kevin McCoy, Customer Relationship Manager and MT Success Coach, and Gina Lawlor, Customer Relationship co-ordinator.

KantanMT officially launched the KantanMT Statistical Machine Translation (SMT) platform as a commercial entity in June 2013. The platform was tested pre-launch by both industry and academic professionals, and was presented at the European OPTIMALE (Optimizing Professional Translator Training in a Multilingual Europe) workshop in Brussels. OPTIMALE is an academic network of 70 partners from 32 European countries, and the organization aims to promote professional translator training as the translation industry merges with the internet and translation automation.

The KantanMT Community…

The KantanMT member’s community now includes top tier Language Service Providers (LSPs), multinationals and smaller organizations. In 2013, the community has grown from 400 members in January to 3400 registered members in December, and in response to this growth, KantanMT introduced two partner programs, with the objective of improving the Machine Translation ecosystem.

The Developer Partner Program, which supports organizations interested in developing integrated technology solutions, and the Preferred Supplier of MT Program, dedicated to strengthening the use of MT technology in the global translation supply chain. KantanMT’s Preferred Suppliers of MT are:

KantanMT’s Progress…

To date, the most popular target languages on the KantanMT platform are; French, Spanish and Brazilian-Portuguese. Members have uploaded more than 67 billion training words and built approx. 7,000 customized KantanMT engines that translated more than 500 million words.

As usage of the platform increased, KantanMT focused on developing new technologies to improve the translation process, including a mobile application for iOS and Android that allows users to get access to their KantanMT engines on the go.

KantanMT’s Core Technologies from 2013…

KantanMT have been kept busy continuously developing and releasing new technologies to help clients build robust business models to integrate Machine Translation into existing workflows.

  • KantanAnalytics™ – segment level Quality Estimation (QE) analysis as a percentage ‘fuzzy match’ score on KantanMT translations, provides a straightforward method for costing and scheduling translation projects.
  • BuildAnalytics™ – QE feature designed to measure the suitability of the uploaded training data. The technology generates a segment level percentage score on a sample of the uploaded training data.
  • KantanWatch™ – makes monitoring the performance of KantanMT engines more transparent.
  • TotalRecall™ – combines TM and MT technology, TM matches with a ‘fuzzy match’ score of less than 85% are automatically put through the customized MT engine, giving the users the benefits of both technologies.
  • KantanISR™ Instant Segment Retraining technology that allows members near instantaneous correction and retraining of their KantanMT engines.
  • PEX Rule Editor – an advanced pattern matching technology that allows members to correct repetitive errors, making a smoother post-editing process by reducing post-editing effort, cost and times.
  • Kantan API – critical for the development of software connectors and smooth integration of KantanMT into existing translation workflows. The success of the MemoQ connector, led to the development of subsequent connectors for MemSource and XTM.

KantanMT sourced and cleaned a range of bi-directional domain specific stock engines that consist of approx. six million words across legal, medical and financial domains and made them available to its members. KantanMT also developed support for Traditional and Simplified Chinese, Japanese, Thai and Croatian Languages during 2013.

Recognition as Business Innovators…

KantanMT received awards for business innovation and entrepreneurship throughout the year. Founder and Chief Architect, Tony O’Dowd was presented with the ICT Commercialization award in September.

In October, KantanMT was shortlisted for the PITCH start-up competition and participated in the ALPHA Program for start-ups at Dublin’s Web Summit, the largest tech conference in Europe. Earlier in the year KantanMT was also shortlisted for the Vodafone Start-up of the Year awards.

KantanMT were silver sponsors at the annual 2013 ASLIB Conference ‘Adopting the theme Translating and the Computer’ that took place in London, in November, and in October, Tony O’Dowd, presented at the TAUS Machine Translation Showcase at Localization World in Silicon Valley.

KantanMT have recently published a white paper introducing its cornerstone Quality Estimation technology, KantanAnalytics, and how this technology provides solutions to the biggest industry challenges facing widespread adoption of Machine Translation.

KantanAnalytics WhitePaper December 2013

For more information on how to introduce Machine Translation into your translation workflow contact Niamh Lacy (

#T9n and the Computer

The 35th ASLIB conference opens today, Thursday 28th November and runs for two days in Paddington, London. The annual ‘Translating and the Computer Conference’ serves to highlight the importance of technology within the translation industry and to showcase new technologies available to localization professionals.KantanMT

KantanMT was keen to have a look at how technology has shaped the translation industry throughout history so we took a look at some of the translation technology milestones over the last 50 years.

The computer has had a long history, so it’s no surprise that developments in computer technology greatly affect how we communicate. Machine Translation research dates back to the early 1940s, although its development was stalled because of negative feedback regarding the accuracy of early MT output. The ALPAC (Automatic Language Processing Advisory Committee) report published in 1966, prompted researchers to look for alternative methods to automate the translation process.


In terms of modern development, the real evolution of ‘translation and the computer’ began in the 1970s, when more universities started carrying out research and development on automated translation. At this point, the European Coal and Steel Community in Luxemburg and the Federal Armed Forces Translation Agency in Mannheim, Germany were already making use of text related glossaries and automatic dictionaries. It was also around this time that translators started to come together to form translation companies/language service providers who not only translated, but also took on project management roles to control the entire translation process.

Developing CAT tools


Translation technology research gained momentum during the early 1980s as commercial content production increased. Companies in Japan, Canada and Europe who were distributing multilingual content to their customers, now needed a more efficient translation process. At this time, translation technology companies began developing and launching Computer Assisted Translation (CAT) technology.

Innovation, KantanMT-IconDutch company, INK was one of the first to release desktop translation tools for translators. These tools originally called INK text tools, sparked more research into the area. Trados, a German translation company, started reselling INK text tools and this led to the research and development of the TED translation editor, an initial version of the translator’s workbench.


The 1990s were an exciting time for the translation industry. Translation activities that were previously kept separate from computer software development were now being carried out together in what was termed localization. The interest in localizing for new markets led to translation companies and language service providers merging both technology and translation services, becoming Localization Service Providers.

Trados launched their CAT tools in 1990, with Multiterm, for terminology management and the Translation Memory (TM) software Translators Workbench in 1994. ATRIL, Madrid launched a TM system in 1993 and STAR (Software, Translation, Artwork, Recording) also released Transit, a TM system in 1994. The ‘fuzzy match’ feature was also developed at this time and quickly became a standard feature of TM.

Increasingly, translators started taking advantage of CAT tools to translate more productively. This lead to a downward pressure on price, making translation services more competitive.

The Future…

As we move forward, technology continues to influence translation. Global internet diffusion has increased the level of global communication and has changed how we communicate. We can now communicate in real-time, on any device and through any medium. Technology will continue to develop, and become faster and more adaptive to multi-language users, and demand for real-time translation will drive the further developments in the areas of automated translation solutions.

Find out more about KantanMT’s Quality Estimation Technology, KantanAnalytics.

Crowdsourcing vs. Machine Translation

KantanMT CrowdsourcingCrowdsourcing is becoming more popular with both organizations and companies since the concept’s introduction in 2006, and has been adopted by companies who are using this new production model to improve their production capacity while keeping costs low. The web-based business model, uses an open call format to reach a wide network of people willing to volunteer their services for free or for a limited reward, for any activity including translation. The application of translation crowdsourcing models has opened the door for increased demand of multilingual content.

Jeff Howe, Wired magazine defined crowdsourcing as:

“…the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call”.

Crowdsourcing costs equate to approx. 20% of a professional translation. Language Service Providers (LSPs) like Gengo and Moravia have realised the potential of crowdsourcing as part of a viable production model, which they are combining with professional translators and Machine Translation.

The crowdsourcing model is an effective method for translating the surge in User Generate Content (UGC). Erratic fluctuations in demand need a dynamic, flexible and scalable model. Crowdsourcing is definitely a feasible production model for translation services, but it still faces some considerable challenges.

Crowdsourcing Challenges

  • No specialist knowledge – crowdsourcing is difficult for technical texts that require specialised knowledge. It often involves breaking down a text to be translated into smaller sections to be sent to each volunteer. A volunteer may not be qualified in the domain area of expertise and so they end up translating small sections text, out of context, with limited subject knowledge which leads to lower quality or mistranslations.
  • Quality – translation quality is difficult to manage, and is dependent on the type of translation. There have been some innovative suggestions for measuring quality, including evaluation metrics such as BLEU and Meteor, but these are costly and time consuming to implement and need a reference translation or ‘gold standard’ to benchmark against.
  • Security – crowd management can be a difficult task and the moderator must be able to vet participants and make sure that they follow the privacy rules associated with the platform. Sensitive information that requires translation should not be released to volunteers.
  • Emotional attachment – humans can become emotionally attached to their translations.
  • Terminology and writing style inconsistency – when the project is divided amongst a number of volunteers, the final version’s style needs to be edited and checked for inconsistencies.
  • Motivation – decisions on how to motivate volunteers and keep them motivated can be an ongoing challenge for moderators.

Improvements in the quality of Machine Translation have had an influence on crowdsourcing popularity and the majority of MT post-editing and proofreading tasks fit into crowdsourcing models nicely. Content can be classified into ‘find-fix-verify’ phases and distributed easily among volunteers.

There are some advantages to be gained when pairing MT technology and collaborative crowdsourcing.

Combined MT/Crowdsourcing

Machine Translation will have a pivotal role to play within new translation models, which focus on translating large volumes of data in cost-effective and powerful production models. Merging both Machine Translation and crowdsourcing tasks will create not only fit-for-purpose, but also high quality translations.

  • Quality – as the overall quality of Machine Translation output improves, it is easier for crowdsourcing volunteers with less experience to generate better quality translations. This will in turn increase the demand for crowdsourcing models to be used within LSPs and organizations. MT quality metrics will also make post-editing tasks more straightforward and easier to delegate among volunteers based on their experience.
  • Training data word alignment and engine evaluations can be done through crowd computing, and parallel corpora created by volunteers can be used to train and/or retrain existing SMT engines.
  • Security – customized Machine Translation engines are more secure when dealing with sensitive product or client information. General or publicly available information is more suited to crowdsourcing.
  • Terminology and writing style consistency – writing style and terminology can be controlled and updated through a straightforward process when using MT. This avoids the idiosyncrasies of volunteer writing styles. There is no risk of translator bias when using Machine Translation.
  • Speed – Statistical Machine Translation (SMT) engines can process translations quickly and efficiently. When there is a need for a high volume of content to be translated within a short period of time it is better to use Machine Translation. Output is guaranteed within a designated time and crowdsourcing post-editing tasks speeds up the production process before final checks are carried out by experienced translators or post-editors.
crowdsource and Machine Translation model
Use of crowdsourcing for software localization. Source: V. Muntes-Mulero and P. Paladini, CA Technologies and M. Solé and J. Manzoor, Universitat Politècnica de Catalunya.

Last chance for a FREE TRIAL for KantanAnalytics™ for all members until November 30th 2013. KantanAnalytics will be available on the Enterprise Plan.

Interview: Working on KantanMT – a Developers Perspective

Eduardo shanahan
Eduardo Shanahan, CNGL

Eduardo Shanahan, a Senior Software Engineer at CNGL spent time working on KantanMT during its early days. KantanMT asked Eduardo to talk about what it was like to work with Founder and Chief Architect, Tony O’Dowd and the rest of the team developing the KantanMT product.

What was your initial impression, when you joined DLab in DCU?

This past year was a different kind adventure. After more than two decades working with Microsoft products like Visual Studio, so it was a big change, moving to Dublin City University (DCU) to be part of the Design and Innovation Lab, or DLab as we call it. The work in DLab consists of transforming code written by researchers into industrial quality products.

One of the first changes was to get a Mac and start deploying code in Linux, with no Visual Studio or even Mono. Instead I worked mostly with Python and NodeJS, and piles of shell scripts. Linux and Python, were not new to me but they did take some adjusting to using them.

This was a completely new environment and a new experience, and I was working in a whole new area. Back then, my relationship with Artificial Intelligence (AI) was informal to say the least, and I wasn’t even aware that something like Statistical Machine Translation (SMT) existed.

How did you get involved with working on KantanMT?

Starting out, I was working on a variety of different projects simultaneously.  A few months into it though, I started working full time with a couple of researchers creating new functionality for Tony and his KantanMT product, which is based on open source Moses technology. Moses technology uses aligned target and source texts of parallel corpora to train a SMT translation system. Once the system is trained, search algorithms are applied to find the most suitable translation matches. This translation model can be applied to any language pair.

What were your goals working on the KantanMT project?

Tony is doing a great job, deploying it on Amazon Web Services and creating a set of tools to streamline the operations for end users. His request to CNGL, was to provide more advanced insight into the translation quality produced by Moses.

To accomplish this, the task was mapped to two successive projects with different researchers on each project. The pace was very intense, we wanted state of the art results that showed up in the applications. Sandipan Dandapat, Assistant Professor in the Department of Computer Science and Engineering, IIT Guwahati and Aswarth Dara, Research Assistant at CNGL, DCU worked on adding real value to the KantanMT product during those long weeks, while I was rewriting their code time after time until it passed all the tests and then some. Our hard work paid off when KantanWatch™ and KantanAalytics™ were born.

Each attempt to deliver was an experience in itself, Tony was quick to detect any inconsistencies and wanted to be extra sure about understanding all the details and steps on the research and implementation.

In your opinion was the work a success?

The end result, is something that has made me proud. The mix between being a scientist and having a real product to implement is a very good combination. The guys at DCU have done a great job on the product base and DLab is a fantastic research and work environment.  The no nonsense attitude from Tony’s side created a very interesting situation and It’s something that we can really celebrate after a year of hard work.

The CNGL Centre for Global Intelligent Content

The CNGL Centre for Global Intelligent Content (Dublin City University, Ireland) is supported by the Science Foundation Ireland. During its academic-industry collaborative research it has not only driven standards in content and localization service integration, but it is also pioneering advancements in Machine Translation through the development of disruptive and cutting edge processing technologies. These technologies are revolutionising global content value chains across a number of different industries.

The CNGL research centre draws its talent and expertise from a combined 150 researchers from Trinity College Dublin, Dublin City University, University College Dublin and University of Limerick. The centre also works closely with industry partners to produce disruptive technologies that will have a positive impact both socially and economically.

KantanMT allows users to build a customised translation engine with training data that will be specific to their needs. KantanMT are continuing to offer a 14 day free trial to new members.

KantanMT and MemSource Cloud Connector

Tony O' Dowd KantanMT’s Founder and Chief Architect
Tony O’ Dowd

Cloud technology and web-based applications have made a significant impact on the localization industry, levelling the playing field between large and smaller Language service providers (LSPs). LSPs who leverage cloud technology can be more competitive. The ‘content explosion’ has also driven the need for on-demand translation services, and taking advantage of cloud technology is the most strategic option for translating large volumes of content securely and in real-time.

David Canek, CEO of MemSource Technologies
David Canek

Software integration plays an important role in achieving a centralised localization management structure. The MemSource Cloud connector developed to integrate with KantanMT will ensure greater control and productivity in localization and translation workflows.

To acknowledge the connector’s release, I caught up with, David Canek, CEO of MemSource Technologies and Tony O’Dowd, KantanMT’s Founder and Chief Architect to get their thoughts on the impact of cloud software integration in the localization industry.

MemSource recently developed a connector to integrate KantanMT and MemSource Cloud, can you explain how the connector works and what this will mean for its users?

[David] Yes, we have developed a connector that lets all of our 10 thousand users very easily select KantanMT as their preferred MT engine for their MemSource Cloud translation projects. The connector is part of our 3.8 release and available as from 3 November 2013. The KantanMT integration supports all of our Machine Translation features, including our post-editing features, specifically the post-editing analysis.

[Tony] The team at MemSource have developed a straightforward mechanism to integrate Machine Translation services into their cloud platform. The MemSource community of LSPs and professional translators can easily select KantanMT as their preferred Machine Translation engine. Integration between both platforms using this new KantanMT connector will boost translation productivity, reduce project costs and improve project margins for the MemSource community.

This partnership is a great example of synergy between two related businesses within the translation industry. How do you think integration will create value for clients and the industry?

[David] Machine Translation has become an integral part of the human translation process and so we found it a logical step to integrate an innovative player in the Machine Translation scene, such as KantanMT.

[Tony] KantanMT combines the speed and accuracy of traditional Translation Memory with the speed and cost-advantages of Machine Translation into a single seamless platform. The current economic climate indicates the localization industry can be certain of only two things – that margin erosion and price compression will continue to put pressure on LSPs to operate with higher levels of efficiency while lowering overall costs.

MemSource and KantanMT customers will benefit from achieving economies of scale when they integrate Machine Translation directly into their existing translation workflows. KantanMT scales effortlessly with business demands and growth, and KantanMT members will benefit from increased profitability as greater volumes of client data are processed.  This helps LSP’s achieve higher levels of operational efficiency while also delivering cost savings to their customers.

There is a lot of buzz around “moving to the cloud” in the tech world, particularly for translation and localization services. As a supplier of both cloud and server translation technology, have you noticed any preference for one over the other, which do your clients prefer and why?

[David] Our clients just like us prefer the cloud version of any technology, including MemSource technology. Therefore, we really focus on providing MemSource Cloud and it is only a question of time when we discontinue offering MemSource in the server option.

[Tony] Progressive companies cannot ignore the financial and operational efficiencies the cloud delivers. The cloud helps organisations achieve economies of scale through reduced capital costs, which are often associated with the investment and maintenance of a technology infrastructure. Combine this with new pricing models like lower monthly subscription fees, which are replacing large upfront software license fees, operating on the cloud ensures a competitive business. This is even more so in the localization industry where the translation of ‘big data’ from the content explosion has increased the need for on-demand localization and translation services. The cloud’s multi-tenant architecture offers LSPs a flexible solution for efficiently managing large volumes of data.

In your opinion, what will the integration of these technologies mean for the future translation industry in the short and longer terms?

[David] Machine Translation has become mainstream technology and will soon have the same importance as Translation Memory in the localization industry. We have believed in this vision right from the start of developing MemSource. This is why we have pioneered the post-editing analysis and other features in MemSource that bring Machine Translation to the forefront and seamlessly integrate it with existing technologies such as Translation Memory.

[Tony] In the short-term, the technology with the greatest impact in the translation industry will be the availability of high speed, on-demand Machine Translation services. It will be used as a tool to boost translator productivity, reduce project costs and improve margins. In using the KantanMT connector, LSPs can integrate Machine Translation into their translation workflows quickly and easily, immediately offering improved services to their clients.

Over the longer-term, like MemSource, KantanMT believes there will be a continuous push to blend Machine Translation and traditional Translation Memory systems into one seamless service. At KantanMT, we’ve made significant progress on this vision by fusing traditional Translation Memory with advanced Machine Translation into the KantanMT platform, and also through the recent development of predictive segment quality estimation technology called KantanAnalytics™.

Thank you, to both David and Tony who gave up time from their busy schedules to be interviewed.

There are still a couple of weeks left to take advantage of the KantanAnalytics™ feature. KantanAnalytics™ is available for ALL KantanMT members until 30th November. When the offer ends it will become an Enterprise Plan only feature.

For more information about the KantanMT Enterprise plan please contact Aidan (

MT Quality Estimation – KantanAnalytics™

kantanmt, KantanAnalytics

The newest addition to the KantanMT technology portfolio is KantanAnalytics™.  KantanAnalytics, which has been co-developed with the CNGL Centre for Global Intelligent Content (Dublin City University, Ireland), assigns a quality estimation score for each automated translation generated by a KantanMT engine. Expressed as a percentage – this predicts the score a human translator would likely assign as to the utility of the translation. KantanAnalytics help Project Managers predict the cost and schedule of Machine Translation projects and creates new business model opportunities for the localization industry.

The commercialisation of Translation Memory technology in the early 1990’s revolutionised the localization industry and led to increased productivity and translation performance. It also provided a new pricing model for the industry – one based on the type of translation memory match (referred to as a ‘fuzzy-match’). This pricing structure, which was tied to the fuzzy-match score, became an industry standard and an invaluable tool Project Managers could use for providing an accurate cost analysis on translation projects. It was also used to predict the time to complete a project.

The use of KantanAnalytics technology means Project Managers can apply a similar pricing structure when calculating the cost of Machine Translation or Post-Edited Machine Translation (PEMT) projects. Currently, Project Managers and translators use fixed charges, such as calculating hourly rates or a fixed number of words for Machine Translation and PEMT. This method lacks precision and transparency and is not a sufficient cost calculation method to drive the wide scale adoption of Machine Translation.

What this means for KantanMT Members

KantanMT Enterprise Members can use a two-pronged approach to measure Machine Translation quality. Using KantanWatch, BLEU, TER and F-measure scores can show the engine’s overall quality level during the training or development stage, then KantanAnalytics is used to analyse the quality of each segment generated by a KantanMT engine.

By using the KantanAnalytics reports, akin to a ‘fuzzy-match’ report, Project Managers can then determine the number of segments, the quality of each segment and estimate how long a project will take to complete and what the cost should be.

This quality estimation score is expressed as a percentage – the higher the score, the better the quality and consequently the less effort required to post-edit it.

KantanAnalytics can be quickly deployed by Project Managers and Enterprise Members can implement a tiered pricing model on Machine Translation jobs similar to Translation Memory jobs. This is an excellent fit within existing business models, fusing two important industry technologies Machine Translation and Translation Memory.

KantanAnalytics creates the framework for a more accurate, more efficient cost management and deployment of Machine Translation throughout the localization industry.

KantanAnalytics User Interface (UI)

KantanAnalytics report
KantanAnalytics Report

Here is a quick look at the new KantanAnalytics interface. The KantanAnalytics report can be viewed in the Project Dashboard on KantanMT or downloaded as a Microsoft Excel file. The report is generated by clicking the graph icon located in the job status column.

The report results are shown when the report is expanded. To expand the report click on ‘summary’ or ‘file name’. The results are represented in three graphs at the top of the report. In the screen shot below, Total Recall technology shows that 76% of the file for translation generated matches 85% or higher. The second graph, shows that 24% of the document had matches less than 85%. The third graph then shows the quality estimation scores in 10% increments. This data is also listed below the graphs in numerical form.

KantanAnalytics dashboard
KantanAnalytics Report
KantanMT Analytics will be available to Enterprise Members of the platform from 30th October. To sign up for the Enterprise Plan or to upgrade to this plan please email KantanMT’s Success Coach, Kevin McCoy (