Identifying Translation Gaps and Managing Machine Translation with KantanTimeLine™

What is Gap Analysis and Kantan TimeLine ?

Gap Analysis identifies and reports any untranslated words in the training data set and allows you to take preventive measures quickly by fine tuning training data and filling data gaps.The KantanTimeLine™ provides a chronological history of activities for each engine and uses version control for precise management of released and production-ready engines.

Using Kantan TimeLine and Gap Analysis:

In KantanBuildAnalytics, click the Gap Analysis tab to see the amount of untranslated words that remain in the generated translations. You will be directed to the Gap Analysis page, where you will see a breakdown of any gaps in your training data.

Gap Analysis tab in KantanMT

A table appears with 3 headings: ‘#’, Unknown Word, Reference/Source, KantanMT Output. Under those headings  you will find details of any untranslated words, their source and the KantanMT Output.

KantanMT Gap Analysis Table

Click Download to download your Gap Analysis report.

Download Gap Analysis KantanMT

Note: You can also click the Timeline tab to view your profiles’s Timeline, which is essentially a record of the changes you have made on your engine.

TimeLine Image

This is one of the many features provided in KantanBuildAnalytics, which aids Localization Project Managers in improving an engine’s quality after its initial training. To see other features used in KantanBuildAnalytics suite please see the links below.

Contact our team to get more information about KantanMT.com or to arrange a platform demonstration, demo@kantanmt.com.

Using F-Measure in Kantan BuildAnalytics

What is F-Measure ?

KantanMT Logo 800x800 F-Measure is an automated measurement that determines the precision and recall  capabilities of a KantanMT engine. F-Measure measures enables you to determine the  quality and performance of your KantanMT engine

  • To see the accuracy and performance of your engine click on the ‘F-measure Scores’ tab. You will now be directed to the ‘F-measure Scores’ page.

F-Measure tab

  • Place your cursor on the ‘F-measure Scores Chart’ to see the individual score of each segment. A pop-up will now appear on your screen with details of the segment under these headings, ‘Segment no.’, ‘Score’, ‘Source’, ‘Reference/Target’ and ‘KantanMT Output’.

Segment

  • To see the ‘F-measure Scores’ of each segment in a table format scroll down. You will now see a table with the headings ‘No’, ‘Source’, ‘Reference/Target’, ‘KantanMT Output’ and ‘Score’.
  • To see an even more in depth breakdown of a particular ‘Segment’ click on the Triangle beside the number of the segment you wish to view.Triangle
  • To reuse the engine as Test Data click on the ‘Reuse as Test Data’. When you do so, the ‘Reuse as Test Data’ button will change to ‘Delete Test Data’.Test Data
    Delete Test Data
  • To download the ‘F-measure Scores’, ‘BLEU Score’ and ‘TER Scores’ of all segments click on the ‘Download’ button on either the ‘F-measure Scores’, ‘BLEU Score’ or ‘TER Scores’ page.download

This is one of the features provided by Kantan BuildAnalytics to improve an engine’s quality after its initial training .To see other features used by Kantan BuildAnalytics please click on the link below .To get more information about KantanMT and the services we provide please contact our support team at  at info@kantanmt.com.

What is KantanISR and Why do I need it ?

KantanISR technology enables KantanMT members to perform instant segment retraining using a pop-up editor. The technology is designed to permit the near-instantaneous submission of post-edited translations into a KantanMT engine so that KantanMT members can submit segments for retraining, hence bypassing the need to completely rebuild the engine.

KantanISR was developed with usability, efficiency and productivity in mind as members simply need to login to their KantanMT account, go to their main dashboard and submit new training segments using the KantanISR Editor. This adding of high quality training data to a KantanMT engine will improve the translation quality of that engine and reduce post-editing requirements.

Using KantanISR

      1. Login into your KantanMT account using your email and your password.
      2. You will be directed to the ‘My Client Profiles’ page. You will be in the ‘Client Profiles’section of the ‘My Client Profiles’ page. The last profile you were working on will be‘Active’.
      3. If you wish to use the ‘KantanISR’ with another profile other than the ‘Active’ profile. Click on the profile you wish to use the ‘KantanISR’ with, then click on the ‘Training Data’ tab.
      4. You will be directed to the ‘Training Data’ page. Now click on the ‘IRS’ tab.
      5. The ‘KantanISR’ wizard will now pop-up on your screen.
      6. Add the source language text in the ‘Source’ text editor fields. Add the corresponding target language text in the ‘Target’ text editor fields.
      7. Then click on the ‘Save’ button if your happy with your retraining data. If not click the‘Cancel’ button.
      8. When you click the save button a ‘KantanISR successful’ pop-up will appear on your screen, click the ‘OK’ button and you will be directed back to the ‘Training Data’ page.

Using KantanISR through KantanAPI

Please Note: The KantanAPI is only available to KantanMT members in the Enterprise Plan.

Members’ can also get the benefit of KantanISR through KantanAPI by using HTTP

GET requests. The API expects:

  • A user authorisation token (‘API token’) which can be gotten by clicking on the ‘API’
  • The name of the client profile you wish to use.
  • A source segment and its target segment in the languages specified when profile was created.

To learn more about KantanISR or get help with KantanMT technologies, please contact us at info@kantanmt.com. Hear from the Development team on why KantanISR increases productivity and efficiency for KantanMT customers.

 

Sue’s Top Tips for Building MT Engines

Sue McDermott, KantanMTI’m new to machine translation and one of the things I’ve been doing at KantanMT is learning how to refine training data with a view to building stock engines.

Stock engines are the optional training data provided by KantanMT to improve the performance of your customized MT engine. In this post I’m going to describe the process of building an engine and refining the training data.

The building process on the platform is quite simple. From your dashboard on the website select “My Client Profiles” where you will find two profiles, which have already been set up. A default profile and sample profile; both of which let you run translation jobs straight away.

To create your own customized profile select ‘New’ at the top of the left-most column. This launches the client Profile Wizard.  Enter the name of your new engine; try to make this something meaningful, or use an easily recognizable standard around how you name your profiles. This makes it easier to recognize which profile is which, when you have more than one profile.

When you select ‘next’ you will be asked to specify the source and target languages from drop down menus. The wizard lets you distinguish between different variants of the same language for example Canadian English or US English. Let’s say we’re translating from Canadian English to Canadian French. If you’re not sure which variant you need, have a quick look at the training data, which will give you the language codes.

The next step gives you an option to select a stock engine from a drop down menu. The stock engines are grouped according to their business area or domain.

You will see a summary of your choices, if you’re happy with them select ‘create’. Your new engine will be shown in the list of your client profiles. However, while you have created your engine, you haven’t yet built it.

KantanMT Stock Engine Training data
Stock training data available for social and conversational domains on the KantanMT platform.

 

Building Your Engine

Selecting your profile from the list will make it the current active engine.  By selecting the Training Data tab you can upload any additional training data easily by using the drag and drop function. Then select the ‘Build’ option to begin building your engine.

It’s always a good idea to supply as much useful training data as possible. This ‘educates’ the engine in the way your organization typically translates text.

Once the build job has been submitted, you can monitor its progress in the ‘My Jobs’ page.

When the job is completed the BuildAnalytics™ feature is created. This can be accessed by clicking on the database icon to the left of the profile name. BuildAnalytics will give you feedback on the strength of your engine using industry standard scores, as well as details about your engines word count. The tabs across the page will give you access to more detail.

The summary tab lets you to see the average BLEU, F-Measure and TER scores for the engine, and the pie charts show you a summary of the percentage scores for all segments. For more detail select the respective tabs and use the data to investigate individual segments.

KantanMT BuildAnalytics Feature
KantanBuildAnalytics provides a granular analyis of your MT engine.

 

A Rejects Report is created for every file of Training Data uploaded. You can use this to determine why some of your data is not being used, and improve the uptake rate of your data.

Gap analysis gives you an effective way to improve your engine with relevant glossary or noise lists, which you can upload to future engine builds. By adding these terminology files in either TBX (Terminology Interchange) or XLSX (Microsoft Excel Spreadsheet) formats you will quickly improve the engines performance.

The Timeline tag shows you the evolution of your engine over its lifetime. This feature lets you compare the statistics with previous builds, and track all the data you have uploaded. On a couple of occasions, I used the archive feature to revert back to a previous build, when the engine building process was not going according to plan.

KantanMT Timeline
KantanMT Timeline lets you view your entire engine’s build history.

 

Improving Your Engine

A great way to improve your engines performance is to analyze the rejects report for the files with a higher rejection rate.  Once you understand the reasons segments are rejected you can begin to address them.  For example, an error 104 is caused by a difference in place holder counts. This can be something as simple as the source language using the % sign where the target language uses the word ‘percent’. In this case a preprocessor rule can be created to fix the problem.

KantanMT Rejects Report Error 104
A detailed rejects report shows you the errors in your MT engine.

A PEX rule editor is accessed from the KantanMT drop down menu. This lets you try out your preprocessor rules, and see the effect that they have in the data. I would suggest directly copying and pasting from the rejects report to the test area and applying your PEX rule to ensure you’re precisely targeting the data concerned. You can get instant feedback using this tool.

Once you’re happy with the way the rules work on the rejected data it’s useful to analyze the rest of the data to see what effect the rules will have.  You want to avoid a situation where using a rule resolves 10 rejects, but creates 20 more. Once the rules are refined copy them to the appropriate files (source.ppx, target.ppx) and upload with the training data. Remember that the rules will run against the content in the order they are specified.

When you rebuild the engine they will be incorporated, and hopefully improve the scores.

Sue’s 3 Tips for Successfully Building MT Engines

  1. Name your profiles clearly – When you are using a number of profiles simultaneously knowing what each one is (Language pair/domain) will make it much easier as you progress through the building process.
  2. Take advantage of BuildAnalytics – Use the insights and Gap analysis features to give you tips on improving your engine. Listening to these tips can really help speed up the engine refinement process.
  3. The PEX Rule Editor is your friend – Don’t be afraid to try out creating and using new PEX rules, if things go south you can always go back to previous versions of your engine.

My internship at KantanMT.com really opened my eyes to the world of language services and machine translation. Before joining the team I knew nothing about MT or the mechanics behind building engines. This was a great experience, and being part of such a smoothly run development team was an added bonus that I will take with me when I return ITB to finish my course.

About Sue McDermott

Sue is currently studying for a Diploma in Computer Science from ITB (Institute of Technology Blanchardstown). Sue joined KantanMT.com on a three month internship. She has a degree in English Literature and a background in business systems, and is also a full-time mum for the last 17 years.

Email: info@kantanmt.com, if you have any questions or want more information on the KantanMT platform.

Language Industry Interview: KantanMT speaks with Maxim Khalilov, bmmt Technical Lead

Language Industry Interview: KantanMT speaks with Maxim Khalilov, bmmt Technical LeadThis year, both KantanMT and its preferred Machine Translation supplier, bmmt, a progressive Language Service Provider with an MT focus, exhibited side by side at the tekom Trade Fair and tcworld conference in Stuttgart, Germany.

As a member of the KantanMT preferred partner program, bmmt works closely with KantanMT to provide MT services to its clients, which include major players in the automotive industry. KantanMT was able to catch up with Maxim Khalilov, technical lead and ‘MT guru’ to find out more about his take on the industry and what advice he could give to translation buyers planning to invest in MT.

KantanMT: Can you tell me a little about yourself and, how you got involved in the industry?

Maxim Khalilov: It was a long and exciting journey. Many years ago, I graduated from the Technical University in Russia with a major in computer science and economics. After graduating, I worked as a researcher for a couple of years in the sustainable energy field. But, even then I knew I still wanted to come back to IT Industry.

In 2005, I started a PhD at Universitat Politecnica de Catalunya (UPC) with a focus on Statistical Machine Translation, which was a very new topic back then. By 2009, after successfully defending my thesis, I moved to Amsterdam where I worked as a post-doctoral researcher at the University of Amsterdam and later as a RD manager at TAUS.

Since February 2014, I’ve been a team lead at bmmt GmbH, which is a German LSP with strong focus on machine translation.

I think my previous experience helped me to develop a deep understanding of the MT industry from both academic and technical perspectives.  It also gave me a combination of research and management experience in industry and academia, which I am applying by building a successful MT business at bmmt.

KMT: As a successful entrepreneur, what were the three greatest industry challenges you faced this year?

MK: This year has been a challenging one for us from both technical and management perspectives. We started to build an MT infrastructure around MOSES practically from scratch. MOSES was developed by academia and for academic use, and because of this we immediately noticed that many industrial challenges had not yet been addressed by MOSES developers.

The first challenge we faced was that the standard solution does not offer a solid tag processing mechanism – we had to invest into a customization of the MOSES code to make it compatible with what we wanted to achieve.

The second challenge we faced was that many players in the MT market are constantly talking about the lack of reliable, quick and cheap quality evaluation metrics. BLEU-like scores unfortunately are not always applicable for real world projects. Even if they are useful when comparing different iterations of the same engines, they are not useful for cross language or cross client comparison.

Interestingly, the third problem has a psychological nature; Post-Editors are not always happy to post edit MT output for many reasons, including of course the quality of MT. However, in many situations the problem is that MT post-editing requires a different skillset in comparison with ‘normal’ translation and it will take time before translators adopt fully to post editing tasks.

KMT: Do you believe MT has a say in the future, and what is your view on its development in global markets?

MK: Of course, MT will have a big say in the language services future. We can see now that the MT market is expanding quickly as more and more companies are adopting a combination TM-MT-PE framework as their primary localization solution.

“At the same time, users should not forget that MT has its clear niche”

I don’t think a machine will be ever able to translate poetry, for example, but at the same time it does not need to – MT has proved to be more than useful for the translation of technical documentation, marketing material and other content which represents more than 90% of the daily translators load worldwide.

Looking at the near future I see that the integration of MT and other cross language technologies with Big Data technologies will open new horizons for Big Data making it a really global technology.

KMT: How has MT affected or changed your business models?

MK: Our business model is built around MT; it allows us to deliver translations to our customers quicker and cheaper than without MT, while at the same time preserving the same level of quality and guaranteeing data security. We not only position MT as a competitive advantage when it comes to translation, but also as a base technology for future services. My personal belief, which is shared by other bmmt employees is that MT is a key technology that will make our world different – where translation is available on demand, when and where consumers need it, at a fair price and at its expected quality.

KMT: What advice can you give to translation buyers, interested in machine translation?

MK: MT is still a relatively new technology, but at the same time there is already a number of best practices available for new and existing players in the MT market. In my opinion, the four key points for translation buyers to remember when thinking about adopting machine translation are:

  1. Don’t mix it up with TM – While TMs mostly support human translators storing previously translated segments, MT translates complete sentences in an automatic way, the main difference is in these new words and phrases, which are not stored in a TM database.
  2. There is more than one way to use MT – MT is flexible, it can be a productivity tool that enables translators to deliver translations faster with the same quality as in the standard translation framework. Or MT can be used for ‘gisting’ without post-editing at all – something that many translation buyers forget about, but, which can be useful in many business scenarios. A good example of this type of scenario is in the integration of MT into chat widgets for real-time translation.
  3. Don’t worry about quality – Quality Assurance is always included in the translation pipeline and we, like many other LSPs guarantee, a desired level of quality to all translations independently of how the translations were produced.
  4. Think about time and cost – MT enables translation delivery quicker and cheaper than without MT.

A big ‘thank you’ to Maxim for taking time out of his busy schedule to take part in this interview, and we look forward to hearing more from Maxim during the KantanMT/bmmt joint webinar ‘5 Challenges of Scaling Localization Workflows for the 21st Century’ on Thursday November 20th (4pm GMT, 5pm CET and 8am PST).

KantanMT Industry Webinar 5 Challenges of Scaling Localization for the 21st Century_Webinar

Register here for the webinar or to receive a copy of the recording. If you have any questions about the services offered from either bmmt or KantanMT please contact:

Peggy Linder, bmmt (peggy.lindner@bmmt.eu)

Louise Irwin, KantanMT (louisei@kantanmt.com)

Scalability or Quality – Can we have both?

KantanMT Engine optimization, machine translationThe ‘quality debate’ is old news and the conversation, which is now heavily influenced by ‘big data’ and ‘cloud computing’ has moved on. Instead it is focusing on the ability to scale translation jobs quickly and efficiently to meet real-time demands.

Translation buyers expect a system or workflow that provides high quality, fit-for-purpose translations. And it’s because of this that Language Service Providers (LSPs) have worked tirelessly, perfecting their systems and orchestrating the use of Translation Memories (TM) within well managed workflows that combine the professionalization of the translator industry – quality is now a given in the buyers eyes.

What is the translation buyers’ biggest challenge?

The Translation buyers’ biggest challenge now is scale – scaling their processes, their workflows and supply chains. Of course, the caveat is that they want scale without jeopardizing quality! They need systems that are responsive, are transparent and scale gracefully in step with their corporate growth and language expansion strategy.

Scale with quality! One without the other is as useless as a wind-farm without wind!

What makes machine translation better than other processes? Looking past the obvious automation of the localization workflow, the one thing that MT can do above all other translation methods is its ability to combine automation and scalability.

KantanAutoScale, KantanMT product, machine translationKantanMT recognizes this and has developed a number of key technologies to accelerate the speed of on-demand MT engines without compromising quality.

  • KantanAutoScale™ is an additional divide and conquer feature that lets KantanMT users distribute their translation jobs across multiple servers running in the cloud.
  • Engine Optimization technology means KantanMT engines now operate 5-10 times faster, reducing the amount of memory and CPU power needed so MT jobs can be processed faster and are more efficiently when using features like KantanAutoScale.
  • API optimization, KantanMT engineers went back to basics, reviewing and refining the system, which enabled users to achieve improvements from 50-100% performance in translation speed.  This meant translation jobs that took five hours can now be completed in less than one hour.

Scalability is the key to advancement in machine translation, and considering the speed at which people are creating and digesting content we need to be able to provide true MT scalability to all language pairs for all content.

KantanMT’s Tony O’Dowd and bmmt’s Maxim Khalilov will discuss the scalability challenge and more, in a free webinar for translation buyers; 5 Challenges of Scaling Localization Workflows in the 21st Century on Thursday November 20th at 4pm GMT, 5pm CET, 8am PST.

KantanMT and bmmt webinar presenters Tony O'Dowd and Maxim Khalilov

To hear more about optimizing or improving the scalability of your engine please contact Louise Irwin (louisei@kantanmt.com).

5 Reasons to Read the TAUS Review

Earlier this month, TAUS, a well-known industry think tank and resource centre for the language services industry launched its quarterly publication; the TAUS review. The new magazine with a mission is dedicated to;

“Making translation technology more prominent and mainstream throughout the globe to break language barriers and improve worldwide communication.”

KantanMT TAUS Review

KantanMT identified five key reasons that make the review an invaluable asset to any translation and localization professional. It’s thanks to these reasons that KantanMT will distribute the TAUS Review right here on the KantanMTblog.

1. Global Translation Industry news 

TAUS has mobilized writers from across the globe; Africa, Americas, Asia and Europe to discuss different trends and technologies in the language services industry. These articles can become a great reference tool for those interested in how language technologies are advancing. In this issue; Andrew Joscelyne reports from Europe; Brian McConnell gives updates from the Americas; Asian trends are covered by Mike Tian-Jian Jiang and Amlaku Eshetie reports from the southern hemisphere; Africa.

2. Research and Reports 

Recent Research in MT is pretty exciting stuff, those that consider themselves language industry veterans like Luigi Muzii remember a time when machine translation predictions were overestimated. But what was once an unrealistic assumption is now changing as “neural networks and big data” are bringing a new frontier to natural language processing. Luigi Muzii gives an overview of the ‘research perspective’, highlighting current trends in research and linking to some interesting ACL winning papers, which introduce MT decoders that do not need linguistic resources.

3. Unique Insights

TAUS Review offers unique insights into the translation industry by incorporating use cases and perspectives from four different personas; the researcher, the journalist, the translator and the language expert, each one with their own different views and opinions on the importance of global communication and breaking down language barriers. In this issue, Jost Zetzsche, Nicholas Ostler, Lane Greene, and Luigi Muzii share their perspectives.

KantanMT especially enjoyed  Jost Zetzsche’s view of making “machine translation translator-centric” where the translator is at the centre of the MT workflow. One of the examples he lists for making this possible, “dynamic improvements in MT systems” is available to KantanMT clients.

4. Language Technology Community 

The opinions and thoughts that come from each contributor are neatly wrapped in one accessible place, and when coupled with the directory of distributors, events and webinars make a very useful resource for any small business or language technology enthusiast. Keep an eye out for some very interesting post-editing and MT quality webinars planned for November.

5. It’s Free! 

Holding true to the concept of sharing information and making translation technology more prominent and mainstream throughout the globe, the review is available quarterly and completely free for its readers, making it accessible to anyone, anywhere regardless of their budget.

Scroll to the end of the page to find the TAUS review on the KantanMTBlog.