All your Burning Questions Answered! How Machine Translation Helps Improve Translation Productivity (Part I)

Part I

We had so many questions during the Q&A in our last webinar session ‘How to Improve Translation Productivity‘ by the KantanMT Professional services team, that we decided to split the answers into two blog posts. So, if you don’t find your questions answered here, check out our blog next week for the remaining answers. 

KantanMT_ComputersInternet today is experiencing what is generally referred to as a ‘content explosion!’ In this fast-paced world, businesses have to strive harder and do more to stay ahead of the game – especially if they are a global business or if they have globalization aspirations. One fool-proof way in which a business can successfully go global is through effective localization. Yet, the huge amount of content available online makes human translation for everything almost impossible. The only viable option then in today’s competitive online environment is through the use of Machine Translation (MT).

On Wednesday 21st October, Tony O’Dowd, Chief Architect of KantanMT.com and Louise Faherty, Technical Project Manager at KantanMT presented a webinar where they showed how Language Service Providers (LSPs)  (as well as enterprises) can improve the translation productivity of the team, manage post-editing effort and easily schedule projects with powerful MT engines. Here is a link to the recording of the webinar on YouTube along with a transcript of the Q&A session.

The answers below are not recorded verbatim and minor edits have been made to make the text more readable.

Question: Do you have clients doing Japanese to English MT? What are the results, and how did you get them? (i.e., do you pre-process the Japanese?)

Answer (Tony O’Dowd): English to Japanese Machine Translation (MT) has indeed always posed a challenge in the MT industry. So is it possible to build a high quality, high fidelity MT system for this language combination? Well, there have been quite a few developments recently to improve the prospect of building effective engines in this language combination. For example, one of the latest changes we made on the KantanMT platform for improving the quality of MT is by using new and improved reordering models to make the translation from English to Japanese and Japanese to English much smoother, so we deliver a higher quality output. In addition to that, higher quality training data sets are now available for this language pair, compared to a couple of years ago, when I had started building English to Japanese engines. Back then it was really challenging. It is still requires some effort to build English to Japanese MT engines, but the fact that there’s more content available in these languages makes it slightly easier for us to build high-quality engines.

We are also developing example-based MT for these engines and it so far this is showing encouraging signs of improving quality for this language pair. However, we have not started deploying this development on the platform yet.

KantanMT note: For more insights into how you can prepare high-quality training data, read these tips shared by Tony O’Dowd, and Selçuk Özcan, co-founder of Transistent Language Automation Services during the webinar ‘Tips for Preparing Training Data for High Quality MT.’

Question: Have you got a webinar recorded or scheduled, where we could see how the system works hands-on?

Answer (Tony O’Dowd): If you go on to the KantanMT website, we have video links on the product features pages. So you can actually watch an explanation video while you are looking at the component.

We work in a very visual environment, and we think videos are a great way of explaining how the platform works. And, if you go on to the website, on the bottom left corner of the page, you will find our YouTube channel, which contains videos on all sorts of topics, including how to build your first enginehow to translate your first document and  how to improve the output of your engines.

If you click on the Resources menu on our site, you can access a number of tutorials that will talk you through the basics of Statistical Machine Translation Systems. In other words, explore the website and you should find what you need.

KantanMT note: Some other useful links for resources are listed below:

Question: Do you provide any Post-Editing recommendations or standards for standardising the PE process? You said translation productivity rose to 8k words per day – this is only PE, correct?

Answer (Tony O’Dowd): I will take the second question first! The 8,000 words per day is the Post-Editing (PE) rate, yes. It is not the raw translation rate. In Machine Translation, everything comes out pretranslated. So this number refers to the Post-Editing effort – like insertions, deletions, substitution of words, and so on that you need to do to get the content to publishable quality.

Louise Faherty: What we recommend to our clients is that when it comes to PE, they should try to use MT. A lot of translators who are new to using MT will try and translate manually, which is a natural tendency, of course. But what we advise our clients is to copy and paste the translation (MT) in the engine and use the MT. The more you use MT and the more you Post-Edit, the better your engine will become.

Tony O’Dowd: I will add something to Louise Faherty ’s comments there. The best example of PE recommendations that I have come across is provided by a group called TAUS. They are at the pivot of educating the industry on how to develop a proficiency in PE.

Subscribe to TAUS YouTube channel here.

Question: What do ‘PPX’ and ‘PEX’ stand for (as abbreviations)?

Answer (Louise Faherty  and Tony O’Dowd): PEX stands for Post-Editing Automation. PEX allows you to take the output of an MT engine and dynamically alter that. When would you need to use PEX? Suppose there is a situation where your engine is repeating the same error over and over again. What you can do in such cases is write a PEX file (developed in the GENTRY programming language). This allows the engine to look for patterns in the output of the engine and to dynamically change that in the output.

For example, one of our French clients did not want to have a space preceding a colon mark in the output of their MT (because this was one of their typographical standards and repeated throughout the content). So we wrote a PEX rule that forced a stylistic change in the output of the engine. This enabled the client to reduce the number of Post-Edits substantially.

PPX stands for Preprocessor automation. You can use PPX files for to normalise or improve the training data. It is based on our GENTRY programming language which is available to all our clients for free.

In short then, PPX is for your training data, while PEX is for the actual raw output of your engine.

For more questions and answers, stay tuned for the next part of this post!

Cutting PEMT Times

KantanMT Cutting PEMT timesIn our last post, The Rise of PEMT, we discussed what automated post-editing means and why it is becoming more and more popular among Language Service Providers (LSPs). One of the most important things to remember about the post-editing process is that the less of it, the better.

In this post, we are going to look at some of the ways that you can keep your post-editing times to a minimum. This post is based on post-editing guidelines that have been developed by TAUS with, among others, KantanMT’s partners DCU and CNGL. A link to these guidelines is available at the end of this post.

7 steps to reducing your post-editing times

1. Train your KantanMT engine to improve translations
The quality of a KantanMT engine’s output increases as it is re-trained. This means running high quality training data through it and re-training using post-edited translations. The more you train your KantantMT engine with good training data, the more accurate your engine’s output will be. All of this means less post-editing time.

2. Make sure your training data is high quality
This rule stems directly from the previous point; a KantanMT engine’s accuracy will not improve if it is trained with poor quality data. Poor quality training data can be diagnosed by a number of factors such as poor writing style, unaligned segments, and data that is not specific to the client’s domain. Keep your training data clean and well-written.

3. Writing style/Pre-editing
It is very important to make sure that pre-translated documents are well written and grammatically correct. That means you should avoid misspellings, ambiguities, and make sure that sentences are grammatically complete. A Machine Translation engine does not correct writing errors so make sure that these mistakes are corrected before the source text is translated. See our previous blogs, Style Guides in MT and How to Write for MT for more information on this topic.

4.Terminology management
Ensure that terminology management is integrated “across source text authoring, Machine Translation and TM systems” (TAUS). Terminology management means defining terms and their rules of usage, and implementing these definitions and rules throughout a document. This safeguards a consistent level of accuracy and legibility across translation outputs.

easelly_visual(3)

5. Set realistic timelines
Make sure that you assess the quality of raw Machine Translation output before agreeing upon a price and the size of the translation order. Naturally, the poorer the output, the more post-editing time that will be required.

6. Decide upon a quality standard of post-editing
For some clients, an understandable document is all that is required. This means that stylistic issues are generally ignored but the meaning of the document is still accurately conveyed. For many clients however, the content must be perfect and this requires a degree of post-editing that also incorporates corrections to stylistic issues. O’ Brien et al, quoting Allen, say that the standard of post-editing output is determined by

•    “User Requirements
•    Volume
•    Quality Expectations
•    Turn-Around Time
•    Perishability
•    Text Function”

Remember to agree upon a post-editing standard with your client. The lower the expected standard of output, the less time consuming the post-editing process should be.

7. Use KantanMT’s Post-Editing Automation technology (PEX)
In our last post, The Rise of PEMT, we discussed the benefits for post-editors in using automated post-editing within their workflow. Here is a quick reminder:

A document has been translated by a KantanMT engine but there is a word that begins with a lower case letter which should begin with a capital letter. This mistake has been repeated throughout the document several hundred times. Rather than a post-editor having to manually find and correct each occurrence of this error, KantanMT’s PEX technology can find and correct the mistake with its rule system. You can find out more about PEX by clicking here.

This means that post-editors can save time and turn their attention to fixing more complex stylistic errors. All of this results in faster project completion times and lower costs.

In our next post, we will look at guidelines to achieving both understandable post-editing output and high quality post-editing output.

TAUS Machine Translation Post-Editing Guidelines

You can find out more about KantanMT by visiting KantanMT.com and signing up to our free 14 day trial.