Post-Editing Machine Translation

Statistical Machine Translation (SMT) has many uses – from the translation of User Generated Content (UGC) to Technical Documents, to Manuals and Digital Content. While some use cases may only need a ‘gist’ translation without post-editing, others will need a light to full human post-edit, depending on the usage scenario and the funding available.

Post-editing is the process of ‘fixing’ Machine Translation output to bring it closer to a human translation standard. This, of course is a very different process than carrying out a full human translation from scratch and that’s why it’s important that you give full training for staff who will carry out this task.

Training will make sure that post-editors fully understand what is expected of them when asked to complete one of the many post-editing type tasks. Research (Vasconcellos – 1986a:145) suggests that post-editing is a honed skill which takes time to develop, so remember your translators may need some time to reach their greatest post-editing productivity levels. KantanMT works with many companies who are post-editing at a rate over 7,000 words per day, compared to an average of 2,000 per day for full human translation.

Types of Training: The Translation Automation User Society (TAUS) is now holding online training courses for post-editors.

post-editing

Post-editing Levels

Post-editing quality levels vary greatly and will depend largely by the client or end-user. It’s important to get an exact understanding of user expectations and manage these expectations throughout the project.

Typically, users of Machine Translation will ask for one of the following types of post-editing:

  • Light post-editing
  • Full post-editing

The following diagram gives a general outline of what is involved in both light and full post-editing. Remember however, the effort to meet certain levels of quality will be determined by the output quality your engine is able to produce

post-editing machine translation

Generally, MT users would carry out productivity tests before they begin a project. This determines the effectiveness of MT for the language pair, in a particular domain and their post-editors ability to edit the output with a high level of productivity. Productivity tests will help you determine the potential Return on Investment of MT and the turnaround time for projects. It is also a good idea to carry out productivity tests periodically to understand how your MT engine is developing and improving. (Source: TAUS)

You might also develop a tailored approach to suit your company’s needs, however the above diagram offers some nice guidelines to start with. Please note that a well-trained MT engine can produce near human translations and a light touch up might be all that is required. It’s important to examine the quality of the output with post-editors before setting productivity goals and post-editing quality levels.

PEX Automatic Post-editing

Post-Editor Skills

In recent years, post-editing skills have become much more of an asset and sometimes a requirement for translators working in the language industry. Machine Translation has grown considerably in popularity and the demand for post-editing services has grown in line with this. TechNavio predicted that the market for Machine Translation will grow at a compound annual growth rate (CAGR) of 18.05% until 2016, and the report attributes a large part of this rise to “the rapidly increasing content volume”.

While the task of post-editing is markedly different to human translation, the skill set needed is almost on par.

According to Johnson and Whitelock (1987), post-editors should be:

  • Expert in the subject area, the text type and the contrastive language.
  • Have a perfect command of the target language

Is it also widely accepted that post-editors who have a favourable perception of Machine Translation perform better at post-editing tasks than those who do not look favourably on MT.

How to improve Machine Translation output quality

Pre-editing

Pre-editing is the process of adjusting text before it has been Machine Translated. This includes fixing spelling errors, formatting the document correctly and tagging text elements that must not be translated. Using a pre-processing tool like KantanMT’s GENTRY can save a lot of time by automating the correction of repetitive errors throughout the source text.

More pre-editing Steps:

Writing Clear and Concise Sentences: Shorter unambiguous segments (sentences) are processed much more effectively by MT engines. Also, when pre-editing or writing for MT, make sure that each sentence is grammatically complete (begins with a capital letter, has at least one main clause, and has an ending punctuation).

Using the Active Voice: MT engines work impressively on text that is clear and unambiguous, that’s why using the active voice, which cuts out vagueness and ambiguity can result in much better MT output.

There are many pre-editing steps you can carry out to produce better MT output. Also, keep in mind writing styles when developing content for Machine Translation to cut the amount of pre-editing required. Get tips on writing for MT here.

For more information about any of KantanMT’s post-editing automation tools, please contact: Gina Lawlor, Customer Relationship Manager (ginal@kantanmt.com).

Machine Translation Style Guides


KantanMT writing for Machine TranslationIn our last post, How to Write for MT, we showed you how you can improve your Machine Translation output by keeping to a few simple rules when writing content for Machine Translation.

Companies often present these rules to their writers within style guides. A style guide is a set of rules/standards that a company’s writers adhere to when writing and designing documents. Style guides are essential for companies who have a team of writers producing their documentation.

When writing for Machine Translation, a clear and concise style guide can help to cut translation and post-editing times, and perhaps most importantly – cost. In this blog, we are going to look at the benefits of using style guides, and how style guides can be useful when writing for Machine Translation.

Consistent formatting…
Companies can use a style guide to make sure that their documentation is formatted consistently. A style guide can tell writers how to format every feature of a document including the title page, table of contents, typeface, paragraphs, graphics etc. This means that writers just have to worry about the quality of their content. Good quality content, as we have discussed in earlier posts, produces better Machine Translation outputs.

Recycling content…
A style guide is very useful for generating content that can be recycled throughout a company’s documentation. If there is a topic that writers must cover repeatedly, a company style guide can tell writers exactly how to write and structure this topic. Writers can then copy and paste this content into the relevant document segments.

Follow these three steps to build your KantanMT engine:
1. Gather your training data
2. Build the KantanMT engine
3. Translate your clients’ files

Check out our last blog, How to Write for MT, we used the same quote there too. Not only does recycling content save time for your writers, but a Machine Translation engine can accurately translate this content because it may process it hundreds of times.

easelly_visual
Post-editing…
A style guide can save a great deal of time when editing documents post-translation. For example, if there is an error within a segment that a company’s writers have recycled throughout a document, a post-editor can quickly correct the issue using KantanMT’s Post-Editing Automation (PEX) technology. Because the segment has been recycled, the post-editor knows that he/she does not have to manually check each individual use of the segment within the Machine Translation output for errors.

Cost savings…
All of the benefits contribute to this final one – and that’s saving money! An effective style guide that writers adhere to can save hours of labour for both writers and post-editors. This means that high quality Machine Translations are ready for publication at a faster and more affordable rate.

So what’s our advice?

Well, if you don’t have a style guide in place – get one in place. Machine Translation is a process and the quality of one step affects the quality of the next. Consistent and well-structured data will produce consistently accurate Machine Translations.

To find out more about Machine Translation and how well written documents can improve your Machine Translation outputs, go to KantanMT.com and sign up to our 14 day free trial!

How to Write for Machine Translation

KantanMT writing for Machine TranslationWriting documentation that a Machine Translation engine can successfully parse is essential to producing better yet more affordable Machine Translations.

Thankfully, this is something that you can do by following a few simple rules when writing your own documentation. The most important thing to remember is to keep your writing clear and concise. The simpler your writing is, the easier it is for a Machine Translation engine to read it.

In this blog, we will take a closer look at how to produce clear and concise documentation.

Many organisations use “controlled language” to write for translation. Controlled language is much stricter than our everyday writing style. The aim of controlled language is to produce coherent and comprehensible documentation that is easy for a Machine Translation engine to read. Controlled language is particularly useful when writing instructional content. Uwe Muegge, a leading figure in the translation industry, has developed the Clout™ rule set; Clout stands for Controlled Language Optimised for Uniform Translation, and this blog references a number of rules within this set. This blog also references rules within Strunk and White’s The Elements of Style, which are useful for all content types.

Rule 1. Avoid misspellings
The most simple and basic rule of all! A Machine Translation engine cannot accurately translate a misspelled word. Ensure that you proofread your data before running it through your translation engine.

Rule 2. Keep your sentences short and concise
Avoid conjunctions (and, but, which, etc.) and more than one clause when possible. Keep your sentences shorter than 25 words. Ensure that each sentence is grammatically complete (begins with a capital letter, has at least one main clause, and has an ending punctuation).

Rule. 3 Use a simple grammatical structure
Do not over complicate the structure of sentences.

Example:
Show that you can organise your thoughts by using a simple sentence structure in your texts. = Correct
You, in your texts, to show that you can organise your thoughts, should use a simple sentence structure. = Incorrect

Rule 4. Use the active voice
The active voice is a direct writing style that cuts out vagueness and ambiguity. It is very difficult for Machine Translation engines to successfully translate vague phrases or those with double meanings.

Example:  
“My first time building a KantanMT engine will always be remembered,” = Incorrect

The incorrect phrase is vague because it is unclear who will always remember you building your first
KantanMT engine; it could be you, someone else, or the world in general.

“I will always remember building my first KantanMT engine” = Correct

Keepitsimple_249x167

Rule 5. Write phrases that you can recycle
Write a phrase that you can recycle throughout your documentation. A Machine Translation engine can recognise and accurately translate repetitive phrases.

Example:
You could use the following list within several sections of the same document.
“Follow these three steps to build your KantanMT engine:

1. Gather your training data
2. Build the KantanMT engine
3. Translate your clients’ files”

Rule 6. Remove needless words
Remove words that do not contribute to a sentence’s meaning.

Example:   
KantanMT = Correct
KantanMT Machine Translation = Incorrect
He = Correct
He is a man who = Incorrect

Rule 7. Avoid clichés/colloquial phrases
A Machine Translation engine may not convey the correct meaning of clichés/colloquial phrases and the meaning may not make sense to international users.

Example:
It is easy = Correct
It is a piece of cake = Incorrect

Rule 8. Use the definite article
Specify nouns using “the”.

Example:
“Train the KantanMT engine” = Correct
“Train KantanMT engine” = Incorrect

Rule 9. Repeat nouns instead of pronouns
This improves the clarity of sentences.

Example:
“You must build the KantanMT engine before using the KantanMT engine to translate client files” = Correct
“You must build the KantanMT engine before using it to translate client files” = Incorrect

And that’s it! Pretty simple right? By following these simple but important steps, you will write documentation that is much more Machine Translation friendly. That means less post-editing time, faster outputs, lower costs, and happier clients!

If you want to read a bit more about controlled language, why not check out this report by Microsoft: Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment