Writing documentation that a Machine Translation engine can successfully parse is essential to producing better yet more affordable Machine Translations.
Thankfully, this is something that you can do by following a few simple rules when writing your own documentation. The most important thing to remember is to keep your writing clear and concise. The simpler your writing is, the easier it is for a Machine Translation engine to read it.
In this blog, we will take a closer look at how to produce clear and concise documentation.
Many organisations use “controlled language” to write for translation. Controlled language is much stricter than our everyday writing style. The aim of controlled language is to produce coherent and comprehensible documentation that is easy for a Machine Translation engine to read. Controlled language is particularly useful when writing instructional content. Uwe Muegge, a leading figure in the translation industry, has developed the Clout™ rule set; Clout stands for Controlled Language Optimised for Uniform Translation, and this blog references a number of rules within this set. This blog also references rules within Strunk and White’s The Elements of Style, which are useful for all content types.
Rule 1. Avoid misspellings
The most simple and basic rule of all! A Machine Translation engine cannot accurately translate a misspelled word. Ensure that you proofread your data before running it through your translation engine.
Rule 2. Keep your sentences short and concise
Avoid conjunctions (and, but, which, etc.) and more than one clause when possible. Keep your sentences shorter than 25 words. Ensure that each sentence is grammatically complete (begins with a capital letter, has at least one main clause, and has an ending punctuation).
Rule. 3 Use a simple grammatical structure
Do not over complicate the structure of sentences.
Show that you can organise your thoughts by using a simple sentence structure in your texts. = Correct
You, in your texts, to show that you can organise your thoughts, should use a simple sentence structure. = Incorrect
Rule 4. Use the active voice
The active voice is a direct writing style that cuts out vagueness and ambiguity. It is very difficult for Machine Translation engines to successfully translate vague phrases or those with double meanings.
“My first time building a KantanMT engine will always be remembered,” = Incorrect
The incorrect phrase is vague because it is unclear who will always remember you building your first
KantanMT engine; it could be you, someone else, or the world in general.
“I will always remember building my first KantanMT engine” = Correct
Rule 5. Write phrases that you can recycle
Write a phrase that you can recycle throughout your documentation. A Machine Translation engine can recognise and accurately translate repetitive phrases.
You could use the following list within several sections of the same document.
“Follow these three steps to build your KantanMT engine:
1. Gather your training data
2. Build the KantanMT engine
3. Translate your clients’ files”
Rule 6. Remove needless words
Remove words that do not contribute to a sentence’s meaning.
KantanMT = Correct
KantanMT Machine Translation = Incorrect
He = Correct
He is a man who = Incorrect
Rule 7. Avoid clichés/colloquial phrases
A Machine Translation engine may not convey the correct meaning of clichés/colloquial phrases and the meaning may not make sense to international users.
It is easy = Correct
It is a piece of cake = Incorrect
Rule 8. Use the definite article
Specify nouns using “the”.
“Train the KantanMT engine” = Correct
“Train KantanMT engine” = Incorrect
Rule 9. Repeat nouns instead of pronouns
This improves the clarity of sentences.
“You must build the KantanMT engine before using the KantanMT engine to translate client files” = Correct
“You must build the KantanMT engine before using it to translate client files” = Incorrect
And that’s it! Pretty simple right? By following these simple but important steps, you will write documentation that is much more Machine Translation friendly. That means less post-editing time, faster outputs, lower costs, and happier clients!
If you want to read a bit more about controlled language, why not check out this report by Microsoft: Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment