The KantanPEX Rule Editor enables members of KantanMT reduce the amount of manual post-editing required for a particular translation by creating, testing and deploying post-editing automation rules on their Machine Translation engines (client profiles).
The editor allows users to evaluate the output of a PEX (Post-Editing Automation) rule on a sample of translated content without needing to upload it to a client profile and run translation jobs. Users can enter up to three pairs of search and replace rules, which will be run in descending order on your content.
How to use the KantanMT PEX Rule Editor
Login into your KantanMT account using your email and your password.
You will be directed to the ‘Client Profiles’ tab in the ‘My Client Profiles’ page. The last profile you were working on will be ‘Active’ and marked in bold.
To use the ‘PEX-Rule Editor’ with a profile other than the ‘Active’ profile, click on the new profile name to select that profile for use with the ‘Kantan PEX-Rule editor’.
Then click the ‘KantanMT’ tab and select ‘PEX Editor’ from the drop-down menu.
You will be directed to the ‘PEX Editor’ page.
Type the content you wish to test on, in the ‘Test Content’ box.
Type the content you wish to search for in the ‘PEX Search Rules’ box.
Type what you want the replacement to be in the ‘PEX Replacement Rules’ box and click on the ‘Test PEX Rules’ button to test the PEX-Rules.
The results of your PEX-Rules will now appear in the ‘Output’ box.
Give the rules you have created a name by typing in the ‘Rule Name’ box.
Select the profile you wish to apply this rule(s) to and then click on the ‘Upload Rule’ button.
KantanMT PEX editor helps reduce the amount of manual post-editing required for a particular translation, hence, reducing project turn-around times and costs. For additional information on PEX-RULES and the Kantan PEX-Rule editor please click on the links below. For more details about KantanMT localization products and ways of improving work productivity and efficiency please contact us at firstname.lastname@example.org.
I’m new to machine translation and one of the things I’ve been doing at KantanMT is learning how to refine training data with a view to building stock engines.
Stock engines are the optional training data provided by KantanMT to improve the performance of your customized MT engine. In this post I’m going to describe the process of building an engine and refining the training data.
The building process on the platform is quite simple. From your dashboard on the website select “My Client Profiles” where you will find two profiles, which have already been set up. A default profile and sample profile; both of which let you run translation jobs straight away.
To create your own customized profile select ‘New’ at the top of the left-most column. This launches the client Profile Wizard. Enter the name of your new engine; try to make this something meaningful, or use an easily recognizable standard around how you name your profiles. This makes it easier to recognize which profile is which, when you have more than one profile.
When you select ‘next’ you will be asked to specify the source and target languages from drop down menus. The wizard lets you distinguish between different variants of the same language for example Canadian English or US English. Let’s say we’re translating from Canadian English to Canadian French. If you’re not sure which variant you need, have a quick look at the training data, which will give you the language codes.
The next step gives you an option to select a stock engine from a drop down menu. The stock engines are grouped according to their business area or domain.
You will see a summary of your choices, if you’re happy with them select ‘create’. Your new engine will be shown in the list of your client profiles. However, while you have created your engine, you haven’t yet built it.
Building Your Engine
Selecting your profile from the list will make it the current active engine. By selecting the Training Data tab you can upload any additional training data easily by using the drag and drop function. Then select the ‘Build’ option to begin building your engine.
It’s always a good idea to supply as much useful training data as possible. This ‘educates’ the engine in the way your organization typically translates text.
Once the build job has been submitted, you can monitor its progress in the ‘My Jobs’ page.
When the job is completed the BuildAnalytics™ feature is created. This can be accessed by clicking on the database icon to the left of the profile name. BuildAnalytics will give you feedback on the strength of your engine using industry standard scores, as well as details about your engines word count. The tabs across the page will give you access to more detail.
The summary tab lets you to see the average BLEU, F-Measure and TER scores for the engine, and the pie charts show you a summary of the percentage scores for all segments. For more detail select the respective tabs and use the data to investigate individual segments.
A Rejects Report is created for every file of Training Data uploaded. You can use this to determine why some of your data is not being used, and improve the uptake rate of your data.
Gap analysis gives you an effective way to improve your engine with relevant glossary or noise lists, which you can upload to future engine builds. By adding these terminology files in either TBX (Terminology Interchange) or XLSX (Microsoft Excel Spreadsheet) formats you will quickly improve the engines performance.
The Timeline tag shows you the evolution of your engine over its lifetime. This feature lets you compare the statistics with previous builds, and track all the data you have uploaded. On a couple of occasions, I used the archive feature to revert back to a previous build, when the engine building process was not going according to plan.
Improving Your Engine
A great way to improve your engines performance is to analyze the rejects report for the files with a higher rejection rate. Once you understand the reasons segments are rejected you can begin to address them. For example, an error 104 is caused by a difference in place holder counts. This can be something as simple as the source language using the % sign where the target language uses the word ‘percent’. In this case a preprocessor rule can be created to fix the problem.
A PEX rule editor is accessed from the KantanMT drop down menu. This lets you try out your preprocessor rules, and see the effect that they have in the data. I would suggest directly copying and pasting from the rejects report to the test area and applying your PEX rule to ensure you’re precisely targeting the data concerned. You can get instant feedback using this tool.
Once you’re happy with the way the rules work on the rejected data it’s useful to analyze the rest of the data to see what effect the rules will have. You want to avoid a situation where using a rule resolves 10 rejects, but creates 20 more. Once the rules are refined copy them to the appropriate files (source.ppx, target.ppx) and upload with the training data. Remember that the rules will run against the content in the order they are specified.
When you rebuild the engine they will be incorporated, and hopefully improve the scores.
Sue’s 3 Tips for Successfully Building MT Engines
Name your profiles clearly – When you are using a number of profiles simultaneously knowing what each one is (Language pair/domain) will make it much easier as you progress through the building process.
Take advantage of BuildAnalytics – Use the insights and Gap analysis features to give you tips on improving your engine. Listening to these tips can really help speed up the engine refinement process.
The PEX Rule Editor is your friend – Don’t be afraid to try out creating and using new PEX rules, if things go south you can always go back to previous versions of your engine.
My internship at KantanMT.com really opened my eyes to the world of language services and machine translation. Before joining the team I knew nothing about MT or the mechanics behind building engines. This was a great experience, and being part of such a smoothly run development team was an added bonus that I will take with me when I return ITB to finish my course.
About Sue McDermott
Sue is currently studying for a Diploma in Computer Science from ITB (Institute of Technology Blanchardstown). Sue joined KantanMT.com on a three month internship. She has a degree in English Literature and a background in business systems, and is also a full-time mum for the last 17 years.
Email: email@example.com, if you have any questions or want more information on the KantanMT platform.
KantanMT had an exciting year as it transitioned from a publicly funded business idea into a commercial enterprise that was officially launched in June 2013. The KantanMT team are delighted to have surpassed expectations, by developing and refining cutting edge technologies that make Machine Translation easier to understand and use.
Here are some of the highlights for 2013, as KantanMT looks back on an exceptional year.
Strong Customer Focus…
The year started on a high note, with the opening of a second office in Galway, Ireland, and KantanMT kept the forward momentum going as the year progressed. The Galway office is focused on customer service, product education and Customer Relationship Management (CRM), and is home to Aidan Collins, User Engagement Manager, Kevin McCoy, Customer Relationship Manager and MT Success Coach, and Gina Lawlor, Customer Relationship co-ordinator.
KantanMT officially launched the KantanMT Statistical Machine Translation (SMT) platform as a commercial entity in June 2013. The platform was tested pre-launch by both industry and academic professionals, and was presented at the European OPTIMALE (Optimizing Professional Translator Training in a Multilingual Europe) workshop in Brussels. OPTIMALE is an academic network of 70 partners from 32 European countries, and the organization aims to promote professional translator training as the translation industry merges with the internet and translation automation.
The KantanMT Community…
The KantanMT member’s community now includes top tier Language Service Providers (LSPs), multinationals and smaller organizations. In 2013, the community has grown from 400 members in January to 3400 registered members in December, and in response to this growth, KantanMT introduced two partner programs, with the objective of improving the Machine Translation ecosystem.
The Developer Partner Program, which supports organizations interested in developing integrated technology solutions, and the Preferred Supplier of MT Program, dedicated to strengthening the use of MT technology in the global translation supply chain. KantanMT’s Preferred Suppliers of MT are:
To date, the most popular target languages on the KantanMT platform are; French, Spanish and Brazilian-Portuguese. Members have uploaded more than 67 billion training words and built approx. 7,000 customized KantanMT engines that translated more than 500 million words.
As usage of the platform increased, KantanMT focused on developing new technologies to improve the translation process, including a mobile application for iOS and Android that allows users to get access to their KantanMT engines on the go.
KantanMT’s Core Technologies from 2013…
KantanMT have been kept busy continuously developing and releasing new technologies to help clients build robust business models to integrate Machine Translation into existing workflows.
KantanAnalytics™ – segment level Quality Estimation (QE) analysis as a percentage ‘fuzzy match’ score on KantanMT translations, provides a straightforward method for costing and scheduling translation projects.
BuildAnalytics™ – QE feature designed to measure the suitability of the uploaded training data. The technology generates a segment level percentage score on a sample of the uploaded training data.
KantanWatch™ – makes monitoring the performance of KantanMT engines more transparent.
TotalRecall™ – combines TM and MT technology, TM matches with a ‘fuzzy match’ score of less than 85% are automatically put through the customized MT engine, giving the users the benefits of both technologies.
KantanISR™ Instant Segment Retraining technology that allows members near instantaneous correction and retraining of their KantanMT engines.
PEX Rule Editor – an advanced pattern matching technology that allows members to correct repetitive errors, making a smoother post-editing process by reducing post-editing effort, cost and times.
Kantan API – critical for the development of software connectors and smooth integration of KantanMT into existing translation workflows. The success of the MemoQ connector, led to the development of subsequent connectors for MemSource and XTM.
KantanMT sourced and cleaned a range of bi-directional domain specific stock engines that consist of approx. six million words across legal, medical and financial domains and made them available to its members. KantanMT also developed support for Traditional and Simplified Chinese, Japanese, Thai and Croatian Languages during 2013.
Recognition as Business Innovators…
KantanMT received awards for business innovation and entrepreneurship throughout the year. Founder and Chief Architect, Tony O’Dowd was presented with the ICT Commercialization award in September.
In October, KantanMT was shortlisted for the PITCH start-up competition and participated in the ALPHA Program for start-ups at Dublin’s Web Summit, the largest tech conference in Europe. Earlier in the year KantanMT was also shortlisted for the Vodafone Start-up of the Year awards.
KantanMT were silver sponsors at the annual 2013 ASLIB Conference ‘Adopting the theme Translating and the Computer’ that took place in London, in November, and in October, Tony O’Dowd, presented at the TAUS Machine Translation Showcase at Localization World in Silicon Valley.
KantanMT have recently published a white paper introducing its cornerstone Quality Estimation technology, KantanAnalytics, and how this technology provides solutions to the biggest industry challenges facing widespread adoption of Machine Translation.
Post-editing is a necessary step in the Machine Translation workflow, but the role is still largely misunderstood. Language Service Providers (LSPs) are now experimenting more with the best practices for post-editing in the workflow. The lack of consistent training and reluctance within the industry to accept importance of the role are linked to the post-editors motivation. KantanMT looks at some of the more conventional attitudes towards motivation and their application to post-editing.
What is motivation and what studies have been done so far?
Understanding the concept of motivation has been a hot topic in many areas of organisation theory. Studies in the area really began to kick off with their application in the workplace, opening doors for pioneers to understand how employees could be motivated to do more work, and do better work.
Abraham Maslow and his well-known ‘Hierarchy of Needs’ indicates a person’s motivations are based on their position in the hierarchy pyramid.
Frederick Herzberg’s ‘two Factor Theory’ or Herzberg’s motivation-hygiene theory suggests professional activities like; professional acknowledgement, achievements and work responsibility, or job satisfiers have a positive effect on motivation.
Douglas McGregor used a black and white approach to motivation in his ‘Theory X and Theory Y’. He grouped employees into two categories; those who will only do the minimum and those who will push themselves.
As development of theories continued…
John Adair came up with the ‘fifty-fifty theory’ . According to the fifty-fifty theory, motivation is fifty percent the responsibility of the employee and fifty percent outside the employee’s control.
Even more recently, in 2010
Teresa Amabile and Steven Kramer carried out a study on the motivation levels of employees in a variety of settings. Their findings, suggest ‘Progress’ as the top performance motivator identified from an analysis of approx. 12,000 diary entries, daily ratings of motivation and emotions from hundreds of study participants.
To understand post-editor motivation we can combine the top performance motivator; progress with fifty-fifty theory.
Progress is a healthy motivator in the post-editing profession, it can help Localization Project Managers understand and encourage post-editor satisfaction and motivation. But while progress can be deemed an external factor, if we apply Adair’s ‘fifty-fifty’ rule, post-editors are also at least fifty percent responsible for their own motivation.
Post-editing as a profession is still only finding its feet, TAUS carried out a study in 2010 on the post editing practices of global LSPs. The study showed that, while post-editing is becoming a standard activity in the translation workflow it only accounts for a minor share of LSP business volume. This indicates that post-editors see their role as one of lesser importance because the industry views it as a role of lesser importance.
This attitude in the industry is highlighted by the lack of industry standards for post-editing best practices. Without evaluation practices to train post-editors and improve the post-editing process, post-editors are not making progress. This quite naturally is demotivating for the post-editor.
How to motivate post-editors
The first step in motivating post-editors is to recognise their role as autonomous to the role of a translator. The best post-editors are those, who are at least bilingual with some form of linguistic training, like a translator. Linguistic training is a major asset for editing the Machine Translated output.
TAUS offer a comparison of the translation process versus the post-editing process, highlighting the differences in the post-editing and translation processes.
One process is not more complicated that the other, only different. Translators, translate internally, while post-editors make “snap editing decisions” based on client requirements. As LSPs recognise these differences, they can successfully motivate their post-editors by providing them with the most suitable support, and work environment.
Progress as a Motivator
Translators make good post-editors, they have the linguistic ability to understand both the source and target texts, and if they enjoy editing or proof-reading, then the post-editing role will suit them. The right training is also important, if post-editors are trained properly they will become more aware of potential improvements to the workflow.
These improvements or ideas can be a great boost to post-editor motivation, if implemented the post-editor can take on more responsibility, which helps improve the translation workflow. A case where this could be applied is; if the post-editor is made responsible for updating the language assets used to retrain a Machine Translation system, they can take ownership and become responsible for the output quality rather than just post-editing Machine Translation output in isolation.
Fixing repetitive errors, can be frustrating for anyone, not just post-editors. But if they are responsible for the output quality, understand the system and can control the rules used to reduce these repetitive errors, they will experience motivation through progress.
This is only the tip of the iceberg on what motivates post-editors, each post-editor is different and how they feel about the role, whether it is just ‘another job’ or a major step in their career all play a part. The key is to provide proper training, foster an environment where post-editors can make progress by positively contributing to the role.
Translators often take pride and ownership of their translations, post-editors should also have the opportunity to take pride in their work, as it is their skills and experience that make it ‘publishable’ or even ‘fit for purpose’ quality.
Repetitive errors like diacritic marks or capitalisation can be easily fixed using KantanMT’s Post-Editing Automation (PEX) rules. PEX rules allow repetitive errors in a Machine Translation engine to be easily fixed using a ‘find and replace’ tool. These rules can be checked on a sample of the text by using the PEX Rule Editor.
The post-editor can correct repetitive errors during post-editing process, so the same errors don’t appear in future MT output, giving them responsibility over the Machine Translation engines quality.
Post-Editing Machine Translation (PEMT) is an important and necessary step in the Machine Translation process. KantanMT is releasing a new, simple and easy to use PEX rule editor, which will make the post-editing process more efficient, saving both time, costs and the post-editors sanity.
As we have discussed in earlier posts, PEMT is the process of reviewing and editing raw MT output to improve quality. The PEX rule editor is a tool that can help to save time and cut costs. It helps post-editors, since they no longer have to manually correct the same repetitive mistakes in a translated text.
Post-editing can be divided into roughly two categories; light and full post-editing. ‘Light’ post-editing, also called ‘gist’, ‘rapid’ or ‘fast’ post-editing focuses on transferring the most correct meaning without spending time correcting grammatical and stylistic errors. Correcting textual standards, like word order and coherence are less important in a light post-edit, compared to a more thorough ‘full’ or ‘conventional’ post-edit. Full post-edits need the correct meaning to be conveyed, correct grammar, accurate punctuation, and the correct transfer of any formatting such as tags or place holders.
The Client often dictates the type of post-editing required, whether it’s a full post-edit to get it up to ‘publishable quality’ similar to a human translation standard, or a light post-edit, which usually means ‘fit for purpose’. The engine’s quality also plays a part in the post-editing effort; using a high volume of in-domain training data during the build produce higher quality engines, which helps to cut post-editing efforts. Other factors such as language combination, domain and text type all contribute to post-editing effort.
Examples of repetitive errors
Some users may experience the following errors in their MT output.
Punctuation mistakes, hyphenation, diacritic marks etc.
Formatting – trailing spaces
SMT engines use a process of pattern matching to identify different regular expressions. Regular expressions or ‘regex’ are special text strings that describe patterns, these patterns need no linguistic analysis so they can be implemented easily across different language pairs. Regular expressions are also important components in developing PEX rules. KantanMT have a list of regular expressions used for both GENTRY Rule files (*.rul) and PEX post-edit files (*.pex).
Post-Editing Automation (PEX)
Repetitive errors can be fixed automatically by uploading PEX rule files. These rule files allow post-editors to spend less time correcting the same repetitive errors by automatically applying PEX constructs to translations generated from a KantanMT engine.
PEX works by incorporating “find and replace” rules. The rules are uploaded as a PEX file and applied while a translation job is being run.
PEX Rule Editor
KantanMT have designed a simple way to create, test and upload post-editing rules to a client profile.
The PEX Rule editor, located in the ‘MykantanMT’ menu, has an easy to use interface. Users can copy a sample of the translated text into the upper text box ‘Test Content’ then input the rules to be applied in the ‘PEX Search Rules’ and their corrections to the ‘PEX Replacement Rules’ box. The user can test the new rules by clicking ‘test rules’ and instantly identify any incorrect rules, before they are uploaded to the profile.
The introduction of tools to assist in the post-editing process helps remove some of the more repetitive corrections for post-editors. The new PEX Editor feature helps improve the PEMT workflow by ensuring all uploaded rule files are correct leading to a more effective method for fixing repetitive errors.
So far in this KantanMT blog series on Machine Translation post-editing we have looked at; automated post-editing, why it is becoming popular within the localization industry, how you can reduce your post-editing times, and the steps you can take to achieve both understandable or ‘fit for purpose’ and close to human levels of post-editing standards. In this post we are going to focus on perhaps one of the most difficult issues with regards to providing a post-editing service, and that’s pricing.
What’s the problem?
The problem, put simply, is that there is no set way for Language Service Providers (LSPs) to price post-editing projects for their clients. That’s because LSPs must contend with a range of variables in the post–editing process, each of which can effect the final cost. Lorena Guerra, writing in 2003, sums up one of the main issues, “Whereas Human Translation is mainly based on the unit “word” as a cost base, in the case of post-editing, as outlined by Spalink et al. the cost base “word” is much harder to justify”. LSPs cannot charge for post-editing a “word” when their post-editors may have just corrected a letter or perhaps even a broader stylistic problem. There are also other items to consider, here are just a few
The time it takes to complete the post-editing process
The post-editing standards required by the client
The number of segments requiring higher post-editing quality compared to those requiring a lower post-editing standard
Varying segment lengths
The quality of the raw Machine Translation output
Varying degrees of post-editing effort required for different language pairs
LSPs and their clients must not only set a price, but also agree upon how that price is reached. Establishing a pricing framework that considers all parties is an imperative.
Pricing Machine Translation Post-Editing
So how can Localization Service Providers develop appropriate frameworks for pricing Machine Translation post-editing? TAUS, has recently published a public consultation entitled “Best Practice Guidelines for Pricing MT Post Editing” that features guidelines to help solve this problem. Let’s take a look at the key points. Note: These TAUS guidelines are preliminary and are subject to review while the public consultation is ongoing.
1. Things to Always Remember
TAUS says that no matter what kind of framework you use for pricing Machine Translation post-editing, there are certain things to always keep in mind.
Set a price up-front
Ensure that your framework can provide an estimation of the cost of post-editing a text at the outset; re-evaluate prices when you evaluate or roll out a new version of an engine.
Involve all parties
When building your pricing framework, include all parties involved in your Machine Translation process. This is to ensure that everyone agrees “that the pricing model reflects the effort involved”.
Take the content to be post-edited into account
Consider the variables outlined earlier in this post such as post-editing different language pairs and post-editing to various quality standards. All of these factors need to be assessed as part of your pricing framework.
2. Building a Pricing Model
TAUS recommends combining a number of approaches to build your pricing framework. These are Automated Quality Score (e.g. TER, BLEU, F-Measure), Human Assessment, and Productivity Assessment. TAUS adds that “Productivity Assessment should always be used” regardless of what approach is taken.
Automated quality scores
There a number of combinations of automated measurement tools, KantanMT currently deploys BLEU, TER, and F-Measure.
This involves steps such as human post-editors checking both the quality of raw Machine Translation output and post-edited content.
Post-editing productivity assessment
TAUS defines this as “calculating the difference in speed between translating from scratch and post-editing Machine Translation output”. Speeds may change if you deploy a new engine, so each time a “new ‘production’ ready engine” is rolled out make sure that you perform new productivity assessments.
To find out more about developing a Machine Translation post-editing pricing framework, check out TAUS’s public consultation “Best Practice Guidelines for MT Post-Editing”. Note: The public consultation on these preliminary guidelines closes Tuesday July 30th 2013 and the official guidelines will be published on Tuesday August 6th 2013.
This week, KantanMT has announced the forthcoming release of KantanMT Analytics. This technology, which has been developed in partnership with the CNGL Centre for Global Intelligent Content, provides segment level quality analysis for Machine Translation output.
By attaining a quality score for each segment of a Machine Translated document, post-editors can accurately identify segments that require the most post-editing time and those which already meet the client’s quality standards. This will help KantanMT members to calculate post-editing effort and price.
That brings us to the end of our blog series on Machine Translation post-editing. We hope you have enjoyed taking this “post-editing adventure” with us and are able to put the advice within this blog series to good use. Please feel free to comment on this post or any ones previous-we’d love to hear from you.
In this blog series, we are discussing the area of post-editing. In our earlier posts, ‘The Rise of PEMT‘ and ‘Cutting PEMT Times‘ we have discussed the meaning of automated post-editing, why its popularity is growing among Language Service Providers (LSPs), and how you can cut your post-editing times.
Machine Translated text can be post-edited to different quality levels. This post is based on post-editing guidelines that have been developed by TAUS with, among others, KantanMT’s partners DCU and CNGL. A link to these guidelines is available at the end of this post.
Post-editing to an understandable level
An understandable level of post-editing is a standard by which the main content of the message is correct and understandable for the user. However, the documents readability may not be perfect and there may be a number of styling errors. Correct styling however is not essential as long as the main message content is understandable.
Follow these rules to post-edit a translated text to an understandable level
Ensure that the meaning of the translated text is the same as the source text and that it is understandable to the user
Read through the document to make sure that there is no missing or excess information
Because the translation is part of the localization process, make sure that the content is not offensive or culturally insensitive
Correct basic spelling errors
Errors that only effect the styling of the document do not need to be changed, so, there is no need to correct the following sentence, “Kantanmt is cloud based statistical machine translator platform”. Note: The stylistically correct version is “KantanMT is a cloud-based Statistical Machine Translation platform”
Remember that the fewer post-edits there are the better – use as much of the original Machine Translation output as possible
Don’t restructure sentences to improve the flow if the meaning is comprehensible
Post-editing to a quality standard similar to human translation TAUS defines this level as being, “comprehensible (i.e. an end-user perfectly understands the content of the message), correct (i.e. it communicates the same meaning as the source text), stylistically fine, though the style may not be as good as that achieved by a native-speaking human translator. Syntax is normal, grammar and punctuation are correct”
Follow these rules to post-edit a translated text to this standard
Ensure that content is grammatically complete and structured logically, and that the meaning of the message is clear to the user
Check the translation of terms that are essential to the document and make sure that any untranslated terms have been requested to stay as such by the client
Read through the document to make sure that there is no missing or excess information
Because the translation is part of the localization process, make sure that the content is not offensive or culturally insensitive
Remember that the fewer post-edits there are the better – use as much of the original MT output as possible
Correct spelling errors and make sure that the document is correctly punctuated and well formatted
And that’s it! For errors such as misspellings or formatting mistakes, you can use KantanMT’s PEX technology to find and correct any repetitive errors throughout a document. This will help to speed up post-editing times while reducing post-editing costs.
More companies want multilingual content produced cheaply and quickly by Language Service Providers (LSPs); Machine Translation is becoming a more popular choice as a result.
TechNavio predicted that the market for Machine Translation will grow at a compound annual growth rate (CAGR) of 18.05% until 2016, and the report attributes a large part of this rise to “the rapidly increasing content volume”. Of course, while Machine Translation may help to cut costs and turnaround times, its success is ultimately judged on whether it can not only produce correct translations-but also content that meets the quality standards of each individual client.
This places the spotlight firmly on the post-editing stage of the Machine Translation process. In this post, we are going to examine the Machine Translation post-editing stage and discuss how automatic post-editing can be incorporated into it.
What is Machine Translation post-editing?
Jeff Allen says the purpose of the post-editing stage, or more specifically the post-editor, is to “edit, modify, and/or correct pre-translated text that has been processed by an MT system from a source language into (a) target language(s)”. The most important thing to take from this is that post-editing is not the same as translation.
The fundamental aim of the post-editing process is to make Machine Translation output understandable or stylistically appropriate (depending on client requirements). Automatic post-editing is when computer technology is used to complete parts of the post-editing process.
Does this mean some stages of the post-editing process can be completely automated?
Not exactly. Automated post-editing is not an entirely mechanised process whereby a machine parses and corrects a document without human intervention. Humans must still proofread translation output and make sure that the each client’s standards are met. However, post-editing technologies can automate a number of steps that would have previously required manual intervention and multiple edits by the post-editor.
As Bartolomé Mesa-Lao of Copenhagen Business School in Denmark says, the less edits required the better a post-editors productivity. This is one of the main reasons why, in an age where companies want multilingual user content on-demand, post-editing technologies are becoming increasingly more important to LSPs. If we take an example of using KantanMT’s post-editing technologies as part of the post-editing process, we can see how it works:
A document has been translated by a KantanMT engine but there is a word that begins with a lower case letter which should begin with a capital letter. This mistake has been repeated throughout the document several hundred times. Rather than a post-editor having to manually find and correct each occurrence of this error, KantanMT’s PEX technology can find and correct the mistake using its “find and replace” rules. This means that post-editors can save time and turn their attention to fixing more complex stylistic errors. All of this results in faster project completion times and lower costs.
In our next post, we will look at some of the best practices you can use to make sure that you keep your post-editing times to a minimum.
You can find out more about Machine Translation and KantanMT by going to KantanMT.com and signing up to our free 14 day trial.