KantanMT blog, Pricing PEMT

Segment-by-segment Machine Translation Quality Estimation (QE) scores are reforming current Language Service Provider (LSP) business models.

Pricing Machine Translation is one of the most widely debated topics within the translation and localization industries. Many agree that there is no ‘black and white’ approach, because a number of variables must always be taken into consideration when costing a project. Industry experts are in agreement that levels of post-editing effort and payment should be calculated through a fair and easily replicated formula. This transparency is the goal KantanMT had in mind during the development of KantanAnalytics™, a “game-changing” technology in the localization industry.

New Business Model

The two greatest challenges facing Localization Project Managers are; how to cost and schedule Machine Translation projects. Experienced PM’s can quickly gauge how long a project will take to complete, but there is still an element of guesswork and contingency planning involved. This is intensified when you add Machine Translation. Although, not a new technology, the practical application in a business environment is still in infancy stages.

Powerful Machine Translation engines can be easily integrated into an LSP workflow. Measuring Machine Translation quality on a segment-by-segment basis and calculating post-editing effort on those segments allows LSPs to create more streamlined business models.

Studies have shown post-editing Machine Translation can be more productive than translating a document from scratch. This is especially true when translators or post-editors have a broad technical or subject knowledge of the text’s domain. In these cases they can capitalise on their knowledge with higher post-editing productivity.

So, how should a Machine Translation pricing model look?

The development of a technology that can evaluate a translation on a segment-by-segment basis and assign an accurate QE score to a Machine Translated text is critical for the successful integration of this technology into a project’s workflow.

The segment-by-segment breakdown and ‘fuzzy match’ percentage scoring system ensured the commercialisation of Translation Memories into LSP workflows. This system has been adopted as an industry standard for pricing translation jobs where translation memories or Computer Aided Translation (CAT) tools can be implemented. The next natural evolution, is to create a similar tiered ‘fuzzy’ matching system for Machine Translation.

Segment level QE technology is now available where Machine Translated segments are assigned percentage match values, similar to translation memory match values. Post-editing costs, similar to the costing of translation memory matches can be assigned. The match value also gives a clear indication of how long a project should take to post-edit based on the quality of the match and the post-editors skills and experience.

How can we trust the quality score?

The Machine Translation engine’s quality is based on the quality of the training data used to build the engine. The engines quality can be monitored with BLEU scores, F-measure and TER scoring. These automatic evaluation metrics indicate the engines quality, and combined with the ‘fuzzy’ match score, can be adjusted to get a more accurate picture of how post-editing effort is calculated and how projects should be priced. There are a number of variables that dictate how to create and implement a pricing model.

Variables to be considered when creating a pricing model

The challenge in measuring PEMT stems from a number of variables, which need to be considered by PMs when creating a pricing model:

  • Intended purpose – does the text require; a light, fast or full post-edit
  • Language pair and direction – Roman languages tend to provide better MT output
  • Quality of the MT system – better quality, domain specific engines produce better results
  • Post-editing effort – degree of editing required – minor edits or full retranslate
  • Post-editor skill and experience – post-editors with extensive domain expertise

Traditional Models

To overcome these challenges PMs traditionally opted for hourly or daily rates. However, hourly rates do not provide enough transparency or cost breakdown and can make a project difficult to schedule. These rates must also be calculated to take into consideration the translator or post-editors productivity and language pair.

Rates are usually calculated based on the translator or post-editor’s average post-editing speed within the specified domain. Day rates can be a good cost indicator for PMs based on the post-editors capabilities and experience, but again the cost breakdown is not completely transparent. Difficulties usually occur when a post-editor comes across a part of the text that requires more time or effort to post-edit, then productivity automatically drops.

As an example of the differing opinions in the translation community, pricing PEMT is dependent on the post-editing circumstances. Some posters on the Proz.com forum suggest that PEMT is priced as 30-50% or similar to editing a human translation. Others suggest, the output quality of a Machine Translation system is priced around the same as a ‘fuzzy’ match of 50-74% from a translation memory. These are broad subjective figures which do not take variables into consideration.

Calculation of the Machine Translated text on a segment-by-segment basis allows PMs to calculate post-editing effort based on the quality of customised Machine Translation engines. PMs can then use these calculations to build an accurate pricing model for the project, which incorporates all relevant variables. It also makes it possible to distribute post-editing work evenly across translators and post-editors making the most efficient use of their skills. Benefits to calculating post-editing effort are also seen in scheduling and project turnaround times.

KantanAnalytics™ is a segment-by-segment quality estimation scoring technology, which when applied to a Machine Translated text will generate a quality score for each segment, similar to the fuzzy match scoring system used in translation memories.

Sign up for a free trail to experience KantanAnalytics until November 30th 2013 KantanAnalytics will be available on the Enterprise Plan to sign up or upgrade to this plan please email KantanMT’s Success Coach, Kevin McCoy (kevinmcc@kantanmt.com).