Moses Use Case: KantanMT.com

Moses Core MT use case KantanMTJanuary 2015 marks the last month of the Moses Core project. The project started three years ago in 2012, as a collaborative effort by its members to improve translation processes and to create a competitive translation environment. Over those three years, the translation and MT landscape has changed significantly. This change and the project’s success is in no small part due to the hard work and diligence of the Moses Core project coordinator; TAUS  and with TAUS’s kind permission, KantanMT is republishing the MT use case for the KantanMT Community.

COMPANY NAME

KantanMT.com is a registered trademark of Xcelerator Machine Translations Ltd.

TIME IN MT BUSINESS

The platform was launched commercially in Q4 2013, however, we have been rigorously testing KantanMT.com in academic and commercial settings since 2012. In the beginning, the product was offered as a free trial to the KantanMT Community, and their feedback was instrumental in shaping and improving the platform to what it is today.

MOSES EXPERIENCE

The Moses technology has improved immensely over the past 12-18 months. Developer documentation and support materials, while initially very basic, have matured into a more structured, comprehensive and helpful resource. Additionally, the management of software distributions has made it easier to work with, understand and deploy. These are key elements in maintaining and supporting any open-source technology and have made Moses a key technology for the localization industry.

MosesCore

WHY MOSES?

The rise of the global economy and the driving demand for multilingual translation created a gap in the market for a sustainable translation method that could automatically scale to accommodate fluctuating translation needs. The KantanMT Development team was able to utilize the open source Moses decoder to develop a cloud-based Statistical Machine Translation (SMT) platform, where clients could build and manage their own customized MT engines without compromizing on the ownership of their data. The flexibility, scalability and security of the Moses toolkit made this possible.

The Moses toolkit offers the most flexibility in implementing an SMT solution for commercial purposes, as it allows the system’s training and decoding process to be modified. This has enabled the KantanMT team to create a high-value product that is dynamic and commercially relevant.

To ensure the product could scale and adapt to user needs the KantanMT team needed a decoder that could be built and managed on the cloud. The Moses system enabled this functionality.

Parallel language data is required to train an SMT engine. This data is an important resource for companies, and current generic SMT engines do not guarantee the security or safeguard the ownership of these assets. In using the Moses decoder, the KantanMT team created a product that could ensure its clients’ data was kept private, and not repurposed or reused in anyway.

Many global companies have large repositories of bilingual data, however, they often do not wish to deploy and maintain their own version of the Moses decoder. The KantanMT Development team was able to develop the sophisticated Moses SMT technology into a package that could be easily accessible to companies wishing to translate their content, and over time achieve localization cost savings.

MT STAFF

The current machine translation development team consists of four people, who maintain the platform and build machine translation engines for clients. Due to significant growth in the company over the past year, KantanMT.com will be hiring more staff over the course of the next few months to build engines for clients.

MT SYSTEM INFRASTRUCTURE

Insource or Outsource Moses/Implementation

Based on research, the demands of the language services industry and enterprise machine translation buyers, KantanMT has implemented and customized the Moses decoder in house to create a robust and commercially viable machine translation product that can scale and adapt to our clients’ needs. The original/base KantanAnalytics™ technology was co-developed with the CNGL Centre for Global Intelligent Content, an academic-industry research Centre based in Dublin City University, Ireland. However, all other KantanMT.com technologies have been developed in house by an in house expert development team.

Number of Engines

As of January 2015, the total number of MT engines built on KantanMT.com by the KantanMT community is 6,777 engines.

Volumes

As of January 2015, the total number of training words uploaded to the platform by the KantanMT Community has surpassed 50 billion, and the number of translated words on the platform is now more than 600 million.

USE SCENARIO

KantanMT preferred MT supplier bmmt
KantanMT.com Preferred MT Supplier

bmmt GmbH is a German language service provider with a strong focus on machine translation. It needed a Machine Translation provider, which would give the bmmt team full control of their Machine Translation training data and MT engine customization process at a low investment point. They also required a system which could correctly handle format-specific tagging and transparent transfer of mark-up information.

In early 2013, bmmt joined the KantanMT Community and began testing different customization processes using client specific training data. The team initially experienced minor problems with their SDLXLIFF files. However, the KantanMT development team were able to quickly solve this problem by restructuring some of its tokenizers.

The company began deploying production engines in mid-2013. These were showing particularly high Quality Evaluation (QE) scores due to the quality of their training data and resulted in a considerable increase in translation productivity. bmmt MT technicians found that domain specificity is a better basis for predictable output than sheer input size.

bmmt is currently using approximately 20 KantanMT engines in production across technical and automotive domains. These production ready engines are experiencing high quality metric scores for each language combination.

MARKET POSITIONING

KantanMT.com is one of the market leaders of cloud-based machine translation services. It provides cloud-based SMT services to major global enterprises and software companies wishing to translate large volumes of data. It works directly with companies to develop and implement a long term machine translation strategy, or it works with a select number of language service providers (preferred MT supplier partner program) to supply MT services to large enterprises.

VIEWS ON CURRENT STATE OF MT

Machine translation is now much more widely accepted in the industry, than it was just a few years ago. Since KantanMT.com entered the market in its testing phase in 2012, we have seen an enormous change in the attitudes and perception of MT in the language community. Access to technology such as smart-phones and tablets in non-English speaking nations has driven the global marketplace, and this in turn has increased the need for on-demand translation services – driving demand for MT services. The MosesCore Project has facilitated this demand with an open source solution that made it possible for smaller companies, and startups like us to compete against bigger MT providers, to solve the problem of language.

“The KantanMT platform sets a new industry benchmark in terms of analytics and development tools used to build and measure the quality of Statistical MT Engines. The KantanMT expert development team has introduced some of the industry’s most exciting and valuable technologies built on the Moses decoder, which are helping language and enterprise clients to translate more efficiently and reduce costs.” KantanMT.com founder and Chief Architect, Tony O’Dowd.

For more information on the Moses Core project or to access the original article, please contact TAUS (moses@taus.net) or to find out more about KantanMT.com contact Louise (info@kantanmt.com).

 

 

Language Industry Interview: KantanMT speaks with Maxim Khalilov, bmmt Technical Lead

Language Industry Interview: KantanMT speaks with Maxim Khalilov, bmmt Technical LeadThis year, both KantanMT and its preferred Machine Translation supplier, bmmt, a progressive Language Service Provider with an MT focus, exhibited side by side at the tekom Trade Fair and tcworld conference in Stuttgart, Germany.

As a member of the KantanMT preferred partner program, bmmt works closely with KantanMT to provide MT services to its clients, which include major players in the automotive industry. KantanMT was able to catch up with Maxim Khalilov, technical lead and ‘MT guru’ to find out more about his take on the industry and what advice he could give to translation buyers planning to invest in MT.

KantanMT: Can you tell me a little about yourself and, how you got involved in the industry?

Maxim Khalilov: It was a long and exciting journey. Many years ago, I graduated from the Technical University in Russia with a major in computer science and economics. After graduating, I worked as a researcher for a couple of years in the sustainable energy field. But, even then I knew I still wanted to come back to IT Industry.

In 2005, I started a PhD at Universitat Politecnica de Catalunya (UPC) with a focus on Statistical Machine Translation, which was a very new topic back then. By 2009, after successfully defending my thesis, I moved to Amsterdam where I worked as a post-doctoral researcher at the University of Amsterdam and later as a RD manager at TAUS.

Since February 2014, I’ve been a team lead at bmmt GmbH, which is a German LSP with strong focus on machine translation.

I think my previous experience helped me to develop a deep understanding of the MT industry from both academic and technical perspectives.  It also gave me a combination of research and management experience in industry and academia, which I am applying by building a successful MT business at bmmt.

KMT: As a successful entrepreneur, what were the three greatest industry challenges you faced this year?

MK: This year has been a challenging one for us from both technical and management perspectives. We started to build an MT infrastructure around MOSES practically from scratch. MOSES was developed by academia and for academic use, and because of this we immediately noticed that many industrial challenges had not yet been addressed by MOSES developers.

The first challenge we faced was that the standard solution does not offer a solid tag processing mechanism – we had to invest into a customization of the MOSES code to make it compatible with what we wanted to achieve.

The second challenge we faced was that many players in the MT market are constantly talking about the lack of reliable, quick and cheap quality evaluation metrics. BLEU-like scores unfortunately are not always applicable for real world projects. Even if they are useful when comparing different iterations of the same engines, they are not useful for cross language or cross client comparison.

Interestingly, the third problem has a psychological nature; Post-Editors are not always happy to post edit MT output for many reasons, including of course the quality of MT. However, in many situations the problem is that MT post-editing requires a different skillset in comparison with ‘normal’ translation and it will take time before translators adopt fully to post editing tasks.

KMT: Do you believe MT has a say in the future, and what is your view on its development in global markets?

MK: Of course, MT will have a big say in the language services future. We can see now that the MT market is expanding quickly as more and more companies are adopting a combination TM-MT-PE framework as their primary localization solution.

“At the same time, users should not forget that MT has its clear niche”

I don’t think a machine will be ever able to translate poetry, for example, but at the same time it does not need to – MT has proved to be more than useful for the translation of technical documentation, marketing material and other content which represents more than 90% of the daily translators load worldwide.

Looking at the near future I see that the integration of MT and other cross language technologies with Big Data technologies will open new horizons for Big Data making it a really global technology.

KMT: How has MT affected or changed your business models?

MK: Our business model is built around MT; it allows us to deliver translations to our customers quicker and cheaper than without MT, while at the same time preserving the same level of quality and guaranteeing data security. We not only position MT as a competitive advantage when it comes to translation, but also as a base technology for future services. My personal belief, which is shared by other bmmt employees is that MT is a key technology that will make our world different – where translation is available on demand, when and where consumers need it, at a fair price and at its expected quality.

KMT: What advice can you give to translation buyers, interested in machine translation?

MK: MT is still a relatively new technology, but at the same time there is already a number of best practices available for new and existing players in the MT market. In my opinion, the four key points for translation buyers to remember when thinking about adopting machine translation are:

  1. Don’t mix it up with TM – While TMs mostly support human translators storing previously translated segments, MT translates complete sentences in an automatic way, the main difference is in these new words and phrases, which are not stored in a TM database.
  2. There is more than one way to use MT – MT is flexible, it can be a productivity tool that enables translators to deliver translations faster with the same quality as in the standard translation framework. Or MT can be used for ‘gisting’ without post-editing at all – something that many translation buyers forget about, but, which can be useful in many business scenarios. A good example of this type of scenario is in the integration of MT into chat widgets for real-time translation.
  3. Don’t worry about quality – Quality Assurance is always included in the translation pipeline and we, like many other LSPs guarantee, a desired level of quality to all translations independently of how the translations were produced.
  4. Think about time and cost – MT enables translation delivery quicker and cheaper than without MT.

A big ‘thank you’ to Maxim for taking time out of his busy schedule to take part in this interview, and we look forward to hearing more from Maxim during the KantanMT/bmmt joint webinar ‘5 Challenges of Scaling Localization Workflows for the 21st Century’ on Thursday November 20th (4pm GMT, 5pm CET and 8am PST).

KantanMT Industry Webinar 5 Challenges of Scaling Localization for the 21st Century_Webinar

Register here for the webinar or to receive a copy of the recording. If you have any questions about the services offered from either bmmt or KantanMT please contact:

Peggy Linder, bmmt (peggy.lindner@bmmt.eu)

Louise Irwin, KantanMT (louisei@kantanmt.com)

5 Reasons to Read the TAUS Review

Earlier this month, TAUS, a well-known industry think tank and resource centre for the language services industry launched its quarterly publication; the TAUS review. The new magazine with a mission is dedicated to;

“Making translation technology more prominent and mainstream throughout the globe to break language barriers and improve worldwide communication.”

KantanMT TAUS Review

KantanMT identified five key reasons that make the review an invaluable asset to any translation and localization professional. It’s thanks to these reasons that KantanMT will distribute the TAUS Review right here on the KantanMTblog.

1. Global Translation Industry news 

TAUS has mobilized writers from across the globe; Africa, Americas, Asia and Europe to discuss different trends and technologies in the language services industry. These articles can become a great reference tool for those interested in how language technologies are advancing. In this issue; Andrew Joscelyne reports from Europe; Brian McConnell gives updates from the Americas; Asian trends are covered by Mike Tian-Jian Jiang and Amlaku Eshetie reports from the southern hemisphere; Africa.

2. Research and Reports 

Recent Research in MT is pretty exciting stuff, those that consider themselves language industry veterans like Luigi Muzii remember a time when machine translation predictions were overestimated. But what was once an unrealistic assumption is now changing as “neural networks and big data” are bringing a new frontier to natural language processing. Luigi Muzii gives an overview of the ‘research perspective’, highlighting current trends in research and linking to some interesting ACL winning papers, which introduce MT decoders that do not need linguistic resources.

3. Unique Insights

TAUS Review offers unique insights into the translation industry by incorporating use cases and perspectives from four different personas; the researcher, the journalist, the translator and the language expert, each one with their own different views and opinions on the importance of global communication and breaking down language barriers. In this issue, Jost Zetzsche, Nicholas Ostler, Lane Greene, and Luigi Muzii share their perspectives.

KantanMT especially enjoyed  Jost Zetzsche’s view of making “machine translation translator-centric” where the translator is at the centre of the MT workflow. One of the examples he lists for making this possible, “dynamic improvements in MT systems” is available to KantanMT clients.

4. Language Technology Community 

The opinions and thoughts that come from each contributor are neatly wrapped in one accessible place, and when coupled with the directory of distributors, events and webinars make a very useful resource for any small business or language technology enthusiast. Keep an eye out for some very interesting post-editing and MT quality webinars planned for November.

5. It’s Free! 

Holding true to the concept of sharing information and making translation technology more prominent and mainstream throughout the globe, the review is available quarterly and completely free for its readers, making it accessible to anyone, anywhere regardless of their budget.

Scroll to the end of the page to find the TAUS review on the KantanMTBlog.

TAUS CEO, Jaap van der Meer talks to KantanMT

The translation industry has experienced a great shift in the past number of years, and not many can say they haven’t been affected. The movement to automate translation processes, driven by a remarkable increase in the demand for accessible multilingual content and price pressures on localization professionals can be seen at every level of the translation industry.

TAUS (Translation Automation User Society), a translation industry ‘think tank’ was founded in 2004 as a result of a roundtable held at the Localization World Conference in Seattle at which a group of some of the biggest IT companies in the world; including Oracle, IBM and CISCO sat to discuss the topic of automation and explore ideas of how to support the movement and those it affected by it.

TAUS CEO, Jaap van der Meer talks to KantanMTJaap van der Meer, Founder and CEO of TAUS talked to KantanMT about the evolution of one of the industry’s most well-known resource centres and the rapidly increasing developments in translation technology. He also shares his opinions and thoughts about the translation profession which he sees as having no escape from this global move to automation.

For Jaap, TAUS began as an ideology; he wanted to “help the world communicate better and create bigger opportunities for the translation sector”. He notes how the translation sector differs from other industries in that most industries have developed shared approaches, best practices and common metrics to support themselves and others working within these industries.” The lack of this he says is something that has created a “huge barrier to efficiency and innovation” in the translation industry, and when we remove these barriers “we create a much bigger opportunity for each individual player in the industry”.

TAUS is synonymous with automated translation, and in particular with machine translation. Yet, while Jaap would suggest that this is only one piece of the puzzle, he does believe that in time “every company that operates internationally will have to start using it.”

Machine translation has experienced incredible growth in recent years, both in terms of technological innovation and wide industry adoption. Indeed, Jaap believes that “the investment that goes into improving MT technology and integrating MT and post-editing into translation workflows will be the one thing that has the biggest effect on the industry” over the next few years. He stresses however that this investment needs to feed an entire ecosystem, because MT is not stand alone. “You can’t just dump a machine translation system into an existing environment. You need to change and innovate the whole environment. There’s a lot of evaluation and metrics involved and widespread training needed.”

Another technology that he sees developing in line with machine translation is speech translation, and the convergence of both technologies. Those attending the TAUS annual conference in Vancouver in October will learn more about this as it is the conference theme. So will TAUS offer similar resources for speech translation as with text translation? Well, Jaap admits that although TAUS always tries to be “ahead of the curve”, the process of building such an extensive repository of speech corpora might be too demanding for an industry body of TAUS’ size. The solution? Jaap says they will need to “collaborate with other industry groups and also at a government level” in order to grow in this area.

So, as TAUS continues to expand its services and move into new areas Jaap’s role begins to grow and diversify. What keeps him driven on his pursuit towards language as a utility? “It’s just because I believe in it, if it were just for business, I’d probably do something else.” A nice thought knowing that that there are people working to progress an industry and ease the path for all stakeholders involved.

TAUS Translation Technology Showcase – KantanMT

Last week KantanMT’s Founder and Chief Architect, Tony O’Dowd took part in the TAUS Translation Technology Showcase series where he presented two of KantanMT’s most innovative technologies; BuildAnalytics™ and KantanAnalytics™. During the webinar Tony discussed how Project Managers are using KantanAnalytics to cost, schedule and scope projects involving Machine Translation and how MT developers are using BuildAnalytics to help them to rapidly improve the performance of their KantanMT engines.

Watch the recording to learn more…

Interested in booking a live personal demo of KantanMT technologies? Get in touch with Gina Lawlor, Customer Relationship Manager who will happily set this up for you (ginal@kantanmt.com)

KantanMT – 2013 Year in Review

KantanMT 2013 year in ReviewKantanMT had an exciting year as it transitioned from a publicly funded business idea into a commercial enterprise that was officially launched in June 2013. The KantanMT team are delighted to have surpassed expectations, by developing and refining cutting edge technologies that make Machine Translation easier to understand and use.

Here are some of the highlights for 2013, as KantanMT looks back on an exceptional year.

Strong Customer Focus…

The year started on a high note, with the opening of a second office in Galway, Ireland, and KantanMT kept the forward momentum going as the year progressed. The Galway office is focused on customer service, product education and Customer Relationship Management (CRM), and is home to Aidan Collins, User Engagement Manager, Kevin McCoy, Customer Relationship Manager and MT Success Coach, and Gina Lawlor, Customer Relationship co-ordinator.

KantanMT officially launched the KantanMT Statistical Machine Translation (SMT) platform as a commercial entity in June 2013. The platform was tested pre-launch by both industry and academic professionals, and was presented at the European OPTIMALE (Optimizing Professional Translator Training in a Multilingual Europe) workshop in Brussels. OPTIMALE is an academic network of 70 partners from 32 European countries, and the organization aims to promote professional translator training as the translation industry merges with the internet and translation automation.

The KantanMT Community…

The KantanMT member’s community now includes top tier Language Service Providers (LSPs), multinationals and smaller organizations. In 2013, the community has grown from 400 members in January to 3400 registered members in December, and in response to this growth, KantanMT introduced two partner programs, with the objective of improving the Machine Translation ecosystem.

The Developer Partner Program, which supports organizations interested in developing integrated technology solutions, and the Preferred Supplier of MT Program, dedicated to strengthening the use of MT technology in the global translation supply chain. KantanMT’s Preferred Suppliers of MT are:

KantanMT’s Progress…

To date, the most popular target languages on the KantanMT platform are; French, Spanish and Brazilian-Portuguese. Members have uploaded more than 67 billion training words and built approx. 7,000 customized KantanMT engines that translated more than 500 million words.

As usage of the platform increased, KantanMT focused on developing new technologies to improve the translation process, including a mobile application for iOS and Android that allows users to get access to their KantanMT engines on the go.

KantanMT’s Core Technologies from 2013…

KantanMT have been kept busy continuously developing and releasing new technologies to help clients build robust business models to integrate Machine Translation into existing workflows.

  • KantanAnalytics™ – segment level Quality Estimation (QE) analysis as a percentage ‘fuzzy match’ score on KantanMT translations, provides a straightforward method for costing and scheduling translation projects.
  • BuildAnalytics™ – QE feature designed to measure the suitability of the uploaded training data. The technology generates a segment level percentage score on a sample of the uploaded training data.
  • KantanWatch™ – makes monitoring the performance of KantanMT engines more transparent.
  • TotalRecall™ – combines TM and MT technology, TM matches with a ‘fuzzy match’ score of less than 85% are automatically put through the customized MT engine, giving the users the benefits of both technologies.
  • KantanISR™ Instant Segment Retraining technology that allows members near instantaneous correction and retraining of their KantanMT engines.
  • PEX Rule Editor – an advanced pattern matching technology that allows members to correct repetitive errors, making a smoother post-editing process by reducing post-editing effort, cost and times.
  • Kantan API – critical for the development of software connectors and smooth integration of KantanMT into existing translation workflows. The success of the MemoQ connector, led to the development of subsequent connectors for MemSource and XTM.

KantanMT sourced and cleaned a range of bi-directional domain specific stock engines that consist of approx. six million words across legal, medical and financial domains and made them available to its members. KantanMT also developed support for Traditional and Simplified Chinese, Japanese, Thai and Croatian Languages during 2013.

Recognition as Business Innovators…

KantanMT received awards for business innovation and entrepreneurship throughout the year. Founder and Chief Architect, Tony O’Dowd was presented with the ICT Commercialization award in September.

In October, KantanMT was shortlisted for the PITCH start-up competition and participated in the ALPHA Program for start-ups at Dublin’s Web Summit, the largest tech conference in Europe. Earlier in the year KantanMT was also shortlisted for the Vodafone Start-up of the Year awards.

KantanMT were silver sponsors at the annual 2013 ASLIB Conference ‘Adopting the theme Translating and the Computer’ that took place in London, in November, and in October, Tony O’Dowd, presented at the TAUS Machine Translation Showcase at Localization World in Silicon Valley.

KantanMT have recently published a white paper introducing its cornerstone Quality Estimation technology, KantanAnalytics, and how this technology provides solutions to the biggest industry challenges facing widespread adoption of Machine Translation.

KantanAnalytics WhitePaper December 2013

For more information on how to introduce Machine Translation into your translation workflow contact Niamh Lacy (niamhl@kantanmt.com).

Overcome Challenges of building High Quality MT Engines with Sparse Data

KantanMT Whitepaper Improving your MT

Many of us, involved with Machine Translation are familiar with the importance of using high quality parallel data to build and customize good quality MT engines. Building high quality MT engines with sparse data is a challenge faced not only by Language Service Providers (LSPs), but any company with limited bilingual resources. A more economical alternative to creating large quantities of high quality bilingual data can be found by adding monolingual data in the target language to an MT engine.

Statistical Machine Translation systems use algorithms to find the most probable translations, based on how often patterns occur in the training data, so it makes sense to use large volumes of bilingual training data. The best data to use for training MT engines is usually high quality bilingual data and glossaries, so it’s great if you have access to these language assets.

But what happens when access to high quality parallel data is limited?

Bilingual data is costly and time-consuming to produce in large volumes, so the smart option is to come up with more economical language assets, and monolingual data is one of those economical assets. MT output fluency improves dramatically, by using monolingual data to train an engine, especially in cases where good quality bilingual data is a sparse language resource.

More economical…

Many companies lack the necessary resources to develop their own high quality in domain parallel data. But, monolingual data – is readily available in large volumes across different domains. This target language content can be found anywhere; websites, blogs, customers and even company specific documents created for internal use.

Companies with sparse parallel data can really leverage their available language assets with monolingual data to produce better quality engines, producing more fluent output. Even those with access to large volumes of bilingual data can still take advantage of using monolingual data to improve target language fluency.

Target language monolingual data is introduced during the engine training process so the engine learns how to generate fluent output. The positive effects of including monolingual data in the training process have been proven both academically and commercially.  In a study for TAUS, Natalia Korchagina confirmed that using monolingual data when training SMT engines considerably improved the BLEU score for a Russian-French translation system.

Natalia’s study not only “proved the rule” that in domain monolingual data improves engine quality, she also identified that out of domain monolingual data also improves quality, but to a lesser extent.

Monolingual data can be particularly useful for improving scores in morphologically rich languages like; Czech, Finnish, German and Slovak, as these languages are often syntactically more complicated for Machine Translation.

Success with Monolingual Data…

KantanMT has had considerable success with its clients using monolingual data to improve their engines quality. An engine trained with sparse bilingual data (the sparse bilingual data was still greater than the amount of data in Korchagina’s study) in the financial domain showed a significant improvement in the engine’s overall quality metrics when financial monolingual data was added to the engine:

  • BLEU score showed approx. 40% improvement
  • F-Measure score showed approx. 12% improvement
  • TER (Total Error Rate), where a lower score is better saw a reduction of approx. 50%

The support team at KantanMT showed the client how to use monolingual data to their advantage, getting the most out of their engine, and empowering the client to improve and control the accuracy and fluency of their engines.

How will this Benefit LSPs…

Online shopping by users of what can be considered ‘lower density languages’ or languages with limited bilingual resources is driving demand for multilingual website localization. Online shoppers prefer to make purchases in their own language, and more people are going online to shop as global internet capabilities improve. Companies with an online presence and limited language resources are turning to LSPs to produce this multilingual content.

Most LSPs with access to vast amounts of high quality parallel data can still take advantage of monolingual data to help improve target language fluency. But LSPs building and training MT engines for uncommon language pairs or any language pair with sparse bilingual data will benefit the most by using monolingual data.

To learn more about leveraging monolingual data to train your KantanMT engine; send the KantanMT Team an email and we can talk you through the process (info@kantanmt.com), alternatively, check out our whitepaper on improving MT engine quality available from our resources page.

 

 

Motivate Post-Editors

KantanMT motivate post-editorsPost-editing is a necessary step in the Machine Translation workflow, but the role is still largely misunderstood. Language Service Providers (LSPs) are now experimenting more with the best practices for post-editing in the workflow. The lack of consistent training and reluctance within the industry to accept importance of the role are linked to the post-editors motivation. KantanMT looks at some of the more conventional attitudes towards motivation and their application to post-editing.

What is motivation and what studies have been done so far?

Understanding the concept of motivation has been a hot topic in many areas of organisation theory. Studies in the area really began to kick off with their application in the workplace, opening doors for pioneers to understand how employees could be motivated to do more work, and do better work.

Motivation Pioneers

  • Abraham Maslow and his well-known ‘Hierarchy of Needs’ indicates a person’s motivations are based on their position in the hierarchy pyramid.
  • Frederick Herzberg’s ‘two Factor Theory’ or Herzberg’s motivation-hygiene theory suggests professional activities like; professional acknowledgement, achievements and work responsibility, or job satisfiers have a positive effect on motivation.
  • Douglas McGregor used a black and white approach to motivation in his ‘Theory X and Theory Y’. He grouped employees into two categories; those who will only do the minimum and those who will push themselves.

As development of theories continued…

  • John Adair came up with the ‘fifty-fifty theory’ . According to the fifty-fifty theory, motivation is fifty percent the responsibility of the employee and fifty percent outside the employee’s control.

Even more recently, in 2010

  • Teresa Amabile and Steven Kramer carried out a study on the motivation levels of employees in a variety of settings. Their findings, suggest ‘Progress’ as the top performance motivator identified from an analysis of approx. 12,000 diary entries, daily ratings of motivation and emotions from hundreds of study participants.

To understand post-editor motivation we can combine the top performance motivator; progress with fifty-fifty theory.

Progress is a healthy motivator in the post-editing profession, it can help Localization Project Managers understand and encourage post-editor satisfaction and motivation. But while progress can be deemed an external factor, if we apply Adair’s ‘fifty-fifty’ rule, post-editors are also at least fifty percent responsible for their own motivation.

Post-editing as a profession is still only finding its feet, TAUS carried out a study in 2010 on the post editing practices of global LSPs. The study showed that, while post-editing is becoming a standard activity in the translation workflow it only accounts for a minor share of LSP business volume. This indicates that post-editors see their role as one of lesser importance because the industry views it as a role of lesser importance.

This attitude in the industry is highlighted by the lack of industry standards for post-editing best practices. Without evaluation practices to train post-editors and improve the post-editing process, post-editors are not making progress. This quite naturally is demotivating for the post-editor.

How to motivate post-editors

The first step in motivating post-editors is to recognise their role as autonomous to the role of a translator. The best post-editors are those, who are at least bilingual with some form of linguistic training, like a translator. Linguistic training is a major asset for editing the Machine Translated output.

TAUS offer a comparison of the translation process versus the post-editing process, highlighting the differences in the post-editing and translation processes.

KantanMT, Translator process Taus 2010
Translation process of a Translator (TAUS 2010)
KantanMT, Motivating Post-editors,
Translation process of a Post-editor (TAUS 2010)

One process is not more complicated that the other, only different. Translators, translate internally, while post-editors make “snap editing decisions” based on client requirements. As LSPs recognise these differences, they can successfully motivate their post-editors by providing them with the most suitable support, and work environment.

Progress as a Motivator

Translators make good post-editors, they have the linguistic ability to understand both the source and target texts, and if they enjoy editing or proof-reading, then the post-editing role will suit them. The right training is also important, if post-editors are trained properly they will become more aware of potential improvements to the workflow.

These improvements or ideas can be a great boost to post-editor motivation, if implemented the post-editor can take on more responsibility, which helps improve the translation workflow. A case where this could be applied is; if the post-editor is made responsible for updating the language assets used to retrain a Machine Translation system, they can take ownership and become responsible for the output quality rather than just post-editing Machine Translation output in isolation.

Fixing repetitive errors, can be frustrating for anyone, not just post-editors. But if they are responsible for the output quality, understand the system and can control the rules used to reduce these repetitive errors, they will experience motivation through progress.

This is only the tip of the iceberg on what motivates post-editors, each post-editor is different and how they feel about the role, whether it is just ‘another job’ or a major step in their career all play a part. The key is to provide proper training, foster an environment where post-editors can make progress by positively contributing to the role.

Translators often take pride and ownership of their translations, post-editors should also have the opportunity to take pride in their work, as it is their skills and experience that make it ‘publishable’ or even ‘fit for purpose’ quality.

Repetitive errors like diacritic marks or capitalisation can be easily fixed using KantanMT’s Post-Editing Automation (PEX) rules. PEX rules allow repetitive errors in a Machine Translation engine to be easily fixed using a ‘find and replace’ tool. These rules can be checked on a sample of the text by using the PEX Rule Editor.

The post-editor can correct repetitive errors during post-editing process, so the same errors don’t appear in future MT output, giving them responsibility over the Machine Translation engines quality.

Training Data

KantanMT Training DataBuilding a KantanMT Engine: Training Data

When the decision is made to incorporate a KantanMT engine into a translation model, the next obvious and most difficult question to answer is what to use to train the engine? This is often followed by: what are the optimum training data requirements to yield a highly productive engine? And how will I curate my training data?

The engine’s target domain and objectives should be clearly mapped out ahead of the build. If the documents are for a specific client or domain then the relevant in-domain training data should be used to build the engine. This also ensures the best possible translation results.

KantanMT recommends a minimum of 2 million training words for each domain specific engine. Higher quantities of in-domain “unique words” will also improve the potential for building an “intelligent” engine.

The quality of the engine is based on the language or translation assets used to build the engine. Studies by TAUS have shown quality is more important than quantity. “Intelligently selected training data” generated higher BLEU scores than an engine built with more generic data. The studies also indicated, a proactive approach in customising or adapting the engine with translation assets led to better quality results.

Translation assets are the best source of suitable training data for building KantanMT engines, they include:

Stock Training Data: KantanMT stock engines are collections of highly cleansed bi-lingual training data sets. Quality is ensured as each data set shows the source corpora and approximate number of words used to create each stock engine. These can be added to client data to produce much larger and more powerful engines. There are over a hundred different stock engines to choose from, including industry specific sets such as IT, Legal, Medical and Finance. Find a list of KantanMT Stock engines here >>

Stock engines are a good starting point if you have limited TMX (Translation Memory Exchange) files in the required domain, or if you would simply like to build bigger KantanMT engines.

Translation Memory Files: This is the best source of high quality training data since both source and target texts are aligned. Translation Memories used for previous translations in a similar domain will also have been verified for quality. This guarantees the engine’s quality will be representative of the Translation Memory quality. As the old expression in the translation industry goes “garbage in, garbage out”, good quality Translation Memory files will yield a good quality Machine Translation engine. The TMX file format is the optimal format for use with KantanMT, however, text files can also be used.

Monolingual Translated Text Files: Monolingual text files are used to create language models for a KantanMT engine. Language models are used for word and phrase selection and have a direct impact on the fluency and recall of KantanMT engines. Translated monolingual training data should be uploaded alongside bi-lingual training data when building KantanMT engines.

Glossary Files: Terminology or glossary files can also be used as training material. Including a glossary improves terminology consistency and translation quality. Terminology files are uploaded with your ‘files to be translated’ and should also be in a TBX file format.

KantanISR™: Instant segment retraining technology allows users to input edited segments via the KantanISR editor. The segments then become training data and are stored in the KantanISR cache. The new segments are incorporated into the engine, avoiding the need to rebuild. As corrected data is included, the engine will improve in quality becoming an even more powerful and productive KantanMT engine.

KantanISR Instant Segment Retrainer
KantanISR editor

Building your KantanMT engine can be a very rewarding process. While some time is needed to gather the best data for a domain specific engine, there are many ways to enhance your engine that require little effort.

For more information about preparing training data or engine re-training, please contact Kevin McCoy, KantanMT Success Coach.

tcworld special

The tcworld Conference & tekom Fair starts tomorrow, November 6th in Wiesbaden, Germany. Aidan Collins, KantanMT’s User Engagement Manager will be visiting the conference on Thursday and is looking forward to seeing you all there. To help keep you organised, KantanMT put together a list of professional and expert presentations and workshops relevant for localization professionals. Expert speakers will cover topics on content strategy and design, terminology management, translation, localization, and quality assurance.
Rhein-Main-Hallen convention centre Wiesbaden, Germany, KantanMT
Source: GCB German Convention Bureau e. V, 2011.

The fair opens at 9am on Wednesday and finishes at 4pm on Friday in the Rhein-Main-Hallen, Wiesbaden’s biggest convention centre. The centre has more than 20,000m² of conference space, seven exhibition halls and a number of conference and congress rooms. The largest congress hall has the capacity to seat 3,000 people. The exhibition’s size and central location, just half a kilometre from the city centre, make it an excellent option for hosting the technical communication event.The line up includes:

Wednesday November 6th 9:00 – 18:00

Content Strategy 08:45 – 09:30: ‘Strategic Video Storytelling’. The Content Wrangler’s Scott Abel, will give the opening keynote speech, a presentation on Content Strategies and the importance of using stories in video production.

International Management 08:45 – 10:30: ‘A Business Model Generation Session’. Diego Bartolome will host a design thinking workshop on technology and languages.

Content Strategy 11:15 – 12:00: ‘The Need for Speed: Preparing for New Requirements’. Content Strategy Consultant, Sarah O’Keefe, will present on the importance of developing content initiatives to improve technical communication workflows.

Language Technology 11:15 – 12:00: ‘Real-time Selection of Best Assets Based on Productivity Analysis’. Anton Voronov, Innovations Director for ABBYY Language Services will discuss the use of productivity metrics and translator preferences in developing a pricing structure and best practices for Machine Translation deployment.

Language Technology 14:45 – 15:30: ‘Developing “Ideal” Software for Language Industry’. Julia Makoushina and Eugenia Tashkun will co-present on developing the “ideal” Language technology. They will discuss the software possibilities from both the user and developer’s perspective, and how to identify and meet user needs.

Language Technology 16:15 – 17:00: ‘Extracting Translation Relations for Human-readable Dictionaries from Bilingual Text’. Kurt Eberle, Managing Director and Co-founder of Lingenio GmbH, talks about “cross-lingual expression” through the identification and extraction of dictionary entries from source and target texts.

Content Strategy 17:15 – 18:00: ‘The Convergence Era: Translation Becomes a Utility’. TAUS Founder and Director, Jaap van der Meer, will discuss the evolution of the translation industry and what this will mean for content creators.

Thursday November 7th 9:00 – 18:00

Localization 08:30 – 08:45: ‘Welcome Session’. Don De Palma, Founder Common Sense Advisory will give a welcome session in Room 12B.

Localization 08:45 – 09:30: ‘Is Your Content Ready to Go Global?’ Localization and Content Strategy Consultant, Kit Brown-Hoekstra will discuss how localized quality content can be leveraged as a competitive advantage.

Localization 09:45 – 10:30: ‘Rules of Engagement: Successful Partnerships with Translation/Localization Companies’. Aki Ito, Localization Professional, will co-present with Robin Franke, a Technical Product Communication Specialist, on the client-vendor partnership and the role each partner plays when working together.

Language Technology 11:15 – 12:00: ‘Welcome to the Cloud! Terminology as a Service’. Dr. Andrejs Vasiljevs will introduce a cloud based terminology platform and solutions for using multilingual terminological data. The talk will target both language workers and Machine Translation users.

Localization 12:15 – 13:00: ‘Machine Translation in the Mainstream: New Tools, New Gains, New Headaches’. Daniel Grasmick from Lucy Software and Services GmbH, will discuss how Machine Translation can be built into the localization lifecycle.

Language Technology 16:00 – 16:45: ‘Terminology in the cloud with MemoQ and TaaS’. CEO of Kilgray Translation Technologies, Istvan Lengyel and Kilgray’s Founder, Gabor Ugray will present on TaaS CAT tool solutions and their integration with memoQ.

Localization 17:15 – 18:00: ‘Closing Session: Summaries and Lessons Learned’. Don De Palma will cover the sessions highlights and future localization “trends” and “innovations”.

Friday November 8th 9:00 – 16:00

Localization 08:45 – 09:30: ‘Simplified English and MT: Best Practices for Localization Content Optimization and Simplification’. Alberto Ferreira, Avira Operations will discuss Machine Translation and automated post editing integration into the localization workflow.

Localization 09:45 – 10:30: ‘A Unified Model for Document and Translation Quality Assurance’. Dr. Aljoscha Burchardt and Dr. Arle Lommel will address Translation quality assurance (QA) with QTLaunchPad, an open source software project funded by the European commission.

Localization 14:30 – 15:15: ‘UX and Localization: Optimal Design Practices for World-Ready Applications’. Alberto Ferreira will talk about user interface (UI) and web design localization trends based on “platform-independent design principles”.  Ferreira will cover topics such as; usability testing, visual text layout, cultural adaptation, internationalisation concerns, cost reduction and the time-to-market development cycle.

Localization 15:30 – 16:15: ‘The “International Persona” – Usability and Localization Communication consultant, Henrietta Hartl will discuss the use of an “international persona” in localization usability evaluation.

It will be a busy couple of days with informative presentations and workshops from industry experts and market leaders. KantanMT hopes you all enjoy the conference!

Would you like to learn how Machine Translation can increase your business opportunities? Contact Kevin McCoy, KantanMT’s Machine Translation Success Coach: kevinmcc@kantanmt.com.