Leveraging MT to Improve Productivity

KantanMT Leveraging MT in BusinessCommunication is the one of the most important elements of business, and Machine Translation is a flexible tool that can be used to facilitate communication in a wide variety of scenarios and situations. Multinationals and other companies operating globally can take advantage of Machine Translation to achieve productivity gains.

This two part blog series examines two very different examples of implementing Machine Translation. This first post will look at what multinational organizations should consider before introducing Machine Translation to their business, and the second post will discuss the productivity gains and competitive advantages that can be achieved by Language Service Providers (LSPs) who adopt MT.

What is a multinational and why should it use Machine Translation?

Multinational corporations or global businesses are organizations operating in more than one country or region. The concept of an ‘international company’ has been around for hundreds of years, going back to the trading companies, which were established in the 1700s. Outside political agendas, their main purpose was to trade in spices and other commodities throughout Asia and Europe exposing traders to different languages and cultures.

Hundreds of years later, global communication is common place as more businesses operate internationally. There are no boundaries, and companies with worldwide operations require a constant flow of multilingual communication in order to maintain relationships between global employees, customers and stakeholders.

Multinational organizations typically have two types of content; external and internal. External content is created and released to the public; corporate documents, investor information, Corporate Social Responsibility (CSR) and marketing communications. On the other hand, internal content is created for use within the company, this is usually in the form of email and chat communications, memos and other internal documents.

To Translate or not to translate

Organizations without an in house translation team, often outsource the translation of external content to a reputable LSP. This ensures a guaranteed level of quality for the translation, and it also means that the process of localization is more efficient and cost effective. This is because, over time language assets in the form of translation memories, can be built up and leveraged to off-set the cost of future translations.

Internal content, however, is mostly comprised of communications between departments; emails, chats and information on sales and marketing activities. These are usually not translated professionally for a number of reasons:

  • Cost – the volume to be translated can make costs unmanageable
  • Confidentiality – managing sensitive information is more difficult
  • Real-time translation – emails and chat conversations generally requires real-time speed

As an example, if a company is headquartered in the United States, but operates in both Asia and Europe there is a very high possibility that more than one language is used in the company’s internal communication.

Multinational companies often select working languages that must be used for internal communications and department managers are sometimes required to have a certain level of proficiency in the company’s designated working languages, which usually includes English.

Large organizations like the United Nations also have official languages. In this case, documents are not published until a translation has been prepared in each official language.

So, what happens when an email with a client’s product specifications and sales information is sent to a group of employees who speak different languages? Some of those readers may have limited knowledge of the language being used, and only be able to understand the communication, but are not familiar enough with the language to write a coherent response. This can result in them responding in their native language. Suddenly, a single conversation thread contains more than one language, with a greater potential for miscommunication.

Why use Machine Translation?

Multinationals with global operations often have issues with the quantity and flow of internal information between departments operating in different languages. If the corporate headquarters uses a different language than its global subsidiaries, corporate documents need to be translated into each language as the internal information moves down the organizational hierarchy.

Machine Translation is a solution that can provide an instant, understandable ‘gist’ of internal information across a company operating in different languages and the use of MT can serve two purposes:

  • Documents that require a professional human translation are easily identified
  • Internal documents can be translated instantly so employees can get an understanding of the content

In order to understand internal content, employees often might use an open source MT solution such as Google Translate. While this is useful, it does not take into consideration any proprietary jargon or writing styles specific to the organization, and it also raises the question of confidentiality.

Challenges of MT

Many organizations may be interested in taking steps to deploy their own MT systems rather than outsourcing translation jobs or asking bilinguals in the company to do ad hoc translations. Those considering MT have two options; develop their own in house system or use a cloud-based subscription model.

Implementing any new process has challenges and MT is no exception. Some challenges traditionally associated with implementing MT systems are:

  • High costs
  • Complex technology
  • Long deployment times

How should an MT system be integrated?

Before going ahead with an MT solution, an organization needs to carefully consider what it hopes to achieve from implementing Machine Translation. The company should evaluate all the perceived benefits thoroughly, including managing any and all expectations about using Machine Translation.

Organizations thinking of implementing MT should ask:

  • What is its purpose? – Will MT be used as a management tool to improve internal communication and productivity, or to make decisions on what documents require professional outside translation? The purpose should be clearly defined at the outset.
  • Do we have enough language assets to build high quality engines? Bilingual language assets are a key ingredient for building MT engines. The quality of the training data will have a direct impact on the MT engines output “garbage in, garbage out”.
  • Should we invest in building our own system or buy a cloud-based subscription service? MT systems can be rule-based (RBMT), statistical (SMT) and hybrid. In house development of a propriety MT system requires a heavy technology, HR and training investment, unless those assets are readily available. Cloud-based subscription models do not require such a heavy initial investment and are often more cost effective than developing and managing an in house MT system.
  • Is the Machine Translation option scalable? How many language combinations will be needed? If each language pair requires its own unique engine, how simple is it to build additional engines with new language combinations? Scalability will be determined by translating capacity and the ability to add new language combinations, this would be especially important when entering different language markets or expanding the business to new regions. The MT solution should align itself with the company’s long term goals.
  • How will MT be integrated into everyday workflows?  Users need to be able to easily access translation functions through their existing applications like email or the company intranet system to make it accessible and viable.
  • What indirect costs and planning will be involved? RBMT and hybrid systems require qualified linguists or language experts to develop and manage the engines. SMT systems use algorithms to identify probable translations based on the frequency, therefore, storage capacity is essential for the large volumes of training data required. Cloud options eliminate the need for in house technology investment, but extra costs might be incurred for going over the subscription plans, similar to the minutes allowance with mobile phone usage.

In carefully answering these questions, any organization planning to implement MT can stay focused on using the most cost-effective solution and achieve productivity gains with less miscommunication and more time savings.

The next part of this blog will look at how LSPs can leverage Machine Translation technology for productivity gains and competitive advantage.

Translation Technology Conferences and Events for 2014

KantanMT events2014 has arrived – and there is no better way to get the ball rolling than by planning what events to attend. Over the next twelve months there is a vast selection of conferences, unconferences, workshops, roundtables, webinars and other events planned around the world.

It was hard to narrow the list of everything going on, so KantanMT tried to focus on events that were related to Machine Translation and the Natural Language Processing (NLP) industry, localization, translation technologies and post-editing. Some of the events are more academic, while others are more business orientated.

Unconferences and Conferences…

We added some ‘unconferences’ to the list, these are the opposite of conferences. Unconferences are peer-to-peer interactions on topics chosen by participants at the beginning of a session, unlike more formal conferences. Unconference participants choose the topics, so it is much easier to promote an open discussion and are a good way for industry professionals to get together in an informal setting, sharing their own challenges and solutions.

Localization World, one of the biggest industry conferences, has had a great response from holding unconferences alongside its traditional conferences and the Association of Language Companies (ALC) also endorses the value of unconferences. The next ALC unconference will held in the early part of February.

Hopefully, this list will be a useful resource in deciding what events and conferences to visit during 2014. You may have registered for some of these events already, if not, then now is the time to start filling in your calendar. If you know of a relevant conference or event we missed, please add it to the comment section at the bottom of this post.

2014 Listings

January

Jan 8, 2014 (17:00-18:00 CET)

Webinar: TAUS Translation Technology Showcase – XTRF and Kilgray’s memoQ

Tomasz Mróz, XTRF Operations Director will present usage scenarios on integrating XTRF technology into the translation workflows, TM integration and faster project turnaround times. István Lengyel, CEO of Kilgray will also be presenting on memoQ, a cloud-based translation technology platform for translation management.


Jan 9, 2014

Webinar:  TAUS Dynamic Quality Framework Users Call

The users call is a bi-monthly webinar where TAUS members discuss solutions for measuring Machine Translation quality. Some of the participants include; Autodesk, CA Technologies, Cisco, Dell, Digital Linguistics, eBay, EMC and Google. To register for the webinar, members can email memberservices@taus.net


Jan 15, 2014

Webinar: The Convergence Era: Translation as A Utility (The Content Wrangler, TAUS)

This webinar, hosted by BrightTalk is a discussion by Jaap van der Meer (TAUS) and Scott Abel (The Content Wrangler) on how translation has become a necessary part of everyday life, the same way as electricity, water and the internet have become indispensable.


Jan 16, 2014

Meeting/Webinar: L20n: Next Generation Localization Framework for the Web, The International Multilingual Computing User Group (IMUG), San José, California USA

Zbigniew Braniecki, Software Engineer, Mozilla Corporation will speak about L20n, a new localization framework that isolates localization and enables translators to give naturally expressive translations for even the most complex user interfaces. Mozilla is investing in moving its products – Firefox, Firefox OS, and Firefox for Android – to this new architecture.


Jan 23, 2014

Unconference: Localization Unconference, Achievers Head office Toronto, Canada

This unconference is an all-day event starting at 09:30am and will cover internationalization and localization topics. It is organized by Jenny Reid, Localization Project Manager, BlackBerry; Oleksandr Pysaryuk, Localization Manager, Achievers; and Richard Sikes, Principal Consultant, Localization Flow Technologies.


Jan 30, 2014 (11:00 EST/17:00 CET)

Webinar: Integrating Your Content Platform, Globalization and Localization Association

Anders Holt, European Director and Robert Timms, Technical Director at translate plus will present a webinar on integrating content management platforms; CMS, DMS, PIM or e-procurement system into the translation workflow. They will discuss the integration methods available and how to get the best results and benefits of integration.


Jan 30-31, 2014

Conference: 2014 CRITT – WCRE Conference, Translation in transition: between cognition, computing and technology, Copenhagen Business School (CBS), Frederiksberg, Denmark

This academic conference presents research from the centre for research and innovation in translation and translation technology (CRITT). The program covers a variety of topics including; translation and cognitive processes, translation and translation theory and observations about Machine Translation and translation and post-editing.


February

Feb 5, 2014 (17:00-18:00 CET)

Webinar: TAUS Translation Technology Showcase – Ontram and Across Language Server v6

Christian Weih, Chief Sales Officer from Across Systems presents a TMS platform that integrates all aspects of the translation workflow.


Feb 6-8, 2014

Unconference: ALC Unconference, (Association of Language Companies), Palm Beach Gardens, Florida USA

The Unconference is geared towards language company owners and senior members of staff who get together without any formal presentation structure for more intimate brainstorming and discussion sessions in a casual and relaxed environment.


Feb 6, 2014 (11:00 EST/17:00 CET)

Webinar: Maximizing Translation Efficiency: Best QA Practices for Large Multi-channel Publishing Projects

Jose Sermeno, Product Evangelist at MadCap Software and Peter Argondizzo, Translation and Localization PM at MadTranslations discuss QA best practices that will make projects more efficient.


Feb 24-26, 2014

Conference: ‘Localization in a Shifting Global Economy’ Localization World, Bangkok Thailand

The first of three Localization World conferences of 2014, Localization World is the leading conference for international business, translation and localization providing opportunities for networking and information exchange.


Feb 26-28, 2014

Conference, workshops:  ICC (Intelligent Content Conference) 2014, San José, California USA

ICC focuses on the creation and management of content in different languages on any device. The topics that will include; content strategy, content marketing, content engineering, structured content, ebooks, mobile, apps, adaptive content, automated translation, terminology management, big data and analytics.


Feb 27, 2014 (11:00 EST/17:00 CET)

Webinar: GALA Translation Project Management with memoQ Server Training session

Daniel Zielinski will explain how the memoQ server can be used for managing translation projects effectively. See the different types of projects and workflows supported, and learn how to set up, prepare, monitor and complete a translation project with the memoQ server.


Feb 27 – Mar 1, 2014

Conference: memoQfest Americas, Kilgray Translation Technologies, Los Angeles, California USA

This three day event is hosted by Kilgray Translation Technologies and is aimed at freelance language professionals, LSPs and corporate translation users. The conference gives an overview of translation technology and how it can be integrated into businesses.


March

Mar 3-6, 2014

Conference: WritersUA, the conference for Software User Assistance, Palm Springs, California USA

This conference is for those involved in creating user assistance content. There will be a variety of presentations focused on developing content strategies, key technologies and tools that are used to create well-designed interfaces, technical communications and support information.


Mar 5, 2014 (17:00-18:00 CET)

Webinar: TAUS Translation Technology Showcase – Safaba and KantanMT

The theme of this webinar is the application and influence of MT technologies on global business. Tony O’Dowd, Founder and Chief Architect presents the KantanMT.com cloud-based platform introducing some of the KantanMT technologies and usage cases, including; KantanWatch, KantanISR, KantanAnalytics, TotalRecall, PEX and GENTRY.

Udi Hershkovich, Vice President of Business Development at Safaba will discuss key business imperatives for businesses and how Enterprise MT removes the language barriers that face global businesses.


Mar 13-14, 2014

Conference: International Conference on Translation and Accessibility in Video Games and Virtual Worlds at Universitat Autònoma de Barcelona, Spain

The conference is a meeting point for academics, professionals and students involved in the game localization industry. The conference aims to foster the interdisciplinary debate in these fields, combine them as academic areas of research and contribute to the development of best practices.


Mar 17-21, 2014

Conference: Game Localization Summit at GDC, IGDA Game Localization SIG, San Francisco, California USA

The game Localization Summit at GDC is supported and organized by the IGDA Game Localization SIG, and it is aimed at helping localization professionals as well as the entire community of game developers and publishers understand how to plan and execute game localization and culturalization as a part of the development cycle. There are other GDC conferences planned for Europe and China later in the year.


Mar 23-26, 2014

Conference: GALA 2014, Globalization and Localization Association (GALA), Istanbul, Turkey

The annual GALA conference brings together localization industry professionals for networking opportunities and peer-to-peer learning of the latest technologies and emerging trends in localization, language and translation technology.


Mar 28-29, 2014

Conference: The Translation and Localization Conference, Localize.pl, TexteM, KOMTE, Warsaw, Poland

This is an annual international event focusing on the latest technologies and localization industry trends. The conference is suited to LSPs and freelance translators, and covers technical communication and implications for the translation industry. Big data vs. the translation industry; CAT tools, MT, cloud computing, project management and the human factor; recruitment and training.


April

Apr 2, 2014 (17:00-18:00 CET)

Webinar: Translation Technology Showcase, TAUS – tauyou and Pangeanic

Diego Bartolome, CEO tauyou will discuss the ‘Big Data’ approach to SMT and the importance of clean data on output quality.


Apr 10-11, 2014

Event: TAUS Executive Forum, Oracle Japan, Tokyo, Japan

The executive forum consists of two-days of meetings for buyers and providers of language services and technologies. It is an open exchange about language business innovation and translation technology with the theme ‘translation as a utility’. Topics to be covered include; translation data, MT showcases, DQF evaluation, translation customer support and integration with CRM systems.


Apr 13-15, 2014

Conference: MadWorld 2014, MadCap Software, Inc., San Diego, California USA

Designed to cater for technical writers, documentation managers and content strategists. This is the top conference for technical communication and content strategy.


Apr 25, 2014

Conference: TCeurope Colloquium, Conseil des Rédacteurs Techniques, Aix-en-Provence, France

Conference themes include; looking at the essential core skills of a technical communicator, accessibility and usability, technical communication and social media, multi‐authoring and international teamwork and training technical authors in the internet age.


Apr 26-30, 2014

Conference: EACL-2014, European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden

Available to all ACL members and covers research in computational linguistics, psycholinguistics, speech, information retrieval, multimodal language processing and language issues in emerging domains such as bioinformatics and social media. Workshops and tutorials are held during Saturday-Sunday April 26-27th, while the main conference is runs from Monday-Wednesday April 28th-30th.


May

May 7, 2014 (17:00-18:00 CET)

Webinar: Translation Technology Showcase, TAUS – TaaS and Interverbum

TaaS and Interverbum present in this month’s Translation Technology Showcase by TAUS.


May 7-9, 2014

Conference: memoQfest International, Kilgray Translation Technologies, Budapest, Hungary

This conference aims to set up a forum where companies, LSPs and translators can discuss workflows and best practices that relate to memoQ or translation technology in general. Attendees will discuss industry trends attend workshops and exchange information with translators, LSPs, and translation end users.


May 7-8, 2014

Workshop: Making the Multilingual Web Work, MultilingualWeb, Madrid, Spain

The workshop is supported by the LIDER project and aims to survey and share information about best practices and standards for promoting multilingualism on the web.


May 8-9, 2014

Conference: Intelligent Content – Life Sciences and Healthcare, the Rockley Group, the Content Wrangler, San Francisco, California USA

The event will showcase examples, standards, methods, strategies and tools needed to help pharmaceutical companies, medical device manufacturers, and healthcare firms deliver the right information, in the right language, on any device. Conference topics include; mhealth, ehealth, digital health, personalized healthcare content and advanced translation technologies.


May 17-18, 2014

Conference: UTIC 2014, Ukrainian Translation Industry Conference, Kiev, Ukraine

Translators, managers, educators and software developers get together for networking opportunities and to discuss future industry trends.


May 18-21, 2014

Conference: Technical Communication Summit 2014, Society for Technical Communication, Phoenix, Arizona USA

The Technical Communication Summit is a source of learning for professional technical communicators giving training on the latest communication techniques, publishing technologies and business trends in the industry.


May 18-21, 2014

Conference: ALC 2014 Annual Conference, Association of Language Companies, Palm Springs, California USA

This conference is a networking event for anyone doing business with LSPs, combining educational content and networking.


May 23, 2014

Roundtable: TAUS Translation Automation Roundtable, TAUS, Moscow, Russia

Hosted by ABBYY Language Services, is a meeting for buyers and providers of translation services. The participants will get a good insight into MT technology, customization, implementation requirements and business cases.


May 26-31, 2014

Conference: LREC 2014, the European Language Resource Association, Reykjavík, Iceland

LREC is focused on Language Resources (LRs) and Evaluation for Language Technologies (LT). The aim of LREC is to give an overview of LR and LTs, emerging trends and the exchange of information.


June

June 2-3, 2014

Event: TAUS Industry Leaders Forum 2014, Clontarf Castle Hotel, Dublin

The theme for this meeting is ‘convergence’ with industry leaders discussing best practices, possible common approaches and shared services to optimize translation efficiencies through a series of short presentations.


Jun 3-4, 2014

Workshop: Localization Project Management Certification – The Localization Institute, Clarion Hotel, Dublin, Ireland

As part of the LPM Certification Program, this two-day project management training workshop will be held alongside Localization world. There is an eight week self-study part that must be completed before the workshop. It is open to Localization Project Managers with at least three years project management experience. Early bird and group registration discounts are available.


Jun 4-6, 2014

Conference: Localization World Dublin, Localization World Ltd., Dublin, Ireland

The second localization conference of 2014 will be held in Dublin with the theme of “disruptive innovation” and how this impacts the localization industry and the role of translators. Topics covered at the conference will include; advanced localization management, global business, localization core competencies and technology.


Jun 5-6, 2014

Conference: UA Europe 2013, UA Europe, Kraków, Poland

In association with Writers UA, the UA Europe technical communication conference focuses on software user assistance and online Help, and provides information on the latest industry trends, technical developments, and best practice in software UA.


Jun 16-18, 2014

Conference: EAMT 2014, European Association for Machine Translation, Dubrovnik, Croatia – 17th Annual Conference of the European Association for Machine Translation

The conference is aimed at anyone interested in MT and translation-related tools and resources. Topics will include; MT in multilingual public service (eGovernment etc.), MT for the web, MT embedded in other services, MT evaluation techniques and evaluation results and more.


August

Aug 23-29, 2014

Conference: COLING 2014, International Committee for Computational Linguistics, Dublin, Ireland

The bi-annual COLING conference, is one of the premier Natural Language Processing conferences in the world. The conference will include full papers, oral presentations, poster presentations, demonstrations, tutorials, and workshops on a variety of technical areas on natural language and computation.


September

Sep 25-26, 2014

Workshop: IATIS Regional Workshop, Translator and Interpreter Training, Serbia

This conference is aimed at promoting translator training, and will address training in areas such as field/domain specialization, technical skills (including pre-/post-editing of MT), revision skills and management skills (soft skills).


October

Oct 4-5, 2014

Conference: MedTranslate 2014, GxP Language Services, Freiburg im Breisgau, Germany


Oct 6-7, 2014

Workshop: Localization Project Management Certification, the Localization Institute, Seattle, Washington USA

As part of the LPM Certification Program, this two-day project management training workshop will be held alongside Localization world.


Oct 19, 2014

Unconference: Localization World Unconference, Seattle

The agenda will be set in the first session and then there will be 3-4 break-out sessions with topics the group chose together. Attendees can submit topics to be considered from Wednesday, October 17th and can be submitted at VistaTEC’s booth.


Oct 27-28, 2014

Conference: TAUS User Conference, TAUS, Vancouver, Canada

The TAUS Annual Conference 2014 will be co-located with the Localization World Conference taking place in the Convention Centre, Vancouver, BC, Canada.


Oct 29-31, 2014

Conference: Localization World Vancouver, Localization World Ltd., Vancouver, Canada

Localization World provides an opportunity for the exchange of information in the language and translation services and technologies market.


November

Nov 3-5, 2014

Conference: 38th Internationalization & Unicode Conference (IUC38), Object Management Group, Santa Clara, California USA

The conference is for internationalization experts, tools vendors, software implementers, and business and program managers who want to discuss the best methods for doing business in international markets. The conference will feature subject areas; cloud computing, upgrading to HTML5, integrating with social networking software, and implementing mobile apps.


Nov 5-8, 2014

Conference: 55th ATA Conference, American Translators Association, Sheraton Hotel Chicago, Illinois USA

A networking event for translators, project managers and industry professionals. The aim of the conference is to promote the professional development of translators and interpreters.


Nov 11-13, 2014

Conference:  tcworld – tekom, Stuttgart, Germany

The technical communication conference and trade fair examines different aspects of localization, internationalization and globalization. It is the largest technical communication, authoring and IT management conference in the world and participating companies offer industrial, software and services for technical communication.


December

Dec 8-12 2014

Conference: IEEE GLOBECOM, Austin Texas USA

The conference is the second largest of the 38 IEEE communications societies will focus on the latest advancements in broadband, wireless, multimedia, internet, image and voice communications.


Dec 15-18 2014

Conference: IEEE CloudCom 2014, Nanyang Avenue, Singapore

CloudCom promotes cloud computing platforms. It is co-sponsored by the Institute of Electrical and Electronics Engineers (IEEE) and the Cloud Computing Association. The conference attracts researchers, developers, users, students and practitioners from the fields of big data, systems architecture, services research, virtualization, security and privacy and high performance computing.

KantanMT will look forward to meeting you at some of these conferences over the next year.

KantanMT – 2013 Year in Review

KantanMT 2013 year in ReviewKantanMT had an exciting year as it transitioned from a publicly funded business idea into a commercial enterprise that was officially launched in June 2013. The KantanMT team are delighted to have surpassed expectations, by developing and refining cutting edge technologies that make Machine Translation easier to understand and use.

Here are some of the highlights for 2013, as KantanMT looks back on an exceptional year.

Strong Customer Focus…

The year started on a high note, with the opening of a second office in Galway, Ireland, and KantanMT kept the forward momentum going as the year progressed. The Galway office is focused on customer service, product education and Customer Relationship Management (CRM), and is home to Aidan Collins, User Engagement Manager, Kevin McCoy, Customer Relationship Manager and MT Success Coach, and Gina Lawlor, Customer Relationship co-ordinator.

KantanMT officially launched the KantanMT Statistical Machine Translation (SMT) platform as a commercial entity in June 2013. The platform was tested pre-launch by both industry and academic professionals, and was presented at the European OPTIMALE (Optimizing Professional Translator Training in a Multilingual Europe) workshop in Brussels. OPTIMALE is an academic network of 70 partners from 32 European countries, and the organization aims to promote professional translator training as the translation industry merges with the internet and translation automation.

The KantanMT Community…

The KantanMT member’s community now includes top tier Language Service Providers (LSPs), multinationals and smaller organizations. In 2013, the community has grown from 400 members in January to 3400 registered members in December, and in response to this growth, KantanMT introduced two partner programs, with the objective of improving the Machine Translation ecosystem.

The Developer Partner Program, which supports organizations interested in developing integrated technology solutions, and the Preferred Supplier of MT Program, dedicated to strengthening the use of MT technology in the global translation supply chain. KantanMT’s Preferred Suppliers of MT are:

KantanMT’s Progress…

To date, the most popular target languages on the KantanMT platform are; French, Spanish and Brazilian-Portuguese. Members have uploaded more than 67 billion training words and built approx. 7,000 customized KantanMT engines that translated more than 500 million words.

As usage of the platform increased, KantanMT focused on developing new technologies to improve the translation process, including a mobile application for iOS and Android that allows users to get access to their KantanMT engines on the go.

KantanMT’s Core Technologies from 2013…

KantanMT have been kept busy continuously developing and releasing new technologies to help clients build robust business models to integrate Machine Translation into existing workflows.

  • KantanAnalytics™ – segment level Quality Estimation (QE) analysis as a percentage ‘fuzzy match’ score on KantanMT translations, provides a straightforward method for costing and scheduling translation projects.
  • BuildAnalytics™ – QE feature designed to measure the suitability of the uploaded training data. The technology generates a segment level percentage score on a sample of the uploaded training data.
  • KantanWatch™ – makes monitoring the performance of KantanMT engines more transparent.
  • TotalRecall™ – combines TM and MT technology, TM matches with a ‘fuzzy match’ score of less than 85% are automatically put through the customized MT engine, giving the users the benefits of both technologies.
  • KantanISR™ Instant Segment Retraining technology that allows members near instantaneous correction and retraining of their KantanMT engines.
  • PEX Rule Editor – an advanced pattern matching technology that allows members to correct repetitive errors, making a smoother post-editing process by reducing post-editing effort, cost and times.
  • Kantan API – critical for the development of software connectors and smooth integration of KantanMT into existing translation workflows. The success of the MemoQ connector, led to the development of subsequent connectors for MemSource and XTM.

KantanMT sourced and cleaned a range of bi-directional domain specific stock engines that consist of approx. six million words across legal, medical and financial domains and made them available to its members. KantanMT also developed support for Traditional and Simplified Chinese, Japanese, Thai and Croatian Languages during 2013.

Recognition as Business Innovators…

KantanMT received awards for business innovation and entrepreneurship throughout the year. Founder and Chief Architect, Tony O’Dowd was presented with the ICT Commercialization award in September.

In October, KantanMT was shortlisted for the PITCH start-up competition and participated in the ALPHA Program for start-ups at Dublin’s Web Summit, the largest tech conference in Europe. Earlier in the year KantanMT was also shortlisted for the Vodafone Start-up of the Year awards.

KantanMT were silver sponsors at the annual 2013 ASLIB Conference ‘Adopting the theme Translating and the Computer’ that took place in London, in November, and in October, Tony O’Dowd, presented at the TAUS Machine Translation Showcase at Localization World in Silicon Valley.

KantanMT have recently published a white paper introducing its cornerstone Quality Estimation technology, KantanAnalytics, and how this technology provides solutions to the biggest industry challenges facing widespread adoption of Machine Translation.

KantanAnalytics WhitePaper December 2013

For more information on how to introduce Machine Translation into your translation workflow contact Niamh Lacy (niamhl@kantanmt.com).

Overcome Challenges of building High Quality MT Engines with Sparse Data

KantanMT Whitepaper Improving your MT

Many of us, involved with Machine Translation are familiar with the importance of using high quality parallel data to build and customize good quality MT engines. Building high quality MT engines with sparse data is a challenge faced not only by Language Service Providers (LSPs), but any company with limited bilingual resources. A more economical alternative to creating large quantities of high quality bilingual data can be found by adding monolingual data in the target language to an MT engine.

Statistical Machine Translation systems use algorithms to find the most probable translations, based on how often patterns occur in the training data, so it makes sense to use large volumes of bilingual training data. The best data to use for training MT engines is usually high quality bilingual data and glossaries, so it’s great if you have access to these language assets.

But what happens when access to high quality parallel data is limited?

Bilingual data is costly and time-consuming to produce in large volumes, so the smart option is to come up with more economical language assets, and monolingual data is one of those economical assets. MT output fluency improves dramatically, by using monolingual data to train an engine, especially in cases where good quality bilingual data is a sparse language resource.

More economical…

Many companies lack the necessary resources to develop their own high quality in domain parallel data. But, monolingual data – is readily available in large volumes across different domains. This target language content can be found anywhere; websites, blogs, customers and even company specific documents created for internal use.

Companies with sparse parallel data can really leverage their available language assets with monolingual data to produce better quality engines, producing more fluent output. Even those with access to large volumes of bilingual data can still take advantage of using monolingual data to improve target language fluency.

Target language monolingual data is introduced during the engine training process so the engine learns how to generate fluent output. The positive effects of including monolingual data in the training process have been proven both academically and commercially.  In a study for TAUS, Natalia Korchagina confirmed that using monolingual data when training SMT engines considerably improved the BLEU score for a Russian-French translation system.

Natalia’s study not only “proved the rule” that in domain monolingual data improves engine quality, she also identified that out of domain monolingual data also improves quality, but to a lesser extent.

Monolingual data can be particularly useful for improving scores in morphologically rich languages like; Czech, Finnish, German and Slovak, as these languages are often syntactically more complicated for Machine Translation.

Success with Monolingual Data…

KantanMT has had considerable success with its clients using monolingual data to improve their engines quality. An engine trained with sparse bilingual data (the sparse bilingual data was still greater than the amount of data in Korchagina’s study) in the financial domain showed a significant improvement in the engine’s overall quality metrics when financial monolingual data was added to the engine:

  • BLEU score showed approx. 40% improvement
  • F-Measure score showed approx. 12% improvement
  • TER (Total Error Rate), where a lower score is better saw a reduction of approx. 50%

The support team at KantanMT showed the client how to use monolingual data to their advantage, getting the most out of their engine, and empowering the client to improve and control the accuracy and fluency of their engines.

How will this Benefit LSPs…

Online shopping by users of what can be considered ‘lower density languages’ or languages with limited bilingual resources is driving demand for multilingual website localization. Online shoppers prefer to make purchases in their own language, and more people are going online to shop as global internet capabilities improve. Companies with an online presence and limited language resources are turning to LSPs to produce this multilingual content.

Most LSPs with access to vast amounts of high quality parallel data can still take advantage of monolingual data to help improve target language fluency. But LSPs building and training MT engines for uncommon language pairs or any language pair with sparse bilingual data will benefit the most by using monolingual data.

To learn more about leveraging monolingual data to train your KantanMT engine; send the KantanMT Team an email and we can talk you through the process (info@kantanmt.com), alternatively, check out our whitepaper on improving MT engine quality available from our resources page.

 

 

Motivate Post-Editors

KantanMT motivate post-editorsPost-editing is a necessary step in the Machine Translation workflow, but the role is still largely misunderstood. Language Service Providers (LSPs) are now experimenting more with the best practices for post-editing in the workflow. The lack of consistent training and reluctance within the industry to accept importance of the role are linked to the post-editors motivation. KantanMT looks at some of the more conventional attitudes towards motivation and their application to post-editing.

What is motivation and what studies have been done so far?

Understanding the concept of motivation has been a hot topic in many areas of organisation theory. Studies in the area really began to kick off with their application in the workplace, opening doors for pioneers to understand how employees could be motivated to do more work, and do better work.

Motivation Pioneers

  • Abraham Maslow and his well-known ‘Hierarchy of Needs’ indicates a person’s motivations are based on their position in the hierarchy pyramid.
  • Frederick Herzberg’s ‘two Factor Theory’ or Herzberg’s motivation-hygiene theory suggests professional activities like; professional acknowledgement, achievements and work responsibility, or job satisfiers have a positive effect on motivation.
  • Douglas McGregor used a black and white approach to motivation in his ‘Theory X and Theory Y’. He grouped employees into two categories; those who will only do the minimum and those who will push themselves.

As development of theories continued…

  • John Adair came up with the ‘fifty-fifty theory’ . According to the fifty-fifty theory, motivation is fifty percent the responsibility of the employee and fifty percent outside the employee’s control.

Even more recently, in 2010

  • Teresa Amabile and Steven Kramer carried out a study on the motivation levels of employees in a variety of settings. Their findings, suggest ‘Progress’ as the top performance motivator identified from an analysis of approx. 12,000 diary entries, daily ratings of motivation and emotions from hundreds of study participants.

To understand post-editor motivation we can combine the top performance motivator; progress with fifty-fifty theory.

Progress is a healthy motivator in the post-editing profession, it can help Localization Project Managers understand and encourage post-editor satisfaction and motivation. But while progress can be deemed an external factor, if we apply Adair’s ‘fifty-fifty’ rule, post-editors are also at least fifty percent responsible for their own motivation.

Post-editing as a profession is still only finding its feet, TAUS carried out a study in 2010 on the post editing practices of global LSPs. The study showed that, while post-editing is becoming a standard activity in the translation workflow it only accounts for a minor share of LSP business volume. This indicates that post-editors see their role as one of lesser importance because the industry views it as a role of lesser importance.

This attitude in the industry is highlighted by the lack of industry standards for post-editing best practices. Without evaluation practices to train post-editors and improve the post-editing process, post-editors are not making progress. This quite naturally is demotivating for the post-editor.

How to motivate post-editors

The first step in motivating post-editors is to recognise their role as autonomous to the role of a translator. The best post-editors are those, who are at least bilingual with some form of linguistic training, like a translator. Linguistic training is a major asset for editing the Machine Translated output.

TAUS offer a comparison of the translation process versus the post-editing process, highlighting the differences in the post-editing and translation processes.

KantanMT, Translator process Taus 2010
Translation process of a Translator (TAUS 2010)
KantanMT, Motivating Post-editors,
Translation process of a Post-editor (TAUS 2010)

One process is not more complicated that the other, only different. Translators, translate internally, while post-editors make “snap editing decisions” based on client requirements. As LSPs recognise these differences, they can successfully motivate their post-editors by providing them with the most suitable support, and work environment.

Progress as a Motivator

Translators make good post-editors, they have the linguistic ability to understand both the source and target texts, and if they enjoy editing or proof-reading, then the post-editing role will suit them. The right training is also important, if post-editors are trained properly they will become more aware of potential improvements to the workflow.

These improvements or ideas can be a great boost to post-editor motivation, if implemented the post-editor can take on more responsibility, which helps improve the translation workflow. A case where this could be applied is; if the post-editor is made responsible for updating the language assets used to retrain a Machine Translation system, they can take ownership and become responsible for the output quality rather than just post-editing Machine Translation output in isolation.

Fixing repetitive errors, can be frustrating for anyone, not just post-editors. But if they are responsible for the output quality, understand the system and can control the rules used to reduce these repetitive errors, they will experience motivation through progress.

This is only the tip of the iceberg on what motivates post-editors, each post-editor is different and how they feel about the role, whether it is just ‘another job’ or a major step in their career all play a part. The key is to provide proper training, foster an environment where post-editors can make progress by positively contributing to the role.

Translators often take pride and ownership of their translations, post-editors should also have the opportunity to take pride in their work, as it is their skills and experience that make it ‘publishable’ or even ‘fit for purpose’ quality.

Repetitive errors like diacritic marks or capitalisation can be easily fixed using KantanMT’s Post-Editing Automation (PEX) rules. PEX rules allow repetitive errors in a Machine Translation engine to be easily fixed using a ‘find and replace’ tool. These rules can be checked on a sample of the text by using the PEX Rule Editor.

The post-editor can correct repetitive errors during post-editing process, so the same errors don’t appear in future MT output, giving them responsibility over the Machine Translation engines quality.

Automatic Post-Editing

KantanMT - PEX Post EditorPost-Editing Machine Translation (PEMT) is an important and necessary step in the Machine Translation process. KantanMT is releasing a new, simple and easy to use PEX rule editor, which will make the post-editing process more efficient, saving both time, costs and the post-editors sanity.

As we have discussed in earlier posts, PEMT is the process of reviewing and editing raw MT output to improve quality. The PEX rule editor is a tool that can help to save time and cut costs. It helps post-editors, since they no longer have to manually correct the same repetitive mistakes in a translated text.

Post-editing can be divided into roughly two categories; light and full post-editing.  ‘Light’ post-editing, also called ‘gist’, ‘rapid’ or ‘fast’ post-editing focuses on transferring the most correct meaning without spending time correcting grammatical and stylistic errors. Correcting textual standards, like word order and coherence are less important in a light post-edit, compared to a more thorough ‘full’ or ‘conventional’ post-edit. Full post-edits need the correct meaning to be conveyed, correct grammar, accurate punctuation, and the correct transfer of any formatting such as tags or place holders.

The Client often dictates the type of post-editing required, whether it’s a full post-edit to get it up to ‘publishable quality’ similar to a human translation standard, or a light post-edit, which usually means ‘fit for purpose’. The engine’s quality also plays a part in the post-editing effort; using a high volume of in-domain training data during the build produce higher quality engines, which helps to cut post-editing efforts. Other factors such as language combination, domain and text type all contribute to post-editing effort.

Examples of repetitive errors

Some users may experience the following errors in their MT output.

  • Capitalization
  • Punctuation mistakes, hyphenation, diacritic marks etc.
  • Words added/omitted
  • Formatting – trailing spaces

SMT engines use a process of pattern matching to identify different regular expressions. Regular expressions or ‘regex’ are special text strings that describe patterns, these patterns need no linguistic analysis so they can be implemented easily across different language pairs. Regular expressions are also important components in developing PEX rules. KantanMT have a list of regular expressions used for both GENTRY Rule files (*.rul) and PEX post-edit files (*.pex).

Post-Editing Automation (PEX)

Repetitive errors can be fixed automatically by uploading PEX rule files. These rule files allow post-editors to spend less time correcting the same repetitive errors by automatically applying PEX constructs to translations generated from a KantanMT engine.

PEX works by incorporating “find and replace” rules. The rules are uploaded as a PEX file and applied while a translation job is being run.

PEX Rule Editor

KantanMT have designed a simple way to create, test and upload post-editing rules to a client profile.

KantanMT Pex Rule Editor

The PEX Rule editor, located in the ‘MykantanMT’ menu, has an easy to use interface. Users can copy a sample of the translated text into the upper text box ‘Test Content’ then input the rules to be applied in the ‘PEX Search Rules’ and their corrections to the ‘PEX Replacement Rules’ box. The user can test the new rules by clicking ‘test rules’ and instantly identify any incorrect rules, before they are uploaded to the profile.

The introduction of tools to assist in the post-editing process helps remove some of the more repetitive corrections for post-editors. The new PEX Editor feature helps improve the PEMT workflow by ensuring all uploaded rule files are correct leading to a more effective method for fixing repetitive errors.

Conference and Event Guide – December 2013

KantanMT eventsThings are winding down as we are getting closer to the end of the year, but there are still some great events and webinars coming up during the month of December that we can look forward to.

Here are some recommendations from KantanMT to keep you busy in the lead up to the festive season.

Listings

Dec 02 – Dec 05, 2013
Event: IEEE CloudCom 2013, Bristol, United Kingdom

Held in association with Hewlett-Packard Laboratories (HP Labs), the conference is open to researchers, developers, users, students and practitioners from the fields of big data, systems architecture, services research, virtualization, security and high performance computing.


Dec 04, 2013
Event: LANGUAGES & BUSINESS Forum – Hotel InterContinental Berlin

The forum highlights key issues in language education, particularly in the workplace and the new technologies that are becoming a key part of the process. The event, will promote international networking and has four main themes; Corporate Training, Pre-Experience Learners, Intercultural Communication and Online Learning.


Dec 05, 2013
Webinar: Effective Post-Editing in Human and Machine Translation Workflows

Stephen Doherty and Federico Gaspari, CNGL (Centre for Next Generation Localisation) will give an overview of post-editing and different post-editing scenarios from ‘gist’ to ‘full’ post-edits. They will also give advice on different post-editing strategies and how they differ for Machine Translation systems.


Dec 07 – Dec 09, 2013
Event: 6th Language and Technology Conference, Poznan, Poland

The conference will address the challenges of Human Language Technologies (HLT) in computer science and linguistics. The event covers a wide range of topics including; electronic language resources and tools, formalisation of natural languages, parsing and other forms of NL processing.


Dec 09 – Dec 13, 2013
Event: IEEE GLOBECOM 2013 – Power of Global Communications, Atlanta, Georgia USA

The conference, which is the second largest of the 38 IEEE technical societies will focus on the latest advancements in broadband, wireless, multimedia, internet, image and voice communications. Some of the topics presented referring to localization occur on the 10th December and include; Localization Schemes, Localization and Link Layer Issues, and Detection, Estimation and Localization.


Dec 10 – Dec 11, 2013
Event: Game QA & Localization 2013, San Francisco, California USA

This event brings together QA and Localisation Managers, Directors and VPs from game developers around the world to discuss key game localization industry challenges. The event in London, June 2013 was a huge success, as more than 120 senior QA and localization professionals from developers, publishers and 3rd party suppliers of all sizes and platforms came to learn, benchmark and network.


Dec 11 – Dec 15, 2013
Event: International Conference on Language and Translation, Thailand, Vietnam and Cambodia

The Association of Asian Translation Industry (AATI) is holding an International Conference on Language and Translation or “Translator Day” in three countries; Thailand on December 11, 2013, Vietnam on December 13, 2013, and Cambodia on December 15, 2013. The events provide translators, interpreters, translation agencies, foreign language centres, NGO’s, FDI financed enterprises and other translation purchasers with opportunities to meet.


Dec 12, 2013
Webinar: LSP Partnerships & Reseller Programs 16:00 GMT (11:00 EST/17:00 CET)

This webinar, which is hosted by GALA and presented by Terena Bell covers how to open up new revenue streams by introducing reseller programs to current business models. The webinar is aimed at world trade associations, language schools, and other non-translation companies wishing to offer their clients translation, interpreting, or localization services.


Dec 13 – Dec 14 2013
Event: The Twelfth Workshop on Treebanks and Linguistic Theories (TLT12), Sofia (Bulgaria)

The workshops, hosted by BulTreeBank Group­­­­­­­ serve to promote new and ongoing high-quality work related to syntactically-annotated corpora such as treebanks. Treebanks are important resources for Natural Language processing applications including Machine Translation and information extraction. The workshops will focus on different aspects of treebanking; descriptive, theoretical, formal and computational.


Are you planning to go to any events during December? KantanMT would like to hear about your thoughts on what makes a good event in the localization industry.