KantanMT – 2013 Year in Review

KantanMT 2013 year in ReviewKantanMT had an exciting year as it transitioned from a publicly funded business idea into a commercial enterprise that was officially launched in June 2013. The KantanMT team are delighted to have surpassed expectations, by developing and refining cutting edge technologies that make Machine Translation easier to understand and use.

Here are some of the highlights for 2013, as KantanMT looks back on an exceptional year.

Strong Customer Focus…

The year started on a high note, with the opening of a second office in Galway, Ireland, and KantanMT kept the forward momentum going as the year progressed. The Galway office is focused on customer service, product education and Customer Relationship Management (CRM), and is home to Aidan Collins, User Engagement Manager, Kevin McCoy, Customer Relationship Manager and MT Success Coach, and Gina Lawlor, Customer Relationship co-ordinator.

KantanMT officially launched the KantanMT Statistical Machine Translation (SMT) platform as a commercial entity in June 2013. The platform was tested pre-launch by both industry and academic professionals, and was presented at the European OPTIMALE (Optimizing Professional Translator Training in a Multilingual Europe) workshop in Brussels. OPTIMALE is an academic network of 70 partners from 32 European countries, and the organization aims to promote professional translator training as the translation industry merges with the internet and translation automation.

The KantanMT Community…

The KantanMT member’s community now includes top tier Language Service Providers (LSPs), multinationals and smaller organizations. In 2013, the community has grown from 400 members in January to 3400 registered members in December, and in response to this growth, KantanMT introduced two partner programs, with the objective of improving the Machine Translation ecosystem.

The Developer Partner Program, which supports organizations interested in developing integrated technology solutions, and the Preferred Supplier of MT Program, dedicated to strengthening the use of MT technology in the global translation supply chain. KantanMT’s Preferred Suppliers of MT are:

KantanMT’s Progress…

To date, the most popular target languages on the KantanMT platform are; French, Spanish and Brazilian-Portuguese. Members have uploaded more than 67 billion training words and built approx. 7,000 customized KantanMT engines that translated more than 500 million words.

As usage of the platform increased, KantanMT focused on developing new technologies to improve the translation process, including a mobile application for iOS and Android that allows users to get access to their KantanMT engines on the go.

KantanMT’s Core Technologies from 2013…

KantanMT have been kept busy continuously developing and releasing new technologies to help clients build robust business models to integrate Machine Translation into existing workflows.

  • KantanAnalytics™ – segment level Quality Estimation (QE) analysis as a percentage ‘fuzzy match’ score on KantanMT translations, provides a straightforward method for costing and scheduling translation projects.
  • BuildAnalytics™ – QE feature designed to measure the suitability of the uploaded training data. The technology generates a segment level percentage score on a sample of the uploaded training data.
  • KantanWatch™ – makes monitoring the performance of KantanMT engines more transparent.
  • TotalRecall™ – combines TM and MT technology, TM matches with a ‘fuzzy match’ score of less than 85% are automatically put through the customized MT engine, giving the users the benefits of both technologies.
  • KantanISR™ Instant Segment Retraining technology that allows members near instantaneous correction and retraining of their KantanMT engines.
  • PEX Rule Editor – an advanced pattern matching technology that allows members to correct repetitive errors, making a smoother post-editing process by reducing post-editing effort, cost and times.
  • Kantan API – critical for the development of software connectors and smooth integration of KantanMT into existing translation workflows. The success of the MemoQ connector, led to the development of subsequent connectors for MemSource and XTM.

KantanMT sourced and cleaned a range of bi-directional domain specific stock engines that consist of approx. six million words across legal, medical and financial domains and made them available to its members. KantanMT also developed support for Traditional and Simplified Chinese, Japanese, Thai and Croatian Languages during 2013.

Recognition as Business Innovators…

KantanMT received awards for business innovation and entrepreneurship throughout the year. Founder and Chief Architect, Tony O’Dowd was presented with the ICT Commercialization award in September.

In October, KantanMT was shortlisted for the PITCH start-up competition and participated in the ALPHA Program for start-ups at Dublin’s Web Summit, the largest tech conference in Europe. Earlier in the year KantanMT was also shortlisted for the Vodafone Start-up of the Year awards.

KantanMT were silver sponsors at the annual 2013 ASLIB Conference ‘Adopting the theme Translating and the Computer’ that took place in London, in November, and in October, Tony O’Dowd, presented at the TAUS Machine Translation Showcase at Localization World in Silicon Valley.

KantanMT have recently published a white paper introducing its cornerstone Quality Estimation technology, KantanAnalytics, and how this technology provides solutions to the biggest industry challenges facing widespread adoption of Machine Translation.

KantanAnalytics WhitePaper December 2013

For more information on how to introduce Machine Translation into your translation workflow contact Niamh Lacy (niamhl@kantanmt.com).

MT Lingo

293-blueman-thinking-designMachine Translation TerminologyMT technology can be overwhelming for those new to the industry, and getting to grips with the jargon can be a daunting task even for some of the most and industry savvy gurus. KantanMT put together a list of some acronyms, popular buzzwords and numeronyms, which are abbreviations that use numbers, so that you can keep up with the MT professionals or just brush up on your tech vocabulary.

Numeronyms:

  • L10n – Localization/Localisation the process of adapting and translating a product or service so that it is culturally acceptable for a specific country or region.
  • I18n – Internationalization/Internationalisation is a process implemented in the planning stages of a product or application, it ensures the infrastructure (coding) suits future translations or localizations. Some of the more common Internationalization preparation for software products involves supporting international character sets like Unicode, or ensuring there is enough space in the User Interface (UI) for text to be translated from languages like English with single-byte character codes to the multiple-byte character codes used in Chinese and Japanese Kanji.
  • G11n – Globalization/Globalisation refer to the internationalization and localization preparations for products and services to be released in global markets. It usually incorporates ‘sim-ship’ or simultaneous shipment to different regions.

Acronyms:

  • MT – Machine Translation or Automated Translation is a translation carried out by computer. A piece of natural language text like English is translated by computer software into another language like French.  Cloud MT is Machine Translation based on the cloud. There are different types of MT systems available.
  • RBMT – Rule-Based Machine Translation system that uses a list of syntactic, grammatical and translation rules to generate the most appropriate translations.
  • SMT – Statistical Machine Translation systems are data driven and have a statistical modelling architecture using algorithms to find the most probable match between source and target segments.
  • API – Application Programming Interface is an interface that allows communication and interoperability between two applications or software programs.
  • LSP – Language Service Provider sometimes referred to as a Localization Service Provider, is a service provider that carries out the translation and localization of different types of content for specific countries or locales.
  • TM – Translation Memory is a database of aligned source and target translations called segments. Segments can be words, sentences or paragraphs. TMs can be integrated with CAT tools and they help speed up the translation process. TM files can be used as training data to train SMT engines.
  • SPE – Statistical Post-editing is when Machine Translation output that has been post-edited is re-used as training data and fed back into the SMT engine or used to train a new engine.

Popular Buzzwords:

  • Normalization is the checking and cleaning up a Translation Memory so it can be included as training data for a SMT engine. Things to identify and correct are tags, mistranslations, sentence mismatches and stylistic features like upper and lower case inconsistencies.
  • CAT tools – Computer-Aided Translation tools/ Computer-assisted Translation tools are used by humans to support the translation process by managing MT, TM and glossaries.
  • Glossaries are vocabulary lists of specialised terminology, usually specific to an industry or organisation. These files can be uploaded as additional training data to an SMT engine.
  • Bilingual corpus/ Bi-text database is a large text document with source and target languages. If the corpus is aligned it can be used as training data for an SMT engine.

If you know any new terms or interesting words you heard from your experience in the language and localization industry, KantanMT would love to hear about them, just pop them into the comment box below.

A Truly Global Internet

multilingual, KantanMT, Localisation Industry

The internet became truly multilingual yesterday, as the Internet Corporation for Assigned Names and Numbers (ICANN), announced the release of four new generic top-level domains (gTLDs). gTLDs are internet domain names with language-specific scripts and the four new suffixes represent some of the world’s most widely spoken languages. Their selection for release by the ICANN was a strategic decision.

After Latin script, Chinese is the second most widely used alphabet with approx. 1340 million users, Arabic holds the number three position with 380 million users, and Cyrillic is number five used by approx. 250 million people. The four domain names released yesterday are:

  1. 游戏 (game) – Chinese

  2.   شبكة (web) – Arabic

  3. Онлайн (online) – Cyrillic

  4. Сайт (site) – Cyrillic

The president of ICANN’s Generic Domains Division, Akram Atallah indicated this was just the start of a, “global society” coming together. The purpose of The New Generic Top Level Domain Program is to create a, “globally-inclusive Internet”, improving ecommerce and internet globalisation.

Ripples will be felt in the localization industry with increased demand for real-time translation of user generated content (UGC). Translation technologies are constantly being developed, adopted to markets and fine-tuned. A leading example of this in the development of Machine Translation and these improvements are best seen in the quality assessment (QA) of Machine Translation.

Machine Translation quality has been subjected to scrutiny for decades. This is also changing. Commercial use of Machine Translation is growing, especially in certain industries. Computational capabilities and the availability of vast amounts of multi and monolingual training data have played a significant role in the adoption rate of Machine Translation in both the public and private sectors.

Next week, KantanMT, will release a technology, which addresses the challenge of Machine Translation quality estimation (QE). KantanAnalytics is a revolutionary product that carries out quality analysis at segment level.

Increased demand for real-time high quality translated content will be seen in the near future as internationalised domain names (IDNs) bring people and communities together. This is one of the first steps in increasing the current number of 22 English language dominated domain names to a further 1,400 new multilingual names.

IDNs are domain names registered in non-Latin scripts or ASCII characters, like Chinese. IDNs are already available as second-level domains and country code top-level domains (ccTLDs) tied to specific countries. For example, In Ireland a ccTLD will end in “.ie”. These are different from gTLDs, which belong to a core group of restricted domain names such as .com, .net and .org.

Watch out for the KantanAnalytics release next week. KantanMT are continuing to offer a 14 day free trial to new members. click here>>

Many Languages, One World: Student Essay Contest

The United Nations (UN) are big promoters of multilingualism and this week is no exception. The UN Academic Impact (UNAI) and the ELS Educational Services launched a student essay contest to promote international education and multilingualism. Entrants should submit an essay written in one of the six official languages of the UN: Arabic, Chinese, English, French, Russian and Spanish as long as it’s not their native tongue.

The theme of the contest “Many Languages, One World’, focuses on multilingualism in a globalised world and supports communication between all global citizens. The UN is a global organisation, which understands the challenges in making hefty volumes of content available in different languages.

multilingualism, languages, UN official languages, countries spoken
The number of countries where each official UN language is spoken

In 2001, Kofi Annan, UN Secretary-General at the time, suggested there was a linguistic imbalance with the UN having a tendency towards English. The reasons behind the imbalance boiled down to high translation costs and a lack of resources.

UN official languages, multilingualism, languages
UN official languages by number of speakers
Source: Ethnologue Languages of the World (SIL International, 2013)

Ten years later, in 2011, the World Intellectual Property Organization (WIPO) in collaboration with the UN, trained their Moses technology based Machine Translation engine, using approx. 11 years of translated UN documents (2000 – 2012), which were provided by the UN’s Documentation Division (DD).  The Tapta4Un was born – a Statistical Machine Translation (SMT) engine for professional UN translators.

The UN had used Google translate and Bing Translator to translate their publicly available documents at first, and with good results. But as data from other organisations was added to those engines, the quality of UN translated documents began to decrease.

The TAPTA engine, built with customised UN training data, provided a much higher quality Machine Translation result and higher BLEU scores compared with google translate. This paved the way for the ‘gText’ project, a global UN project, which is the product of the positive adoption of Machine Translation, tasked with integrating computer aided translation (CAT) tools into the document workflow.

KantanMT allows users to build a customised translation engine with training data that will be specific to their needs. KantanMT are continuing to offer a 14 day free trial to new members. click here>>

 

KantanMT now Supports Chinese

ChinaAs the American market for translation slows down, the market in Asia continues to grow. According to the Common Sense Advisory Board, Asia makes up 12.88% of the global market share for translation services and provides a multitude of opportunities for growth.
As Asia’s biggest economy, China has an important role to play in this. Although there is widespread talk of decline within the Chinese economy, it is still by far the fastest growing nation of the last decade.

Despite its ageing population of 1.3 billion, China is set to create an increasing amount of translation business in the coming years. The Chinese government is now focusing on creating more exporting opportunities for indigenous businesses that are answering to growing demands from the US and Europe. Chinese exports grew by 14.1% in December 2012 compared to one year earlier. In February 2013 ‘Bloomberg’ discussed how China has now surpassed the US as the world’s biggest trading nation.

Answering to the demand for Chinese Machine Translations, KantanMT has recently introduced Chinese language capabilities on the cloud-based platform. Members can manage Chinese Machine Translation engines using the same process as all other languages on KantanMT.com. Simply upload Chinese training data, build a KantanMT engine, and then translate client files. KantanMT encourages members to use KantanWatch™ to track the quality improvements of their engines over time, helping them to significantly improve engine performance.

Login and start translating >>

Register here for a Free Webinar >>