Welcome to our second post in the ‘5 Questions’ series, which will give you a deeper insight into the people at KantanMT.
Welcome to our second post in the ‘5 Questions’ series, which will give you a deeper insight into the people at KantanMT.
KantanAPI enables KantanMT clients to interact with KantanMT as an on-demand web service. It also provides a number of different services including translation, file upload and retrieval and job launches.
With the KantanAPI you not only have the opportunity to integrate KantanMT into your workflow systems but also the ability to receive on-demand translations from your KantanMT engines. All these services make the experience with Machine Translation as seamless as possible.
To access the KantanMT API you will first need your ‘API token’. This token can be found in the ‘API’ tab on the ‘My Client Profiles’ page of your KantanMT account.
Once you have your token you can use the API in a number of ways
For more details on implementing your API solution via the REST interface, please see the full API technical documentation at the following link:
Login into your KantanMT account using your email and your password.
You will be directed to the ‘My Client Profiles’ page. You will be in the ‘Client Profiles’ section of the ‘My Client Profiles’ page. The last profile you were working on will be ‘Active’.
If you wish to use the ‘KantanAPI’ with another profile other than the ‘Active’ profile. Click on the profile you wish to use the ‘KantanAPI’ with, then click on the ‘API’ tab.
You will be directed to the ‘API Settings’ page. Now click on the ‘Launch API’ button.
A ‘Launch API’ pop-up will now appear on your screen asking you ‘Are you sure you want to launch the API?’ Click ‘OK’.
The ‘API Status’ will now change from ‘offline’ to ‘initialising’, the ‘Launch API’ button will now change to ‘Launching API’ .
When your KantanAPI launches the ‘API Status’ will now change from ‘initialising’ to ‘running’, the ‘Launching API’ button changes to ‘Shutdown API’ and you should now be able to click on the ‘Translate’ button.
Type the text you wish to translate in the text box and click on the ‘Translate’ button.
The translated text will now appear in the ‘Translated Text’ box. If you wish to make any changes to the translated text simply place the cursor inside the ‘Translated Text’ box and make the changes. Save these changes by clicking the ‘Retrain Engine’ button.
Test if your engine was successfully retrained by clicking the ‘Translate’ button. The retrained text will now appear in the ‘Translated Text’ box.
If you don’t wish to retrain your engine and you are happy with the translated text in the ‘Translated Text’ box. You may continue translating other text or shut down your KantanAPI by clicking the ‘Shutdown API’ button.
When you click the ‘Shutdown API’ button a pop-up will now appear asking you ‘Are you sure you want to shout down the API?’ Click ‘OK’.
The ‘Shutdown API’ button will now change to ‘Terminating API’, the ‘API status’ will now change from ‘running’ to ‘terminating’ and you shouldn’t be able to click on the ‘Translate’ or ‘Retrain Engine’ button.
You will now be directed back to the initial screen on the API Settings page.
KantanAPI™ is one of the various machine translation services offered by KantanMT to improve productivity for our clients and also enable them to be more efficient. For more information on KantanAPI or any KantanMT products please contact us at firstname.lastname@example.org.
For more details on the KantanMT API please see the following links and the video below:
KantanMT had an exciting year as it transitioned from a publicly funded business idea into a commercial enterprise that was officially launched in June 2013. The KantanMT team are delighted to have surpassed expectations, by developing and refining cutting edge technologies that make Machine Translation easier to understand and use.
Here are some of the highlights for 2013, as KantanMT looks back on an exceptional year.
Strong Customer Focus…
The year started on a high note, with the opening of a second office in Galway, Ireland, and KantanMT kept the forward momentum going as the year progressed. The Galway office is focused on customer service, product education and Customer Relationship Management (CRM), and is home to Aidan Collins, User Engagement Manager, Kevin McCoy, Customer Relationship Manager and MT Success Coach, and Gina Lawlor, Customer Relationship co-ordinator.
KantanMT officially launched the KantanMT Statistical Machine Translation (SMT) platform as a commercial entity in June 2013. The platform was tested pre-launch by both industry and academic professionals, and was presented at the European OPTIMALE (Optimizing Professional Translator Training in a Multilingual Europe) workshop in Brussels. OPTIMALE is an academic network of 70 partners from 32 European countries, and the organization aims to promote professional translator training as the translation industry merges with the internet and translation automation.
The KantanMT Community…
The KantanMT member’s community now includes top tier Language Service Providers (LSPs), multinationals and smaller organizations. In 2013, the community has grown from 400 members in January to 3400 registered members in December, and in response to this growth, KantanMT introduced two partner programs, with the objective of improving the Machine Translation ecosystem.
The Developer Partner Program, which supports organizations interested in developing integrated technology solutions, and the Preferred Supplier of MT Program, dedicated to strengthening the use of MT technology in the global translation supply chain. KantanMT’s Preferred Suppliers of MT are:
To date, the most popular target languages on the KantanMT platform are; French, Spanish and Brazilian-Portuguese. Members have uploaded more than 67 billion training words and built approx. 7,000 customized KantanMT engines that translated more than 500 million words.
As usage of the platform increased, KantanMT focused on developing new technologies to improve the translation process, including a mobile application for iOS and Android that allows users to get access to their KantanMT engines on the go.
KantanMT’s Core Technologies from 2013…
KantanMT have been kept busy continuously developing and releasing new technologies to help clients build robust business models to integrate Machine Translation into existing workflows.
KantanMT sourced and cleaned a range of bi-directional domain specific stock engines that consist of approx. six million words across legal, medical and financial domains and made them available to its members. KantanMT also developed support for Traditional and Simplified Chinese, Japanese, Thai and Croatian Languages during 2013.
Recognition as Business Innovators…
KantanMT received awards for business innovation and entrepreneurship throughout the year. Founder and Chief Architect, Tony O’Dowd was presented with the ICT Commercialization award in September.
In October, KantanMT was shortlisted for the PITCH start-up competition and participated in the ALPHA Program for start-ups at Dublin’s Web Summit, the largest tech conference in Europe. Earlier in the year KantanMT was also shortlisted for the Vodafone Start-up of the Year awards.
KantanMT were silver sponsors at the annual 2013 ASLIB Conference ‘Adopting the theme Translating and the Computer’ that took place in London, in November, and in October, Tony O’Dowd, presented at the TAUS Machine Translation Showcase at Localization World in Silicon Valley.
KantanMT have recently published a white paper introducing its cornerstone Quality Estimation technology, KantanAnalytics, and how this technology provides solutions to the biggest industry challenges facing widespread adoption of Machine Translation.
For more information on how to introduce Machine Translation into your translation workflow contact Niamh Lacy (email@example.com).
Many of us, involved with Machine Translation are familiar with the importance of using high quality parallel data to build and customize good quality MT engines. Building high quality MT engines with sparse data is a challenge faced not only by Language Service Providers (LSPs), but any company with limited bilingual resources. A more economical alternative to creating large quantities of high quality bilingual data can be found by adding monolingual data in the target language to an MT engine.
Statistical Machine Translation systems use algorithms to find the most probable translations, based on how often patterns occur in the training data, so it makes sense to use large volumes of bilingual training data. The best data to use for training MT engines is usually high quality bilingual data and glossaries, so it’s great if you have access to these language assets.
But what happens when access to high quality parallel data is limited?
Bilingual data is costly and time-consuming to produce in large volumes, so the smart option is to come up with more economical language assets, and monolingual data is one of those economical assets. MT output fluency improves dramatically, by using monolingual data to train an engine, especially in cases where good quality bilingual data is a sparse language resource.
Many companies lack the necessary resources to develop their own high quality in domain parallel data. But, monolingual data – is readily available in large volumes across different domains. This target language content can be found anywhere; websites, blogs, customers and even company specific documents created for internal use.
Companies with sparse parallel data can really leverage their available language assets with monolingual data to produce better quality engines, producing more fluent output. Even those with access to large volumes of bilingual data can still take advantage of using monolingual data to improve target language fluency.
Target language monolingual data is introduced during the engine training process so the engine learns how to generate fluent output. The positive effects of including monolingual data in the training process have been proven both academically and commercially. In a study for TAUS, Natalia Korchagina confirmed that using monolingual data when training SMT engines considerably improved the BLEU score for a Russian-French translation system.
Natalia’s study not only “proved the rule” that in domain monolingual data improves engine quality, she also identified that out of domain monolingual data also improves quality, but to a lesser extent.
Monolingual data can be particularly useful for improving scores in morphologically rich languages like; Czech, Finnish, German and Slovak, as these languages are often syntactically more complicated for Machine Translation.
Success with Monolingual Data…
KantanMT has had considerable success with its clients using monolingual data to improve their engines quality. An engine trained with sparse bilingual data (the sparse bilingual data was still greater than the amount of data in Korchagina’s study) in the financial domain showed a significant improvement in the engine’s overall quality metrics when financial monolingual data was added to the engine:
The support team at KantanMT showed the client how to use monolingual data to their advantage, getting the most out of their engine, and empowering the client to improve and control the accuracy and fluency of their engines.
How will this Benefit LSPs…
Online shopping by users of what can be considered ‘lower density languages’ or languages with limited bilingual resources is driving demand for multilingual website localization. Online shoppers prefer to make purchases in their own language, and more people are going online to shop as global internet capabilities improve. Companies with an online presence and limited language resources are turning to LSPs to produce this multilingual content.
Most LSPs with access to vast amounts of high quality parallel data can still take advantage of monolingual data to help improve target language fluency. But LSPs building and training MT engines for uncommon language pairs or any language pair with sparse bilingual data will benefit the most by using monolingual data.
To learn more about leveraging monolingual data to train your KantanMT engine; send the KantanMT Team an email and we can talk you through the process (firstname.lastname@example.org), alternatively, check out our whitepaper on improving MT engine quality available from our resources page.
Post-Editing Machine Translation (PEMT) is an important and necessary step in the Machine Translation process. KantanMT is releasing a new, simple and easy to use PEX rule editor, which will make the post-editing process more efficient, saving both time, costs and the post-editors sanity.
As we have discussed in earlier posts, PEMT is the process of reviewing and editing raw MT output to improve quality. The PEX rule editor is a tool that can help to save time and cut costs. It helps post-editors, since they no longer have to manually correct the same repetitive mistakes in a translated text.
Post-editing can be divided into roughly two categories; light and full post-editing. ‘Light’ post-editing, also called ‘gist’, ‘rapid’ or ‘fast’ post-editing focuses on transferring the most correct meaning without spending time correcting grammatical and stylistic errors. Correcting textual standards, like word order and coherence are less important in a light post-edit, compared to a more thorough ‘full’ or ‘conventional’ post-edit. Full post-edits need the correct meaning to be conveyed, correct grammar, accurate punctuation, and the correct transfer of any formatting such as tags or place holders.
The Client often dictates the type of post-editing required, whether it’s a full post-edit to get it up to ‘publishable quality’ similar to a human translation standard, or a light post-edit, which usually means ‘fit for purpose’. The engine’s quality also plays a part in the post-editing effort; using a high volume of in-domain training data during the build produce higher quality engines, which helps to cut post-editing efforts. Other factors such as language combination, domain and text type all contribute to post-editing effort.
Some users may experience the following errors in their MT output.
SMT engines use a process of pattern matching to identify different regular expressions. Regular expressions or ‘regex’ are special text strings that describe patterns, these patterns need no linguistic analysis so they can be implemented easily across different language pairs. Regular expressions are also important components in developing PEX rules. KantanMT have a list of regular expressions used for both GENTRY Rule files (*.rul) and PEX post-edit files (*.pex).
Repetitive errors can be fixed automatically by uploading PEX rule files. These rule files allow post-editors to spend less time correcting the same repetitive errors by automatically applying PEX constructs to translations generated from a KantanMT engine.
PEX works by incorporating “find and replace” rules. The rules are uploaded as a PEX file and applied while a translation job is being run.
KantanMT have designed a simple way to create, test and upload post-editing rules to a client profile.
The PEX Rule editor, located in the ‘MykantanMT’ menu, has an easy to use interface. Users can copy a sample of the translated text into the upper text box ‘Test Content’ then input the rules to be applied in the ‘PEX Search Rules’ and their corrections to the ‘PEX Replacement Rules’ box. The user can test the new rules by clicking ‘test rules’ and instantly identify any incorrect rules, before they are uploaded to the profile.
The introduction of tools to assist in the post-editing process helps remove some of the more repetitive corrections for post-editors. The new PEX Editor feature helps improve the PEMT workflow by ensuring all uploaded rule files are correct leading to a more effective method for fixing repetitive errors.
Things are winding down as we are getting closer to the end of the year, but there are still some great events and webinars coming up during the month of December that we can look forward to.
Here are some recommendations from KantanMT to keep you busy in the lead up to the festive season.
Dec 02 – Dec 05, 2013
Event: IEEE CloudCom 2013, Bristol, United Kingdom
Held in association with Hewlett-Packard Laboratories (HP Labs), the conference is open to researchers, developers, users, students and practitioners from the fields of big data, systems architecture, services research, virtualization, security and high performance computing.
Dec 04, 2013
Event: LANGUAGES & BUSINESS Forum – Hotel InterContinental Berlin
The forum highlights key issues in language education, particularly in the workplace and the new technologies that are becoming a key part of the process. The event, will promote international networking and has four main themes; Corporate Training, Pre-Experience Learners, Intercultural Communication and Online Learning.
Dec 05, 2013
Webinar: Effective Post-Editing in Human and Machine Translation Workflows
Stephen Doherty and Federico Gaspari, CNGL (Centre for Next Generation Localisation) will give an overview of post-editing and different post-editing scenarios from ‘gist’ to ‘full’ post-edits. They will also give advice on different post-editing strategies and how they differ for Machine Translation systems.
Dec 07 – Dec 09, 2013
Event: 6th Language and Technology Conference, Poznan, Poland
The conference will address the challenges of Human Language Technologies (HLT) in computer science and linguistics. The event covers a wide range of topics including; electronic language resources and tools, formalisation of natural languages, parsing and other forms of NL processing.
Dec 09 – Dec 13, 2013
Event: IEEE GLOBECOM 2013 – Power of Global Communications, Atlanta, Georgia USA
The conference, which is the second largest of the 38 IEEE technical societies will focus on the latest advancements in broadband, wireless, multimedia, internet, image and voice communications. Some of the topics presented referring to localization occur on the 10th December and include; Localization Schemes, Localization and Link Layer Issues, and Detection, Estimation and Localization.
Dec 10 – Dec 11, 2013
Event: Game QA & Localization 2013, San Francisco, California USA
This event brings together QA and Localisation Managers, Directors and VPs from game developers around the world to discuss key game localization industry challenges. The event in London, June 2013 was a huge success, as more than 120 senior QA and localization professionals from developers, publishers and 3rd party suppliers of all sizes and platforms came to learn, benchmark and network.
Dec 11 – Dec 15, 2013
Event: International Conference on Language and Translation, Thailand, Vietnam and Cambodia
The Association of Asian Translation Industry (AATI) is holding an International Conference on Language and Translation or “Translator Day” in three countries; Thailand on December 11, 2013, Vietnam on December 13, 2013, and Cambodia on December 15, 2013. The events provide translators, interpreters, translation agencies, foreign language centres, NGO’s, FDI financed enterprises and other translation purchasers with opportunities to meet.
Dec 12, 2013
Webinar: LSP Partnerships & Reseller Programs 16:00 GMT (11:00 EST/17:00 CET)
This webinar, which is hosted by GALA and presented by Terena Bell covers how to open up new revenue streams by introducing reseller programs to current business models. The webinar is aimed at world trade associations, language schools, and other non-translation companies wishing to offer their clients translation, interpreting, or localization services.
Dec 13 – Dec 14 2013
Event: The Twelfth Workshop on Treebanks and Linguistic Theories (TLT12), Sofia (Bulgaria)
The workshops, hosted by BulTreeBank Group serve to promote new and ongoing high-quality work related to syntactically-annotated corpora such as treebanks. Treebanks are important resources for Natural Language processing applications including Machine Translation and information extraction. The workshops will focus on different aspects of treebanking; descriptive, theoretical, formal and computational.
Are you planning to go to any events during December? KantanMT would like to hear about your thoughts on what makes a good event in the localization industry.
The 35th ASLIB conference opens today, Thursday 28th November and runs for two days in Paddington, London. The annual ‘Translating and the Computer Conference’ serves to highlight the importance of technology within the translation industry and to showcase new technologies available to localization professionals.
KantanMT was keen to have a look at how technology has shaped the translation industry throughout history so we took a look at some of the translation technology milestones over the last 50 years.
The computer has had a long history, so it’s no surprise that developments in computer technology greatly affect how we communicate. Machine Translation research dates back to the early 1940s, although its development was stalled because of negative feedback regarding the accuracy of early MT output. The ALPAC (Automatic Language Processing Advisory Committee) report published in 1966, prompted researchers to look for alternative methods to automate the translation process.
In terms of modern development, the real evolution of ‘translation and the computer’ began in the 1970s, when more universities started carrying out research and development on automated translation. At this point, the European Coal and Steel Community in Luxemburg and the Federal Armed Forces Translation Agency in Mannheim, Germany were already making use of text related glossaries and automatic dictionaries. It was also around this time that translators started to come together to form translation companies/language service providers who not only translated, but also took on project management roles to control the entire translation process.
Translation technology research gained momentum during the early 1980s as commercial content production increased. Companies in Japan, Canada and Europe who were distributing multilingual content to their customers, now needed a more efficient translation process. At this time, translation technology companies began developing and launching Computer Assisted Translation (CAT) technology.
Dutch company, INK was one of the first to release desktop translation tools for translators. These tools originally called INK text tools, sparked more research into the area. Trados, a German translation company, started reselling INK text tools and this led to the research and development of the TED translation editor, an initial version of the translator’s workbench.
The 1990s were an exciting time for the translation industry. Translation activities that were previously kept separate from computer software development were now being carried out together in what was termed localization. The interest in localizing for new markets led to translation companies and language service providers merging both technology and translation services, becoming Localization Service Providers.
Trados launched their CAT tools in 1990, with Multiterm, for terminology management and the Translation Memory (TM) software Translators Workbench in 1994. ATRIL, Madrid launched a TM system in 1993 and STAR (Software, Translation, Artwork, Recording) also released Transit, a TM system in 1994. The ‘fuzzy match’ feature was also developed at this time and quickly became a standard feature of TM.
Increasingly, translators started taking advantage of CAT tools to translate more productively. This lead to a downward pressure on price, making translation services more competitive.
As we move forward, technology continues to influence translation. Global internet diffusion has increased the level of global communication and has changed how we communicate. We can now communicate in real-time, on any device and through any medium. Technology will continue to develop, and become faster and more adaptive to multi-language users, and demand for real-time translation will drive the further developments in the areas of automated translation solutions.
There are some great events and webinars coming up over the next month and KantanMT put together a list of some noteworthy dates to add to the calendar.
KantanMT’s Aidan Collins, User Engagement Manager, will be attending tcworld on Thursday 7th November in Wiesbaden, Germany. Then towards the end of the month, Aidan will head to London, and present at the 35th ASLIB Translating and the Computer Conference. KantanMT are also a silver sponsor for this year’s ASLIB conference.
Nov 04 – 05, 2013
Workshop: Translation Project Management, Wiesbaden, Germany.
Angelika Zerfaß and Martin Beuster will be presenting a Translation Project Management (PM) and Localization PM workshop. This is geared towards current and future Project Managers in the localization and translation industry.
Nov 06 – 08, 2013
Event: tcworld 2013 – tekom trade fair, Rhein-Main-Hallen, Wiesbaden, Germany.
This is the largest global event for technical communication. Participating companies offer industrial, software and services for technical communication with a regional focus on Germany, Austria and Switzerland. The conference will cover topics on localization, internationalization, and globalization, management of technical communication, mobile documentation and content strategies.Contact: tekom, email@example.com
To set up a meeting with Aidan Collins, User Engagement Manager, email him directly at firstname.lastname@example.org or call him on +353 86 823 1767.
Nov 06 – 09, 2013
Event: 54th ATA Conference, San Antonio, Texas USA.
This is a great networking event for translators, project managers and industry professionals. The aim of the conference is to promote the professional development of translators and interpreters. There will be approx. 175 educational sessions in varying languages, specializations and levels. Contact: American Translators Association, email@example.com
Nov 15 – 16, 2013 (Expolingua International Fair, Nov 15 – 17)
Event:: InDialog: Mapping the Field of Community Interpreting, Expolingua International Fair Berlin, Germany
This conference is focusing on interpreting services aimed towards government representatives, policy makers, service providers and anyone involved in the interpreting service workflow. InDialog is taking place in conjunction with 26th EXPOLINGUA International Fair for languages and Cultures. Contact: ICWE GmbH, firstname.lastname@example.org
To set up a meeting with Aidan or Niamh, email Niamhl@kantanmt.com or call her directly on +353 877526320
The internet became truly multilingual yesterday, as the Internet Corporation for Assigned Names and Numbers (ICANN), announced the release of four new generic top-level domains (gTLDs). gTLDs are internet domain names with language-specific scripts and the four new suffixes represent some of the world’s most widely spoken languages. Their selection for release by the ICANN was a strategic decision.
After Latin script, Chinese is the second most widely used alphabet with approx. 1340 million users, Arabic holds the number three position with 380 million users, and Cyrillic is number five used by approx. 250 million people. The four domain names released yesterday are:
The president of ICANN’s Generic Domains Division, Akram Atallah indicated this was just the start of a, “global society” coming together. The purpose of The New Generic Top Level Domain Program is to create a, “globally-inclusive Internet”, improving ecommerce and internet globalisation.
Ripples will be felt in the localization industry with increased demand for real-time translation of user generated content (UGC). Translation technologies are constantly being developed, adopted to markets and fine-tuned. A leading example of this in the development of Machine Translation and these improvements are best seen in the quality assessment (QA) of Machine Translation.
Machine Translation quality has been subjected to scrutiny for decades. This is also changing. Commercial use of Machine Translation is growing, especially in certain industries. Computational capabilities and the availability of vast amounts of multi and monolingual training data have played a significant role in the adoption rate of Machine Translation in both the public and private sectors.
Increased demand for real-time high quality translated content will be seen in the near future as internationalised domain names (IDNs) bring people and communities together. This is one of the first steps in increasing the current number of 22 English language dominated domain names to a further 1,400 new multilingual names.
IDNs are domain names registered in non-Latin scripts or ASCII characters, like Chinese. IDNs are already available as second-level domains and country code top-level domains (ccTLDs) tied to specific countries. For example, In Ireland a ccTLD will end in “.ie”. These are different from gTLDs, which belong to a core group of restricted domain names such as .com, .net and .org.
The “five percent gamble”, a new buzz phrase, implemented by the digital information industry, assumes most of the world’s population can be reached by supporting just five percent of the world’s 6,000 + languages.
This ‘gamble’ discussed by Thomas Petzold and Han-Teng Liao, social technology analysts, came about through calculating the return on investment for internationalisation and localization activities. It was also a major stepping stone for driving our multi-lingual internet.
English, considered to be the original language of the internet, and the global lingua franca, was predicted to overshadow other languages as the internet phenomena exploded. However, the expected English language hegemony was disrupted as the internet became more accessible to other language users.
It is through these other language users that the internet transitioned from a mono-to-multilingual infrastructure. Businesses looking to enter European markets localised through FIGS (French, Italian, German and Spanish) the big four for Europe, and CJK (Chinese, Japanese and Korean) language support became necessary for penetrating Asian markets.
Together with English, these seven languages formed the top of a global language hierarchy. But as the global marketplace is evolving this hierarchy is shifting. We are seeing a much higher demand for localised products for BRIC (Brazil, Russia, India and China) regions, especially as purchasing power for those areas increases.
Research from the Common Sense Advisory shows 90% of online purchasers can be reached using only 13 languages. These languages include: English, Japanese, German, Spanish, French, Simplified Chinese, Italian, Portuguese, Dutch, Korean, Arabic, Russian, and Swedish. Another interesting fact identified from the research, showed 72.1% of online buyers preferred browsing and buying from websites in their native language.
Byte Level Research, one of the first companies to undertake an extensive analysis on how websites are designed and shared globally, produce an annual web globalisation report. According to the 2012 report websites supporting 10 languages are just “not global enough”. The average number of languages supported by companies in the 2012 web globalisation report was 32 languages. The Common Sense Advisory suggests a 16 language minimum is needed to just be competitive.
The five percent gamble by companies, like Google, which supports approximately 345 different languages, and Wikipedia, which supports 285 language editions has had a knock on effect in shaping the future of languages and turning the internet into an “international platform”.
What this means for businesses and organisations in the foreseeable future is a huge jump in the demand for translation services across varying language combinations. Implementing machine translation will be the only viable way to achieve this.
Did you attend Localization World, Santa Clara last week? Check out KantanMT’s Facebook page for photos from our booth!