KantanMT was recently announced as a finalist in three out of eight categories of the Irish Software Awards 2016 (ISA 2016); ‘Emerging Company of the Year’, ‘Technology Innovation of the Year’ and ‘Outstanding Achievement in International Growth’. This is very exciting news for us, and our success has only been made possible thanks to our brilliantly supportive clients and partners. The announcement led us to walk down the memory lane, and think about things that we did right over the past couple of years.
We would like to share some basic principles that we followed as a company, which helped us succeed and made us one of the most recognisable brands, not only within the translation and localization industry, but also within the wider Software Service scene.
If you are in a start-up mode, these pointers will help you achieve full commercial exploitation within the span of a year: Continue reading →
KantanAPIenables KantanMTclients to interact with KantanMT as an on-demand web service. It also provides a number of different services including translation, file upload and retrieval and job launches.
With the KantanAPI you not only have the opportunity to integrate KantanMT into your workflow systems but also the ability to receive on-demand translations from your KantanMT engines. All these services make the experience with Machine Translation as seamless as possible.
Please Note: The API is only available to KantanMT members in the EnterprisePlan.
To access the KantanMT API you will first need your ‘API token’. This token can be found in the ‘API’ tab on the ‘My Client Profiles’ page of your KantanMT account.
Once you have your token you can use the API in a number of ways
Using the API tab on the ‘My Client Profiles’ page in the KantanMT Web interface
Using the REST interface via HTTP GET or POST requests
Using one of our various connectors, which are built using our KantanAPI
For more details on implementing your API solution via the REST interface, please see the full API technical documentation at the following link:
Login into your KantanMT account using your email and your password.
You will be directed to the ‘My Client Profiles’ page. You will be in the ‘Client Profiles’ section of the ‘My Client Profiles’ page. The last profile you were working on will be ‘Active’.
If you wish to use the ‘KantanAPI’ with another profile other than the ‘Active’ profile. Click on the profile you wish to use the ‘KantanAPI’ with, then click on the ‘API’ tab.
You will be directed to the ‘API Settings’ page. Now click on the ‘Launch API’ button.
A ‘Launch API’ pop-up will now appear on your screen asking you ‘Are you sure you want to launch the API?’ Click ‘OK’.
The ‘API Status’ will now change from ‘offline’ to ‘initialising’, the ‘Launch API’ button will now change to ‘Launching API’ .
When your KantanAPI launches the ‘API Status’ will now change from ‘initialising’ to ‘running’, the ‘Launching API’ button changes to ‘Shutdown API’ and you should now be able to click on the ‘Translate’ button.
Type the text you wish to translate in the text box and click on the ‘Translate’ button.
The translated text will now appear in the ‘Translated Text’ box. If you wish to make any changes to the translated text simply place the cursor inside the ‘Translated Text’ box and make the changes. Save these changes by clicking the ‘Retrain Engine’ button.
Test if your engine was successfully retrained by clicking the ‘Translate’ button. The retrained text will now appear in the ‘Translated Text’ box.
If you don’t wish to retrain your engine and you are happy with the translated text in the ‘Translated Text’ box. You may continue translating other text or shut down your KantanAPI by clicking the ‘Shutdown API’ button.
When you click the ‘Shutdown API’ button a pop-up will now appear asking you ‘Are you sure you want to shout down the API?’ Click ‘OK’.
The ‘Shutdown API’ button will now change to ‘Terminating API’, the ‘API status’ will now change from ‘running’ to ‘terminating’ and you shouldn’t be able to click on the ‘Translate’ or ‘Retrain Engine’ button.
You will now be directed back to the initial screen on the API Settings page.
KantanAPI™ is one of the various machine translation services offered by KantanMT to improve productivity for our clients and also enable them to be more efficient. For more information on KantanAPI or any KantanMT products please contact us at email@example.com.
For more details on the KantanMT API please see the following links and the video below:
Ease of use and simplicity are always on the minds of our Developers, hence the making of KantanTimeLine™. KantanTimeLine enables KantanMT clients to view the life cycle of their KantanMT engine. This empowers our clients as they are able to find exactly what is negatively or positively affecting the quality of their engines. Clients are able to keep track of things such as, Training Data uploads, Translation jobs, Engine Tuning, templates, Build jobs and so on through the KantanTimeLine.
How to use KantanTimeLine™
Login into your KantanMTaccount using your email and your password.
You will be directed to the ‘My Client Profiles’ page. You will be in the ‘Client Profiles’ section of the ‘My Client Profiles’ page. The last profile you were working on will be ‘Active’.
If you wish to use ‘KantanTimeLine’ with another profile other than the ‘Active’ profile. Click on the profile that you want to you wish to view the ‘KantanTimeLine’.
Click on the ‘TimeLine’ tab.
You will now be directed to the ‘TimeLine’ page for your chosen profile.
To restore an Archived Build select the Build you wish to restore from the ‘Archives’ drop-down menu and click on the ‘Restore’ button.
To delete an archived Build click on the ‘Delete’ button.
To archive a Build click on the ‘Archive’ button of the build you wish to archive.
To view or edit the description of a build click on the ‘Yellow Notepad’ icon.
To filter the timeline click on the ‘Filter’ drop down-menu and select the filter you wish to use.
Additional Information and Support
KantanTimeLine™ is one of the many products offered by KantanMT to make the integration of Machine Translation into the workflow of our clients seamless. For more information on TimeLine or any KantanMT products please contact us at firstname.lastname@example.org.
TimeLine can also be used in KantanBuildAnalytics. To learn how TimeLine is incorporated into KantanBuildAnalytics please click on the link below or contact us at email@example.com.
KantanMT Founder and Chief Architect, Tony O’Dowd was recently featured in one of Ireland’s major national newspapers; The Irish Times.
The author of the news article, Olive Keogh is a business journalist who specialises in writing about innovative Irish enterprises and startups. With Olive’s kind permission, we are republishing the Irish times article.
“It’s not widely known at home but Ireland has developed an international reputation for research in statistical machine translation. Trinity, DCU and UL are all recognised worldwide and 120 PhD students have graduated here with skills in the field in the last five years. That’s more than in any other country in Europe,” says Tony O’Dowd the man behind KantanMT, a new scalable, high-speed machine translation system based on the Moses decoder and the Amazon Web Services and Cloud Computing infrastructure.
O’Dowd has spent almost 30 years in the software localization sector with companies such as Lotus Development Corporation and Symantec. Xcelerator, the company behind KantanMT, is O’Dowd’s second start-up, but he was also involved in the formation of FIT, a training organisation set up in 1998 to provide IT skills and training for the long-term unemployed.
Economics of the Cloud
“We are leveraging the Moses MT decoder and multiple streams of research from the Centre for Global Intelligent Content to make statistical machine translation (SMT) technology available to the masses,” he says.
“Traditional SMT systems are slow, expensive to deploy, time-consuming to customise and complex to manage. In short, not for the faint-hearted. I wanted to harness the economics of the cloud to solve these problems. Using hundreds of high-powered cloud-based severs to convert training data into data models also accelerated the process of customisation and the development of SMT engines.”
O’Dowd points out that in addition to the cost factor, traditional SMT solutions can produce translations of dubious quality. By focusing on advanced natural language processes and data processing algorithms, KantanMT also addresses these quality issues.
“Because of the costs involved, SMT tends to be used by large organisations with big budgets and plenty of people available to work on the system. The KantanMT platform removes this expense and complexity and makes it a far more practical and usable tool for businesses both big and small. Our clients can customise, improve and deploy their own engines in a matter of days,” O’Dowd says.
O’Dowd took his first steps as an entrepreneur in 2000 when he set up Alchemy Software Development. It quickly became a leading player in the software localization sector with over 27,000 licences in use worldwide. This success didn’t go unnoticed. The company was sold to the largest privately owned localization service provider, Translations.com, in March 2007.
Prior to setting up Alchemy O’Dowd was technology manager for Symantec Corporation Ireland and responsible for establishing the organisation’s Asian localization hub in Japan. He was also executive vice-president of Corel Corporation and spent three years as a lecturer in Trinity College Dublin teaching microprocessor design and assembly language programming.
O’Dowd began working on the idea for KantanMT in 2011 while on a year “off” to retrain himself on cloud-based technologies. He employed an MBA student to do detailed research into the barriers preventing companies using SMT and says the major leap forward in computing and storage capacity provided by the cloud enabled him to build a platform for SMT systems that would have been inconceivable without it.
Xcelerator recently raised €1.1 million in seed funding from venture capital company Delta Partners and the Enterprise Ireland High Potential Start Up fund. Early versions of KantanMT were given away free to kill competition and grab market share but first revenues (based on a usage pricing model) began flowing this time last year and O’Dowd says it is now profitable. A second round of funding is planned for later this year.
The company currently employs 11 people in its offices in Dublin and Galway, but this is expected to rise to 20-25 by the end of 2015. Its focus is the export market and its biggest customers are independent software vendors from industries such as ecommerce, finance and electronics. The company also provides MT services to the language industry.
School of Hard Knocks
“Starting your first business is definitely daunting as everything is new and you’re travelling down every road for the first time,” O’Dowd says.
“Next time around there is a lot of commonality and because you’ve learned by engaging with the school of hard knocks, you’re better at anticipating the problems and meeting the challenges. You also have a better network of contacts, you’re less frazzled when things don’t go right and you can actually grow the business faster and at a higher level. You also get a better hearing from the funding community as they view you as a safe pair of hands.”
KantanMT is based in the Invent Building at DCU and O’Dowd says the resources and expertise provided by the Invent team were instrumental in getting KantanMT.com off the ground.
“KantanMT.com is the fastest growing SMT platform in the localization industry today. So far over 80.5 billion words have been uploaded to the platform as training data and more than 750 million words have been translated by our clients. When you consider this has all happened in the last nine months, the company is rapidly becoming one of the biggest translation hubs in the market,” O’Dowd says.
KantanMT is delighted to republish, with permission a post on machine translation technology and internet security that was recently written by Joseph Wojowski. Joseph Wojowski is the Director of Operations at Foreign Credits and Chief Technology Officer at Morningstar Global Translations LLC.
Machine Translation Technology and Internet Security
An issue that seems to have been brought up once in the industry and never addressed again are the data collection methods used by Microsoft, Google, Yahoo!, Skype, and Apple as well as the revelations of PRISM data collection from those same companies, thanks to Edward Snowden. More and more, it appears that the industry is moving closer and closer to full Machine Translation Integration and Usage, and with interesting, if alarming, findings being reported on Machine Translation’s usage when integrated into Translation Environments, the fact remains that Google Translate, Microsoft Bing Translator, and other publicly-available machine translation interfaces and APIs store every single word, phrase, segment, and sentence that is sent to them.
Terms and Conditions
What exactly are you agreeing to when you send translation segments through the Google Translate or Bing Translator website or API?
1 – Google Terms and Conditions
Essentially, in using Google’s services, you are agreeing to permit them to store the segment to use for creating more accurate translations in the future, they can also publish, display, and distribute the content.
“When you upload, submit, store, send or receive content to or through our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content.” (Google Terms of Service – 14 April 2014, accessed on 8 December 2014)
Oh, and did I mention that in using the service, the user is bearing all liability for“LOST PROFITS, REVENUES, OR DATA, FINANCIAL LOSSES OR INDIRECT, SPECIAL, CONSEQUENTIAL, EXEMPLARY, OR PUNITIVE DAMAGES.” (Google Terms of Service – 14 April 2014, accessed on 8 December 2014)
So if it is discovered that a client’s confidential content is also located on Google’s servers because of a negligent translator, that translator is liable for losses and Google relinquishes liability for distributing what should have been kept confidential.
Alright, that’s a lot of legal wording, not the best news, and a lot to take in if this is the first time you’re hearing about this. What about Microsoft Bing Translator?
2 – Microsoft Services Agreement (correction made to content – see below)
In writing their services agreement, Microsoft got very tricky. They start out positively by stating that you own your own content.
“Except for material that we license to you that may be incorporated into your own content (such as clip art), we do not claim ownership of the content you provide on the services. Your content remains your content, and you are responsible for it. We do not control, verify, pay for, or endorse the content that you and others make available on the services.” (Microsoft Services Agreement – effective 19 October 2012, accessed on 8 December 2014)
Bing! Bing! Bing! Bing! Bing! We have a winner! Right? Hold your horses, don’t install the Bing API yet. It continues on in stating,
“When you transmit or upload Content to the Services, you’re giving Microsoft the worldwide right, without charge, to use Content as necessary: to provide the Services to you, to protect you, and to improve Microsoft products and services.”(Microsoft Services Agreement – effective 19 October 2012, accessed on 8 December 2014)
So again with Bing, while they originally state that you own the content you submit to their services, they also state that in doing so, you are giving them the right to use the information as they see fit and (more specifically) to improve the translation engine.
How do these terms affect the translation industry, then?
The problem arises whenever translators are working with documents that contain confidential or restricted-access information. Aside from his/her use of webmail hosted by Microsoft, Google, Apple, etc. – which also poses a problem with confidentiality – contents of documents that are sent through free, public machine translation engines; whether through the website or API, are leaking the information the translator agreed to keep confidential in the Non-Disclosure Agreement (if established) with the LSP; a clear and blatant breach of confidentiality.
But I’m a professional translator and have been for years, I don’t use MT and no self-respecting professional translator would.
Well, yes and no; a conflict arises from that mode of thinking. In theory, yes, a professional translator should know better than to blindly use Machine Translation because of its inaccurate and often unusable output. A professional translator; however, should also recognize that with advancements in MT Technology, Machine Translation can be a very powerful tool in the translator’s toolbox and can, at times, greatly aid in the translation of certain documents.
The current state of the use of MT more echoes the latter than the former. In 2013 research conducted by Common Sense Advisory, 64% of the 239 people who responded to the survey reported that colleagues frequently use free Machine Translation Engines; 62% of those sampled were concerned about free MT usage.
In the November/December 2014 Issue of the ATA Chronicle, Jost Zetzsche relayed information on how users were using the cloud-based translation tool MemSource. Of particular interest are the Machine Translation numbers relayed to him by David Canek, Founder of MemSource. 46.2% of its around 30,000 users (about 13,860 translators) were using Machine Translation; of those, 98% were using the Google Translate or a variant of the Bing Translator API. And of still greater alarm, a large percentage of users using Bing Translator chose to employ the “Microsoft with Feedback” option which sends the finalized target segment back to Microsoft (a financially appealing option since when selected, use of the API costs nothing).
As you can imagine, while I was reading that article, I was yelling at all 13.9 thousand of them through the magazine. How many of them were using Google or Bing MT with documents that should not have been sent to either Google or Microsoft? How many of these users knew to shut off the API for such documents – how many did?
There’s no way to be certain how much confidential information may have been leaked due to translator negligence, in the best scenario perhaps none, but it’s clear that the potential is very great.
On the other hand, in creating a tool as dynamic and ever-changing as a machine translation engine, the only way to train it and make it better is to use it, a sentiment that is echoed throughout the industry by developers of MT tools and something that can be seen in the output of Google translate over the past several years.
So what options are there for me to have an MT solution for my customers without risking a breach in confidentiality?
There are numerous non-public MT engines available – including Apertium, a developing open-source MT platform – however, none of them are as widely used (and therefore, as well-trained) as Google Translate or Bing Translator (yes, I realize that I just spent over 1,000 words talking about the risk involved in using Google Translate or Bing Translator).
So, is there another way? How can you gain the leverage of arguably the best-trained MT Engines available while keeping confidential information confidential?
There are companies who have foreseen this problem and addressed it, without pitching their product, here’s how it works. It acts as an MT API but before any segments are sent across your firewall to Google, it replaces all names, proper nouns, locations, positions, and numbers with an independent, anonymous token or placeholder. After the translated segment has returned from Google and is safely within the confines of your firewall, the potentially confidential material then replaces the tokens leaving you with the MT translated segment. On top of that, it also allows for customized tokenization rules to further anonymize sensitive data such as formulae, terminology, processes, etc.
While the purpose of this article was not to prevent translators from using MT, it is intended to get translators thinking about its use and increase awareness of the inherent risks and solution options available.
— Correction —
As I have been informed, the information in the original post is not as exact as it could be, there is a Microsoft Translator Privacy Agreement that more specifically addresses use of the Microsoft Translator. Apparently, with Translator, they take a sample of no more than 10% of “randomly selected, non-consecutive sentences from the text” submitted. Unused text is deleted within 48 hours after translation is provided.
If the user subscribes to their data subscriptions with a maximum of 250 million characters per month (also available at levels of 500 million, 635 million, and one billion) , he or she is then able to opt-out of logging.
There is also Microsoft Translator Hub which allows the user to personalize the translation engine where “The Hub retains and uses submitted documents in full in order to provide your personalized translation system and to improve the Translator service.” And it should be noted that, “After you remove a document from your Hub account we may continue to use it for improving the Translator service.”
So let’s analyze this development. 10% of the full text submitted is sampled and unused text is deleted within 48 hours of its service to the user. The text is still potentially from a sensitive document and still warrants awareness of the issue.
If you use The Translator Hub, it uses the full document to train the engine and even after you remove the document from your Hub, and they may also use it to continue improving the Translator service.
Now break out the calculators and slide rules, kids, it’s time to do some math.
In order to opt-out of logging, you need to purchase a data subscription of 250 million characters per month or more (the 250 million character level costs $2,055.00/month). If every word were 50 characters each, that would be 5 million words per month (where a month is 31 days) and a post-editor would have to process 161,290 words per day (working every single day of this 31-day month). It’s physically impossible for a post-editor to process 161,290 words in a day, let alone a month (working 8 hours a day for 20 days a month, 161,290 words per month would be 8,064.5 words per day). So we can safely assume that no freelance translator can afford to buy in at the 250 million character/month level especially when even in the busiest month, a single translator comes no where near being able to edit the amount of words necessary to make it a financially sound expense.
In the end, I still come to the same conclusion, we need to be more cognizant of what we send through free, public, and semi-public Machine Translation engines and educate ourselves on the risks associated with their use and the safer, more secure solutions available when working with confidential or restricted-access information.
The KantanMT team would like to thank Joseph Wojowski for allowing us to republish his very interesting and topical post on machine translation security. You can view the original post here.
At KantanMT, security, integrity and the privacy of our customers’ data is a top priority. We believe this is vital to their business operations and to our own success. Therefore, we use a multilayered approach to protect and encrypt this information. The KantanMT Data Privacy statement ensures that no client data is re-published, re-tasked or re-purposed and will also be fully encrypted during storage and transmission.
The ‘quality debate’ is old news and the conversation, which is now heavily influenced by ‘big data’ and ‘cloud computing’ has moved on. Instead it is focusing on the ability to scale translation jobs quickly and efficiently to meet real-time demands.
Translation buyers expect a system or workflow that provides high quality, fit-for-purpose translations. And it’s because of this that Language Service Providers (LSPs) have worked tirelessly, perfecting their systems and orchestrating the use of Translation Memories (TM) within well managed workflows that combine the professionalization of the translator industry – quality is now a given in the buyers eyes.
What is the translation buyers’ biggest challenge?
The Translation buyers’ biggest challenge now is scale – scaling their processes, their workflows and supply chains. Of course, the caveat is that they want scale without jeopardizing quality! They need systems that are responsive, are transparent and scale gracefully in step with their corporate growth and language expansion strategy.
Scale with quality! One without the other is as useless as a wind-farm without wind!
What makes machine translation better than other processes? Looking past the obvious automation of the localization workflow, the one thing that MT can do above all other translation methods is its ability to combine automation and scalability.
KantanMT recognizes this and has developed a number of key technologies to accelerate the speed of on-demand MT engines without compromising quality.
KantanAutoScale™ is an additional divide and conquer feature that lets KantanMT users distribute their translation jobs across multiple servers running in the cloud.
Engine Optimization technology means KantanMT engines now operate 5-10 times faster, reducing the amount of memory and CPU power needed so MT jobs can be processed faster and are more efficiently when using features like KantanAutoScale.
API optimization, KantanMT engineers went back to basics, reviewing and refining the system, which enabled users to achieve improvements from 50-100% performance in translation speed. This meant translation jobs that took five hours can now be completed in less than one hour.
Scalability is the key to advancement in machine translation, and considering the speed at which people are creating and digesting content we need to be able to provide true MT scalability to all language pairs for all content.
Communication is the one of the most important elements of business, and Machine Translation is a flexible tool that can be used to facilitate communication in a wide variety of scenarios and situations. Multinationals and other companies operating globally can take advantage of Machine Translation to achieve productivity gains.
This two part blog series examines two very different examples of implementing Machine Translation. This first post will look at what multinational organizations should consider before introducing Machine Translation to their business, and the second post will discuss the productivity gains and competitive advantages that can be achieved by Language Service Providers (LSPs) who adopt MT.
What is a multinational and why should it use Machine Translation?
Multinational corporations or global businesses are organizations operating in more than one country or region. The concept of an ‘international company’ has been around for hundreds of years, going back to the trading companies, which were established in the 1700s. Outside political agendas, their main purpose was to trade in spices and other commodities throughout Asia and Europe exposing traders to different languages and cultures.
Hundreds of years later, global communication is common place as more businesses operate internationally. There are no boundaries, and companies with worldwide operations require a constant flow of multilingual communication in order to maintain relationships between global employees, customers and stakeholders.
Multinational organizations typically have two types of content; external and internal. External content is created and released to the public; corporate documents, investor information, Corporate Social Responsibility (CSR) and marketing communications. On the other hand, internal content is created for use within the company, this is usually in the form of email and chat communications, memos and other internal documents.
To Translate or not to translate
Organizations without an in house translation team, often outsource the translation of external content to a reputable LSP. This ensures a guaranteed level of quality for the translation, and it also means that the process of localization is more efficient and cost effective. This is because, over time language assets in the form of translation memories, can be built up and leveraged to off-set the cost of future translations.
Internal content, however, is mostly comprised of communications between departments; emails, chats and information on sales and marketing activities. These are usually not translated professionally for a number of reasons:
Cost – the volume to be translated can make costs unmanageable
Confidentiality – managing sensitive information is more difficult
Real-time translation – emails and chat conversations generally requires real-time speed
As an example, if a company is headquartered in the United States, but operates in both Asia and Europe there is a very high possibility that more than one language is used in the company’s internal communication.
Multinational companies often select working languages that must be used for internal communications and department managers are sometimes required to have a certain level of proficiency in the company’s designated working languages, which usually includes English.
Large organizations like the United Nations also have official languages. In this case, documents are not published until a translation has been prepared in each official language.
So, what happens when an email with a client’s product specifications and sales information is sent to a group of employees who speak different languages? Some of those readers may have limited knowledge of the language being used, and only be able to understand the communication, but are not familiar enough with the language to write a coherent response. This can result in them responding in their native language. Suddenly, a single conversation thread contains more than one language, with a greater potential for miscommunication.
Why use Machine Translation?
Multinationals with global operations often have issues with the quantity and flow of internal information between departments operating in different languages. If the corporate headquarters uses a different language than its global subsidiaries, corporate documents need to be translated into each language as the internal information moves down the organizational hierarchy.
Machine Translation is a solution that can provide an instant, understandable ‘gist’ of internal information across a company operating in different languages and the use of MT can serve two purposes:
Documents that require a professional human translation are easily identified
Internal documents can be translated instantly so employees can get an understanding of the content
In order to understand internal content, employees often might use an open source MT solution such as Google Translate. While this is useful, it does not take into consideration any proprietary jargon or writing styles specific to the organization, and it also raises the question of confidentiality.
Challenges of MT
Many organizations may be interested in taking steps to deploy their own MT systems rather than outsourcing translation jobs or asking bilinguals in the company to do ad hoc translations. Those considering MT have two options; develop their own in house system or use a cloud-based subscription model.
Implementing any new process has challenges and MT is no exception. Some challenges traditionally associated with implementing MT systems are:
Long deployment times
How should an MT system be integrated?
Before going ahead with an MT solution, an organization needs to carefully consider what it hopes to achieve from implementing Machine Translation. The company should evaluate all the perceived benefits thoroughly, including managing any and all expectations about using Machine Translation.
Organizations thinking of implementing MT should ask:
What is its purpose? – Will MT be used as a management tool to improve internal communication and productivity, or to make decisions on what documents require professional outside translation? The purpose should be clearly defined at the outset.
Do we have enough language assets to build high quality engines? Bilingual language assets are a key ingredient for building MT engines. The quality of the training data will have a direct impact on the MT engines output “garbage in, garbage out”.
Should we invest in building our own system or buy a cloud-based subscription service? MT systems can be rule-based (RBMT), statistical (SMT) and hybrid. In house development of a propriety MT system requires a heavy technology, HR and training investment, unless those assets are readily available. Cloud-based subscription models do not require such a heavy initial investment and are often more cost effective than developing and managing an in house MT system.
Is the Machine Translation option scalable? How many language combinations will be needed? If each language pair requires its own unique engine, how simple is it to build additional engines with new language combinations? Scalability will be determined by translating capacity and the ability to add new language combinations, this would be especially important when entering different language markets or expanding the business to new regions. The MT solution should align itself with the company’s long term goals.
How will MT be integrated into everyday workflows? Users need to be able to easily access translation functions through their existing applications like email or the company intranet system to make it accessible and viable.
What indirect costs and planning will be involved? RBMT and hybrid systems require qualified linguists or language experts to develop and manage the engines. SMT systems use algorithms to identify probable translations based on the frequency, therefore, storage capacity is essential for the large volumes of training data required. Cloud options eliminate the need for in house technology investment, but extra costs might be incurred for going over the subscription plans, similar to the minutes allowance with mobile phone usage.
In carefully answering these questions, any organization planning to implement MT can stay focused on using the most cost-effective solution and achieve productivity gains with less miscommunication and more time savings.
The next part of this blog will look at how LSPs can leverage Machine Translation technology for productivity gains and competitive advantage.
KantanMT had an exciting year as it transitioned from a publicly funded business idea into a commercial enterprise that was officially launched in June 2013. The KantanMT team are delighted to have surpassed expectations, by developing and refining cutting edge technologies that make Machine Translation easier to understand and use.
Here are some of the highlights for 2013, as KantanMT looks back on an exceptional year.
Strong Customer Focus…
The year started on a high note, with the opening of a second office in Galway, Ireland, and KantanMT kept the forward momentum going as the year progressed. The Galway office is focused on customer service, product education and Customer Relationship Management (CRM), and is home to Aidan Collins, User Engagement Manager, Kevin McCoy, Customer Relationship Manager and MT Success Coach, and Gina Lawlor, Customer Relationship co-ordinator.
KantanMT officially launched the KantanMT Statistical Machine Translation (SMT) platform as a commercial entity in June 2013. The platform was tested pre-launch by both industry and academic professionals, and was presented at the European OPTIMALE (Optimizing Professional Translator Training in a Multilingual Europe) workshop in Brussels. OPTIMALE is an academic network of 70 partners from 32 European countries, and the organization aims to promote professional translator training as the translation industry merges with the internet and translation automation.
The KantanMT Community…
The KantanMT member’s community now includes top tier Language Service Providers (LSPs), multinationals and smaller organizations. In 2013, the community has grown from 400 members in January to 3400 registered members in December, and in response to this growth, KantanMT introduced two partner programs, with the objective of improving the Machine Translation ecosystem.
The Developer Partner Program, which supports organizations interested in developing integrated technology solutions, and the Preferred Supplier of MT Program, dedicated to strengthening the use of MT technology in the global translation supply chain. KantanMT’s Preferred Suppliers of MT are:
To date, the most popular target languages on the KantanMT platform are; French, Spanish and Brazilian-Portuguese. Members have uploaded more than 67 billion training words and built approx. 7,000 customized KantanMT engines that translated more than 500 million words.
As usage of the platform increased, KantanMT focused on developing new technologies to improve the translation process, including a mobile application for iOS and Android that allows users to get access to their KantanMT engines on the go.
KantanMT’s Core Technologies from 2013…
KantanMT have been kept busy continuously developing and releasing new technologies to help clients build robust business models to integrate Machine Translation into existing workflows.
KantanAnalytics™ – segment level Quality Estimation (QE) analysis as a percentage ‘fuzzy match’ score on KantanMT translations, provides a straightforward method for costing and scheduling translation projects.
BuildAnalytics™ – QE feature designed to measure the suitability of the uploaded training data. The technology generates a segment level percentage score on a sample of the uploaded training data.
KantanWatch™ – makes monitoring the performance of KantanMT engines more transparent.
TotalRecall™ – combines TM and MT technology, TM matches with a ‘fuzzy match’ score of less than 85% are automatically put through the customized MT engine, giving the users the benefits of both technologies.
KantanISR™ Instant Segment Retraining technology that allows members near instantaneous correction and retraining of their KantanMT engines.
PEX Rule Editor – an advanced pattern matching technology that allows members to correct repetitive errors, making a smoother post-editing process by reducing post-editing effort, cost and times.
Kantan API – critical for the development of software connectors and smooth integration of KantanMT into existing translation workflows. The success of the MemoQ connector, led to the development of subsequent connectors for MemSource and XTM.
KantanMT sourced and cleaned a range of bi-directional domain specific stock engines that consist of approx. six million words across legal, medical and financial domains and made them available to its members. KantanMT also developed support for Traditional and Simplified Chinese, Japanese, Thai and Croatian Languages during 2013.
Recognition as Business Innovators…
KantanMT received awards for business innovation and entrepreneurship throughout the year. Founder and Chief Architect, Tony O’Dowd was presented with the ICT Commercialization award in September.
In October, KantanMT was shortlisted for the PITCH start-up competition and participated in the ALPHA Program for start-ups at Dublin’s Web Summit, the largest tech conference in Europe. Earlier in the year KantanMT was also shortlisted for the Vodafone Start-up of the Year awards.
KantanMT were silver sponsors at the annual 2013 ASLIB Conference ‘Adopting the theme Translating and the Computer’ that took place in London, in November, and in October, Tony O’Dowd, presented at the TAUS Machine Translation Showcase at Localization World in Silicon Valley.
KantanMT have recently published a white paper introducing its cornerstone Quality Estimation technology, KantanAnalytics, and how this technology provides solutions to the biggest industry challenges facing widespread adoption of Machine Translation.
Post-editing is a necessary step in the Machine Translation workflow, but the role is still largely misunderstood. Language Service Providers (LSPs) are now experimenting more with the best practices for post-editing in the workflow. The lack of consistent training and reluctance within the industry to accept importance of the role are linked to the post-editors motivation. KantanMT looks at some of the more conventional attitudes towards motivation and their application to post-editing.
What is motivation and what studies have been done so far?
Understanding the concept of motivation has been a hot topic in many areas of organisation theory. Studies in the area really began to kick off with their application in the workplace, opening doors for pioneers to understand how employees could be motivated to do more work, and do better work.
Abraham Maslow and his well-known ‘Hierarchy of Needs’ indicates a person’s motivations are based on their position in the hierarchy pyramid.
Frederick Herzberg’s ‘two Factor Theory’ or Herzberg’s motivation-hygiene theory suggests professional activities like; professional acknowledgement, achievements and work responsibility, or job satisfiers have a positive effect on motivation.
Douglas McGregor used a black and white approach to motivation in his ‘Theory X and Theory Y’. He grouped employees into two categories; those who will only do the minimum and those who will push themselves.
As development of theories continued…
John Adair came up with the ‘fifty-fifty theory’ . According to the fifty-fifty theory, motivation is fifty percent the responsibility of the employee and fifty percent outside the employee’s control.
Even more recently, in 2010
Teresa Amabile and Steven Kramer carried out a study on the motivation levels of employees in a variety of settings. Their findings, suggest ‘Progress’ as the top performance motivator identified from an analysis of approx. 12,000 diary entries, daily ratings of motivation and emotions from hundreds of study participants.
To understand post-editor motivation we can combine the top performance motivator; progress with fifty-fifty theory.
Progress is a healthy motivator in the post-editing profession, it can help Localization Project Managers understand and encourage post-editor satisfaction and motivation. But while progress can be deemed an external factor, if we apply Adair’s ‘fifty-fifty’ rule, post-editors are also at least fifty percent responsible for their own motivation.
Post-editing as a profession is still only finding its feet, TAUS carried out a study in 2010 on the post editing practices of global LSPs. The study showed that, while post-editing is becoming a standard activity in the translation workflow it only accounts for a minor share of LSP business volume. This indicates that post-editors see their role as one of lesser importance because the industry views it as a role of lesser importance.
This attitude in the industry is highlighted by the lack of industry standards for post-editing best practices. Without evaluation practices to train post-editors and improve the post-editing process, post-editors are not making progress. This quite naturally is demotivating for the post-editor.
How to motivate post-editors
The first step in motivating post-editors is to recognise their role as autonomous to the role of a translator. The best post-editors are those, who are at least bilingual with some form of linguistic training, like a translator. Linguistic training is a major asset for editing the Machine Translated output.
TAUS offer a comparison of the translation process versus the post-editing process, highlighting the differences in the post-editing and translation processes.
One process is not more complicated that the other, only different. Translators, translate internally, while post-editors make “snap editing decisions” based on client requirements. As LSPs recognise these differences, they can successfully motivate their post-editors by providing them with the most suitable support, and work environment.
Progress as a Motivator
Translators make good post-editors, they have the linguistic ability to understand both the source and target texts, and if they enjoy editing or proof-reading, then the post-editing role will suit them. The right training is also important, if post-editors are trained properly they will become more aware of potential improvements to the workflow.
These improvements or ideas can be a great boost to post-editor motivation, if implemented the post-editor can take on more responsibility, which helps improve the translation workflow. A case where this could be applied is; if the post-editor is made responsible for updating the language assets used to retrain a Machine Translation system, they can take ownership and become responsible for the output quality rather than just post-editing Machine Translation output in isolation.
Fixing repetitive errors, can be frustrating for anyone, not just post-editors. But if they are responsible for the output quality, understand the system and can control the rules used to reduce these repetitive errors, they will experience motivation through progress.
This is only the tip of the iceberg on what motivates post-editors, each post-editor is different and how they feel about the role, whether it is just ‘another job’ or a major step in their career all play a part. The key is to provide proper training, foster an environment where post-editors can make progress by positively contributing to the role.
Translators often take pride and ownership of their translations, post-editors should also have the opportunity to take pride in their work, as it is their skills and experience that make it ‘publishable’ or even ‘fit for purpose’ quality.
Repetitive errors like diacritic marks or capitalisation can be easily fixed using KantanMT’s Post-Editing Automation (PEX) rules. PEX rules allow repetitive errors in a Machine Translation engine to be easily fixed using a ‘find and replace’ tool. These rules can be checked on a sample of the text by using the PEX Rule Editor.
The post-editor can correct repetitive errors during post-editing process, so the same errors don’t appear in future MT output, giving them responsibility over the Machine Translation engines quality.
Post-Editing Machine Translation (PEMT) is an important and necessary step in the Machine Translation process. KantanMT is releasing a new, simple and easy to use PEX rule editor, which will make the post-editing process more efficient, saving both time, costs and the post-editors sanity.
As we have discussed in earlier posts, PEMT is the process of reviewing and editing raw MT output to improve quality. The PEX rule editor is a tool that can help to save time and cut costs. It helps post-editors, since they no longer have to manually correct the same repetitive mistakes in a translated text.
Post-editing can be divided into roughly two categories; light and full post-editing. ‘Light’ post-editing, also called ‘gist’, ‘rapid’ or ‘fast’ post-editing focuses on transferring the most correct meaning without spending time correcting grammatical and stylistic errors. Correcting textual standards, like word order and coherence are less important in a light post-edit, compared to a more thorough ‘full’ or ‘conventional’ post-edit. Full post-edits need the correct meaning to be conveyed, correct grammar, accurate punctuation, and the correct transfer of any formatting such as tags or place holders.
The Client often dictates the type of post-editing required, whether it’s a full post-edit to get it up to ‘publishable quality’ similar to a human translation standard, or a light post-edit, which usually means ‘fit for purpose’. The engine’s quality also plays a part in the post-editing effort; using a high volume of in-domain training data during the build produce higher quality engines, which helps to cut post-editing efforts. Other factors such as language combination, domain and text type all contribute to post-editing effort.
Examples of repetitive errors
Some users may experience the following errors in their MT output.
Punctuation mistakes, hyphenation, diacritic marks etc.
Formatting – trailing spaces
SMT engines use a process of pattern matching to identify different regular expressions. Regular expressions or ‘regex’ are special text strings that describe patterns, these patterns need no linguistic analysis so they can be implemented easily across different language pairs. Regular expressions are also important components in developing PEX rules. KantanMT have a list of regular expressions used for both GENTRY Rule files (*.rul) and PEX post-edit files (*.pex).
Post-Editing Automation (PEX)
Repetitive errors can be fixed automatically by uploading PEX rule files. These rule files allow post-editors to spend less time correcting the same repetitive errors by automatically applying PEX constructs to translations generated from a KantanMT engine.
PEX works by incorporating “find and replace” rules. The rules are uploaded as a PEX file and applied while a translation job is being run.
PEX Rule Editor
KantanMT have designed a simple way to create, test and upload post-editing rules to a client profile.
The PEX Rule editor, located in the ‘MykantanMT’ menu, has an easy to use interface. Users can copy a sample of the translated text into the upper text box ‘Test Content’ then input the rules to be applied in the ‘PEX Search Rules’ and their corrections to the ‘PEX Replacement Rules’ box. The user can test the new rules by clicking ‘test rules’ and instantly identify any incorrect rules, before they are uploaded to the profile.
The introduction of tools to assist in the post-editing process helps remove some of the more repetitive corrections for post-editors. The new PEX Editor feature helps improve the PEMT workflow by ensuring all uploaded rule files are correct leading to a more effective method for fixing repetitive errors.