When we’re talking about Business Intelligence, self-service is an approach to data analytics that plays a vital and extremely beneficial role within an enterprise because it allows for immediate decision making. No-wait decision-making is a enormous contributor for a company’s bottom-line. Self-service allows business users to retrieve, interact and collaborate with company information without having to rely on IT assistance.
Thanks to self-service, IT personnel can focus their energy on more large-scale responsibilities that benefit the entire enterprise, like setting up the data warehouse and data marts underpinning the BI system, for example, while other team-members can work more strategically and efficiently. Quality data preparation tools are imperative to the independence of business users and integrated tools allow users to operate, analyze, change and calculate data sets quickly using GUI’s to alternate between data prep and visualization screens with just one click.
BI is becoming more intertwined with self-service which should be no surprise since it enables data analysis to be more streamlined and keeps companies optimally responsive, efficient and agile. Self-service data-discovery tools allow decision-makers to tap into the information they require which enhances their success. Self-service also helps a company to realize reduced administrative burdens, shortened timelines and the emergence of deeper insights.
Self-service acts as an enterprise’s coveted ally for several reasons:
If the post-Black Friday sales numbers are anything to go by, there’s no question any more that the face of eCommerce is changing, and with it, the brick-and-mortar retailers have started rethinking their business strategy. As this news piece about Scotland experiencing a major dip in shoppers goes on to prove, demand for online shopping will increase substantially in 2016. This in turn means that the need for content localization and translation for eTailers (online retailers) will be even more pressing during the coming new year. As the often quoted Common Sense Advisory report points out, 72.4% of consumers are more likely to buy from a site, which is in their native language. Indeed, localization is no longer a good-to-have feature – it is now a must-have for all eCommerce businesses that aim to sell their products globally.
Chris Bishop, Managing Director of Microsoft Research, Cambridge, UK points out that “by 2026 we will have ubiquitous, human-quality translation among all European languages, thereby eliminating the language barrier throughout Europe.” Bishop’s prediction does not sound far off the mark at all when we take into account the fact that in the past ten years, Machine Translation (MT) has improved by leaps and bounds. Early MT was rules-based (RBMT) and required sets of linguistic rules, and it worked moderately well within a prescribed domain. However, this was resource intensive and cost prohibitive for many.
By 2026 we will have ubiquitous, human-quality translation among all European languages, thereby eliminating the language barrier throughout Europe
Chris Bishop, Managing Director of Microsoft Research, Cambridge, UK
The turning point for using MT in business came with the advent of the Internet, the SaaS model and the open source development model for software. These new changes in technology helped build the foundation for Statistical Machine Translation (SMT) research, and subsequently the open source development of the Moses Decoder. Moses enabled researchers and private companies to commercialise Statistical MT and develop it to the custom solutions it is today. The year of 2016 and beyond, will see further research in the fields of Natural Language Processing (NPL), Deep learning and machine learning, contributing directly to immense improvements in the fields of Custom MT.
The KantanMT Business Team published a new white paper, which provides an in depth understanding of how eTailers in 2016 will be affected by Machine Translation, and also goes on to discuss how Custom Machine Translation when compared to generic MT systems, will emerge as the clear winner in solving eTailing localization issues in the coming year.
Here are some of the highlights how MT will evolve in 2016 for eTailers:
eTailers will use a combination of only CMT or CMT and Human Post-Editing to reach new markets ahead of their competitors
With increased multilingual customer demand for products, content translation will find support in auto scaling
Custom Machine Translation will be used more widely as eCommerce customers expand globally
Machine Translation is no longer a luxury. It is an essential component as a Tier 1 application to support global business. The purpose of this paper is to highlight how Machine Translation and more importantly Custom Machine Translation technology has come of age, in terms of quality, speed and scalability. During 2016 and beyond eTailers need to ensure that they review their globalization strategies to reflect these advances in technology, so they can maximise their global growth potential.
Machine translation applications have sky rocketed, and we as consumers demand content to be readily available in our native language. We make purchases online quickly, and expect those purchases delivered to our doors regardless of language and shipping destination.
Common Sense Advisory identified that three quarters of online consumers prefer to buy in their own languages. This is significant for online business, and as such companies are aware that a localized product or service available online means a much greater customer pool, which in turn leads to more sales and a bigger return for stakeholders.
There is one big ‘wall’ still standing between more sales revenues and happy customers, and that is ‘multilingual support’. Traditional multilingual support requires a heavy investment in translation and localization workflows, not to mention a plethora of specialists needed to provide linguistic support.
However, ‘Big data’, computing capabilities and the cloud are creating unique possibilities to avoid such heavy investments and companies that choose to embrace these new opportunities are reaping the rewards.
KantanMT’s Founder and Chief Architect, Tony O’Dowd and Deepan Patel, Machine Translation Solutions Architect at Milengo Ltd. discuss the opportunities offered by implementing a cloud based machine translation solution. They examine Milengo’s experience using KantanMT to optimise its translation supply chain, and illustrate, with examples; how the leading translation company uses KantanMT.com to achieve excellent results in ongoing MT projects for some of the world’s major companies
Manage User Expectations: Clear communication with the client about the process, workflow and expected results will ensure trust and confidence in the project. Even without a pilot test, Milengo still managed to localize a web shop with 780,000 Danish words to Swedish in 17 days.
Think to Scale: The localization process must always be scalable, each example for; software documentation (Interactive Intelligence), ecommerce (Netthandelen) and automotive parts data required an automated solution that could be scaled.
Customise It: MT customisation can fulfil a wide variety of localization needs. Not only is it more cost efficient (Netthandelen achieved 62% cost savings), it enables engine retraining quickly, and improves its ability generate higher quality translations.
To learn how you can generate meaningful business intelligence that lets you manage and improve the ROI from Machine translation, contact us for a free consultation and/or personalised platform demonstration.
January 2015 marks the last month of the Moses Core project. The project started three years ago in 2012, as a collaborative effort by its members to improve translation processes and to create a competitive translation environment. Over those three years, the translation and MT landscape has changed significantly. This change and the project’s success is in no small part due to the hard work and diligence of the Moses Core project coordinator; TAUS and with TAUS’s kind permission, KantanMT is republishing the MT use case for the KantanMT Community.
KantanMT.com is a registered trademark of Xcelerator Machine Translations Ltd.
TIME IN MT BUSINESS
The platform was launched commercially in Q4 2013, however, we have been rigorously testing KantanMT.com in academic and commercial settings since 2012. In the beginning, the product was offered as a free trial to the KantanMT Community, and their feedback was instrumental in shaping and improving the platform to what it is today.
The Moses technology has improved immensely over the past 12-18 months. Developer documentation and support materials, while initially very basic, have matured into a more structured, comprehensive and helpful resource. Additionally, the management of software distributions has made it easier to work with, understand and deploy. These are key elements in maintaining and supporting any open-source technology and have made Moses a key technology for the localization industry.
The rise of the global economy and the driving demand for multilingual translation created a gap in the market for a sustainable translation method that could automatically scale to accommodate fluctuating translation needs. The KantanMT Development team was able to utilize the open source Moses decoder to develop a cloud-based Statistical Machine Translation (SMT) platform, where clients could build and manage their own customized MT engines without compromizing on the ownership of their data. The flexibility, scalability and security of the Moses toolkit made this possible.
The Moses toolkit offers the most flexibility in implementing an SMT solution for commercial purposes, as it allows the system’s training and decoding process to be modified. This has enabled the KantanMT team to create a high-value product that is dynamic and commercially relevant.
To ensure the product could scale and adapt to user needs the KantanMT team needed a decoder that could be built and managed on the cloud. The Moses system enabled this functionality.
Parallel language data is required to train an SMT engine. This data is an important resource for companies, and current generic SMT engines do not guarantee the security or safeguard the ownership of these assets. In using the Moses decoder, the KantanMT team created a product that could ensure its clients’ data was kept private, and not repurposed or reused in anyway.
Many global companies have large repositories of bilingual data, however, they often do not wish to deploy and maintain their own version of the Moses decoder. The KantanMT Development team was able to develop the sophisticated Moses SMT technology into a package that could be easily accessible to companies wishing to translate their content, and over time achieve localization cost savings.
The current machine translation development team consists of four people, who maintain the platform and build machine translation engines for clients. Due to significant growth in the company over the past year, KantanMT.com will be hiring more staff over the course of the next few months to build engines for clients.
MT SYSTEM INFRASTRUCTURE
Insource or Outsource Moses/Implementation
Based on research, the demands of the language services industry and enterprise machine translation buyers, KantanMT has implemented and customized the Moses decoder in house to create a robust and commercially viable machine translation product that can scale and adapt to our clients’ needs. The original/base KantanAnalytics™ technology was co-developed with the CNGL Centre for Global Intelligent Content, an academic-industry research Centre based in Dublin City University, Ireland. However, all other KantanMT.com technologies have been developed in house by an in house expert development team.
Number of Engines
As of January 2015, the total number of MT engines built on KantanMT.com by the KantanMT community is 6,777 engines.
As of January 2015, the total number of training words uploaded to the platform by the KantanMT Community has surpassed 50 billion, and the number of translated words on the platform is now more than 600 million.
bmmt GmbH is a German language service provider with a strong focus on machine translation. It needed a Machine Translation provider, which would give the bmmt team full control of their Machine Translation training data and MT engine customization process at a low investment point. They also required a system which could correctly handle format-specific tagging and transparent transfer of mark-up information.
In early 2013, bmmt joined the KantanMT Community and began testing different customization processes using client specific training data. The team initially experienced minor problems with their SDLXLIFF files. However, the KantanMT development team were able to quickly solve this problem by restructuring some of its tokenizers.
The company began deploying production engines in mid-2013. These were showing particularly high Quality Evaluation (QE) scores due to the quality of their training data and resulted in a considerable increase in translation productivity. bmmt MT technicians found that domain specificity is a better basis for predictable output than sheer input size.
bmmt is currently using approximately 20 KantanMT engines in production across technical and automotive domains. These production ready engines are experiencing high quality metric scores for each language combination.
KantanMT.com is one of the market leaders of cloud-based machine translation services. It provides cloud-based SMT services to major global enterprises and software companies wishing to translate large volumes of data. It works directly with companies to develop and implement a long term machine translation strategy, or it works with a select number of language service providers (preferred MT supplier partner program) to supply MT services to large enterprises.
VIEWS ON CURRENT STATE OF MT
Machine translation is now much more widely accepted in the industry, than it was just a few years ago. Since KantanMT.com entered the market in its testing phase in 2012, we have seen an enormous change in the attitudes and perception of MT in the language community. Access to technology such as smart-phones and tablets in non-English speaking nations has driven the global marketplace, and this in turn has increased the need for on-demand translation services – driving demand for MT services. The MosesCore Project has facilitated this demand with an open source solution that made it possible for smaller companies, and startups like us to compete against bigger MT providers, to solve the problem of language.
“The KantanMT platform sets a new industry benchmark in terms of analytics and development tools used to build and measure the quality of Statistical MT Engines. The KantanMT expert development team has introduced some of the industry’s most exciting and valuable technologies built on the Moses decoder, which are helping language and enterprise clients to translate more efficiently and reduce costs.” KantanMT.com founder and Chief Architect, Tony O’Dowd.
KantanMT is delighted to republish, with permission a post on machine translation technology and internet security that was recently written by Joseph Wojowski. Joseph Wojowski is the Director of Operations at Foreign Credits and Chief Technology Officer at Morningstar Global Translations LLC.
Machine Translation Technology and Internet Security
An issue that seems to have been brought up once in the industry and never addressed again are the data collection methods used by Microsoft, Google, Yahoo!, Skype, and Apple as well as the revelations of PRISM data collection from those same companies, thanks to Edward Snowden. More and more, it appears that the industry is moving closer and closer to full Machine Translation Integration and Usage, and with interesting, if alarming, findings being reported on Machine Translation’s usage when integrated into Translation Environments, the fact remains that Google Translate, Microsoft Bing Translator, and other publicly-available machine translation interfaces and APIs store every single word, phrase, segment, and sentence that is sent to them.
Terms and Conditions
What exactly are you agreeing to when you send translation segments through the Google Translate or Bing Translator website or API?
1 – Google Terms and Conditions
Essentially, in using Google’s services, you are agreeing to permit them to store the segment to use for creating more accurate translations in the future, they can also publish, display, and distribute the content.
“When you upload, submit, store, send or receive content to or through our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content.” (Google Terms of Service – 14 April 2014, accessed on 8 December 2014)
Oh, and did I mention that in using the service, the user is bearing all liability for“LOST PROFITS, REVENUES, OR DATA, FINANCIAL LOSSES OR INDIRECT, SPECIAL, CONSEQUENTIAL, EXEMPLARY, OR PUNITIVE DAMAGES.” (Google Terms of Service – 14 April 2014, accessed on 8 December 2014)
So if it is discovered that a client’s confidential content is also located on Google’s servers because of a negligent translator, that translator is liable for losses and Google relinquishes liability for distributing what should have been kept confidential.
Alright, that’s a lot of legal wording, not the best news, and a lot to take in if this is the first time you’re hearing about this. What about Microsoft Bing Translator?
2 – Microsoft Services Agreement (correction made to content – see below)
In writing their services agreement, Microsoft got very tricky. They start out positively by stating that you own your own content.
“Except for material that we license to you that may be incorporated into your own content (such as clip art), we do not claim ownership of the content you provide on the services. Your content remains your content, and you are responsible for it. We do not control, verify, pay for, or endorse the content that you and others make available on the services.” (Microsoft Services Agreement – effective 19 October 2012, accessed on 8 December 2014)
Bing! Bing! Bing! Bing! Bing! We have a winner! Right? Hold your horses, don’t install the Bing API yet. It continues on in stating,
“When you transmit or upload Content to the Services, you’re giving Microsoft the worldwide right, without charge, to use Content as necessary: to provide the Services to you, to protect you, and to improve Microsoft products and services.”(Microsoft Services Agreement – effective 19 October 2012, accessed on 8 December 2014)
So again with Bing, while they originally state that you own the content you submit to their services, they also state that in doing so, you are giving them the right to use the information as they see fit and (more specifically) to improve the translation engine.
How do these terms affect the translation industry, then?
The problem arises whenever translators are working with documents that contain confidential or restricted-access information. Aside from his/her use of webmail hosted by Microsoft, Google, Apple, etc. – which also poses a problem with confidentiality – contents of documents that are sent through free, public machine translation engines; whether through the website or API, are leaking the information the translator agreed to keep confidential in the Non-Disclosure Agreement (if established) with the LSP; a clear and blatant breach of confidentiality.
But I’m a professional translator and have been for years, I don’t use MT and no self-respecting professional translator would.
Well, yes and no; a conflict arises from that mode of thinking. In theory, yes, a professional translator should know better than to blindly use Machine Translation because of its inaccurate and often unusable output. A professional translator; however, should also recognize that with advancements in MT Technology, Machine Translation can be a very powerful tool in the translator’s toolbox and can, at times, greatly aid in the translation of certain documents.
The current state of the use of MT more echoes the latter than the former. In 2013 research conducted by Common Sense Advisory, 64% of the 239 people who responded to the survey reported that colleagues frequently use free Machine Translation Engines; 62% of those sampled were concerned about free MT usage.
In the November/December 2014 Issue of the ATA Chronicle, Jost Zetzsche relayed information on how users were using the cloud-based translation tool MemSource. Of particular interest are the Machine Translation numbers relayed to him by David Canek, Founder of MemSource. 46.2% of its around 30,000 users (about 13,860 translators) were using Machine Translation; of those, 98% were using the Google Translate or a variant of the Bing Translator API. And of still greater alarm, a large percentage of users using Bing Translator chose to employ the “Microsoft with Feedback” option which sends the finalized target segment back to Microsoft (a financially appealing option since when selected, use of the API costs nothing).
As you can imagine, while I was reading that article, I was yelling at all 13.9 thousand of them through the magazine. How many of them were using Google or Bing MT with documents that should not have been sent to either Google or Microsoft? How many of these users knew to shut off the API for such documents – how many did?
There’s no way to be certain how much confidential information may have been leaked due to translator negligence, in the best scenario perhaps none, but it’s clear that the potential is very great.
On the other hand, in creating a tool as dynamic and ever-changing as a machine translation engine, the only way to train it and make it better is to use it, a sentiment that is echoed throughout the industry by developers of MT tools and something that can be seen in the output of Google translate over the past several years.
So what options are there for me to have an MT solution for my customers without risking a breach in confidentiality?
There are numerous non-public MT engines available – including Apertium, a developing open-source MT platform – however, none of them are as widely used (and therefore, as well-trained) as Google Translate or Bing Translator (yes, I realize that I just spent over 1,000 words talking about the risk involved in using Google Translate or Bing Translator).
So, is there another way? How can you gain the leverage of arguably the best-trained MT Engines available while keeping confidential information confidential?
There are companies who have foreseen this problem and addressed it, without pitching their product, here’s how it works. It acts as an MT API but before any segments are sent across your firewall to Google, it replaces all names, proper nouns, locations, positions, and numbers with an independent, anonymous token or placeholder. After the translated segment has returned from Google and is safely within the confines of your firewall, the potentially confidential material then replaces the tokens leaving you with the MT translated segment. On top of that, it also allows for customized tokenization rules to further anonymize sensitive data such as formulae, terminology, processes, etc.
While the purpose of this article was not to prevent translators from using MT, it is intended to get translators thinking about its use and increase awareness of the inherent risks and solution options available.
— Correction —
As I have been informed, the information in the original post is not as exact as it could be, there is a Microsoft Translator Privacy Agreement that more specifically addresses use of the Microsoft Translator. Apparently, with Translator, they take a sample of no more than 10% of “randomly selected, non-consecutive sentences from the text” submitted. Unused text is deleted within 48 hours after translation is provided.
If the user subscribes to their data subscriptions with a maximum of 250 million characters per month (also available at levels of 500 million, 635 million, and one billion) , he or she is then able to opt-out of logging.
There is also Microsoft Translator Hub which allows the user to personalize the translation engine where “The Hub retains and uses submitted documents in full in order to provide your personalized translation system and to improve the Translator service.” And it should be noted that, “After you remove a document from your Hub account we may continue to use it for improving the Translator service.”
So let’s analyze this development. 10% of the full text submitted is sampled and unused text is deleted within 48 hours of its service to the user. The text is still potentially from a sensitive document and still warrants awareness of the issue.
If you use The Translator Hub, it uses the full document to train the engine and even after you remove the document from your Hub, and they may also use it to continue improving the Translator service.
Now break out the calculators and slide rules, kids, it’s time to do some math.
In order to opt-out of logging, you need to purchase a data subscription of 250 million characters per month or more (the 250 million character level costs $2,055.00/month). If every word were 50 characters each, that would be 5 million words per month (where a month is 31 days) and a post-editor would have to process 161,290 words per day (working every single day of this 31-day month). It’s physically impossible for a post-editor to process 161,290 words in a day, let alone a month (working 8 hours a day for 20 days a month, 161,290 words per month would be 8,064.5 words per day). So we can safely assume that no freelance translator can afford to buy in at the 250 million character/month level especially when even in the busiest month, a single translator comes no where near being able to edit the amount of words necessary to make it a financially sound expense.
In the end, I still come to the same conclusion, we need to be more cognizant of what we send through free, public, and semi-public Machine Translation engines and educate ourselves on the risks associated with their use and the safer, more secure solutions available when working with confidential or restricted-access information.
The KantanMT team would like to thank Joseph Wojowski for allowing us to republish his very interesting and topical post on machine translation security. You can view the original post here.
At KantanMT, security, integrity and the privacy of our customers’ data is a top priority. We believe this is vital to their business operations and to our own success. Therefore, we use a multilayered approach to protect and encrypt this information. The KantanMT Data Privacy statement ensures that no client data is re-published, re-tasked or re-purposed and will also be fully encrypted during storage and transmission.
The ‘quality debate’ is old news and the conversation, which is now heavily influenced by ‘big data’ and ‘cloud computing’ has moved on. Instead it is focusing on the ability to scale translation jobs quickly and efficiently to meet real-time demands.
Translation buyers expect a system or workflow that provides high quality, fit-for-purpose translations. And it’s because of this that Language Service Providers (LSPs) have worked tirelessly, perfecting their systems and orchestrating the use of Translation Memories (TM) within well managed workflows that combine the professionalization of the translator industry – quality is now a given in the buyers eyes.
What is the translation buyers’ biggest challenge?
The Translation buyers’ biggest challenge now is scale – scaling their processes, their workflows and supply chains. Of course, the caveat is that they want scale without jeopardizing quality! They need systems that are responsive, are transparent and scale gracefully in step with their corporate growth and language expansion strategy.
Scale with quality! One without the other is as useless as a wind-farm without wind!
What makes machine translation better than other processes? Looking past the obvious automation of the localization workflow, the one thing that MT can do above all other translation methods is its ability to combine automation and scalability.
KantanMT recognizes this and has developed a number of key technologies to accelerate the speed of on-demand MT engines without compromising quality.
KantanAutoScale™ is an additional divide and conquer feature that lets KantanMT users distribute their translation jobs across multiple servers running in the cloud.
Engine Optimization technology means KantanMT engines now operate 5-10 times faster, reducing the amount of memory and CPU power needed so MT jobs can be processed faster and are more efficiently when using features like KantanAutoScale.
API optimization, KantanMT engineers went back to basics, reviewing and refining the system, which enabled users to achieve improvements from 50-100% performance in translation speed. This meant translation jobs that took five hours can now be completed in less than one hour.
Scalability is the key to advancement in machine translation, and considering the speed at which people are creating and digesting content we need to be able to provide true MT scalability to all language pairs for all content.
Are you a Language Service Provider (LSP)? If you are then you’ve probably got a client request for Machine Translation, or you are a progressive company who wants to list the latest technologies on your website and you’ve heard about the productivity gains others are getting from MT and want part of it.
Are you an Enterprise? Then you’re probably under pressure to reduce translation costs and/or introduce new languages online and offline.
Either way, you will travel down a similar path when introducing Machine Translation into your company and many people and departments will be touched– translators, engineers, project managers, managers, solutions engineers, finance, legal etc. That’s why it’s important to be clear about what to expect from the process and to put down measures to ensure that it all runs smoothly.
During the investigating period you will research in both print and online publications to get an idea of which types of Machine Translation solutions are available and which might suit your business needs. You’ll read about rule-based MT, statistical MT, and possibly hybrid too, learning that each option has its pros and its cons.
After you narrow down your preferences to a small group of providers, you should look for peer based reviews of the Machine Translation products – this could be in the form of case studies, webinars, video testimonials, forums, or face to face at conferences. Doing this will give you a better understanding of how the product works in a real life scenario and might bring to light issues you should be aware of, that only users will tell you.
At this point you will want to contact some Machine Translation sales companies to get a better overview of the product and service they offer and to see if the pricing and support meets your needs. Ask about automatic post-editing tools and analytics as these are key!
It’s important to see the Machine Translation system in action so you know if it’s manageable for your team. Most providers will offer you a demo after or during your first meeting, but don’t be afraid to ask if they don’t.
So, after you’ve seen a few systems in action you may be in a situation where you are choosing between a custom build MT system and a platform which allows you to develop, manage and deploy your own MT systems. Have you tried a pilot? This is a great way to see whether your data works better with one system or another.
Things to think about when you start using Machine Translation (statistical):
– Do I have sufficient data to get started?
Realistically you need some good data to start off with. While MT providers (like us) offer to manufacture data – it is never as good as your own. You’ll need a TM (of roughly 2-5 million words), some terminology and some monolingual data (which can be easily generated).
– Does the MT provider have a developer friendly API?
Whether you’re intending to use Machine Translation as a pre-translation tool or to translate content on-demand you’ll want to ask the MT provider about its API capabilities. The speed and flexibility of an API will vary greatly so remember to ask about this!
– Have you discussed post-editing with your translators?
Some translators are less open to Machine Translation than others, and rightly so – there are some questionable free MT systems available online today! While many have had bad experiences, some just don’t like the thought of a computer doing their work and others would prefer to translate from scratch. It’s important to talk to your translators before, during and after deciding to introduce Machine Translation into your company so they are part of the process and more in tune with the technology. Translators and linguists play an integral part in the development of Machine Translation engines and will help be a great help when deciding which MT solution produces to choose.
This is not an exhaustive list of items to think about and discuss with your team but it should give you some idea of the process you will likely go through when evaluating Machine Translation systems. If you would like more information about the buying process, or would like to book a demo of KantanMT.com please contact Niamh (firstname.lastname@example.org)