How do You Avoid the Bottlenecks and Save Money at the Same Time?

speed-async

If you had a very valuable and talented member of staff, would you send him/her on an errand to collect something from a supplier; a task that might tie them up for hours? Or, if you had a second option whereby you can request the supplier to call you when the product is ready for collection allowing your staff member to make a timely collection, which option might you take? Most people, I would venture, would choose the second option. However, many would favour having both options. Flexibility is always a good thing to have as part of any workflow.

With KantanMT’s announcement that its Software Development Kit (SDK) has been enhanced by the addition of a new Asynchronous Interface the community of KantanMT users has been given this very flexibility. This development in the SDK provides a high speed, high volume asynchronous programme interface in to both Statistical and Neural MT engines. This option complements the already existing synchronous interface option.

For the uninitiated, the synchronous and asynchronous nature of an Application Programming Interface (API) is a function of the time it takes from the initial request for a service to the return of processed data. In the case of synchronous APIs, the expectation is that there will be an immediate return of processed data. The API requests data and then pauses and waits for the processed data to be returned. This has the impact of tying up that hardware whilst it awaits a result to its request. Hence my analogy above of tying up a valuable resource while he/she waits for the supplier to supply the requested product.

In the case of asynchronous APIs, it proceeds to send data on the basis that a resource may not be immediately available, and therefore, the requested service may not be immediately responded to. The API will receive a call back when the required service is available. The asynchronous method avoids the server (and an engineer?) pausing and lying dormant while it awaits a response, thus allowing that server to be employed in other tasks.

Asynchronous processing is used in the case where the user doesn’t want to halt processing while an external request is being dealt with. A synchronous API will block the caller until it returns thus tying up resources until the action has been completed. On the other hand, an asynchronous API will not block the caller and typically will require a call back that will be executed once the work is completed. Asynchronous requests are useful in maintaining functionality in an application rather than tying up resources while waiting on a request.

The buying of hardware is a big capital investment for many companies, particularly start-ups. And staying on top of the biggest and the best is also an endless drain on company resources. Therefore, server time must be optimised to the nth degree. A tied-up server equates to a wasted resource – hence my analogy of the valuable staff member twiddling his/her thumbs while they queue for a collection.

The provision by KantanMT of the asynchronous API function allows users to reduce spending by optimising server time, so making less need for investment in multiple servers. KantanMT is confident that this addition to its SDK will be welcomed by many of its clients; users who while requiring high volume translations, do not need it done in real-time. The Asynchronous interface will provide a solution to any possible bottleneck in server capacity by queuing non-urgent translation requests to be processed when capacity is available. This, KantanMT is confident, will ensure users can avoid spending on costly hardware, thus reducing outlays and so keeping TCO (Total Cost of Ownership) to a minimum for the KantanMT Community. Having a bypass to a bottleneck is always a good thing to have, especially when it equates to less spending.

Aidan Collins is a language industry veteran. He is Marketing Manager at KantanMT.

KantanMT’s TNN Translation Shown to Produce a Striking Boost in Quality

Top

As the localization industry strives at a fast pace to integrate Machine Translation into mainstream workflows to increase productivity, reduce cost and gain a competitive advantage, it’s worthwhile taking time to consider which type of Neural MT provides the best results in terms of translation quality and cost.

This is a question that has been occupying our minds here at KantanMT and eBay over the past several months. The fact is, Neural MT comes in many variants – with the different models available yielding remarkably different quality results.

Overview of Neural Network Types

 The main models of Neural MT are:

  • Recurrent Neural Networks (RNNs) – these have been designed to recognize sequential characteristics of data and use the detected patterns to predict the next most likely sequence. Training happens in a both forward and backward direction; hence, the use of the descriptor recurrent. RNNs have been the predominant neural network of choice by most MT providers.

RNN

Fig 1: Image Courtesy of Jeremy Jordon

  • Convolutional Neural Networks (CNNs) – these are the main type of networks used in computer image processing (e.g., facial recognition and image searching) but can also be used for machine translation purposes. The model exploits the 2D-structure of input data. Training process is simplified and CNNs require less computational overhead to compute models.

CNN

Fig 1: Image Courtesy of Jeremy Jordon

  • Transformer Neural Networks (TNNs) – The predominant approach to MT is based on the recurrent/convolutional neural networks model of connecting the encoder and decoder through an attention mechanism. However, the Transformer Neural Networks model uses only the attention mechanisms aspect (e.g., contextual characteristics of input data). This completely avoids using the recurrence and convolution structures of the other models. This has the effect of simplifying the training process and reducing the computational requirements for TNN modelling.

TNN

Fig 3: Image Courtesy of “The Illustrated Transformer” by Jay Alammar

The eBay NMT Experiment

To determine which model yields the best translation outcomes, eBay and KantanMT collaborated and set up a controlled experiment using the KantanMT platform, which supports all three types of Neural Models.

The language arc English => Italian was chosen, and the domain defined as eBay’s Customer Support content. Each Kantan model variant was trained on identical training data sets which consisted of:

  • eBay’s in-domain translation memory
  • eBay’s glossaries and lists of brand names
  • Supplementary KantanLibrary training corpora

The Test Reference Set was created by the eBay MT Linguistic Team by sampling the eBay Translation Memory to mirror its segment length distribution (e.g., 10% short segments, 30% medium and 60% long).

To provide a comprehensive comparison and ranking of the performance of different models, the translation outputs from the following systems were included in our joint experiment:

  • Kantan TNN (Transformer Neural Network, customized)
  • Kantan CNN (Convolutional Neural Network, customized)
  • Kantan RNN (Recurrent Neural Network, customized)
  • Bing Translate (Transformer Neural Network, generic)
  • Google Translate (Transformer Neural Network, generic)

Human Translation (HT) was also included in this comparison and ranking to determine how neural machine translation outputs compare to translations provided by Professional Translators.

The evaluator was an eBay Italian MT language specialist with domain expertise and experience in ranking and assessing the quality of machine translation outputs.

The following Key Performance Indicators (KPIs) were chosen to determine the comparative fluency and adequacy of each system:

  • Fluency = Fluency determines the translation follows common grammatical rules and contains expected word collocation. This KPI measures whether the machine translation segment is formed in the same way a human translation would
  • Adequacy = Adequacy measures how much meaning is expressed in the machine translation segment. It measures whether the machine translation segment contains as much of the meaning as if it were translated by a human

Each KPI was rated on a 5-star scale, with 1 star being the lowest rating (i.e., No Fluency) and 5 stars being the highest rating (i.e., Human-Level Fluency).

KantanLQR was used to manage the assessment, randomise and anonymise the Test Reference Set, score the translation outputs, and collate the feedback from the eBay MT linguist.

The Results

Results

Our Conclusions

The Custom Kantan Transformer Neural Network (Kantan TNN) performed the best in terms of Fluency and Adequacy. It outperformed RNNs in terms of Fluency by 9 percentage points (which is statistically significant), and 11 percentage points in terms of Adequacy. While there is still some way to go to achieve near-human-level quality (as depicted by the HT graphs), Transformer Neural Networks provide significant improvements in MT quality in terms of Fluency and Adequacy, and they offer the best-bang-for-your-buck in terms of training time and process simplification.

Since this blog was first published, comparative analysis has also been carried out for English=>German, English=>Spanish and English=>French language combinations and in all cases Kantan TNNs out-performed CNNs, RNNs, Google and Bing Translate.

Shaping the Path to Neural Machine Translation: Interview with Tony O’Dowd

What is Neural Machine Translation (NMT) all about?

Neural Machine Translation is an approach to machine translation that uses large neural networks to produce translations that are more natural sounding and achieve greater levels of fluency. These networks are trained on sequences (or sentences), which means they solve many of the syntactical and grammatical errors previously associated with Phrase-Based Statistical Machine Translation.

With the emergence of relatively cheap, super-powerful computers, coupled with practically unlimited storage capacity due to the emergence of “the Cloud”, we can now compute these complex NMT models in several hours.

These highly efficient models can then provide fast and fluent translations, and at an economically advantageous price. Today, over 90% of the daily traffic on the KantanMT platform is processed by our NMT services. This gives you an indication of the high regard our customers have for the efficacy of our NMT platform.

A few facts about KantanMT

Why is there so much hype around NMT?

Simply because it provides a demonstrable improvement in machine translation outputs, resolving many of the problems associated with the statistical machine translation (SMT) model. Effectively, we have in the last two years solved many of the translation shortcomings of the SMT system; deficiencies that we’ve been trying to resolve for the last two decades! So, you can imagine how excited we are to be able to move with such speed compared to the pace of development we were lumbered with when working with SMT.

An interesting factoid about NMT is that we actually don’t train them using “whole words”, but we in fact train them using “word pieces”. And even if we don’t have parallel training data for a language combination, we can build a zero-shot network that will be capable of producing translations in these languages! It’s completely amazing what we can do once we set up the deep learning approach and throw super-computers at the problem. In Deep Learning (DL) we use highly sophisticated, multi-layered, pattern of ‘neurons’ to process huge chunks of data looking to refine the information contained within that data. The DL process can take an abstract jungle of information (word pieces), as is contained with data, and using the power of super-computation refine the data in to clearly understood language.

Can you imagine how good Neural MT will be in a further two years? Will Moore’s Law of exponential technological growth apply to Neural Machine Translation too? I believe it will. It will be amazing to see then how powerful NMT will be. It is certainly something that excites us here at KantanMT.com.

Which languages have made the greatest progress for NMT?

Any language that has a deep and complex grammatical structure can now be efficiently modelled using Deep Leaning and Neural Networks. For example, take the grammatical characteristic of the humble German verb – under normal circumstances it needs to be positioned at the end of a sentence. That would seem a straight forward enough challenge? However, SMT struggled to position the German verb accurately. To overcome this, we at KantanMT.com used advanced part-of-speech reordering approaches to improve this accuracy. This was a very complex, time-consuming and computationally intensive approach. However, NMT (because we train the engines on full sentences) almost always correctly positions that illusive German verb. This methodology also allows us to meet the challenges of languages such as Hungarian and Finnish. These are now well within our capabilities, allowing us to produce very good translation outputs using NMT.

Where do you see the translation industry in the next 5 years?

What an exciting time to be in the Localization Industry! We are on the cusp of a massive explosion in Artificial Intelligence (AI), which will impact all facets of the localization industry’s workflow and processes.

The industry will use automated translation technology to process even more content, in to more languages, and faster than ever before. Translators should not fear, as they will be the main beneficiary of this transformation. As this technological evolution grows, translators will be able to produce more words per day and consequently, significantly improve their income levels. I envisage a scenario whereby translation from scratch will be viewed as old-school and passé. The translation model will change in the same way as Computer Aided Translation (CAT) transformed the industry for the better. In the new NMT paradigm, the post-editing of a constantly improving machine translation output will be seen and accepted as the modern, progressive way of working. And the industry will be the better for it.

AI will also enable better job matching and candidate selection – so translators will be selected based on their relevant skill sets, domain knowledge and previous job performances. This is not to be feared, as essentially this is they way we choose our dentists and doctors today. AI wil

l become a driver for greater competition and increased professionalism in our industry.

I also see AI being becoming part of the project management workflow system, and the project management role.  PM systems will be expected to handle real-time translation workflows. A system that will combine automated translation and “human touch post-editing” to provide almost instantaneous results.

On the quality side, translation errors and problems will be identified by AI checkers and automatically routed for automatic recovery and fixing. The time between job arrival and completion will reduced in some cases to seconds. These “micro-jobs” will be driven by the requirement for new content to be translated in effectively “real time”. This fast system will be required for content such as blogs, wikis, live user forums, reviews, internal corporate content, help chat lines etc.

What should we expect from KantanMT in the next few months?

We’re working on a new type of Neural Network that will provide even better translation outputs than before, with a significantly reduced training time. These new networks are already in testing with one of the largest eCommerce companies; so, stay tuned for further news of this major step forward in the evolution of NMT.

Additionally, we have figured out a way of measuring the quality of an automatically generated translation. This Quality Estimation Score system was developed by KantanMT.com for the European Commission. The good news is, we shall be open-sourcing this technology in early 2019.

You’re also going to also see a new, improved version of KantanLQR that will support multi-lingual quality projects. It will give you the means to measure how individual language arcs are performing across your enterprise.

This article first appeared on http://www.argosmulitlingual.com in October 2018: http://www.argosmultilingual.com/blog/shaping-the-path-to-neural-machine-translation-with-tony-odowd

Deep Learning – Is it Simply a Chip Off the Old Block?

Today’s blog is aimed at helping the novice understand the technology that is Deep Learning (DL). To do this, I will need to discuss in-depth Linear Algebra, Statistics, Probability Theory and Multivariate Calculus. Only Joking! Nothing would turn the novice readers off than trying to hack our way through the above complex disciplines. We’ll leave that for the nerds. Today’s blog – like my last on Machine Learning – will try and use an analogy to help explain what is without doubt a very multifaceted, intricate subject to fully master.

For myself, the more I read about Deep Learning, and the more I spoke to the engineering masterminds at KantanMT the more I realised that the discipline of using a Deep Learning model bore a similarity to sculpting. Let me expand: I don’t know to whom this quote is attributed, but for me it certainly describes the methods of Deep Learning:

“The sculpture produces the beautiful statute by chipping such parts of the marble block as are not needed – it is a process of elimination.”

Indeed, I think it was no less than Michelangelo, who when asked about sculpting, said that the angel lay within the marble block; it was simply his job to release it. Michelangelo’s minimalist explanation, and the above quotation, encapsulate in its simplest form what the Deep Learning progression involves. The engineer is the sculpture. The marble block represents the huge block of dense data to be processed.  The act of processing the data is the chipping away of unwanted information by neural networks. The act of fine tuning the deep learning neural engine represents the technique of the sculptor carefully finessing the shape of the emerging form in to a recognisable figure.

In both the role of sculptor and engineer there is a vision of what the ‘fine-tuning’ activity should produce. I am confident that if you as a novice accept this simple analogy you are going someway to grasping the very fundamentals of the Deep Learning process.

mikendavid

As a concept, Deep Learning is less than two decades old. The origin of the expression is attributed to Igor Aizenberg, Professor and Chair of the Department of Computer Science at Manhattan College, New York. Aizenberg studies, amongst other things, complex-valued neural networks.  He came up with the concept of an Artificial Neural Network system based on that of the human neural network – the network of the human brain.

The ‘Deep’ element of the concept refers to a multi-layered processing network of neuron filters. The equivalent process in the human brain is that of information flowing through neurons connected by synapses. In the machine equivalent, artificial neurons are used to fine-tune and refine data as it is passed through the ‘engine’. The process of Deep Learning also learns from experience and can adjust its processes accordingly. In sculpting, it is the equivalent of the experienced sculptor chipping and refining the marble to release Michelangelo’s hidden angel.

Jeff Dean, a Senior Fellow at Google’s ‘System and Information Group’ – the group behind many of Google’s highly sophisticated machine learning technologies – said:

“When you hear the term ‘Deep Learning’ just think of a large neural net. Deep refers to the number of layers typically, and so this is kind of the popular term that’s been adopted by the press.”

For many novices there is a confusion around the terms Machine Learning (ML), Artificial Intelligence (AI) and Deep Learning (DL).  There need not be this confusion as the division is quite simple: Artificial Intelligence is the catch-all term to cover Machine Learning and Deep Learning. Machine Learning is an over-arching term for the training of computers, using algorithms, to parse data, learn from it and make informed decisions based on the accrued learning. Examples of machine learning in action is Netflix showing you what you might want to watch next. Or Amazon suggesting books you might want to buy. These suggestions are the outcome of these companies using ML technology to monitor and build preferences profiles based on your buying patterns.

AI

Deep Learning is a subset of ML. It uses a highly sophisticated, multi-layered, pattern of ‘neurons’ to process huge chunks of data looking to refine the information contained within that data. It takes an abstract jungle of information, as is contained with data, and refines these in to clearly understood concepts. The data used can be clean, or not clean. Clean data is the processing of refining the pre-processed information to remove any clearly irrelevant information. Clean data can be processed quicker than data that has not been cleaned. Think of it as the human brain blocking out extraneous information as it processes what is relevant, and discards what is irrelevant. Something the human brain does every minute of every day.

But why has Deep Learning suddenly taken off so spectacularly? It is because of the ability to train Artificial Neural Networks (ANN) to a level of accuracy when trained with huge amount of data. ANN can synthesise complex non-linear processes with a high degree of accuracy. DL is also becoming predominant because of the following boosters:

  • The emergence of Big Data
  • The increase in computational power
  • The emergence of The Cloud
  • The affordable availability of GPU and TPU
  • The development of DL models using open source code

Today it is estimated that Big Data provides 2.5 quintillion bytes of information per day. Now, if you are like me, you’ll will never have heard of the measure quintillion. Well, apparently, it is 1 million billion. Not that helps give it finer focus!

According to IBM:

“90% of the data in the world today has been created in the last two years. This data comes from everywhere: sensors used to gather shopper information, posts to social media sites, digital pictures and videos, purchase transaction, and cell phone GPS signals to name a few. This data is big data.”

It is safe to say that the amount of data available will only increase over the coming years. Institutions such as the European Union, the United Nations, the World Bank, the World Health Organisation, Social Media companies etc make huge volumes of data available daily, and in multilingual form. The importance of this resource of massive data is underlined by Andrew Ng, Chief Scientist at Baidu, China’s major search engine, who said:

“The analogy to deep learning is that the rocket engine is the deep learning models and the fuel is the huge amounts of data we can feed to these algorithms.”

The advent of Cloud Computing has allowed even small companies to have virtually unlimited storage space, and access to fantastically powerful computational power. Processors of the power of tensor processing unit (TPU) are available via Cloud computing. Some examples of Cloud computing sources would be Amazon’s Web Service, IBM’s SmartCloud or Google’s Cloud.

TPUs were developed by Google to specifically deal with the demands of ANN. Previously, graphics processing unit (GPUs) reduced from weeks to hours the machine learning process. TPUs have speeded up that process exponentially. Without this level of computing power, it is unlikely Deep Learning would be a viable technology.

Finally, Intel is reportedly developing a device called a Neural Stick which they claim will allow companies to bypass the Cloud to do their processing at a local level (i.e. non-Cloud level). This will be a boost to those companies who baulk at the security implications of processing data in a remote location. It will also increase the speed of processing as all the crunching will be done at the local level. Intel say it is their intent to make DL work “everywhere and on every device”. If they succeed, Deep Learning will expand to a huge degree. Interesting times lie ahead for Artificial Intelligence.

Aidan Collins is a language industry veteran. He is Marketing Manager at KantanMT. This article first appeared in Multilingual in the December 2017 edition: https://multilingual.com/all-articles/?art_id=2592

 

 

Forget Nostradamus – Here’s Tony O’Dowd’s IT Predictions for 2019

2019.png

It is that time of year when Janus-faced we look over the year just passed and towards the year about to start to get a sense of how much progress we have made, and what progress may lie before us. It would be true to say that over the last 12 months Artificial Intelligence (AI) has become a norm in our lives and is now part of the vernacular. People have now accepted that their lives interact multiple times a day with AI, and that such technology is becoming ubiquitous within their lives.

And looking forward to 2019, what does it hold for us? Well, if we pay heed to the predictions of Nostradamus, we should brace ourselves for flooding, wars and a strike by a meteor. Nothing there to bring cheer but, assuming we survive all of that, what does 2019 hold for technology trends? Well, this is what we at KantanMT are predicting for the new year ahead:

Artificial Intelligence

AI_2.png

We’ve seen an explosion in the use of Artificial Intelligence in the delivery of Neural Machine Translation during 2018; expect this to continue into 2019 and beyond. AI is the catch-all term to cover Machine Learning and Deep Learning. Machine Learning is an over-arching term for the training of computers, using algorithms, to parse data, learn from it and make informed decisions based on the accrued learning. Examples of machine learning in action is Netflix showing you what you might want to watch next. Or Amazon suggesting books you might want to buy.

Within the localisation industry, the use of AI in the form of Machine Translation (in several forms) has significantly improved translation quality outputs, speeded up translation of huge quantities of data and reduced the price of translation to make it economically viable.

AI refers to computer systems built to mimic human intelligence (i.e. imitating human neural abilities) and to perform tasks such as image recognition, parsing speech forms, discerning patterns from complex data sets, and informing accurate decision making. What’s more, AI can do these tasks faster, cheaper and more accurately than humans. Although AI has been around since the 1950s, it can be truly said that it has now come of age. This maturity has been propelled by the ever-increasing computational power now available in the Cloud.

According to Forbes, five out of six people use AI technology each day. These services include such things as navigation apps, streaming services (Amazon Alexa, Netflix etc), smartphone personal assistants, dating apps and even smart home devices (e.g. remote-activated home security systems). Additionally, AI is used in recommendation engines used by eCommerce sites (Amazon, Netflix etc), to schedule trains, to predict maintenance cycles and for other mission-critical business tasks.

For the localisation industry, AI will become a highly-integrated component of Machine Translation (MT) systems. The role of the human translator will continue evolving to that of an editor of MT texts, rather than translator of raw texts. In addition, pricing models will continue to move from the traditional price per word based on word volumes, to pricing on a time-measured rate.  MT will become an integral part of the standard workflow. The reality of real-time translation – driven by such technology as the Internet of Things (IOT) – will see project managers/editors managing workflows of projects required by customers who need a constant flow of updated information as needed by the IOT. MT will become part of the translation process just as much as CAT did in the past. And, as ever, evolving technology will bring with it a desire for speedier and cost-effective solutions.

Machine Learning

Machine Learning

Machine Learning (ML) will continue to grow as a tool used by most localisation departments as the requirement for the speedy translations of large datasets continues to be a driver in the industry.

ML is a subset of Artificial Intelligence: with ML, computers are automated to learn to do something that they are not initially programmed to do. So, ML is an over-arching term for the training of computers to use smart algorithms to automate actions, to parse complex data and to learn patterns from this learning thus enabling the computer to make informed decisions based on this accrued knowledge. Machine Leaning can be broadly broken down into two types of learning – supervised and non-supervised learning.

For supervised machine learning, the training data is pre-labelled and consists of an aligned input data set and desired output data set. For example, an input data set could be a translation memory. An ML algorithm analyses the training data and maps how to convert future inputs to match the learned, desired output data sets.

Unsupervised machine learning is like supervised machine learning; however, the input data sets are not pre-classified or labelled. The goal of unsupervised machine learning is to find hidden structures in the unlabelled data.

So how does this impact the localisation industry? Well, suppose you want to build a translation system to translate from Zulu to French, without any Zulu-French training data? The answer is, you can combine both supervised and unsupervised approaches to achieve this. You can use an English-Zulu data set in combination with an English-French data set and using unsupervised machine learning, the system can learn how to translate from Zulu into French.

This approach is commonly referred to as ‘Zero-Shot’ machine learning – expect to hear more about this in 2019 for machine translation systems for long-tail languages.

Blockchain

Blockchain

I know what you’re thinking – why have we put Blockchain into this blog? Sure, isn’t that technology used only for Cryptocurrencies such as Bitcoin? Well you’re correct; while blockchain is most widely known as the technology behind Cryptocurrencies, it offers security that is useful in many other ways.

In simple terms, blockchain can be described as data you can add to, but not take away from or change. These ‘blocks’ of data can be ‘chained’ together to create incredible secure data repositories. Not being able to change any previous blocks is what makes it so secure.

This enhanced security is why blockchain is used for cryptocurrencies. It is also why it will play a significant role in localisation where it will used to protect information such as a client’s financial details, and to protect and preserve translation memories; especially in TMs used in distributed translation workflow scenarios.

Edge

Edge Computing

Cloud computing has now become mainstream: most of all global companies now rely on this centralised hosting structure for machine learning and powerful computational power. This Cloud market is dominated by just a few gigantic companies such as Amazon, Microsoft, Google, and IBM. However, now that we’ve been using Cloud Computing for some time, companies have realised that accessing all data from a central repository introduces a time-delay latency, which in turn slows down the delivery of services which can, in turn, increase costs. The “round trip” made by Cloud-based data is seen by many of today’s companies as a hindrance to their business growth.

Technology stands still for no man, and so, for many, the Cloud has reached its peak as a service for some technologies. The Cloud will continue to be used to analyse and process huge swathes of data, but the advent of the Internet of Things (e.g. connected security systems, electronic appliances, vending machines, automated lighting etc), where data processing needs to be high speed, if not real time, demands a different model. So, the logical, and necessary next move is to move this data processing to the Edge. The Edge simply means that data processing is moving from a centralised, far away location to a geographical site closer to the data source. The advent of powerful computer chips which allows such processing to be done locally has expedited this move to the Edge. Indeed, many of today’s Cloud setups automatically look to place the processing of data at the optimum Edge site for that data’s requirements.

So, Edge Computing solves the latency problem by simply moving the data processing closer to home. Closer to home means less time spent uploading and downloading data. Instead of the – centralised storage model, which has hitherto driven AI, companies are moving their data into the “local community” to be processed. This move will undoubtedly make data access much faster and facilitate the growing demand for real-time computing.

How will this impact on localisation: well, in 2019 we can expect to see the Edge model used in domain-adapted machine translation systems, and distributed translation workflows that are designed to meet the increasing demand for data distribution in real-time.

Summary

Summary

We are on the verge of an explosion in the use of AI. An explosion of the very thing that drives many of the vital cogs within the localisation business. This change will redefine many key roles and bring about the reconfiguration and automation of everyday workflow tasks. The inevitable growth of AI, and the implementation of such things as machine learning, will fundamentally re-shape how companies manage translation workflows; the very engine of their work process. Real-time translations will become the norm, where it is required.

We also predict that changes will happen at a human level; for example, the role of the translator will change from that of translator of raw text to that of editor of huge volumes of high-quality MT-produced text. We also believe this will be a beneficial change, allowing translators to increase their capacity and so increase their income. In 2019, we predict that the overall transformation effected by the advent of AI at all levels of the industry will bring with it an increased velocity of production, an improved efficiency in the delivery of translations, and a reduction in the cost of translating huge volumes of data.

We hope you all have a very successful 2019!

Tony O’Dowd CEO, KantanMT

This article first appeared in Multilingual in January 2019: https://multilingual.com/localization-tech-predictions-2019/

 

Joint Study Confirms Kantan TNN Delivers Remarkable Quality Scores

Top

As the localization industry strives at a fast pace to integrate Machine Translation into mainstream workflows to increase productivity, reduce cost and gain a competitive advantage, it’s worthwhile taking time to consider which type of Neural MT provides the best results in terms of translation quality and cost.

This is a question that has been occupying our minds here at KantanMT and eBay over the past several months. The fact is, Neural MT comes in many variants – with the different models available yielding remarkably different quality results.

Overview of Neural Network Types

 The main models of Neural MT are:

  • Recurrent Neural Networks (RNNs) – these have been designed to recognize sequential characteristics of data and use the detected patterns to predict the next most likely sequence. Training happens in a both forward and backward direction; hence, the use of the descriptor recurrent. RNNs have been the predominant neural network of choice by most MT providers.

RNN

Fig 1: Image Courtesy of Jeremy Jordon

  • Convolutional Neural Networks (CNNs) – these are the main type of networks used in computer image processing (e.g., facial recognition and image searching) but can also be used for machine translation purposes. The model exploits the 2D-structure of input data. Training process is simplified and CNNs require less computational overhead to compute models.

CNN

Fig 1: Image Courtesy of Jeremy Jordon

  • Transformer Neural Networks (TNNs) – The predominant approach to MT is based on the recurrent/convolutional neural networks model of connecting the encoder and decoder through an attention mechanism. However, the Transformer Neural Networks model uses only the attention mechanisms aspect (e.g., contextual characteristics of input data). This completely avoids using the recurrence and convolution structures of the other models. This has the effect of simplifying the training process and reducing the computational requirements for TNN modelling.

TNN

Fig 3: Image Courtesy of “The Illustrated Transformer” by Jay Alammar

The eBay NMT Experiment

To determine which model yields the best translation outcomes, eBay and KantanMT collaborated and set up a controlled experiment using the KantanMT platform, which supports all three types of Neural Models.

The language arc English => Italian was chosen, and the domain defined as eBay’s Customer Support content. Each Kantan model variant was trained on identical training data sets which consisted of:

  • eBay’s in-domain translation memory
  • eBay’s glossaries and lists of brand names
  • Supplementary KantanLibrary training corpora

The Test Reference Set was created by the eBay MT Linguistic Team by sampling the eBay Translation Memory to mirror its segment length distribution (e.g., 10% short segments, 30% medium and 60% long).

To provide a comprehensive comparison and ranking of the performance of different models, the translation outputs from the following systems were included in our joint experiment:

  • Kantan TNN (Transformer Neural Network, customized)
  • Kantan CNN (Convolutional Neural Network, customized)
  • Kantan RNN (Recurrent Neural Network, customized)
  • Bing Translate (Transformer Neural Network, generic)
  • Google Translate (Transformer Neural Network, generic)

Human Translation (HT) was also included in this comparison and ranking to determine how neural machine translation outputs compare to translations provided by Professional Translators.

The evaluator was an eBay Italian MT language specialist with domain expertise and experience in ranking and assessing the quality of machine translation outputs.

The following Key Performance Indicators (KPIs) were chosen to determine the comparative fluency and adequacy of each system:

  • Fluency = Fluency determines the translation follows common grammatical rules and contains expected word collocation. This KPI measures whether the machine translation segment is formed in the same way a human translation would
  • Adequacy = Adequacy measures how much meaning is expressed in the machine translation segment. It measures whether the machine translation segment contains as much of the meaning as if it were translated by a human

Each KPI was rated on a 5-star scale, with 1 star being the lowest rating (i.e., No Fluency) and 5 stars being the highest rating (i.e., Human-Level Fluency).

KantanLQR was used to manage the assessment, randomise and anonymise the Test Reference Set, score the translation outputs, and collate the feedback from the eBay MT linguist.

The Results

Results

Our Conclusions

The Custom Kantan Transformer Neural Network (Kantan TNN) performed the best in terms of Fluency and Adequacy. It outperformed RNNs in terms of Fluency by 9 percentage points (which is statistically significant), and 11 percentage points in terms of Adequacy. While there is still some way to go to achieve near-human-level quality (as depicted by the HT graphs), Transformer Neural Networks provide significant improvements in MT quality in terms of Fluency and Adequacy, and they offer the best-bang-for-your-buck in terms of training time and process simplification.

Since this blog was first published, comparative analysis was also carried out for English=>German, English=>Spanish and English=>French language combinations and in all cases Kantan TNNs out-performed CNNs, RNNs, Google and Bing Translate.

Which Neural MT Model Should I Choose?

Top

As the localization industry strives at a fast pace to integrate Machine Translation into mainstream workflows to increase productivity, reduce cost and gain a competitive advantage, it’s worthwhile taking time to consider which type of Neural MT provides the best results in terms of translation quality and cost.

This is a question that has been occupying our minds here at KantanMT and eBay over the past several months. The fact is, Neural MT comes in many variants – with the different models available yielding remarkably different quality results.

Overview of Neural Network Types

 The main models of Neural MT are:

  • Recurrent Neural Networks (RNNs) – these have been designed to recognize sequential characteristics of data and use the detected patterns to predict the next most likely sequence. Training happens in a both forward and backward direction; hence, the use of the descriptor recurrent. RNNs have been the predominant neural network of choice by most MT providers.

RNN

Fig 1: Image Courtesy of Jeremy Jordon

  • Convolutional Neural Networks (CNNs) – these are the main type of networks used in computer image processing (e.g., facial recognition and image searching) but can also be used for machine translation purposes. The model exploits the 2D-structure of input data. Training process is simplified and CNNs require less computational overhead to compute models.

CNN

Fig 1: Image Courtesy of Jeremy Jordon

  • Transformer Neural Networks (TNNs) – The predominant approach to MT is based on the recurrent/convolutional neural networks model of connecting the encoder and decoder through an attention mechanism. However, the Transformer Neural Networks model uses only the attention mechanisms aspect (e.g., contextual characteristics of input data). This completely avoids using the recurrence and convolution structures of the other models. This has the effect of simplifying the training process and reducing the computational requirements for TNN modelling.

TNN

Fig 3: Image Courtesy of “The Illustrated Transformer” by Jay Alammar

The eBay NMT Experiment

To determine which model yields the best translation outcomes, eBay and KantanMT collaborated and set up a controlled experiment using the KantanMT platform, which supports all three types of Neural Models.

The language arc English => Italian was chosen, and the domain defined as eBay’s Customer Support content. Each Kantan model variant was trained on identical training data sets which consisted of:

  • eBay’s in-domain translation memory
  • eBay’s glossaries and lists of brand names
  • Supplementary KantanLibrary training corpora

The Test Reference Set was created by the eBay MT Linguistic Team by sampling the eBay Translation Memory to mirror its segment length distribution (e.g., 10% short segments, 30% medium and 60% long).

To provide a comprehensive comparison and ranking of the performance of different models, the translation outputs from the following systems were included in our joint experiment:

  • Kantan TNN (Transformer Neural Network, customized)
  • Kantan CNN (Convolutional Neural Network, customized)
  • Kantan RNN (Recurrent Neural Network, customized)
  • Bing Translate (Transformer Neural Network, generic)
  • Google Translate (Transformer Neural Network, generic)

Human Translation (HT) was also included in this comparison and ranking to determine how neural machine translation outputs compare to translations provided by Professional Translators.

The evaluator was an eBay Italian MT language specialist with domain expertise and experience in ranking and assessing the quality of machine translation outputs.

The following Key Performance Indicators (KPIs) were chosen to determine the comparative fluency and adequacy of each system:

  • Fluency = Fluency determines the translation follows common grammatical rules and contains expected word collocation. This KPI measures whether the machine translation segment is formed in the same way a human translation would
  • Adequacy = Adequacy measures how much meaning is expressed in the machine translation segment. It measures whether the machine translation segment contains as much of the meaning as if it were translated by a human

Each KPI was rated on a 5-star scale, with 1 star being the lowest rating (i.e., No Fluency) and 5 stars being the highest rating (i.e., Human-Level Fluency).

KantanLQR was used to manage the assessment, randomise and anonymise the Test Reference Set, score the translation outputs, and collate the feedback from the eBay MT linguist.

The Results

Results

Our Conclusions

The Custom Kantan Transformer Neural Network (Kantan TNN) performed the best in terms of Fluency and Adequacy. It outperformed RNNs in terms of Fluency by 9 percentage points (which is statistically significant), and 11 percentage points in terms of Adequacy. While there is still some way to go to achieve near-human-level quality (as depicted by the HT graphs), Transformer Neural Networks provide significant improvements in MT quality in terms of Fluency and Adequacy, and they offer the best-bang-for-your-buck in terms of training time and process simplification.

Since this blog was first published, comparative analysis has also been carried out for English=>German, English=>Spanish and English=>French language combinations and in all cases Kantan TNNs out-performed CNNs, RNNs, Google and Bing Translate.