A Short Introduction to the Neural Machine Translation (NMT) Model

We Saw in my previous blog that the first idea for Statistical Machine Translation was introduced by Warren Weaver as far back as 1947. As with many innovations, the idea was way ahead of the technological ability to make it happen. But with the growth of super powerful computers, the introduction of the Cloud as a repository for these super-computers, and the provision of these to LSPs at an affordable price permitted the theory of SMT as a solution from translation of languages to become commonplace.

But as with all technological breakthroughs, once the tipping point is reached things grow exponentially. While SMT could provide a translation model – albeit in a limited capacity – scientist began to look at the next iteration of machine translation – Neural Machine Translation.

A generic neural network is a real or virtual computer system that is designed to mirror the brain in its ability to “learn” to analyse bodies of complex data. Neural networks, particularly those with multiple hidden layers (hence the term ‘deep’ – the more layers, the deeper) are remarkably good at learning input->output mappings.

Such systems are composed of input nodes, hidden nodes, and output nodes that can be thought of as loosely analogous to biological neurons, albeit greatly simplified, and the connections between nodes can be thought of as in some way reflecting synapse connections between neurons in the human brain. As stated, the ‘deep’ element of the concept refers to a multi-layered processing network of neuron filters.

In the human process, instructions and information flows from the brain through neurons connected by synapses. In the machine translation equivalent artificial neurons are used to fine-tune and refine input data as it is passed through the translation ‘engine’ with a view to achieving a predicted output. In addition, the process called Deep Learning learns – as does the human brain – from experience and can adjust and remember processes accordingly. Consequently, the more an engine is used the more refined and powerful it becomes. The process becomes a life-long R&D development.

So, in translation, the language engine uses deep neural networks to essentially predict the next word in a sequence. The Neural Network is built using a bi-lingual corpus and is trained on full sentences. (These are referred to a sequences.) This is important, as NMT fully understands what a sentence is, whereas SMT only understands what a phrase is. This shift to training Neural Networks on sequences means that we eradicate almost all the syntactical and grammatical errors frequently found in SMT outputs.

For example, NMT systems make significantly fewer errors when compared to SMT systems:

(NOTE: every graph below 0 is the percentage decrease in error types for NMT systems compared to SMT systems.)

Therefore, NMT provides a demonstrable improvement in machine translation outputs, resolving many of the problems associated with the statistical machine translation (SMT) model. Effectively, in the last two years it has solved many of the translation shortcomings of the SMT system; deficiencies that SMT users have been trying to resolve for the last two decades. NMT-based R&D has been moving with a speed far out-pacing the slow pace of SMT development.

Amazingly, Google managed to condense in to four years the development of its NMT solution. That compares with 15 years it spent on developing and refining statistical machine translation (SMT). This quantum leap in output quality for NMT systems is the result of being able to train the translation models on ‘sequences’ (i.e. full sentences) rather than words, and the use of deep learning to train and refine the neural networks on an ongoing basis.

So, a Neural Network is a mathematical model seeking to imitate how the human brain functions. Obviously, scientists can only do this in a modest way. The human brain consists of approximately 100 billion neurons, and these are highly connected to other neurons creating super-complex relationships. In a Neural Network a neuron is a ‘word’ within a sequence (sentence), and each word is highly connected to words within the same sequence creating a complex web of grammatical and syntactical relationships.

By creating these neutral models with lots of sequences, we can build up an elaborate network of word relationships. We can use this network to predict translations within the context of the training data used to build the translation model.

To determine if a model is good at predicting translations (relative to the training data), we use a mathematical number called Perplexity. We want to build networks with a Perplexity as low as possible. If we achieve this, we can predict good quality translations that contain fewer syntactical and grammatical errors when compared to SMT systems.

These highly efficient models can then provide fast and fluent translations, and at economically affordable prices. Today, over 90% of the daily traffic on the KantanMT platform is processed by our NMT services. Slator.com has reported that the number of LSPs now using NMT has quadrupled since 2018. This gives an indication of the high regard customers have for the efficacy of the NMT platform. The powers of deep learning, the availability of huge amounts of data and the ever-increasing processing power of computers makes it inevitable that this branch of artificial intelligence will be with us for the long haul.

One of the concomitant effects of the introduction and widespread use of viable machine translation solutions within LSPs is the inevitable re-organising of the translation process, and the realignment of existing jobs to fit with the newest technology trends. I will look at this in my next blog and try to predict how job roles might evolve over the coming years.

Aidan Collins, Marketing Manager at KantanMT.