So, What’s the Big Deal About Big Data Then?

It started with the flu: not just any old strain of flu, but a virulent virus known by the name of H1N1. The year was 2009 and a public health epidemic was rocking the USA. And the authorities were sure if things were not controlled it could well become a pandemic (i.e. a country- or world-wide epidemic). The problem was the authorities were chasing the spread of the disease. By the time they had identified an area where it might appear it was already too late. The challenge then was how do you get ahead of the spreading disease, how do you treat it like a forest fire and surround it by a fire line that would stop it spreading? It seemed an impossible task. Step in Big Data and Google. No company in the world could do what Google could do in 2009. Google handled 3 billion search terms every day and had the huge processing power to do it. (Source: Big Data, Viktor Mayer-Schonberger & Kenneth Cukier).

Then someone had the brilliant idea of harnessing this intelligent to try and identify the path the spread of the epidemic might be taking, and as close to real-time as was possible under the circumstances. The idea was to identify search terms that people would use if they were worried about getting a flu or were beginning to feel the symptoms of the flu. The search words could be identified and the computational might of Google would allow them to identify 45 search terms that would set off an alarm, but more importantly it would allow them to identify the area where the people where located who were doing the searching. Suddenly, instead of chasing a spreading flu, the health authorities were able to identify probable hot spots and quickly deploy health professionals to treat the population ahead of any arriving flu bug. They were at last able to throw down a fire line and contain the spread. The pandemic would not happen – this time. This was the first time Big Data was harnessed in such a fashion. A new industry was born, and it has continued its relentless growth right up the present day. And it’s not about to disappear anytime soon.

Big Data.jpgSo, what in a few words is Big Data? Well, the Big refers to the volume. Digital data has existed since the first bytes were input into a computer. Many computers had very large volumes of data on them. Think of all of the photos you have on hard drives, all of the documents you have created, all of the emails you have written, all of the Tweets you have tweeted, all of the Facebook messages you have posted and you are beginning to get an idea of the personal data you have created and are continuing to create, daily. Not multiply that the amount of people in the world who are doing the same thing. Here’s a figure for the active users of Facebook per day – 1.56 billion. (Source: Facebook DAU, March 2019). Add to that Twitter, Instagram, LinkedIn etc and you see where we are going with the numbers. Colossal! For me the numbers are incomprehensible, but I will share them with you: are you strapped in? Well, according to IBM there exists today 2.7 zettabytes of data online. A zettabyte is 1 billion terabytes – are you any clearer? To say that is a helluva lot of information would be something of an understatement.

That is one humongous potential source of business intelligence which could render an enormous variety of information and at a comparatively rapid speed. Using the right equipment, methodology and specially trained data scientists that information mother lode could be sliced, diced and parsed to uncover very valuable information. And every global company worth their salt wants to do just that. And the reason for that? – it is estimated that today’s online spending is $50 trillion per annum. Any company wanting even a fraction of that needs to have the ability to identify markets, trends, sentiments, opportunities and to do it damn quick. Have your grey cells exploded yet? Well, here’s one last figure for you – it is estimated that the volume of online business data doubles every 1.2 years! (Source: Users Bring Real Value to Big Data Machine Translation, Wired).

Yet what made all this possible. Why now in the 21st century did it become a phenomenon? Well, the answer to that is the incredible explosion of the computational of computers. The development of fibre optic cable which allowed data to travel at fantastic speeds and the birth of the Cloud – a seemingly unlimited storage somewhere out there. Tie these elements to get and you get a completely different computation paradigm that hitherto existed. In addition to these technological rockets you had a change in mindset as to how Big Data could be used. It was as though a treasure chest (or Pandora’s box?) had been opened and made available to those who had the savvy as to how to exploit this global gift. The key word is global, as that is the key challenge. If a company truly wants to grab a slice of that $50 trillion-dollar market it needs to be able use online data, interrogate its own data collection, all of which exists in a multiple of languages. That is the reality of a global market. Only one third of those online are English speakers. It is estimated the other two thirds are covered by 45 other languages.

Enter Machine Translation – a technology that has existed since the 1960s and that has seen huge growth and refinement in the last 10 years. If a company wants to be truly global it must develop a way of handling huge volumes of data in multilingual formats and at fantastic speed – some even need it real time. The shear volume of what needs to be translated – to some degree or other, and not always perfectly – is, as we have seen – phenomenal. Machine Translation is the tool of choice for doing that. Not every translation needs to 100% perfect – companies like law firms need only “gist” translations. Other companies employing online chatlines need similar services – a translation with a quality that gets the message across. However, there are those who need perfection (if that exists – see my blog above on this). That too is being supplied more and more by MT AND Human Translators (HT). The latter will never disappear as a vital part of the translation equation. MT and HT are an essential partnership.

MT algorithms can crunch huge volumes of data and tremendous speeds and to an increasingly high-quality level. But we still need HT to bring the product to the accepted quality level (that can vary from customer to customer). The industry is giving shape to this new partnership translation paradigm. HT will always be a vital part of the translation/localisation industry. MT does not threaten their hard-earned status. I predict that the earning power of translators will rise over the coming short while. In theory, and I believe it will be in practice, translators will have the option to access work online 24/7/365. Only sleep will prevent them from accessing work when and where they want it. Things are developing rapidly behind the scenes and I believe that soon sky’s the limit for all translators out there.

Aidan Collins, Marketing Manager at KantanMT.

The Roar of NMT Engines is Growing, Says WWW.Slator.com

I am not a great fan of Formula 1 racing. For me it is a lot of going around in circles interspersed with some moments of confused drama. But the one part of it I do enjoy is the moment when all of those powerful machines are lined up on the starting grid. The highly tuned turbo-charged engines are roaring ready to spring forward at tremendous speeds when unleashed. It is wonderful to behold such a superb collection of state-of-the-art technology, brilliantly designed and programmed to achieve the pinnacle of success.

Well, according to the http://www.slator.com 2019 NMT industry report the domain of turbo-driven engines is not only to be found in Formula 1 racing. Apparently, developers of neural machine translation solutions are also “turbo-charging” their engines as they attempt to capture huge swathes of business in an ever-expanding market.

Turbo-Diamond-Blades (1).png

The report, published, earlier this year, titled “Neural Machine Translation Report: Deploying NMT in Operations”, says that there are now a dozen global tech companies “aggressively” pursuing enhanced solutions in machine translation and natural language processing. The list of global companies pouring millions into high-tech research reads like a Who’s Who of the IT world; Microsoft, Amazon, Salesforce, eBay and Facebook are just some of those who have poured millions into developing the technology.

For some – such as Microsoft and Amazon – the rationale behind their initial investment was self-serving, driven by internal needs to handle billions of words of their own translation projects. However, these global companies now see the advantages of taking this same self-built technology and monetizing it as a product they can now sell to smaller companies and also make available to individual users around the globe.

In parallel to this corporate push is a rapidly expanding level of research into the technology across many colleges in the USA, China and Europe. A chart in the report illustrates clearly this phenomenal growth, revealing that there was a notable rise in papers on NMT published in 2018. That year saw the publication of 391 papers, almost double that of 2017 and a six-fold increase of the 2016 numbers. Asia is to the forefront in the development of the technology.

Indeed, some Chinese academics controversially argue that the whole NMT industry was started by them and is now being propelled by their research. Undoubtedly, the exponential growth of the technology has been remarkable, growing from only a handful of champions just five years ago to a plethora of them today. The report talks of the current development spurt as being the “Third Wave” of MT technology. It suggests that with the huge investment by so many global companies – and growing levels of academic research – that this is a technology with a promising future.

As with the growth of any technology, there is the beginning of diversification processes as companies seek unique ways to monetize their product in an effort to recoup some of their investment. The industry is being driven by the global goliaths whose researchers were initially tasked with creating a solution for internal needs. These solutions are now seen as an opportunity for these corporations to develop another line of revenue. To harness this growing potential many of the tools’ companies, the companies behind such products as Catalyst, MemQ, Déjà Vu, Across, Wordfast etc have created APIs so that their product has the ability to interface smoothly with these NMT solutions.

Development efforts have also gone in many directions depending on the perceived needs of different markets. Some developers have concentrated energies in creating more and more language pairs; Google, for example, supports 50 pairs bidirectionally. Microsoft supports 41 pairs but uniquely allows its users to upload data if the do not have bilingual data. Amazon takes the lead by supplying up to 127 language pairs but is quite restrictive in how its users employ their engines.

One of the things all of these technology suppliers have it common is a menu of charges, although each menu differs as to when the charges kick in, and as to how is much is free. The report briefly highlights how many LSPs are struggling with how they should charge their customers for NMT services. Most have opted to go with the traditional per word method. However, the report suggests that this is something that might evolve into another method of charging as the service matures.

One thing that does jump out of the report is the number and scale of the Asian companies who are serious players in this field. In fact, Chief Scientist at Chinese search giant Baidu, Andrew Ng, claims that “Neural machine translation was a technology first pioneered and developed and shipped in China” and that US companies only came in “well after Baidu.” A comment that has stung many of the developers in the USA. The report seems to suggest there is a bit of a technology race between East and West development giants.

And as in the case of the huge sharks of the ocean, there are “pilot fish” companies swimming around the huge NMT companies seeking to feed off their efforts. This sub-industry is made up of companies who have seen an opportunity for supplying data-hungry developers with the billions of words they need from different market-types. Other enterprising companies have developed training courses aimed at addressing the growing demand for qualified post-editors. There has also been an increase in the number of “boutique” suppliers. These are smaller companies who offer the NMT technology to businesses who have neither the capital nor time to invest in developing their own NMT solution.

The report quotes several CEOs who argue that the technology will only become truly efficient when a comprehensive, qualitative form of integrated testing technology is available. The report interviews CEOs across a range of user companies and gives interesting feedback on their experiences in using NMT. Many of the reports are upbeat and give optimism that the technology is going in the right direction. A few of the CEOs have commented that after some initial reluctance many translators are now seeing that NMT is something that can help expand their earning power and are beginning to feel comfortable about where they now fit in the new L10n workflow.

The report is an easy read. It is not heavy on jargon, and it gives an interesting insight into the industry. Although the authors themselves don’t declare that machine translation technology is the future, taken in the round, it clear that NMT has become established, is making huge strides and is expanding from translating text to also translating voice. Clearly, this is an exciting time to be involved in the world of neural machine translation.

Aidan Collins, Marketing Manager at KantanMT.

The Roles They are A-Changin – Good News is on the Way for Translators

When I started in L10n in 1990 the term L10n did not exist. That came later, after people got tired typing out the full word localisation (or localization). That’s the thing about L10n, time is always of the essence and ways to speed things up are always on the agenda. Over its history, L10n has always been driven by the imperative of dealing with projects quicker, cheaper and while still guaranteeing a high quality standard. In recent years, it is the challenge of dealing with huge volumes of data that has driven the latest technological revolution that is helping to reshape the L10n world. But more of that anon.

In earlier years projects consisted of three basic inter-related elements: documentation (user guides etc), translation (text from software, help-files and documents) and engineering (at that time more problematic and cumbersome, so often restricted in scope). In early years, engineers had nowhere near the plethora of tools they now have at their disposal. Some larger software companies developed proprietary internal tools to help software L10n, but these were restricted to internal staff and they did the heavy lifting on the engineering front.

Only when Catalyst came along (courtesy of the owner of KantanMT!) did engineering within L10n companies throw of its shackles (so to speak) and was able to blossom and grow as a discipline. Almost in parallel with that product we had the release of Trados. At first scorned by purist translators, in an almost Luddite-like fashion, it soon became an integral part of the L10n process. Slowly, translators saw the roles evolving to include the creation of glossaries and translation memories, and the management and coordination of these valuable translation memory assets.

In desk top publishing the PC market-share was pretty much dominated by Ventura, later bought from its original owners and shipped globally as Xerox Ventura. It was an effective workhorse and it dominated the market for the early part of the 1990s. Then came competitors like FrameMaker, which was Ventura on steroids, and then PageMaker and Quarkxpress for PC and the online publishing world took off, with a boost in its ability to handle and integrate complex graphics into documents and marry graphics, text and sound in to one product. The role of the desktop publisher went from someone churning out flat, grey pages to one where the skills of graphical engineering was required for that expanded role.

The history of localisation is marked with key technology milestones such as the introduction of Trados and Catalyst, and the expansion of documentation to become complex and online. Every milestone has also seen the evolution of the roles of the L10n practitioners, all the way from sales (selling new product offers), to engineering (will we build or deliver translated strings only), to DTP (will we go with the bells and whistles, or flat text?), to Project Management (how many more plates to the expect me to spin – cue workflow systems?), to finance (what to heck do we charge for translating strings? The same as text?), to the IT guys (more hardware please!), to the CEO (how do I finance and control this speed of expansion?). One truism is the L10n industry is always in a state of evolution and that is still the case.

Today, the technology driving things is machine translation (and the different flavours of it). MT is a technology that lay dormant because of the inability of companies to deal with its technical complexity and to fund the sophisticated machinery needed to make it mainstream. Then came a thing called the ‘Cloud’ (and now, its cousin the ‘Edge’) mixed with two other potent ingredients – powerful, economical hardware, and huge swathes of data for translation. Added to that already heady potion is the reality that global companies now want to speak to their customers in their own language, and almost at real-time, and you have an industry that has just been loaded with rocket fuel and is pointed on an upward trajectory.

sea flight sky earth
Photo by Pixabay on Pexels.com

This technology milestone is also in the process of moulding and driving new roles, and interestingly – a new earning paradigm for translators. The latest MT drive is something that is causing translators in particular (and maybe quite a few CEOs) to fret about the viability of their profession. The good news is that any rumours of the pending extinction of human translators is just that – rumours without substance. I doubt in my lifetime (admittedly not as long as it used to be) will we see translation happening without supremely qualified translators somewhere in the mix. Indeed, and I make this prediction, translation and the role of translators is about to become more lucrative – if they choose to make it so.

And there won’t be long to wait for my prediction to become a reality. The days of translators having non-earning, down-time while they await the delivery (maybe next week, or maybe the week after or …) of the next major project, the one that will pay the mortgage, is over. I’ll go as far as to predict that translators will soon be in a position to earn money 24x7x365, should they choose. I believe because of MT and the Cloud their halcyon earning days await them.

And I can assure you this; my grey head is not in the clouds on this one.

Watch this space for further developments!

Aidan Collins, Marketing Manager at KantanMT.

A Short Introduction to the Neural Machine Translation (NMT) Model

We Saw in my previous blog that the first idea for Statistical Machine Translation was introduced by Warren Weaver as far back as 1947. As with many innovations, the idea was way ahead of the technological ability to make it happen. But with the growth of super powerful computers, the introduction of the Cloud as a repository for these super-computers, and the provision of these to LSPs at an affordable price permitted the theory of SMT as a solution from translation of languages to become commonplace.

But as with all technological breakthroughs, once the tipping point is reached things grow exponentially. While SMT could provide a translation model – albeit in a limited capacity – scientist began to look at the next iteration of machine translation – Neural Machine Translation.

A generic neural network is a real or virtual computer system that is designed to mirror the brain in its ability to “learn” to analyse bodies of complex data. Neural networks, particularly those with multiple hidden layers (hence the term ‘deep’ – the more layers, the deeper) are remarkably good at learning input->output mappings.

Such systems are composed of input nodes, hidden nodes, and output nodes that can be thought of as loosely analogous to biological neurons, albeit greatly simplified, and the connections between nodes can be thought of as in some way reflecting synapse connections between neurons in the human brain. As stated, the ‘deep’ element of the concept refers to a multi-layered processing network of neuron filters.

In the human process, instructions and information flows from the brain through neurons connected by synapses. In the machine translation equivalent artificial neurons are used to fine-tune and refine input data as it is passed through the translation ‘engine’ with a view to achieving a predicted output. In addition, the process called Deep Learning learns – as does the human brain – from experience and can adjust and remember processes accordingly. Consequently, the more an engine is used the more refined and powerful it becomes. The process becomes a life-long R&D development.

So, in translation, the language engine uses deep neural networks to essentially predict the next word in a sequence. The Neural Network is built using a bi-lingual corpus and is trained on full sentences. (These are referred to a sequences.) This is important, as NMT fully understands what a sentence is, whereas SMT only understands what a phrase is. This shift to training Neural Networks on sequences means that we eradicate almost all the syntactical and grammatical errors frequently found in SMT outputs.

For example, NMT systems make significantly fewer errors when compared to SMT systems:

NMT v SMT

(NOTE: every graph below 0 is the percentage decrease in error types for NMT systems compared to SMT systems.)

Therefore, NMT provides a demonstrable improvement in machine translation outputs, resolving many of the problems associated with the statistical machine translation (SMT) model. Effectively, in the last two years it has solved many of the translation shortcomings of the SMT system; deficiencies that SMT users have been trying to resolve for the last two decades. NMT-based R&D has been moving with a speed far out-pacing the slow pace of SMT development.

Amazingly, Google managed to condense in to four years the development of its NMT solution. That compares with 15 years it spent on developing and refining statistical machine translation (SMT). This quantum leap in output quality for NMT systems is the result of being able to train the translation models on ‘sequences’ (i.e. full sentences) rather than words, and the use of deep learning to train and refine the neural networks on an ongoing basis.

So, a Neural Network is a mathematical model seeking to imitate how the human brain functions. Obviously, scientists can only do this in a modest way. The human brain consists of approximately 100 billion neurons, and these are highly connected to other neurons creating super-complex relationships. In a Neural Network a neuron is a ‘word’ within a sequence (sentence), and each word is highly connected to words within the same sequence creating a complex web of grammatical and syntactical relationships.

NMT

By creating these neutral models with lots of sequences, we can build up an elaborate network of word relationships. We can use this network to predict translations within the context of the training data used to build the translation model.

To determine if a model is good at predicting translations (relative to the training data), we use a mathematical number called Perplexity. We want to build networks with a Perplexity as low as possible. If we achieve this, we can predict good quality translations that contain fewer syntactical and grammatical errors when compared to SMT systems.

These highly efficient models can then provide fast and fluent translations, and at economically affordable prices. Today, over 90% of the daily traffic on the KantanMT platform is processed by our NMT services. Slator.com has reported that the number of LSPs now using NMT has quadrupled since 2018. This gives an indication of the high regard customers have for the efficacy of the NMT platform. The powers of deep learning, the availability of huge amounts of data and the ever-increasing processing power of computers makes it inevitable that this branch of artificial intelligence will be with us for the long haul.

One of the concomitant effects of the introduction and widespread use of viable machine translation solutions within LSPs is the inevitable re-organising of the translation process, and the realignment of existing jobs to fit with the newest technology trends. I will look at this in my next blog and try to predict how job roles might evolve over the coming years.

Aidan Collins, Marketing Manager at KantanMT.

A Short Introduction to the Statistical Machine Translation Model

The first ideas of Statistical Machine Translation were introduced by Warren Weaver as far back as 1947. He explained that language had an inherent logic that could be treated in the same way as any logical mathematical challenge. He contended that logical deduction could be used to identify “conclusions” in the target (untranslated) language based on what already existed in the source (translated) language. With the advent of the cloud, and affordability of powerful computers, the theory of Statistical Machine Translation (SMT) became a practical option.

Therefore, SMT is a machine translation paradigm where translations are generated based on statistical models, whose parameters are derived from the analysis of bilingual text corpora (text bodies) – a source text of translated material and a target text of untranslated material.

Statistical machine translation starts with a very large data set of approved previous translations. This is known as a corpus (corpora is plural) of texts that is then used to automatically deduce a statistical model of translation. This model is then applied to untranslated target texts to make a probability-driven match to suggest a reasonable translation.

Where are these large data sets sourced? Well, over many years most large global organisations, for example the EU, UN, World Bank, World Health Organisation etc develop enormous domain-specific corpora, in multiple source/target language combinations. Many of these have originated as human-translated texts. These are then made accessible to machine translation users/developers to further refine and use for MT purposes. The process is an evolutionary model with each existing corpus being refined, added to and updated on an ongoing basis.

This SMT paradigm is based on what’s known as ‘probabilistic mathematical theory’. Such theory suggests the chances (probability) of something occurring depending of different variables likely to influence the event. For example, tossing a dice gives the probability of one dice being a certain number is 1/6. So, a gambler has a one in six chance of landing on their chosen number. The probability of two dice being the same number is 1/6 x 1/6 = 1/36. In poker, the professional gambler keeps mental track of the cards being played and using probability theory he/she decides whether a gamble has a good chance of winning based on the odds they have calculated in their head.

In SMT, the MT engineer builds a Translation Model using the frequency of phrases appearing the training corpus into a table. This table stores the phrase and the number of times this repeats over the entirety of the training corpus. The more frequently a phrase is repeated in a training corpus, the more probable the target translation is correct. Each phrase (stored in the Phrase Table) can range from one to five words in length. This phrase table is referred to as the Translation Model.

Consequently, the MT engineer is using a probability model to hit on the right source/target translation combination. The process is evolutionary as the corpus is refined and adjusted after each translation run to eliminate/adjust any anomalies. The more frequently the corpus is used, the more perfected it becomes. The development of the corpus quality is a continuous organic R&D process of a highly valuable translation asset.

Additionally, the MT engineer builds a secondary model using the target translation data. This model helps determine the order in which the engineer needs to assemble phrases (from the Phrase Table) in order to optimise translation Fluency i.e. to give the translated text its natural language flow. Fluency ensures that literal translations (i.e. the words are all there, but the sense of the sentence is not) are replaced by a more natural sounding translation.

In order to translate a source sentence, the MT engineer goes through the following decode process (as outlined in the diagram below):

SMT Model

  1. He/she breaks the source language sentence into phrases. (You can see the phrase as individual grey blocks in line two of the diagram.)
  2. He/she then looks up each of these phrases in the Phrase Table/Translation Model and generates the target language translations. (You can see this in line three of the diagram.)
  3. The engineer then uses the Phrase Table/Translation Model to re-order these phrases to optimise translation Fluency. (You can see this reordering in line four of the diagram.)

In my next blog, I will give a short introduction to the increasingly popular MT model Neural Machine Translation.

Aidan Collins, Marketing Manager at KantanMT.

NMT has arrived – can we now agree on what “quality” means?

In a blog I wrote in May of 2018 I commented that when it comes to IT nothing develops faster than new technology (“In Technology, Things Never Slow Down – Quite the Opposite”). It seems that once a new technology is born, its exponential growth becomes unstoppable. And that – I would argue – is exactly where we are with machine translation. We are watching an amazing exponential growth, with Neural Machine Translation (NMT) now the new, hot kid on the block.

Undoubtedly, NMT is now mainstream and LSPs are rapidly adopting it as a viable translation technology option. Machine Translation has followed the exponential growth path of other technological innovations. Amazingly, Google managed to condense in to four years the development of its NMT solution. That compares with 15 years it spent on developing and refining statistical machine translation (SMT). In November 2016, Google give up on developing SMT and unveiled its newly developed NMT system (GNMT). Google announced that their new NMT technology was already able to translate eight language pairs. Within 12 months GNMT was supporting another 90 language pairs. The exponential growth was astonishing: the new kid on the MT block had arrived and was soon centre stage.

TNN

A Transformer Neural Network

 Image Courtesy of: “The Illustrated Transformer” by Jay Alammar

In tandem to this rapid NMT development, the emergence of relatively cheap, super-powerful computers – coupled with practically unlimited storage capacity due to the emergence of the “Cloud” and the Cloud’s more recent iteration “the Edge” – the way was opened for most language companies to adopt complex NMT technology as a standard customer service offer. The way has been created for the language industry to use automated translation technology to process even more content, in to more languages, and faster than ever before.

In 2018, a www.Slator.com report revealed that in just 12 months the number of LSPs offering NMT as a service had quadrupled from a low base of five to 20. Perhaps more pertinent than the numbers are the status of the LSPs adopting the technology as their go-to language solution for bulk translation projects. Significantly too, NMT is now the standard solution of major global entities in both the private (e.g. Amazon, Google, Microsoft etc) and public sectors (e.g. the EU).

So, as LSPs move speedily to integrate Machine Translation into their service offerings and workflows it’s perhaps a good time to look at the ongoing debate about “quality” in NMT production. Quality has always been a hot topic and fiercely debated within the industry. Whole conferences, often in exotic locations, have been given over to passionate debates trying to define the “gold standard” for quality.  But the debate has one inherent flaw, there’s no consensus on what constitutes “quality”.

“Central to these issues is the acceptance that there is no longer a single ‘gold standard’ measure of quality, such that the situation in which MT is deployed needs to be borne in mind, especially with respect to the expected ‘shelf-life’ of the translation itself.

Source: Andy Way “Quality expectations of machine translation

The conundrum is what exactly constitutes “quality”, how is it measured, and who decides what is universal qualitative measure is acceptable?  As with all empirical matters, it is not surprising that this vigorous debate is still ongoing, fuelled by the growing use of NMT. Slator.com’s 2018 NMT shone a light on the debate and gave the views of different industry LSP players on this complicated subject.

The current discussion primarily focuses on the efficacy of technical quality testing using such standards as the BLEU score. BLEU is not seen by many as a suitable measuring standard. Its use simply followed because its existence pre-dated the emergence of NMT. Again, to quote Andy Way:

“When it comes to NMT, however, the improvements over predecessor MT— not to mention the differences in design (i.e. NMT usually runs on character-level encoder-decoder systems) — makes BLEU even less suited to quantifying output quality.”

Indeed, there are those in the industry who would declare BLEU has now become obsolete and should be replaced by measures such as BEER, chrF and characTER (see Slator.com).  These technologies all measure quality at a more granular level, unlike BLEU which measures at a sentence level. Then again, there are those who argue that the ultimate assessment of quality can only be done by a human: a linguistic, subject-matter expert. Underlying this belief is the assertion that there needs to be within the industry an accepted gold standard. An “absolute” standard that can only be achieved by a human. And is this “absolutist” stance that is the Achilles heel of their argument.

The advent of machine translation and post-editing has focused attention on the very nature of quality: Is it proximity to a “gold standard” of perfection or is it characteristic of a product that simply serves its purpose well enough to satisfy the needs of the consumer? In other words, is quality something that should be measured and judged in absolute terms or in relative terms?

Source: The Thorny Problem of Translation and Interpreting Quality: Geoffrey S. Koby, Kent State University, Kent, Ohio, USA; Isabel Lacruz, Kent State University, Kent, Ohio, USA

The same academicians state that the “absolute” stance asserts that some requirements are always understood and absolute and constant and can therefore remain unstated.  Whereas the “relative” stance asserts that in general, the best practice is to explicitly state all requirements as specifications, because they can vary from project to project. For both ends of the specification’s axis, some degree of accuracy and fluency is required. The difference is whether the minimum levels depend on audience or purpose or both.

quality

Image Courtesy of: Pickering Electronics

It is in the last sentence above that the defines the dichotomy in the debate. Absolutist fly the gold standard flag and are unforgiving of less. The relative stance says, no – it’s the intended audience that defines the required quality standard. Their argument can be best explained in the following two examples:

  1. You are a production manager in a law firm. You have been told you need a million words of Discovery Files to be translated from German in to English with two weeks. Only then can the firm begin to understand the evidence placed before them. As a production manager you could contact an LSP and ask them to provide an army of expert translators in law who can deal with the language pair. Good luck with that – and the cost. Or you could ask the question what are we trying to achieve with this translation?

The answer comes back that the lawyers need to have a “gist” of what the files contain, at that point they will decide which files are relevant and have them fully translated. The audience is not demanding a gold standard, they want a basic quality that will allow them to decide on the next translation move. This is undoubtedly a job for NMT in that this technology will satisfy the quality required, will do it within the narrow time frame and will do it without bankrupting the firm. Case closed: NMT wins hands down;

  1. You are the production manager in a company that supplies mission critical technology. Perhaps a medical firm that supplies an IVD product that must be implanted successfully to ensure the survival of patients. The product’s instruction for use cannot have any ambiguity, or error. There needs to be zero errors (although, arguably not possible?). Certainly, this is not a case where you would use solely NMT to produce a raw (unedited) translation. However, you could use both NMT (because of volumes and time pressures) and a linguist/s with the subject-matter expertise. In seriousness, no-one would currently argue NMT alone should be used alone used. There would be a need for a strong human input. It’s a no brainer!

So those who argue that there is alone a gold standard are wrong in their argument. The standard is decided by the audience of the translation. If a company wants real-time translations it should be willing to accept a lesser standard of quality. More and more global companies are looking for this service, so much so LSPs are now embedding NMT in to their workflows with some PMs also finding their job spec changing as they become initiators and monitors of a constant flow of words being translated in real-time and displayed online. Alongside them you also have PMs working with the traditional human translators and/or human and machine translations.

The translation paradigm has changed dramatically in just a short number of years. At the end of the day, companies with an eye to their bottom lines will specified their quality standard. For them, this will be dictated by the audience they are aiming the translation towards. If it is an internal document and is large volume and time sensitive the customer will likely lower their quality needs and accept raw NMT. If the audience is more important, they might opt for both NMT and human editing.

Other companies, as outlined above, will declare they have a zero tolerance of errors and demand the human only track. One thing is clear, it is no longer a binary choice of human versus machine: the new paradigm sees the need for both. The production model is evolving as too are the job functions at all levels within the LSPs. From finance through to production, all will need to fully understand how the paradigm has shifted and restructure accordingly – to snooze is to lose.

Aidan Collins, Marketing Manager at KantanMT

How do You Avoid the Bottlenecks and Save Money at the Same Time?

speed-async

If you had a very valuable and talented member of staff, would you send him/her on an errand to collect something from a supplier; a task that might tie them up for hours? Or, if you had a second option whereby you can request the supplier to call you when the product is ready for collection allowing your staff member to make a timely collection, which option might you take? Most people, I would venture, would choose the second option. However, many would favour having both options. Flexibility is always a good thing to have as part of any workflow.

With KantanMT’s announcement that its Software Development Kit (SDK) has been enhanced by the addition of a new Asynchronous Interface the community of KantanMT users has been given this very flexibility. This development in the SDK provides a high speed, high volume asynchronous programme interface in to both Statistical and Neural MT engines. This option complements the already existing synchronous interface option.

For the uninitiated, the synchronous and asynchronous nature of an Application Programming Interface (API) is a function of the time it takes from the initial request for a service to the return of processed data. In the case of synchronous APIs, the expectation is that there will be an immediate return of processed data. The API requests data and then pauses and waits for the processed data to be returned. This has the impact of tying up that hardware whilst it awaits a result to its request. Hence my analogy above of tying up a valuable resource while he/she waits for the supplier to supply the requested product.

In the case of asynchronous APIs, it proceeds to send data on the basis that a resource may not be immediately available, and therefore, the requested service may not be immediately responded to. The API will receive a call back when the required service is available. The asynchronous method avoids the server (and an engineer?) pausing and lying dormant while it awaits a response, thus allowing that server to be employed in other tasks.

Asynchronous processing is used in the case where the user doesn’t want to halt processing while an external request is being dealt with. A synchronous API will block the caller until it returns thus tying up resources until the action has been completed. On the other hand, an asynchronous API will not block the caller and typically will require a call back that will be executed once the work is completed. Asynchronous requests are useful in maintaining functionality in an application rather than tying up resources while waiting on a request.

The buying of hardware is a big capital investment for many companies, particularly start-ups. And staying on top of the biggest and the best is also an endless drain on company resources. Therefore, server time must be optimised to the nth degree. A tied-up server equates to a wasted resource – hence my analogy of the valuable staff member twiddling his/her thumbs while they queue for a collection.

The provision by KantanMT of the asynchronous API function allows users to reduce spending by optimising server time, so making less need for investment in multiple servers. KantanMT is confident that this addition to its SDK will be welcomed by many of its clients; users who while requiring high volume translations, do not need it done in real-time. The Asynchronous interface will provide a solution to any possible bottleneck in server capacity by queuing non-urgent translation requests to be processed when capacity is available. This, KantanMT is confident, will ensure users can avoid spending on costly hardware, thus reducing outlays and so keeping TCO (Total Cost of Ownership) to a minimum for the KantanMT Community. Having a bypass to a bottleneck is always a good thing to have, especially when it equates to less spending.

Aidan Collins is a language industry veteran. He is Marketing Manager at KantanMT.

KantanMT’s TNN Translation Shown to Produce a Striking Boost in Quality

Top

As the localization industry strives at a fast pace to integrate Machine Translation into mainstream workflows to increase productivity, reduce cost and gain a competitive advantage, it’s worthwhile taking time to consider which type of Neural MT provides the best results in terms of translation quality and cost.

This is a question that has been occupying our minds here at KantanMT and eBay over the past several months. The fact is, Neural MT comes in many variants – with the different models available yielding remarkably different quality results.

Overview of Neural Network Types

 The main models of Neural MT are:

  • Recurrent Neural Networks (RNNs) – these have been designed to recognize sequential characteristics of data and use the detected patterns to predict the next most likely sequence. Training happens in a both forward and backward direction; hence, the use of the descriptor recurrent. RNNs have been the predominant neural network of choice by most MT providers.

RNN

Fig 1: Image Courtesy of Jeremy Jordon

  • Convolutional Neural Networks (CNNs) – these are the main type of networks used in computer image processing (e.g., facial recognition and image searching) but can also be used for machine translation purposes. The model exploits the 2D-structure of input data. Training process is simplified and CNNs require less computational overhead to compute models.

CNN

Fig 1: Image Courtesy of Jeremy Jordon

  • Transformer Neural Networks (TNNs) – The predominant approach to MT is based on the recurrent/convolutional neural networks model of connecting the encoder and decoder through an attention mechanism. However, the Transformer Neural Networks model uses only the attention mechanisms aspect (e.g., contextual characteristics of input data). This completely avoids using the recurrence and convolution structures of the other models. This has the effect of simplifying the training process and reducing the computational requirements for TNN modelling.

TNN

Fig 3: Image Courtesy of “The Illustrated Transformer” by Jay Alammar

The eBay NMT Experiment

To determine which model yields the best translation outcomes, eBay and KantanMT collaborated and set up a controlled experiment using the KantanMT platform, which supports all three types of Neural Models.

The language arc English => Italian was chosen, and the domain defined as eBay’s Customer Support content. Each Kantan model variant was trained on identical training data sets which consisted of:

  • eBay’s in-domain translation memory
  • eBay’s glossaries and lists of brand names
  • Supplementary KantanLibrary training corpora

The Test Reference Set was created by the eBay MT Linguistic Team by sampling the eBay Translation Memory to mirror its segment length distribution (e.g., 10% short segments, 30% medium and 60% long).

To provide a comprehensive comparison and ranking of the performance of different models, the translation outputs from the following systems were included in our joint experiment:

  • Kantan TNN (Transformer Neural Network, customized)
  • Kantan CNN (Convolutional Neural Network, customized)
  • Kantan RNN (Recurrent Neural Network, customized)
  • Bing Translate (Transformer Neural Network, generic)
  • Google Translate (Transformer Neural Network, generic)

Human Translation (HT) was also included in this comparison and ranking to determine how neural machine translation outputs compare to translations provided by Professional Translators.

The evaluator was an eBay Italian MT language specialist with domain expertise and experience in ranking and assessing the quality of machine translation outputs.

The following Key Performance Indicators (KPIs) were chosen to determine the comparative fluency and adequacy of each system:

  • Fluency = Fluency determines the translation follows common grammatical rules and contains expected word collocation. This KPI measures whether the machine translation segment is formed in the same way a human translation would
  • Adequacy = Adequacy measures how much meaning is expressed in the machine translation segment. It measures whether the machine translation segment contains as much of the meaning as if it were translated by a human

Each KPI was rated on a 5-star scale, with 1 star being the lowest rating (i.e., No Fluency) and 5 stars being the highest rating (i.e., Human-Level Fluency).

KantanLQR was used to manage the assessment, randomise and anonymise the Test Reference Set, score the translation outputs, and collate the feedback from the eBay MT linguist.

The Results

Results

Our Conclusions

The Custom Kantan Transformer Neural Network (Kantan TNN) performed the best in terms of Fluency and Adequacy. It outperformed RNNs in terms of Fluency by 9 percentage points (which is statistically significant), and 11 percentage points in terms of Adequacy. While there is still some way to go to achieve near-human-level quality (as depicted by the HT graphs), Transformer Neural Networks provide significant improvements in MT quality in terms of Fluency and Adequacy, and they offer the best-bang-for-your-buck in terms of training time and process simplification.

Since this blog was first published, comparative analysis has also been carried out for English=>German, English=>Spanish and English=>French language combinations and in all cases Kantan TNNs out-performed CNNs, RNNs, Google and Bing Translate.

Shaping the Path to Neural Machine Translation: Interview with Tony O’Dowd

What is Neural Machine Translation (NMT) all about?

Neural Machine Translation is an approach to machine translation that uses large neural networks to produce translations that are more natural sounding and achieve greater levels of fluency. These networks are trained on sequences (or sentences), which means they solve many of the syntactical and grammatical errors previously associated with Phrase-Based Statistical Machine Translation.

With the emergence of relatively cheap, super-powerful computers, coupled with practically unlimited storage capacity due to the emergence of “the Cloud”, we can now compute these complex NMT models in several hours.

These highly efficient models can then provide fast and fluent translations, and at an economically advantageous price. Today, over 90% of the daily traffic on the KantanMT platform is processed by our NMT services. This gives you an indication of the high regard our customers have for the efficacy of our NMT platform.

A few facts about KantanMT

Why is there so much hype around NMT?

Simply because it provides a demonstrable improvement in machine translation outputs, resolving many of the problems associated with the statistical machine translation (SMT) model. Effectively, we have in the last two years solved many of the translation shortcomings of the SMT system; deficiencies that we’ve been trying to resolve for the last two decades! So, you can imagine how excited we are to be able to move with such speed compared to the pace of development we were lumbered with when working with SMT.

An interesting factoid about NMT is that we actually don’t train them using “whole words”, but we in fact train them using “word pieces”. And even if we don’t have parallel training data for a language combination, we can build a zero-shot network that will be capable of producing translations in these languages! It’s completely amazing what we can do once we set up the deep learning approach and throw super-computers at the problem. In Deep Learning (DL) we use highly sophisticated, multi-layered, pattern of ‘neurons’ to process huge chunks of data looking to refine the information contained within that data. The DL process can take an abstract jungle of information (word pieces), as is contained with data, and using the power of super-computation refine the data in to clearly understood language.

Can you imagine how good Neural MT will be in a further two years? Will Moore’s Law of exponential technological growth apply to Neural Machine Translation too? I believe it will. It will be amazing to see then how powerful NMT will be. It is certainly something that excites us here at KantanMT.com.

Which languages have made the greatest progress for NMT?

Any language that has a deep and complex grammatical structure can now be efficiently modelled using Deep Leaning and Neural Networks. For example, take the grammatical characteristic of the humble German verb – under normal circumstances it needs to be positioned at the end of a sentence. That would seem a straight forward enough challenge? However, SMT struggled to position the German verb accurately. To overcome this, we at KantanMT.com used advanced part-of-speech reordering approaches to improve this accuracy. This was a very complex, time-consuming and computationally intensive approach. However, NMT (because we train the engines on full sentences) almost always correctly positions that illusive German verb. This methodology also allows us to meet the challenges of languages such as Hungarian and Finnish. These are now well within our capabilities, allowing us to produce very good translation outputs using NMT.

Where do you see the translation industry in the next 5 years?

What an exciting time to be in the Localization Industry! We are on the cusp of a massive explosion in Artificial Intelligence (AI), which will impact all facets of the localization industry’s workflow and processes.

The industry will use automated translation technology to process even more content, in to more languages, and faster than ever before. Translators should not fear, as they will be the main beneficiary of this transformation. As this technological evolution grows, translators will be able to produce more words per day and consequently, significantly improve their income levels. I envisage a scenario whereby translation from scratch will be viewed as old-school and passé. The translation model will change in the same way as Computer Aided Translation (CAT) transformed the industry for the better. In the new NMT paradigm, the post-editing of a constantly improving machine translation output will be seen and accepted as the modern, progressive way of working. And the industry will be the better for it.

AI will also enable better job matching and candidate selection – so translators will be selected based on their relevant skill sets, domain knowledge and previous job performances. This is not to be feared, as essentially this is they way we choose our dentists and doctors today. AI wil

l become a driver for greater competition and increased professionalism in our industry.

I also see AI being becoming part of the project management workflow system, and the project management role.  PM systems will be expected to handle real-time translation workflows. A system that will combine automated translation and “human touch post-editing” to provide almost instantaneous results.

On the quality side, translation errors and problems will be identified by AI checkers and automatically routed for automatic recovery and fixing. The time between job arrival and completion will reduced in some cases to seconds. These “micro-jobs” will be driven by the requirement for new content to be translated in effectively “real time”. This fast system will be required for content such as blogs, wikis, live user forums, reviews, internal corporate content, help chat lines etc.

What should we expect from KantanMT in the next few months?

We’re working on a new type of Neural Network that will provide even better translation outputs than before, with a significantly reduced training time. These new networks are already in testing with one of the largest eCommerce companies; so, stay tuned for further news of this major step forward in the evolution of NMT.

Additionally, we have figured out a way of measuring the quality of an automatically generated translation. This Quality Estimation Score system was developed by KantanMT.com for the European Commission. The good news is, we shall be open-sourcing this technology in early 2019.

You’re also going to also see a new, improved version of KantanLQR that will support multi-lingual quality projects. It will give you the means to measure how individual language arcs are performing across your enterprise.

This article first appeared on http://www.argosmulitlingual.com in October 2018: http://www.argosmultilingual.com/blog/shaping-the-path-to-neural-machine-translation-with-tony-odowd

Deep Learning – Is it Simply a Chip Off the Old Block?

Today’s blog is aimed at helping the novice understand the technology that is Deep Learning (DL). To do this, I will need to discuss in-depth Linear Algebra, Statistics, Probability Theory and Multivariate Calculus. Only Joking! Nothing would turn the novice readers off than trying to hack our way through the above complex disciplines. We’ll leave that for the nerds. Today’s blog – like my last on Machine Learning – will try and use an analogy to help explain what is without doubt a very multifaceted, intricate subject to fully master.

For myself, the more I read about Deep Learning, and the more I spoke to the engineering masterminds at KantanMT the more I realised that the discipline of using a Deep Learning model bore a similarity to sculpting. Let me expand: I don’t know to whom this quote is attributed, but for me it certainly describes the methods of Deep Learning:

“The sculpture produces the beautiful statute by chipping such parts of the marble block as are not needed – it is a process of elimination.”

Indeed, I think it was no less than Michelangelo, who when asked about sculpting, said that the angel lay within the marble block; it was simply his job to release it. Michelangelo’s minimalist explanation, and the above quotation, encapsulate in its simplest form what the Deep Learning progression involves. The engineer is the sculpture. The marble block represents the huge block of dense data to be processed.  The act of processing the data is the chipping away of unwanted information by neural networks. The act of fine tuning the deep learning neural engine represents the technique of the sculptor carefully finessing the shape of the emerging form in to a recognisable figure.

In both the role of sculptor and engineer there is a vision of what the ‘fine-tuning’ activity should produce. I am confident that if you as a novice accept this simple analogy you are going someway to grasping the very fundamentals of the Deep Learning process.

mikendavid

As a concept, Deep Learning is less than two decades old. The origin of the expression is attributed to Igor Aizenberg, Professor and Chair of the Department of Computer Science at Manhattan College, New York. Aizenberg studies, amongst other things, complex-valued neural networks.  He came up with the concept of an Artificial Neural Network system based on that of the human neural network – the network of the human brain.

The ‘Deep’ element of the concept refers to a multi-layered processing network of neuron filters. The equivalent process in the human brain is that of information flowing through neurons connected by synapses. In the machine equivalent, artificial neurons are used to fine-tune and refine data as it is passed through the ‘engine’. The process of Deep Learning also learns from experience and can adjust its processes accordingly. In sculpting, it is the equivalent of the experienced sculptor chipping and refining the marble to release Michelangelo’s hidden angel.

Jeff Dean, a Senior Fellow at Google’s ‘System and Information Group’ – the group behind many of Google’s highly sophisticated machine learning technologies – said:

“When you hear the term ‘Deep Learning’ just think of a large neural net. Deep refers to the number of layers typically, and so this is kind of the popular term that’s been adopted by the press.”

For many novices there is a confusion around the terms Machine Learning (ML), Artificial Intelligence (AI) and Deep Learning (DL).  There need not be this confusion as the division is quite simple: Artificial Intelligence is the catch-all term to cover Machine Learning and Deep Learning. Machine Learning is an over-arching term for the training of computers, using algorithms, to parse data, learn from it and make informed decisions based on the accrued learning. Examples of machine learning in action is Netflix showing you what you might want to watch next. Or Amazon suggesting books you might want to buy. These suggestions are the outcome of these companies using ML technology to monitor and build preferences profiles based on your buying patterns.

AI

Deep Learning is a subset of ML. It uses a highly sophisticated, multi-layered, pattern of ‘neurons’ to process huge chunks of data looking to refine the information contained within that data. It takes an abstract jungle of information, as is contained with data, and refines these in to clearly understood concepts. The data used can be clean, or not clean. Clean data is the processing of refining the pre-processed information to remove any clearly irrelevant information. Clean data can be processed quicker than data that has not been cleaned. Think of it as the human brain blocking out extraneous information as it processes what is relevant, and discards what is irrelevant. Something the human brain does every minute of every day.

But why has Deep Learning suddenly taken off so spectacularly? It is because of the ability to train Artificial Neural Networks (ANN) to a level of accuracy when trained with huge amount of data. ANN can synthesise complex non-linear processes with a high degree of accuracy. DL is also becoming predominant because of the following boosters:

  • The emergence of Big Data
  • The increase in computational power
  • The emergence of The Cloud
  • The affordable availability of GPU and TPU
  • The development of DL models using open source code

Today it is estimated that Big Data provides 2.5 quintillion bytes of information per day. Now, if you are like me, you’ll will never have heard of the measure quintillion. Well, apparently, it is 1 million billion. Not that helps give it finer focus!

According to IBM:

“90% of the data in the world today has been created in the last two years. This data comes from everywhere: sensors used to gather shopper information, posts to social media sites, digital pictures and videos, purchase transaction, and cell phone GPS signals to name a few. This data is big data.”

It is safe to say that the amount of data available will only increase over the coming years. Institutions such as the European Union, the United Nations, the World Bank, the World Health Organisation, Social Media companies etc make huge volumes of data available daily, and in multilingual form. The importance of this resource of massive data is underlined by Andrew Ng, Chief Scientist at Baidu, China’s major search engine, who said:

“The analogy to deep learning is that the rocket engine is the deep learning models and the fuel is the huge amounts of data we can feed to these algorithms.”

The advent of Cloud Computing has allowed even small companies to have virtually unlimited storage space, and access to fantastically powerful computational power. Processors of the power of tensor processing unit (TPU) are available via Cloud computing. Some examples of Cloud computing sources would be Amazon’s Web Service, IBM’s SmartCloud or Google’s Cloud.

TPUs were developed by Google to specifically deal with the demands of ANN. Previously, graphics processing unit (GPUs) reduced from weeks to hours the machine learning process. TPUs have speeded up that process exponentially. Without this level of computing power, it is unlikely Deep Learning would be a viable technology.

Finally, Intel is reportedly developing a device called a Neural Stick which they claim will allow companies to bypass the Cloud to do their processing at a local level (i.e. non-Cloud level). This will be a boost to those companies who baulk at the security implications of processing data in a remote location. It will also increase the speed of processing as all the crunching will be done at the local level. Intel say it is their intent to make DL work “everywhere and on every device”. If they succeed, Deep Learning will expand to a huge degree. Interesting times lie ahead for Artificial Intelligence.

Aidan Collins is a language industry veteran. He is Marketing Manager at KantanMT. This article first appeared in Multilingual in the December 2017 edition: https://multilingual.com/all-articles/?art_id=2592