It started with the flu: not just any old strain of flu, but a virulent virus known by the name of H1N1. The year was 2009 and a public health epidemic was rocking the USA. And the authorities were sure if things were not controlled it could well become a pandemic (i.e. a country- or world-wide epidemic). The problem was the authorities were chasing the spread of the disease. By the time they had identified an area where it might appear it was already too late. The challenge then was how do you get ahead of the spreading disease, how do you treat it like a forest fire and surround it by a fire line that would stop it spreading? It seemed an impossible task. Step in Big Data and Google. No company in the world could do what Google could do in 2009. Google handled 3 billion search terms every day and had the huge processing power to do it. (Source: Big Data, Viktor Mayer-Schonberger & Kenneth Cukier).

Then someone had the brilliant idea of harnessing this intelligent to try and identify the path the spread of the epidemic might be taking, and as close to real-time as was possible under the circumstances. The idea was to identify search terms that people would use if they were worried about getting a flu or were beginning to feel the symptoms of the flu. The search words could be identified and the computational might of Google would allow them to identify 45 search terms that would set off an alarm, but more importantly it would allow them to identify the area where the people where located who were doing the searching. Suddenly, instead of chasing a spreading flu, the health authorities were able to identify probable hot spots and quickly deploy health professionals to treat the population ahead of any arriving flu bug. They were at last able to throw down a fire line and contain the spread. The pandemic would not happen – this time. This was the first time Big Data was harnessed in such a fashion. A new industry was born, and it has continued its relentless growth right up the present day. And it’s not about to disappear anytime soon.

Big Data.jpgSo, what in a few words is Big Data? Well, the Big refers to the volume. Digital data has existed since the first bytes were input into a computer. Many computers had very large volumes of data on them. Think of all of the photos you have on hard drives, all of the documents you have created, all of the emails you have written, all of the Tweets you have tweeted, all of the Facebook messages you have posted and you are beginning to get an idea of the personal data you have created and are continuing to create, daily. Not multiply that the amount of people in the world who are doing the same thing. Here’s a figure for the active users of Facebook per day – 1.56 billion. (Source: Facebook DAU, March 2019). Add to that Twitter, Instagram, LinkedIn etc and you see where we are going with the numbers. Colossal! For me the numbers are incomprehensible, but I will share them with you: are you strapped in? Well, according to IBM there exists today 2.7 zettabytes of data online. A zettabyte is 1 billion terabytes – are you any clearer? To say that is a helluva lot of information would be something of an understatement.

That is one humongous potential source of business intelligence which could render an enormous variety of information and at a comparatively rapid speed. Using the right equipment, methodology and specially trained data scientists that information mother lode could be sliced, diced and parsed to uncover very valuable information. And every global company worth their salt wants to do just that. And the reason for that? – it is estimated that today’s online spending is $50 trillion per annum. Any company wanting even a fraction of that needs to have the ability to identify markets, trends, sentiments, opportunities and to do it damn quick. Have your grey cells exploded yet? Well, here’s one last figure for you – it is estimated that the volume of online business data doubles every 1.2 years! (Source: Users Bring Real Value to Big Data Machine Translation, Wired).

Yet what made all this possible. Why now in the 21st century did it become a phenomenon? Well, the answer to that is the incredible explosion of the computational of computers. The development of fibre optic cable which allowed data to travel at fantastic speeds and the birth of the Cloud – a seemingly unlimited storage somewhere out there. Tie these elements to get and you get a completely different computation paradigm that hitherto existed. In addition to these technological rockets you had a change in mindset as to how Big Data could be used. It was as though a treasure chest (or Pandora’s box?) had been opened and made available to those who had the savvy as to how to exploit this global gift. The key word is global, as that is the key challenge. If a company truly wants to grab a slice of that $50 trillion-dollar market it needs to be able use online data, interrogate its own data collection, all of which exists in a multiple of languages. That is the reality of a global market. Only one third of those online are English speakers. It is estimated the other two thirds are covered by 45 other languages.

Enter Machine Translation – a technology that has existed since the 1960s and that has seen huge growth and refinement in the last 10 years. If a company wants to be truly global it must develop a way of handling huge volumes of data in multilingual formats and at fantastic speed – some even need it real time. The shear volume of what needs to be translated – to some degree or other, and not always perfectly – is, as we have seen – phenomenal. Machine Translation is the tool of choice for doing that. Not every translation needs to 100% perfect – companies like law firms need only “gist” translations. Other companies employing online chatlines need similar services – a translation with a quality that gets the message across. However, there are those who need perfection (if that exists – see my blog above on this). That too is being supplied more and more by MT AND Human Translators (HT). The latter will never disappear as a vital part of the translation equation. MT and HT are an essential partnership.

MT algorithms can crunch huge volumes of data and tremendous speeds and to an increasingly high-quality level. But we still need HT to bring the product to the accepted quality level (that can vary from customer to customer). The industry is giving shape to this new partnership translation paradigm. HT will always be a vital part of the translation/localisation industry. MT does not threaten their hard-earned status. I predict that the earning power of translators will rise over the coming short while. In theory, and I believe it will be in practice, translators will have the option to access work online 24/7/365. Only sleep will prevent them from accessing work when and where they want it. Things are developing rapidly behind the scenes and I believe that soon sky’s the limit for all translators out there.

Aidan Collins, Marketing Manager at KantanMT.