In Technology, Things Never Slow Down – Quite the Opposite

I am of an age where I can recall the pre-email intra- and extra-office communications process. Both were served by what is now called snail mail. External communications got a stamp and where posted at the end of every day. Internal communications involved a dedicated person travelling around offices handing out memos in large brown envelopes tied with string. If you were on the recipients’ list, the brown envelop was placed in your In-tray by the office postal clerk for you to peruse, at your leisure.

Once you had read the contents, and added a note of comment to them, they were then placed in your Out-tray. You ticked off your signature to show that you had read the contents. The brown envelop was then moved on by the clerk who spent his day walking the corridors to carry out this vital task. You know what, the process worked – albeit at the pace of a crocked snail. But that’s how the world was back then. People did not expect, nor demand, things to be addressed immediately.

Then some office I.T. genius spotted this new technological advancement that was sweeping the world. It was called, Email. The technology was duly introduced and we all received training on how this new-fangled invention worked. The old brown envelopes disappeared and the postal clerk put on a lot of weight from lack of exercise. But for worse (or for better?), the pace of work in the office was ramped up immeasurably. Suddenly messages were being received in your electronic In-tray and expectations grew that a message received should be answered immediately, if not sooner. Decision-making became a nanosecond exercise.

Indeed, people sitting only feet from you would “ping” (that was a new word for us) an email to you, rather than simply shout across the office, or talk to you over the watercooler. The introduction of this new technological changed the face and the pace of every office. It put it in to an overdrive that it never really decelerated from. I tell you this “All of Our Yesterdays” anecdote by way of demonstrating to you how technology begets a change that is often one of speeding up processes. Seldom does new technology aim to slow things down.

This speeding up is being driven by the constant evolution and improvement in the capacity of computers to crunch and process data. As the physical hardware gains more computational power, with super processing chips, that power is used to process and spit out huge corpora of data at breathtakingly fast speeds. But even this power is not proving sufficient as companies hunger for faster and cheaper solutions to their growing need to process huge amounts of data at almost real-time speeds. Already research is at an advanced stage whereby the silicon chip will be replaced by a new technology called the carbon nanotube. And on and on it will go.

The evolution of NMT too has been evolving at a breakneck pace. Tony O’Dowd recently commented in the Slator 2018, Neural Translation Report that:

“It’s fair to say that the [language] industry has condensed 15 years of statistical research in to three years of NMT research, and produced systems that will outperform the best SMT can offer.”

In short, NMT development has moved at five times the pace of SMT research. And the developments in industry bear this out. Google replaced a system they had developed over 12 years by a new NMT system they developed in just over 18 months. And with these developments comes the improvement of outcomes and capabilities. The rapid evolution of NMT has been served by the huge amount of time and effort being put in to research by many of the giants of industry. This factor, married to the development of faster and affordable hardware, has facilitated the ongoing demands for more speed and computational power. Google is working with a start-up company called Nervana Systems that is developing the Nervana Engine, an ASIC processor that increase current processing speeds by a factor of 10. Not surprisingly, Nervana Systems was bought by Intel in 2016.

It is no surprise that NMT, which is a model inspired by the workings of the human brain, is greedy for the speedy processing of huge corpora of complex data. And it is a sobering thought that the average human brain processes data at 30 times the speed of the best supercomputers. Fortunately, with the advance of Deep Learning, SMT requires only a fraction of the memory needed for traditional SMT. Whereby Email was demanded because the world needed to speed up inter- and extra-office communications, the development of NMT is being driven by the proliferation of mobile devices, in-home control systems, the rise of social media and the demand for real-time communications, the growth of e-commerce as a market opportunity for companies and the growth of Big Data and its insatiable appetite to crunch and understand huge amounts of data now, in multiple languages and at an affordable cost.

The adoption of NMT by behemoths such as Google has meant that this language solution has been given the blessing that it is a technology worthy of investment and research. And as is the way in industry once one giant adopts a system the other equally powerful entities feel the need to develop their systems. Facebook too has joined this race. Indeed, the top companies in the world, including Microsoft, Google, Amazon, eBay and Facebook to name but a few, have ongoing investment and research in NMT. With R&D spending prowess of these companies it is no wonder that the development of NMT has gathered such a pace. In fact, NMT is expected to surpass all other MT models and to grow to a market share of $46 billion by 2023.

The objective of NMT development is no small one. In essence, it can be defined as advancing a system that will allow people from anywhere in the world to be able to connect with anyone, and understand anything in their own language. Add to that the need for quality and speed and you can see the mountain NMT has to climb, and has been successfully climbing. Yet achievement of that objective is getting closer. Google, for example, supports 103 languages, it translates a 100 billion words per day (you read that right!) and communicates with the 92 percent of its users who are outside of the USA.

Those are staggering figures. But if companies want to grow their brands, open up fertile new markets and keep their shareholders happy, then these are the levels that must reach to keep pace with developments in NMT. And we are not only referring to the written word, for more and more of the demands are for the spoken word with the growth of voice activated technology and household “gadgets” such as Amazon’s Alexa, Google’s Home and Apple’s HomePod (and that list is growing). And the future of NMT is further being cemented by its adoption by key industries such as Military & Defence, IT, Electronics, Automotive and Healthcare to name just a few.

NMT has now been taken up by all serious language service providers (LSPs). The debate is ongoing as to how this will impact on the current LSP model. Undoubtedly, the role of the human translator is evolving to one of being an editor rather than translator. Pricing models are changing from the traditional price per word based on word volumes, to pricing on a time-measured rate. An expert at eBay has predicted that the traditional translator will evolve to become “… date curators of corpora for MT.”  Our own Tony O’Dowd has a bleaker assessment for the human translator when he says, “... the traditional approach to translation is dead (or in its twilight zone)”. But one thing seems sure, NMT – like email – is not going to go away. Speed is of the essence – that is the eternal watch-cry of technology.

Aidan Collins is a language industry veteran. He works in the marketing department at KantanMT.

Meeting the Challenges of Bahasa Indonesia

Indonesia as a country has a huge, diverse landscape and vast cultural tapestry.  Along with this rich cultural mixture, it has more than 700 native regional languages and dialects spoken across the islands populated by 30 million people.  Bahasa Indonesia, as the official language, has its role as the lingua-franca for all of the hundreds of languages and dialects present in the country.

Bahasa Indonesia is a flourishing language.  Based on the colonial history of Indonesia, Bahasa Indonesia was developed by inheriting words from Sanskrit and Dutch. Because of the very large population, of which the majority speak Indonesian Bahasa, it is one of the most of the widely spoken languages in the world.

Bahasa Indonesia has been treated as an active commercial language by the East Asian countries.  Because of the strong economic market in Indonesia, other East Asian countries have found it necessary to understand Bahasa Indonesia in order to trade with the country. Yet unfortunately, Bahasa Indonesia is still considered a minority language by many companies in the wider world, leading to research and development of the language being under-resourced.

Perhaps as a by-product of this under development, there has been a lack of statistical data created in order for Bahasa Indonesia to qualify as a candidate for either Statistical Machine Translation (SMT) or Neural Machine Translation (NMT).

As a taught subject, Bahasa Indonesia is a very dynamic language.  Bahasa Indonesia is not only taught by its grammar and sentence structure, but also the usage of Bahasa Indonesia in proverbs, poems, and essay writing.  It can be said that Bahasa Indonesia is one of the most difficult subjects to learn as a student.  The dynamic of the language makes it difficult to find the ‘right’ equivalence during the translation process.  As it was mentioned above, Bahasa Indonesia is a mixture of Sanskrit and Dutch. Many of the concepts and definitions have a deep historical background behind their ‘meaning’.

It the recent years it has been getting even more difficult and complicated to translate Bahasa Indonesia in to other languages, and vice versa.  The over-whelming power of English as a global language, and the advances in education and technology, which mostly comes from English speaking countries, has greatly challenged the development of Bahasa Indonesia. This is particularly so in the evolution of new words, which has the knock-on effect of making the task of creating, improving and training better statistical data for SMT even more difficult.

Today, if we look at the Bahasa Indonesia and English language pair being processed through machine translation there will be a lot of fixes needing to be done. This is because English words tend not to be fully translatable in to Bahasa Indonesia.  In addition to this complication, the younger generations in Indonesia tend to intersperse English and Bahasa Indonesia in most of their conversations and writing.

This leads to Bahasa Indonesian lacking a purity of language needed for an optimum use of SMT. Even human translators will keep some of the original English words because they are more commonly used than the Bahasa Indonesian words. As a result, many translations in to Bahasa Indonesian become a hybrid combination.

This lack of linguistic purity means a lot of preparatory work is required in order for Bahasa Indonesia texts to be suitable for SMT. Many of the SMT products for use in the handling of the Bahasa Indonesian and English language pair are inconsistent in their translation, leading to a lot of incorrectly translated texts. The quality of these is such that they are misleading, and of little use to the intended readers.

Even with the growing use of English and Chinese as mandatory subjects in the Indonesian education system, Bahasa Indonesia has been able to hold its place as the official language. It has adapted to the imposition of other languages and cultures by evolving. Yet ironically, it is this lack of linguistic purity created by the adaption of words and concepts from other languages that has proven a challenge for SMT.

If SMT is to become a solution for Bahasa Indonesia, a lot of work will be needed to be done to create and make usable a suitable body of statistical data. Only this level of work will allow Bahasa Indonesia to use SMT and be recognised as a world language.

Janet Siska, a fourth-generation Chinese, was born and grew up in Jakarta, Indonesia. At the age of 15, Janet moved to the USA. In 2015, she graduated with a BSc in Biochemistry/Chemistry from California Polytechnic University, Pomona. Currently, Janet is attending Dublin City University where she is studying for a Master of Science in Translation Technology.  Janet is fluent in Bahasa Indonesia, English, Korean, and has a working knowledge of Chinese and German.

Get the Best from Neural MT with Quality Data

In this post Pat Nagle, our Project Manager at KantanMT speaks about Neural MT and the importance of using high quality data while training MT engines. He delves deep into the various ways in which KantanMT data can be used in order to get the best translation output. Continue reading

KantanMT Embraces Change to Grow with the New Era of Machine Translation

Pic2

KantanMT recently launched a new interface. In this blog Laura Casanellas, Product Manager at KantanMT explores the reasons behind the change and talks about the new functionalities that have been added.

Continue reading

Academic Use of Machine Translation with Universitat Autònoma de Barcelona

Our Academic Partner Universitat Autònoma de Barcelona (UAB) used the KantanMT platform for numerous projects and courses in the University. In this blog post, we caught up with Professor Olga Torres-Hostench where she describes her experience of using our custom MT platform for her course.

UAB Continue reading

Continuing My Journey to Becoming a Gaeilgeoir

An article by Riccardo Superbo, our Client Solutions Engineer in the Professional Services Team at KantanMT.Welcome 2

In my last post, I talked about the reasons that motivated me to start learning Irish. In the second instalment of my blog post, I would like to highlight some interesting aspects that reflect the current situation in Ireland, with relation to how the locals feel about their national language and their reactions to foreigners learning it. Continue reading

6 Ways to Integrate MT in your Work Environment

In light of a recent KantanMT user survey, we noticed that while all our clients enjoy using our custom MT platform, some of our users are less aware of the KantanMT productivity enhancing tools and features, which help access KantanMT translations within the work environment.

Translations from your custom KantanMT engines can be directly accessed within a Microsoft program or on a webpage and in various other ways. In this post, we will tell you about the 6 coolest ways you can get your KantanMT translations, without even having to open the platform. Continue reading