The Rosetta stone paradigm translations. Hans Hillewaert Wikimedia (CC)
This article is written by José Pichel Andrés and was originally published in Spanish in the online journal, El Español. It has been translated into English by Carlos Collantes from the Professional Services team at KantanMT. The article has been edited slightly for readability, but we have made all attempts possible to retain the original flavour of José’s article.
Researchers today are redefining Machine Translation. Though it is still a far cry from being completely satisfactory, it displays a rapid development, thanks to new systems like Neural Networks.
Today, Machine Translation has become more useful and provokes less ridicule. Machine Translation systems are currently available on the Internet and they allow us to understand, with varying degrees of efficiency, anything in any language. Until recently, one could automatically translate website content and access sites without even being aware that they have been automatically translated without any human intervention. Nevertheless, these tools are still far from perfection and many researchers, mainly linguists and computer scientists, focus their work to narrow the gap between machine translated output and human translations.
Mikel Forcada, a researcher from the University of Alicante, is the founder of Apertium, a Spanish translation platform that has proven successful worldwide. This project consists of an open source software that any developer can improve and adapt. Therefore, it is already available in 40 language pairs.
The initiative began with a savings bank, which required an automatic translation between Castilian and Catalan to facilitate its office work. The task was finally assigned to the University of Alicante. The impressive results led researchers to launch the first version of an automatic open-source platform in 2005. The platform offered reliable instant translations between the official languages of Spain. It is updated and refined constantly and forms the basis for other translation systems.
Of course, Apertium also allows us to translate from English into Spanish, but it is more specialised in minority languages and systems like Google or Bing cannot match it on quality. To understand why, it is important to know understand how Machine Translation works.
Rule-Based Machine Translation (RBMT) or Statistical Machine Translation (SMT)?
Basically, when programming an automated translator, a choice has to be made between two options: Rule-Based Machine Translation (RBMT) or Statistical Machine Translation (SMT). RBMT uses a series of linguistic rules for source and target languages. However, developing such a system takes time and, as more rules are introduced, more complications arise. As such, today the most popular translation systems use SMT, which relies on the analysis of vast amounts of data to draw the most likely patterns for a translation.
That is the big problem with minority languages. On the Internet there are millions of texts in English or Spanish, but very few in languages like Sardinian, Maltese or Asturian. However, these languages are available in Apertium, as it operates through a rules based system.
Interestingly, the two most commonly used languages in the platform are the two varieties of Norwegian. “We discovered students in Norway cheating en masse in their homework, as they have to translate texts between the two languages”, says Mikel Forcada.
Apart from issues with translating minority languages with SMT, the most popular automatic translators have serious problems with the specialised languages. SMT always tends to favour the most general choice. This is why, according to Forcada, technical texts from English to Spanish (and vice versa) are better translated in Apertium’s RBMT system.
The need for specialisation has led innovative technological companies in the market to offer Custom Machine Translation systems. One such company is KantanMT, an Irish start-up that focus its research on this field in collaboration with specialists from ADAPT, Dublin City University.
“Our clients use their own files in two different languages to create databases with which they can train their own engines” explains Carlos Collantes, a Spanish worker at the company. “We grant them access to the platform and we teach them how to use it.”
With this service in the cloud, clients enjoy a custom service, which is adapted to the language and terminology they use in their domain. Many clients even publish the translations of their products directly on their websites, while others choose to post-edit the translations with the help of human translators.
KantanLabs™, the company’s R&D department, focuses on the development of Artificial Neural Networks (ANNs). Based on the biological functioning of the nervous system, these systems try to mimic human learning patterns. They are being used in many fields of computing, and they have begun to contribute to the field of Machine Translation as well.
“Like our neurons receive stimuli and provides an answer, Artificial Neural Networks learn how to react by examples,” explains Collantes. The system is so cutting-edge, that there are specialised competitions and contests to see who performs the best translations with the aid of Neural Networks.
Engine training is automated in KantanMT.com, where they can translate up to 250 million words at the rate of six million words per hour. Constant post-editing and engine retraining helps improve the quality of the engine and generate high-quality translations. What’s more is that KantanMT has another great advantage over other free available online options – that of security and privacy.
A Question of creativity
Will this process of refining, machine translated quality ever reach a stage where translators will become dispensable? “There are some technical texts, for example, legal or financial documents, which often consist of repetitive language, and thus suitable for Machine Translation; but creative or literary translations would always need post-editing by human translators. I do not think the work of a translator would ever be completely replaced,” Collantes said.
Mikel Forcada is of a similar opinion. “As humans, we have a context and a cultural background that is difficult to imitate, and only humans can discern what is most suitable in a given scenario “, he says.
However, the immense technological advances in the recent years have come about because of two reasons – more availability of data, and increased computing power. Both these factors will continue to improve. This is why experts refer to the concept of technological singularity – the hypothesis that artificial intelligence will surpass that of human. “Some claim this moment will arrive in 2025. Maybe then machines will understand contexts better than we do.” Forcada sums up.
We will have to wait and watch.
To know more about the Neural Machine Translation research at KantanLabs, mail email@example.com.
About José Pichel Andrés
José is a journalist and specialises in the areas of science, technology and innovation. He creates written and audio-visual content for the Ibero-American Agency for Science and Technology Dissemination, DiCYT, which belongs to the 3CIN Foundation. He also contributes to the science section of the Spanish Newspaper El Español and the journal bez.es. José often organises projects for scientific dissemination and has a broad knowledge in teaching and researching on scientific journalism. He is a member of the Spanish Science Journalism Association (Asociación Española de Comunicación Científica – AECC).