KantanMT has an ongoing Academic Partnership with Centre for Multidisciplinary and Intercultural Inquiry (CMII) at University College London to accelerate research and learning in the field of Machine Translation (MT). The postgraduate students of the department were able to use the KantanMT platform to update or gain new skills in Translation Technology. With help of the KantanMT platform, the students learnt how to build and customise their own Statistical Machine Translation (SMT) systems in a real world scenario.
The module is designed to cover a variety of topics concerning the use of computer systems to present, manipulate, extract or translate information expressed in natural language. Technology covered in the module includes, Machine Translation and terminology extraction. Throughout the module, students acquire an understanding of the technology behind language engineering applications, and learn to use and evaluate different tools.
Dr Mark Shuttleworth, UCL
As someone who has been teaching translation technology since the late 1990s (and occasionally even using it ‘for real’), my experience of working with KantanMT with my students at UCL during last academic year was a considerable eye-opener. What it enables small companies, and even individual translators to accomplish is something that could not have been imagined even as recently as the end of the last century.
The project that involved the use of KantanMT.com, formed part of my module entitled ‘Understanding and using Translation Technology II’. This is a Masters-level course delivered to a group of students that included native speakers of English, Chinese, German, Italian, Russian and Thai. We had twelve students in the course. Prior to the start of the MA, participant’s experiences of MT were in most cases probably limited to Google Translate, and in all likelihood few people were aware of the possibility offered by KantanMT of creating and training a personalised translation engine from scratch.
Some of the tasks that were covered in the module saw the students do the following:
- choosing a subject area to focus on
- creating a MT engine that would specialise in it
- launching it with an appropriate stock engine
- creating a test suite of texts
- training the engine by adding resources to improve the quality of the output
- monitoring the BLEU score as these resources were added.
Students were in general very impressed with the software. One student commented that complicated processes had been successfully compressed into one single user-friendly platform that was
“self-explanatory, requiring little to no instruction on how to operate” and another praised the platform’s “high level of user autonomy.”
One of the greatest challenges my students faced was to locate the bilingual aligned data that would transform a new engine into a useful tool. Significant amounts of parallel data are available to download from the internet free of charge, although such resources are restricted to certain subject domains and the same data inevitably ends up being used by everyone.
A further, perhaps more intellectual challenge proved to be that of understanding the impact that uploading a resource might have on the BLEU score, as this was not always cumulative. In spite of this, after a certain amount of trial and error many of my students were achieving BLEU scores in the 40s, with the highest being 51%. However, these scores were on the basis of first-time experimentation with resources that were almost exclusively generic rather than subject-specific in nature.
For next time I am planning to add more collaboration and sharing by students of their own subject-specific resources.
Without a doubt, the significance of what they had been doing was grasped by everyone. One student predicts that the future is likely to see the proliferation of the new job of “machine translation engine trainer”. To give her the last word:
“If you do not have any knowledge or awareness of translation technology, you will lose an important advantage in your professional translation career.”
Note from the KantanMT team:
We are delighted that Mark will continue to use KantanMT to teach translation technology to his students in 2016. Read more about the course in the module description: Understanding and using Translation Technology II (CMII).
About University College London and Translation at UCL
UCL is one of the world’s leading universities, founded in London in 1826 to open up education to all on equal terms, and to bring the benefits of learning to society. UCL’s ethos is informed by academic excellence and research that addresses real-world problems. It is home to some 27,000 students, from over 150 nationalities.
The College enjoys an international reputation for the quality of the translation research and teaching undertaken by members of staff, including in the fields of literary translation, theatre translation, translation technology and audiovisual translation. It has a long history of expertise in Translation Studies, recently strengthened by the incorporation of the Centre for Translation Studies team, formerly at Imperial College London. Translation Studies at UCL has a solid trajectory in the teaching of translation technology, offering PhD and Masters students the opportunity to gain vital experience in this rapidly developing field, within which Machine Translation plays a very important role.
Graduate teaching and research at the Centre for Multidisciplinary and Intercultural Inquiry (CMII) is interdisciplinary, intercultural and international. The CMII encourages innovative approaches and draws on expertise at UCL from diverse fields in the arts and humanities, whilst also drawing from the extraordinary resources of London. Its programmes include the highly successful MA in Translation Theory and Practice. Our MA and research students study subjects ranging from literature, art, film, history, politics, economics to geography and philosophy. The CMII is situated within UCL’s Institute of Advanced Studies, a space for critical thinking and engaged enquiry within and across conventional disciplinary and departmental boundaries.