Master’s student Ewa Nitoń of the University College London submitted her thesis as part of the MSc degree in Scientific, Technical and Medical Translation with Translation Technology. The following guest article is a reflection on her research concerning the application of Machine Translation in medical context. Ewa was supervised by Teaching Fellow and Lecturer Dr. Emmanouela Patiniotaki and she used KantanMT.com for her MSc research. Continue reading
Dissemination of Machine Translation innovation is a major priority for us at KantanMT. We believe that Academic Partnerships have a huge role to play in furthering the scope of research and innovation in the field of Machine Translation, and as such we have partnered with a number of Universities to help students use the KanataMT platform in a real word scenario.
We are always looking for ways to improve the KantanMT platform, and to keep our finger on the pulse of the KantanMT user experience, we asked one of the students using the platform to answer some questions about the platform.
As a leading Custom Machine Translation company, we at KantanMT believe that Academic Partnerships have a huge role to play in furthering the scope of research and innovation in the field of Machine Translation.
The students from our Partner Universities go on to have very successful careers in the language industry. We are always looking for ways to improve the KantanMT platform, and to keep our finger on the pulse of the KantanMT user experience, we asked one of the students using the platform to answer some questions about the platform.
KantanMT has an ongoing Academic Partnership with Centre for Multidisciplinary and Intercultural Inquiry (CMII) at University College London to accelerate research and learning in the field of Machine Translation (MT). The postgraduate students of the department were able to use the KantanMT platform to update or gain new skills in Translation Technology. With help of the KantanMT platform, the students learnt how to build and customise their own Statistical Machine Translation (SMT) systems in a real world scenario.
Master’s student, Rafaella Athanasiadi of the University College London submitted her thesis as part of the MSc degree in Scientific, Technical and Medical Translation with Translation Technology. Rafaella was supervised by Teaching Fellow and Lecturer Dr. Emmanouela Patiniotaki and she used KantanMT.com for her research. This guest blog post looks at some of her conclusions on Machine Translation and the Localization Industry.
As Hutchins & Somers (c1992:1) argue, “the mechanization of translation has been one of humanity’s oldest dreams.” During the 20th century, the translation process changed radically. From spending endless hours in libraries to find the translation of a word, the translator has been placed in the centre of dozens of assistive tools. To name just a few, today, there are many translation software, terminology extraction tools, project management components, and machine translation systems, which translators have the opportunity to choose from while translating.
However, shifting the focus to audiovisual translation, it can be observed that not so many radical changes took place in that area, at least not until the introduction of machine translation systems in various projects (such as, the MUSA and the SUMAT project) that developed machine translation engines to optimise the subtitling process. Still, the results of such projects do not seem to be satisfactory enough to inspire confidence for the implementation of these engines in the subtitling process both by subtitling software developers and subtitlers.
Based on my personal research that focused primarily on the European setting, in the subtitling industry it seems that only freeware SRT Translator incorporates machine translation while also offering the features that subtitling software usually incorporate (i.e. uploading multimedia files and timecoding subtitles) at the moment. Nonetheless, SRT Translator, which is not very famous among subtitlers, uses solely Google Translator by default, which is a general-domain machine translation engine and not suitable for the purposes of audiovisual translation, one could argue. The quality of the output of Google Translator was tested by translating 35 subtitles of a comedy series. The output was incomprehensible and misleading in many cases.
Even though no further records of traditional subtitling software that incorporate machine translation could be found, there are many online translation platforms that allow users to upload and translate subtitles. Taking into consideration the European market, these can be either translation software like MemoQ, SDL Trados Studio and Wordfast that offer thability to load subtitle files and in some cases link them to the audiovisual content they are connected to, open source tools for translators like Google Translator Toolkit (GTT) or professional and private platforms like Transifex and XTM International that are used by companies and offered to their dedicated network of translators. Nonetheless, in order to enable machine translation in all the above applications, API keys must be purchased. GTT is an exception since it can be used for free anytime and only requires a Gmail account.
The fact that subscription fees have to be paid along with the costs of API keys for each machine translation engine provider puts their usability in question since costs may overweight subtitlers’ profits. Furthermore, these platforms cannot accommodate subtitlers’ needs; for instance, the option to upload and play multimedia files while translating the subtitles is not always possible nor any synchronization features for timecoding the subtitles to the audio track are offered. Transifex, however, is an exception since this localization platform offers users the option to upload multimedia files in the translation editor while translating the subtitles.
According to Macklovitch (2000:1) a translation memory is considered to be “a particular type of translation support tool that maintains a database of source and target language sentence pairs, and automatically retrieves the translation of those sentences in a new text which occur in the database.” Even though machine translation engines were developed through different projects to reduce subtitling time to the least possible degree, no attempts had been traced during this research to integrate a translation memory tool in a subtitling software for optimizing subtitling; at least in a European, Asian and Australian setting. As Smith (2013) argues, “traditionally subtitling has fallen outside the scope of translation memory packages, perhaps as it was thought to be too creative a process to benefit from the features such software offers.” However, as Diaz-Cintas (2015:638) discusses “DVD bonus material, scientific and technical documentaries, edutainment programmes, and corporate videos tend to contain the high level of lexical repetition that makes it worthwhile for translation companies to employ assisted translation and memory tools in the subtitling process.”
Even if such tools have not been integrated in subtitling software, translation memory components are used for subtitling purposes in cloud-based platforms such as GTT, Transifex and XTM International as well as in translation software, MemoQ, SDL Trados Studio, Wordfast Pro and Transit NXT by simply creating a translation memory before or while translating. It should be noted that Transit NXT is the only translation software that can accommodate the needs of subtitlers to a high level among the tools discussed in this research. Apart from the addition of specialized filters to load subtitles (that also exist in MemoQ, SDL Trados Studio and Wordfast Pro), subtitlers can upload multimedia files, translate subtitles while a translation memory component is active and also synchronise their subtitles with the Transit translation editor (Smith, 2013).
Figure 1: The translation editor of Transit NXT by Smith (2013)
The newly-founded company (2012) OOONA has taken a very interesting approach to subtitling by developing a unique cloud-based toolkit that is built exclusively for accommodating the needs of subtitlers. When asked the following question within the context of the MSc thesis,
Considering that other cloud-based translation platforms like GTT, Transifex and XTM International offer the option of uploading a TM or a terminology management component, do you think that it is important to offer it on a subtitling platform as well?
the representative of OOONA (Alex Yoffe) replied that not only will the company implement translation memory and terminology management components in the next phase of enhancing their platform but that they also consider these components to be very important for the subtitling process. In addition, Yoffe (2015) argued that OOONA intends to “add the option of using MT engines. Translators will be able to choose between Microsoft’s, Google’s, or customisable MT engines.” Therefore, it seems that OOONA will become a very powerful tool in the near future with features that will optimise the subtitling process to the maximum and shape the way that subtitling is carried out until now. The fact that Screen Systems, Cavena and EZTitles have partnered with OOONA is an indicator of how much potential there is in this toolkit.
As it can been argued based on the above, there is lack of subtitling software with incorporated translation memory tools. Therefore, this issue was further researched through the form of an online questionnaire that was disseminated to subtitling companies and freelance subtitlers. In addition, two companies that develop subtitling software, Screen Subtitling Systems and EZTitles, were asked to present their views on this topic. In both cases, their willingness to optimise the subtitling process in a semi-automated or a fully-automated way was apparent through their answers. The former company was in favour of a combination of machine translation tools with translation memory tools whereas the latter leaned towards a subtitling system with integrated translation memory and terminology management tools.
Nonetheless, the optimisation of the subtitling process has to coincide with the needs and preferences of subtitlers. Based on the respondents’ answers, it is clear that translation memory tools in subtitling software are desirable by subtitlers. In question,
Which tool would you prefer to have in a subtitling software? An integrated translation memory (TM) or machine translation (MT)?
more than half of the respondents (56.8%) chose TM. Interestingly, the answer Both received the second highest percentage (20.5%) which indicated that subtitlers demand as many assistive tools as possible.
One of the main conclusions that were drawn from this research was that machine translation engines need to be customised to produce good quality output and this can be achieved through customisable engines like KantanMT and Milengo. Moreover, translation memory tools are sought by subtitlers in subtitling software, while cloud-based platforms seem to occupy the translation industry today. Following this trend, subtitling software providers partner with online services/tools like the OOONA toolkit.
Based on the outcomes of this research, it could be said that we are certainly experiencing a new era in subtitling since the traditional PC-based subtitling software are now transforming into flexible and accessible platforms to enhance the subtitling experience as much as possible. It is a matter of time which tool and platform will rule the subtitling industry but one thing is for sure; the technologies of the future will bring a lot of changes in the traditional way of subtitling.
Diaz-Cintas, J., 2015. Technological Strides in Subtitling. In: S. Chan, ed. Routledge Encyclopedia of Translation Technology. London: Routledge, pp. 632-643.
Hutchins, J. W. & Somers, H. L. (c1992). An introduction to machine translation. London: Academic Press.
Macklovitch, E. (2000). Two Types of Translation Memory. In Proceedings of the ASLIB Conference on Translating and the Computer (Vol. 22).
Smith, Steve (2013). New Subtitling Feature in Transit NXT. November 11 2013. [Online]. Available from: http://www.star-uk.co.uk/blog/subtitling/working-with-subtitles-in-transit-nxt/. [Accessed 01 Sept. 2015].
Yoffe, A (2015). MT and TM tools in subtitling. [Interview]. 13 August 2015.
 Relevant data are available in Appendix 1 of the MSc thesis.
University College London
MT has often been seen as a threat to translators. In many instances, professionals and academics have interpreted it as an attempt to lower translators’ rates, attack the profession, replace humans with machines, and undermine quality. It cannot be denied that MT can have a negative impact on employability, challenge the viability of the profession in its traditional form (i.e. pure human translation), as well as compromise the quality of the translation output and the idea of translation as a high-profile task performed by skilled professionals. However, MT is the logical sequence in technological advancements and, as such, it should be utilised rather than demonised.
The task of introducing MT in educational translation programmes as a tool for translators rather than as feature of a CAT tool that could be ignored requires careful planning and it should be based on certain principles. These principles are necessary to clarify the grounds on which such a solution is introduced to students, the usefulness of the material presented to them, the possible applications and, most importantly, the way in which MT systems should be handled. Teaching MT is different from teaching any other CAT tool in the sense that a MT system intervenes in the mental process of the translator. However, since CAT tools also include MT plug-ins and support APIs from various MT systems, it is better to guide translation students or trainees into a suggested way of using MT (based on professional experience) in the most effective manner, rather than unintentionally expose them to a new technological advancement without any preparation.
MT has become a widely used tool in everyday life, mainly through applications installed in various devices and free MT systems on the Web, basically for information and communication purposes. This exposure and familiarity with open-source MT engines brings people closer to the idea of MT, making it a tool itself and perhaps resulting in future translators adopting a more welcoming approach to this technology. Thus, the risks of not defining the setting of MT within the profession are many. The two sides –those already exposed to MT engines for different purposes (e.g. instant website translation) and those who see it as a risk itself for the profession– make a discussion on MT within educational contexts a prerequisite before actually using such a system in professional contexts.
In order to set the grounds for the purposes of introducing MT to translation students and teaching them how to include it effectively in the translation process, it is important to: a) discuss the history and nature of MT systems as well as relevant research on the field, the different types of MT and the purposes they serve; b) discuss the usefulness of MT systems within the field of translation; c) establish connections with Linguistics and the ways in which language is handled by MT systems, syntactically, grammatically and semantically; d) define MT problems and focus on the translator’s role in customisation, pre-editing, post-editing and retraining of systems; e) practise using MT systems with tasks and assignments designed bearing in mind scenarios that can and do exist within the translation industry.
With regard to the principles for such an attempt, the following non-exhaustive are proposed:
- MT systems should not be seen as replacements, but rather as tools in translation training and the profession in general.
- They can be integrated or stand-alone and, as such, they can have limitations but also offer customisation options.
- They require training and practice.
- MT systems are gradually becoming a reality in the profession and knowing how to handle them best can be an asset, especially for new translators.
- MT systems become more effective when their purposes are better defined based on usage, as is the case with domain-specific engines.
- Metrics are a good way to evaluate a system, yet each translation job is different and several parameters (including client requests, data quality, purpose of translation, etc.) need to be considered when assessing those metrics.
- MT systems are (in most of the cases) constantly updated engines, and translators ought to be informed about their structure, content and source of training data.
- The field of MT is very fruitful for conducting research and for carrying out case-studies within Translation and other fields.
- MT solutions should be used wisely and critically. Pre-translation with MT output should be very carefully considered and always in relation to the data a system has been trained with. Substitution of the translator’s originality in translating empty segments with pre-translations consisting of MT raw output can result in non-genuine and very homogenous (loss of identity among different translation sources) translation performance.
Based on the above, students can be driven through what can be seen as a “healthy” implementation of MT systems in translator training. Instead of being faced with the reality of pre-translated documents that require post-editing, the need to adjust to a company’s MT practices, or even the need to develop their own systems as freelancers to improve their effectiveness in terms of time and cohesion, they can be prepared and gradually guided with the use of existing solutions that will make them experience the advantages and disadvantages of MT systems and also test how they could be included in their list of translation tools. After all, the choice of translation tools, when it is not dictated by vendors or companies, is a personal one and, like any other choice in this context, it is based on preference and convenience. Finally, knowledge of a system’s behaviour from the inside is always an advantage for its most effective use.
KantanMT has been used at UCL this year to teach students at master’s level. It is ideal for teaching purposes due to its highly customisable nature. Students had the chance to build domain-specific engines (mainly technical), train them with data they collected from other tools, translate documents, apply rules and perform post-editing tasks. They managed the metrics to realise how systems perform based on the data fed into them, which was very useful as it is often hard for people with a linguistic background to realise how a perfectly well-written set of data may not suffice for good training. Students also learned that the relations between the system structure, the source and target languages, and the content of the translation file play a crucial role as regards the effectiveness of a MT system. All these line up with the teaching approach explained above and facilitate the attempt to introduce a positive view towards MT by translators themselves, based on the idea of MT being used as a tool that should be perceived very critically and managed very carefully in order for it to be effective in the translation process.
It is important to notice that students had the opportunity to express their thoughts on the system through an online questionnaire and also through their assignments. The aim of the module was not to persuade them about the usefulness of MT, but rather to provide guidance and to help them realise its place in the industry, its advantages and disadvantages, as well as train them for the successful handling of MT systems in a professional environment. The system will also be used in professional translators’ training and online courses in next academic year with the hope to familiarise participants with a side of Translation Technology that is often left out of teaching contexts.
With thanks to Rocío Baños Pinero.