An Evaluation of SMT in Medical Context

Master’s student Ewa Nitoń of the University College London submitted her thesis as part of the MSc degree in Scientific, Technical and Medical Translation with Translation Technology. The following guest article is a reflection on her research concerning the application of Machine Translation in medical context. Ewa was supervised by Teaching Fellow and Lecturer Dr. Emmanouela Patiniotaki and she used KantanMT.com for her MSc research.

According to Williams and Chesterman (2011:14) “[…] technology has become an integral part of the translation profession […]” and there has not been wide research on many angles of this discipline. Sophisticated Translation Software (TS) and Machine Translation (MT) systems are well-known tools in the translation industry. Nowadays, there is an abundance of professional software available that helps professional translators work more efficiently and consistently. Thanks to the integrated translation environment offered in such software, professional translators are able to combine traditional translation tools with MT engines in order to receive more translation suggestions, while translating specialised content.

For that reason, in my MSc Research Thesis “A Comparative Study of Semi-Automated and Fully-Automated Machine Translation Output: Evaluating the Translation Quality of a Post Mortem Examination and a Toxicology Report from English into Polish,” the main focus was the investigation and assessment of the quality of Polish target texts produced through TS and Statistical Machine Translation (SMT), using English medical source material, namely a post mortem examination analysis and a toxicology report.

As the main area of this research, I chose to investigate the quality of the Polish MT output gained with the use of the chosen translation technology. Also, through this study it was possible to emphasise on the effectiveness of TS and SMT and see how they cope with the chosen source text (ST) from the subfields of toxicology and forensic medicine. Thus, I found this subject area interesting to investigate, since I am particularly interested in the subject areas of translation technology, machine translation and medical translation.

In this thesis, it was important to establish a way to measure and assess the quality of the Polish MT output achieved by TS and SMT engines, and how well the tools implemented performed, while translating following semi-automated and fully-automated translation scenarios. Therefore, I chose to analyse the performance of SDL Language Cloud SMT engine used in the SDL Trados Studio 2015 freelance edition as well as the performance of a medical SMT KantanMT engine in the MemoQ 2015 Pro edition, following a semi-automated translation process.

In a fully-automated translation process, however, the analysis considered the performance of KantanMT and Google Translate as the SMT engines chosen this research. First of all, it was important to analyse the English source content and focus on building the SMT medical engine from English into Polish on the KantanMT cloud platform. When both stages were successfully completed, it was possible to measure translation quality. I divided this process in two phases. First was the manual assessment with the quality parameters adapted from Krings (2001:264-267) presented in the Table 1 below.

Quality Parameters

1. Lexis/Terminology:

Incorrect translation of term/word
Translatable text left in English
Untranslatable text translated into target language

2. Morphology:

Incorrect verbal form
Lack of agreement (masculine/feminine; singular plural)
Incorrect gender
Incorrect plural form

3. Syntax:

Incorrect word order
Incorrect use of prepositions

4. Punctuation/Spelling:

Incorrect use of punctuation
Incorrect use of capitalization
Spelling errors

Table 1 Quality parameters adapted from Krings (2001:264-267).

In the second phase of this research, the assessment of the quality of the Polish MT output was based on automated evaluation, which involves the quality measurement at the adequacy, fluency and informative level (Depraetere, 2011). The main parameter used to automatically measure the translation quality was the BiLingual Evaluation Understudy (BLEU) score. However, in the KantanMT cloud-based platform, quality was also measured using F-measure, BLEU, TER and Word Count, which are calculated automatically as the engine was trained each time, when fed with training data.

The findings showed that the Polish MT output received from a semi-automated process was checked and produced in a longer period of time than the fully-automated output, since human involvement was far more visible while using TS and SMT engines. The Automated Translation (AT) results only helped with suggesting possible Polish translations, but it was down to the human translator to produce an accurate Polish MT output. Meanwhile, in the fully-automated process, the Polish output was achieved very fast, but although the quality, in comparison to the semi-automated translation, was satisfactory, post-editing was necessary in order to reach professional quality. The following comparison of the achieved results in Table 2 is based on the quality parameters described above, can facilitate our understanding of how well Polish language was preserved in both processes.

Quality Parameter

Semi-automated process:

SDL Language Cloud, KantanMT

Fully-automated process :

KantanMT, Google Translate

Specialised Terminology

(incorrect translation of term/word)

In the semi-automated process, the main issue concerned the use of abbreviations throughout most of the source content, for instance, chemical abbreviations in the toxicology report and the use of such specialised terminology, for instance the name of techniques used in toxicology or the proper medical names for human anatomy.

In the fully-automated process, KantanMT offered more specialised suggestions than Google Translate, since the medical engine was customised to produce medical output only, whereas Google Translate is trained with different corpora from various fields and therefore, it is less likely that this SMT engine will provide the appropriate terminology that will fit the desired context.

Morphology and Syntax

(lack of agreement e.g. masculine/feminine; singular/plural)

The Polish syntax and grammar of the proposed AT results, where neither SMT engine provided good results for the correct structure of Polish grammar in terms of the verb conjugation. For instance, instead of using the female conjugated verb the engines provided it in masculine form. Also, both engines had problems with the proper word order of Polish sentences. Therefore, it is important to highlight that human intervention in producing the target output was more visible in putting the specialised terminology together in coherent and consistent sentences and paragraphs which will be grammatically and stylistically appropriate.

In the aspect of correct grammar and syntax, Google Translate performed better than KantanMT engine. The medical engine was not trained with the general rules of Polish grammar and that is why it did not perform well. However, most of the MT output generated from both engines provided target content with incorrect verbal forms. Also a lack of agreement in the short or complex sentences was observed, for instance, for the nouns in singular and feminine form the verb was conjugated in plural and masculine instead. The major issue in the fully-automated Polish MT output was the choice of incorrect gender of the noun and lack of agreement in the sentences.

Spelling and Punctuation

(spelling errors)

During the translation process, all AT results were taken into account, while translating; however, most of them were amended according to the appropriate TL standards. Moreover, both TS were automatically performing a QA during the translation process and therefore the human translator disregarded this aspect from the manual evaluation of the Polish MT output.

Since the medical KantanMT engine was only trained for the medical domain, although there were no spelling mistakes found and punctuation rules were maintained, it was noticed that the engine had capitalised word in the middle of the sentence. This phenomenon proves the fact that a number of trainings need to be employed in the KantanMT engine in order to provide the understanding of basic Polish syntax and basic grammar rules. On the other hand, Google Translate did not have any issues with providing proper spelling and applying Polish punctuation rules.

Table 2 Compared results between a specialised and a general SMT engines.

With regard to the BLEU score for both SMT engines, it indicated that the KantanMT medical engine created by the user has significantly higher score than the general Google Translate engine, as presented in Table 3 below. This means that the KantanMT medical engine should produce a reasonably fluent Polish MT output with minimal post-editing, whereas Google Translate received a score lower than 50%, which shows that high post-editing effort needs to be applied in the Polish output and consequently it will be better to translate the content from scratch rather than using a SMT engine with so low score.

	A Domain-specific SMT engine – KantanMT	A general SMT engine – Google Translate
BLEU score	61%	33.59%

Table 3 Calculated and compared BLEU score.

Following the investigation of the achieved results, it can be said that a customisable SMT engine performed better in terms of the specialised terminology than a general SDL Language Cloud or Google Translate. However, more general SMT engines were able to produce a better syntax of Polish short sentences than the medical KantanMT engine, but they failed where more complex sentences were encountered. In all instances, post-editing would be required in order to improve the quality of all TTs.

However, since the medical SMT engine achieved the desired scores, there will be minimal post-editing and translation required for the Polish MT output, whereas for the TTs received from Google Translate and SDL Language Cloud, there will be a significant amount of work required to improve the TTs and in some parts the content would need to be translated from scratch, since it does not match the desired context. So far, the Polish output generated for the purposes of this research has indicated that the SMT systems are not fully developed as yet to produce good quality translation for the English into Polish language pair.

KantanMT Notes: As Ewa points out, it is indeed important to customise the KantanMT engines fully before translating. This will ensure that the translated text is highly fluent and of high quality. Additional monolingual data will help to make sure that the fluency of the target language is of high quality.

When building an engine, it is important to find a balance between the scores and the quantity of data. We have noticed that in cases where the word count is low, the engine might still have high BLEU scores; but this is not always indicative of a good performance of the engine. It is important to train your engine with a fair amount of data (glossaries, bilingual, monolingual and even stock data), before the engine is ready for translation.

It is worth mentioning that the Polish language is very rich in vocabulary and grammar, which a machine does not understand and thus, human involvement is crucial, especially when dealing with highly specialised content such as medical. Since, the medical KantanMT engine needs to be improved to the point of achieving the best scores possible, more time is required in order to prepare post-editing rules, refine more specialised English-Polish medical data (e.g. more translation units in English-Polish bilingual parallel corpus and better selected English-Polish medical glossary) and improve the engine in terms of recognising the proper grammar and syntax of Polish language, which was not possible to achieve while carrying out this research and this shows that there is more to improve in SMT engines in order to achieve more accurate results.

In regard to Google Translate and SDL Language Cloud MT engines, it is worth highlighting that they are too general and therefore they may not reach the sufficient level of understanding the medical context enough to provide an accurate rendition into Polish. Also, it is important to mention that any Machine Translation cannot fully replace a human in the translation process and achieve fully professional and well-written translations, especially when translating from English into Polish.

References:

Krings, H. (2001). Repairing Texts: Empirical Investigations of Machine Translation Post-Editing Processes. Kent, Ohio: The Kent State University Press.

Deptraetere, Ilse (ed). (2011). Perspectives on Translation Quality: Text, Translation, Computational Processing. Berlin: De Gruyter Mounton.

Williams, J. and Chesterman, A. (2011). The map. Manchester, U.K.: St. Jerome Pub.

About Ewa Nitoń

Master’s student Ewa Nitoń of the University College London is a freelance English-Polish-English professional translation expert in Medicine, Life Science & Technology. She also runs Polyglot’s Supplement Reader. You can connect with Ewa on Linkedin and Proz.com.