Giulia Mattoni, an Italian Translation Technology student from DCU talks about her experience using Machine Translation for evaluating player support content localization. Giulia’s fascinating view illustrates why this area needs further research, and how she used KantanMT to evaluate MT and post-editing for this type content.
What made you decide to evaluate MT and Post-editing in game localization?
During my Master’s in Translation Technology I extensively studied about Machine Translation developments, its potential and its most recent deployments. Among the domains in which MT is applied, Game Localization appeared to be one of the least discussed topics, providing me with ample scope for research.
Although it is true that ‘transcreation’ is a keyword in game localization, there are repetitive strings in some game assets, which could be translated faster with the aid of Machine Translation. Therefore, the objective of my research was to demonstrate that by leveraging and properly implementing Machine Translation technology would be beneficial in game localization and led to reduction in the translator’s workload. Ideally, this could be achieved while maintaining the same high quality as translating with traditional methods.
Can you tell us a little about how you built your KantanMT engine?
Initially, the training data used to build the engine was provided by the industrial partner [Company A]* in Dublin, a technical service provider to the Video Game Industry. This data consisted of two Translation Memories (TMs) and a glossary relating to the game ‘.’ The training data needed to undergo extensive pre-processing, to be suitable for building the engine.
This process mainly involved eliminating duplicates, deleting external and inner tags, correction of misalignments, splitting up of long segments and deletion of empty entries. As cleaning the data resulted in a considerable reduction in the TM size, I needed to include more training data. So, to increase the word count and build a good quality MT engine, I used the stock training data from KantanLibrary™, which I found was essential to the process of improving my engine quality.
The first TM added included content from the support portals of different applications and software solutions, while the second one was a cleansed TM from the European Commission’s DGT. Although not gaming-related, these TMs were considered the most pertinent because of the technical nature of the content, in line with player support texts that are included in the research.
As a further intervention, monolingual training data was added: I downloaded and cleaned monolingual texts from [Game] official web page. Finally, after checking KantanMT’s Gap Analysis feature, I uploaded a file containing non-translatable assets.
What type of texts did you use for the translation tasks and why?
The English source texts, made available by [Company B], were player support assets from [Game]. It is comprised of procedures and instructions given by the game’s support team in order to respond to any queries on how to solve problems, articles containing explanations of new features of the game, and reports on bugs solved.
The MT tool is particularly efficient when addressing repetitive vocabulary, structured content and predictable syntax of repetitive texts, and this aids translation of strings containing context-free and repetitive assets of the video games. Considering this, and in the light of player support texts characteristics, i.e. short segmentation and repetitiveness of the content, this asset was considered suitable for machine translation.
Can you describe your evaluation techniques?
In order to assess the quality of the MT output, I used both a quantitative and qualitative method. Firstly, I considered the temporal effort by measuring and comparing the time needed for human translation of the source sample text, and post-editing of the Machine Translated version.
Secondly, a qualitative evaluation comprising of error annotation, direct judgments and rating was performed using KantanLQR. KantanLQR is a highly adaptable language quality review tool, which helps streamline and automate the review process.
Participants were asked to select the typology of error encountered within the target text, from a drop-down menu in order to provide insight into the nature of the errors, and to rate each segment’s adequacy and fluency, on a scale of 1 to 5. Finally, a post-task questionnaire was used to collect the reviewer’s feedback.
What were your conclusions and were there any surprises?
Temporal data suggested that post-editing of raw Machine Translated output of player support assets was faster than translating from scratch. In terms of time saving, post-editing produced a productivity gain of 59.7%. However, the results of the LQR process showed that that improvements in the MT engine were needed.
I wouldn’t say this result was a surprise since the need of further in-domain monolingual data, terminology, and especially larger translation memories strictly relating to [Game] was evident while training the MT engine. This type of data would have produced a better engine than using bilingual training data taken from support portals for applications and software solutions and DGT archives.
I believe that this would have significantly reduced the number of errors produced by the engine. It was also predictable that the general perception of post-editing would have been cognitively demanding for the participants when compared with either human translation or TM-aided translation, especially because the latter is the usual working method in [Company A]. The participant perspectives gathered showed the need for a gradual phasing in of the methodology for users.
How did you find using KantanMT for your thesis research?
Although I was already familiar with KantanMT after using it for a project in DCU, I would thoroughly endorse the platform due to its user-friendly and intuitive interface and its strong technical performance.
However, because my training data was game-related and full of tags, building the engine was very challenging. But, with the adept guidance of the Professional Services Team at KantanMT, I could work around this issue very effectively. Similarly, long sentences needed to be split and shortened because by default the engine rejects any segments that are too long. However, I realise that this is essential to the process because shorter segments are more effective in training the segments and thereby creates better translations.
It was mainly thanks to the amazing team at KantanMT that I could overcome any issue I came across while building and deploying the MT engines! The technical team was extremely supportive, ready to help and suggest solutions at any time. Finally, I found the idea of building the engine from scratch more satisfactory than using a ready-to-use generic MT engine available online. In conclusion, I can say that my experience with KantanMT was definitely positive!
KantanMT Note: Thank you Guilia, we are glad you had a good experience using our platform. We always recommend that the training data is thoroughly cleansed before using them for translation projects. All tags should be removed, and all entities should be replaced, duplicates should be eliminated, terminology and capitalization should be consistent, and finally proper segmented translation units should be used to train the engine. Using long segments for training the engine lead to poor quality translation, which is why, to optimise the engine quality, the KantanMT platform has a default limit for each training segments.
About Giulia Mattoni
Born in Ancona, Italy, Giulia holds a BA in Modern and Contemporary Literature from the University of Macerata and a MSc in Translation Technology from Dublin City University. She also studied at the University of Alicante in Spain, as part of the Erasmus Study Programme. She gained experience in machine translation through her Master’s thesis, which was conducted in collaboration with the two industrial partners, a game localization company and KantanMT, and consisted of evaluating the feasibility of integrating statistical machine translation and post-editing into the game localization industry.
She became passionate about terminology after completing a project in collaboration with the World Intellectual Property Organisation (WIPO) on organic electronics, and pursued this interest with a traineeship in the Terminology Coordination Unit of the European Parliament in Luxembourg. Her working languages are Italian, English and Spanish, and she is currently learning French. She likes travelling and writing, and is passionate about ancient cultures.