Giulia Mattoni, an Italian Translation Technology student from DCU talks about her experience using Machine Translation for evaluating player support content localization. Giulia’s fascinating view illustrates why this area needs further research, and how she used KantanMT to evaluate MT and post-editing for this type content. Continue reading
Nikos Katris, submitted his thesis; ‘Evaluation of Two Statistical Machine Translation Systems within a Greek-English Cross-Language Information Retrieval Architecture’ to University of Limerick in October 2015. In his research he compared the results of KantanMT with the Moses system for information retrieval.
Nikos was supervised by Dr Richard Sutcliffe at the University of Limerick’s College of Science and Engineering Department of Computer Science and Information Systems (CSIS). Nikos kindly agreed to discuss his research in an interview. The University of Limerick and the Localisation Research Centre are KantanMT’s academic partners. Continue reading
For our fourth post in the ‘5 Questions’ series, we are very excited to introduce you to Louise Faherty, Technical Project Manager of the Professional Services team at KantanMT. This series of interviews aim to give you a deeper insight into the people at KantanMT. Continue reading
KantanMT.com was used in the course ‘Machine Translation and Post-editing‘, which was taught for the first time in the ‘Degree in Modern Languages Applied to Translation’ in UAH. English and Spanish were used as the main languages used during this course.
We caught up with Professor Cristina Toledo Báez, and in this post she describes her experience of using KantanMT during the course.
Dissemination of Machine Translation innovation is a major priority for us at KantanMT. We believe that Academic Partnerships have a huge role to play in furthering the scope of research and innovation in the field of Machine Translation, and as such we have partnered with a number of Universities to help students use the KanataMT platform in a real word scenario.
We are always looking for ways to improve the KantanMT platform, and to keep our finger on the pulse of the KantanMT user experience, we asked one of the students using the platform to answer some questions about the platform.
Welcome to Part II of the Q&A blog on How Machine Translation Helps Improve Translation Productivity. In case you missed the first part of our post, here’s a link to quickly have a look at what was covered.
Tony O’Dowd, Chief Architect of KantanMT.com and Louise Faherty, Technical Project Manager presented a webinar where they showed how LSPs (as well as enterprises) can improve the translation productivity of the language team, manage post-editing effort estimations and easily schedule projects with powerful MT engines. For this section, we are accompanied by Brian Coyle, Chief Commercial Officer at KantanMT, who joined the team on October, 2015 to strengthen KantanMT’s strategic vision.
We have provided a link to the slides used during the webinar below, along with a transcript of the Q&A session.
Please note that the answers below are not recorded verbatim and minor edits have been made to make the text more accessible.
Question: We are a mid-sized LSP and we would like to know what benefits would we enjoy if we choose to work with KantanMT, over building our own systems from scratch? The latter would be cheaper, wouldn’t it?
Answer (Brian): Tony and Louise have mentioned a lot of features available in KantanMT – indeed, the platform is very feature-rich and provides a great user experience. But on top of that, what’s really underneath KantanMT is the fact that it has access to a massive computing power, which is what Statistical Machine Translation requires in order to perform efficiently and quickly. KantanMT has the unique architecture to help provide instant on-demand access at scale.
As Louise Faherty mentioned, we are currently translating half a billion words per month and we have 760 servers deployed currently. So if you were trying to develop something yourself, it would be hard to reach this level of proficiency in your MT. Whilst no single LSP would probably need this total number of servers, to give you an idea of the cost involved, that kind of server deployment in a self-build environment would cost in the region of €25m.
We also offer 99.99% up time with triple data-centre disaster recovery. It would be very difficult and costly to build this kind of performance yourself. Also, with this kind of performance at your client’s disposal, you can offer Customised MT for mission critical web-based applications such as eCommerce sites.
Finally, a lot of planning, thought, development hours and research has gone into creating what we believe is the best user interface and the platform for MT, which also has the best functionality set with extreme ease of integration in the market place. So, it would be difficult for you to start on your own and build your own system that would be as robust and high quality as KantanMT.com.
Question: Could you also establish KantanNER rules to convert prices on an eCommerce websites?
Answer (Louise Faherty ): Yes, absolutely! With KantanNER, you can also establish rules, convert prices and so on. The only limitation with that being is that the exchange range will of course fluctuate. But there could be options as well of calculating that information dynamically – otherwise you would be looking at a fixed equation to convert those prices.
Question: My client does not want us to use MT because they have had bad experience in the past with Bing Translate – what would convince them to use KantanMT? How will the output be different?
Answer (Tony O’Dowd): One of things that you have to recognise in terms of using the KantanMT platform is that you are using MT to build customised machine translation engines. So you are not going to create generic engines (Bing Translate and Google Translate are generic engines). You would be building customised engines that are trained on the previous translations, glossaries that you clients have provided. You will also be using some of our stock engines that are relevant to your client’s domain.
So when you combine that, you get an engine that will mimic the translation style of your client. Indeed, instead of generic translation engines, you are using an engine that is designed to mirror the terminology and stylistic requirements of your client. If you can achieve this through Machine Translation, you will see that there is a lot less requirement for Post-Editing, and this is one of the most important things that drives away translators from using generic systems or broad-based systems and that’s why they choose customised systems. Clients and LSPs have tested the generic systems as well as customisable engines and found that cloud-based customisable MT add a value, which is not available on free, non-customisable MT platforms.
End of Q/A session
The KantanMT Professional Services Team would once again like to thank you for all your questions during the webinar and for sending in your questions by email.
Have more burning questions? Or maybe you would like to see the brilliant platform translate in a live environment? No problem! Just send an email to firstname.lastname@example.org and we will take care of the rest.
Want to stay informed about our new webinars? You can bookmark this page, or even better – sign up for our newsletter and ensure that you never miss a post!
Master’s student, Rafaella Athanasiadi of the University College London submitted her thesis as part of the MSc degree in Scientific, Technical and Medical Translation with Translation Technology. Rafaella was supervised by Teaching Fellow and Lecturer Dr. Emmanouela Patiniotaki and she used KantanMT.com for her research. This guest blog post looks at some of her conclusions on Machine Translation and the Localization Industry.
As Hutchins & Somers (c1992:1) argue, “the mechanization of translation has been one of humanity’s oldest dreams.” During the 20th century, the translation process changed radically. From spending endless hours in libraries to find the translation of a word, the translator has been placed in the centre of dozens of assistive tools. To name just a few, today, there are many translation software, terminology extraction tools, project management components, and machine translation systems, which translators have the opportunity to choose from while translating.
However, shifting the focus to audiovisual translation, it can be observed that not so many radical changes took place in that area, at least not until the introduction of machine translation systems in various projects (such as, the MUSA and the SUMAT project) that developed machine translation engines to optimise the subtitling process. Still, the results of such projects do not seem to be satisfactory enough to inspire confidence for the implementation of these engines in the subtitling process both by subtitling software developers and subtitlers.
Based on my personal research that focused primarily on the European setting, in the subtitling industry it seems that only freeware SRT Translator incorporates machine translation while also offering the features that subtitling software usually incorporate (i.e. uploading multimedia files and timecoding subtitles) at the moment. Nonetheless, SRT Translator, which is not very famous among subtitlers, uses solely Google Translator by default, which is a general-domain machine translation engine and not suitable for the purposes of audiovisual translation, one could argue. The quality of the output of Google Translator was tested by translating 35 subtitles of a comedy series. The output was incomprehensible and misleading in many cases.
Even though no further records of traditional subtitling software that incorporate machine translation could be found, there are many online translation platforms that allow users to upload and translate subtitles. Taking into consideration the European market, these can be either translation software like MemoQ, SDL Trados Studio and Wordfast that offer thability to load subtitle files and in some cases link them to the audiovisual content they are connected to, open source tools for translators like Google Translator Toolkit (GTT) or professional and private platforms like Transifex and XTM International that are used by companies and offered to their dedicated network of translators. Nonetheless, in order to enable machine translation in all the above applications, API keys must be purchased. GTT is an exception since it can be used for free anytime and only requires a Gmail account.
The fact that subscription fees have to be paid along with the costs of API keys for each machine translation engine provider puts their usability in question since costs may overweight subtitlers’ profits. Furthermore, these platforms cannot accommodate subtitlers’ needs; for instance, the option to upload and play multimedia files while translating the subtitles is not always possible nor any synchronization features for timecoding the subtitles to the audio track are offered. Transifex, however, is an exception since this localization platform offers users the option to upload multimedia files in the translation editor while translating the subtitles.
According to Macklovitch (2000:1) a translation memory is considered to be “a particular type of translation support tool that maintains a database of source and target language sentence pairs, and automatically retrieves the translation of those sentences in a new text which occur in the database.” Even though machine translation engines were developed through different projects to reduce subtitling time to the least possible degree, no attempts had been traced during this research to integrate a translation memory tool in a subtitling software for optimizing subtitling; at least in a European, Asian and Australian setting. As Smith (2013) argues, “traditionally subtitling has fallen outside the scope of translation memory packages, perhaps as it was thought to be too creative a process to benefit from the features such software offers.” However, as Diaz-Cintas (2015:638) discusses “DVD bonus material, scientific and technical documentaries, edutainment programmes, and corporate videos tend to contain the high level of lexical repetition that makes it worthwhile for translation companies to employ assisted translation and memory tools in the subtitling process.”
Even if such tools have not been integrated in subtitling software, translation memory components are used for subtitling purposes in cloud-based platforms such as GTT, Transifex and XTM International as well as in translation software, MemoQ, SDL Trados Studio, Wordfast Pro and Transit NXT by simply creating a translation memory before or while translating. It should be noted that Transit NXT is the only translation software that can accommodate the needs of subtitlers to a high level among the tools discussed in this research. Apart from the addition of specialized filters to load subtitles (that also exist in MemoQ, SDL Trados Studio and Wordfast Pro), subtitlers can upload multimedia files, translate subtitles while a translation memory component is active and also synchronise their subtitles with the Transit translation editor (Smith, 2013).
Figure 1: The translation editor of Transit NXT by Smith (2013)
The newly-founded company (2012) OOONA has taken a very interesting approach to subtitling by developing a unique cloud-based toolkit that is built exclusively for accommodating the needs of subtitlers. When asked the following question within the context of the MSc thesis,
Considering that other cloud-based translation platforms like GTT, Transifex and XTM International offer the option of uploading a TM or a terminology management component, do you think that it is important to offer it on a subtitling platform as well?
the representative of OOONA (Alex Yoffe) replied that not only will the company implement translation memory and terminology management components in the next phase of enhancing their platform but that they also consider these components to be very important for the subtitling process. In addition, Yoffe (2015) argued that OOONA intends to “add the option of using MT engines. Translators will be able to choose between Microsoft’s, Google’s, or customisable MT engines.” Therefore, it seems that OOONA will become a very powerful tool in the near future with features that will optimise the subtitling process to the maximum and shape the way that subtitling is carried out until now. The fact that Screen Systems, Cavena and EZTitles have partnered with OOONA is an indicator of how much potential there is in this toolkit.
As it can been argued based on the above, there is lack of subtitling software with incorporated translation memory tools. Therefore, this issue was further researched through the form of an online questionnaire that was disseminated to subtitling companies and freelance subtitlers. In addition, two companies that develop subtitling software, Screen Subtitling Systems and EZTitles, were asked to present their views on this topic. In both cases, their willingness to optimise the subtitling process in a semi-automated or a fully-automated way was apparent through their answers. The former company was in favour of a combination of machine translation tools with translation memory tools whereas the latter leaned towards a subtitling system with integrated translation memory and terminology management tools.
Nonetheless, the optimisation of the subtitling process has to coincide with the needs and preferences of subtitlers. Based on the respondents’ answers, it is clear that translation memory tools in subtitling software are desirable by subtitlers. In question,
Which tool would you prefer to have in a subtitling software? An integrated translation memory (TM) or machine translation (MT)?
more than half of the respondents (56.8%) chose TM. Interestingly, the answer Both received the second highest percentage (20.5%) which indicated that subtitlers demand as many assistive tools as possible.
One of the main conclusions that were drawn from this research was that machine translation engines need to be customised to produce good quality output and this can be achieved through customisable engines like KantanMT and Milengo. Moreover, translation memory tools are sought by subtitlers in subtitling software, while cloud-based platforms seem to occupy the translation industry today. Following this trend, subtitling software providers partner with online services/tools like the OOONA toolkit.
Based on the outcomes of this research, it could be said that we are certainly experiencing a new era in subtitling since the traditional PC-based subtitling software are now transforming into flexible and accessible platforms to enhance the subtitling experience as much as possible. It is a matter of time which tool and platform will rule the subtitling industry but one thing is for sure; the technologies of the future will bring a lot of changes in the traditional way of subtitling.
Diaz-Cintas, J., 2015. Technological Strides in Subtitling. In: S. Chan, ed. Routledge Encyclopedia of Translation Technology. London: Routledge, pp. 632-643.
Hutchins, J. W. & Somers, H. L. (c1992). An introduction to machine translation. London: Academic Press.
Macklovitch, E. (2000). Two Types of Translation Memory. In Proceedings of the ASLIB Conference on Translating and the Computer (Vol. 22).
Smith, Steve (2013). New Subtitling Feature in Transit NXT. November 11 2013. [Online]. Available from: http://www.star-uk.co.uk/blog/subtitling/working-with-subtitles-in-transit-nxt/. [Accessed 01 Sept. 2015].
Yoffe, A (2015). MT and TM tools in subtitling. [Interview]. 13 August 2015.
 Relevant data are available in Appendix 1 of the MSc thesis.
What is Gap Analysis and Kantan TimeLine ?
Gap Analysis identifies and reports any untranslated words in the training data set and allows you to take preventive measures quickly by fine tuning training data and filling data gaps.The KantanTimeLine™ provides a chronological history of activities for each engine and uses version control for precise management of released and production-ready engines.
Using Kantan TimeLine and Gap Analysis:
In KantanBuildAnalytics, click the Gap Analysis tab to see the amount of untranslated words that remain in the generated translations. You will be directed to the Gap Analysis page, where you will see a breakdown of any gaps in your training data.
A table appears with 3 headings: ‘#’, Unknown Word, Reference/Source, KantanMT Output. Under those headings you will find details of any untranslated words, their source and the KantanMT Output.
Click Download to download your Gap Analysis report.
Note: You can also click the Timeline tab to view your profiles’s Timeline, which is essentially a record of the changes you have made on your engine.
This is one of the many features provided in KantanBuildAnalytics, which aids Localization Project Managers in improving an engine’s quality after its initial training. To see other features used in KantanBuildAnalytics suite please see the links below.
- BLEU in BuildAnalytics
- F-Measure in Kantan-BuildAnalytics
- KantanMT Timeline
- TER in Kantan BuildAnalytics
What is F-Measure ?
F-Measure is an automated measurement that determines the precision and recall capabilities of a KantanMT engine. F-Measure measures enables you to determine the quality and performance of your KantanMT engine
- To see the accuracy and performance of your engine click on the ‘F-measure Scores’ tab. You will now be directed to the ‘F-measure Scores’ page.
- Place your cursor on the ‘F-measure Scores Chart’ to see the individual score of each segment. A pop-up will now appear on your screen with details of the segment under these headings, ‘Segment no.’, ‘Score’, ‘Source’, ‘Reference/Target’ and ‘KantanMT Output’.
- To see the ‘F-measure Scores’ of each segment in a table format scroll down. You will now see a table with the headings ‘No’, ‘Source’, ‘Reference/Target’, ‘KantanMT Output’ and ‘Score’.
- To see an even more in depth breakdown of a particular ‘Segment’ click on the Triangle beside the number of the segment you wish to view.
- To reuse the engine as Test Data click on the ‘Reuse as Test Data’. When you do so, the ‘Reuse as Test Data’ button will change to ‘Delete Test Data’.
- To download the ‘F-measure Scores’, ‘BLEU Score’ and ‘TER Scores’ of all segments click on the ‘Download’ button on either the ‘F-measure Scores’, ‘BLEU Score’ or ‘TER Scores’ page.
This is one of the features provided by Kantan BuildAnalytics to improve an engine’s quality after its initial training .To see other features used by Kantan BuildAnalytics please click on the link below .To get more information about KantanMT and the services we provide please contact our support team at at email@example.com.
KantanISR technology enables KantanMT members to perform instant segment retraining using a pop-up editor. The technology is designed to permit the near-instantaneous submission of post-edited translations into a KantanMT engine so that KantanMT members can submit segments for retraining, hence bypassing the need to completely rebuild the engine.
KantanISR was developed with usability, efficiency and productivity in mind as members simply need to login to their KantanMT account, go to their main dashboard and submit new training segments using the KantanISR Editor. This adding of high quality training data to a KantanMT engine will improve the translation quality of that engine and reduce post-editing requirements.
- Login into your KantanMT account using your email and your password.
- You will be directed to the ‘My Client Profiles’ page. You will be in the ‘Client Profiles’section of the ‘My Client Profiles’ page. The last profile you were working on will be‘Active’.
- If you wish to use the ‘KantanISR’ with another profile other than the ‘Active’ profile. Click on the profile you wish to use the ‘KantanISR’ with, then click on the ‘Training Data’ tab.
- You will be directed to the ‘Training Data’ page. Now click on the ‘IRS’ tab.
- The ‘KantanISR’ wizard will now pop-up on your screen.
- Add the source language text in the ‘Source’ text editor fields. Add the corresponding target language text in the ‘Target’ text editor fields.
- Then click on the ‘Save’ button if your happy with your retraining data. If not click the‘Cancel’ button.
- When you click the save button a ‘KantanISR successful’ pop-up will appear on your screen, click the ‘OK’ button and you will be directed back to the ‘Training Data’ page.
Using KantanISR through KantanAPI
Please Note: The KantanAPI is only available to KantanMT members in the Enterprise Plan.
Members’ can also get the benefit of KantanISR through KantanAPI by using HTTP
GET requests. The API expects:
- A user authorisation token (‘API token’) which can be gotten by clicking on the ‘API’
- The name of the client profile you wish to use.
- A source segment and its target segment in the languages specified when profile was created.
To learn more about KantanISR or get help with KantanMT technologies, please contact us at firstname.lastname@example.org. Hear from the Development team on why KantanISR increases productivity and efficiency for KantanMT customers.