All your Burning Questions Answered! How Machine Translation Helps Improve Translation Productivity (Part I)

Part I

We had so many questions during the Q&A in our last webinar session ‘How to Improve Translation Productivity‘ by the KantanMT Professional services team, that we decided to split the answers into two blog posts. So, if you don’t find your questions answered here, check out our blog next week for the remaining answers. 

KantanMT_ComputersInternet today is experiencing what is generally referred to as a ‘content explosion!’ In this fast-paced world, businesses have to strive harder and do more to stay ahead of the game – especially if they are a global business or if they have globalization aspirations. One fool-proof way in which a business can successfully go global is through effective localization. Yet, the huge amount of content available online makes human translation for everything almost impossible. The only viable option then in today’s competitive online environment is through the use of Machine Translation (MT).

On Wednesday 21st October, Tony O’Dowd, Chief Architect of KantanMT.com and Louise Faherty, Technical Project Manager at KantanMT presented a webinar where they showed how Language Service Providers (LSPs)  (as well as enterprises) can improve the translation productivity of the team, manage post-editing effort and easily schedule projects with powerful MT engines. Here is a link to the recording of the webinar on YouTube along with a transcript of the Q&A session.

The answers below are not recorded verbatim and minor edits have been made to make the text more readable.

Question: Do you have clients doing Japanese to English MT? What are the results, and how did you get them? (i.e., do you pre-process the Japanese?)

Answer (Tony O’Dowd): English to Japanese Machine Translation (MT) has indeed always posed a challenge in the MT industry. So is it possible to build a high quality, high fidelity MT system for this language combination? Well, there have been quite a few developments recently to improve the prospect of building effective engines in this language combination. For example, one of the latest changes we made on the KantanMT platform for improving the quality of MT is by using new and improved reordering models to make the translation from English to Japanese and Japanese to English much smoother, so we deliver a higher quality output. In addition to that, higher quality training data sets are now available for this language pair, compared to a couple of years ago, when I had started building English to Japanese engines. Back then it was really challenging. It is still requires some effort to build English to Japanese MT engines, but the fact that there’s more content available in these languages makes it slightly easier for us to build high-quality engines.

We are also developing example-based MT for these engines and it so far this is showing encouraging signs of improving quality for this language pair. However, we have not started deploying this development on the platform yet.

KantanMT note: For more insights into how you can prepare high-quality training data, read these tips shared by Tony O’Dowd, and Selçuk Özcan, co-founder of Transistent Language Automation Services during the webinar ‘Tips for Preparing Training Data for High Quality MT.’

Question: Have you got a webinar recorded or scheduled, where we could see how the system works hands-on?

Answer (Tony O’Dowd): If you go on to the KantanMT website, we have video links on the product features pages. So you can actually watch an explanation video while you are looking at the component.

We work in a very visual environment, and we think videos are a great way of explaining how the platform works. And, if you go on to the website, on the bottom left corner of the page, you will find our YouTube channel, which contains videos on all sorts of topics, including how to build your first enginehow to translate your first document and  how to improve the output of your engines.

If you click on the Resources menu on our site, you can access a number of tutorials that will talk you through the basics of Statistical Machine Translation Systems. In other words, explore the website and you should find what you need.

KantanMT note: Some other useful links for resources are listed below:

Question: Do you provide any Post-Editing recommendations or standards for standardising the PE process? You said translation productivity rose to 8k words per day – this is only PE, correct?

Answer (Tony O’Dowd): I will take the second question first! The 8,000 words per day is the Post-Editing (PE) rate, yes. It is not the raw translation rate. In Machine Translation, everything comes out pretranslated. So this number refers to the Post-Editing effort – like insertions, deletions, substitution of words, and so on that you need to do to get the content to publishable quality.

Louise Faherty: What we recommend to our clients is that when it comes to PE, they should try to use MT. A lot of translators who are new to using MT will try and translate manually, which is a natural tendency, of course. But what we advise our clients is to copy and paste the translation (MT) in the engine and use the MT. The more you use MT and the more you Post-Edit, the better your engine will become.

Tony O’Dowd: I will add something to Louise Faherty ’s comments there. The best example of PE recommendations that I have come across is provided by a group called TAUS. They are at the pivot of educating the industry on how to develop a proficiency in PE.

Subscribe to TAUS YouTube channel here.

Question: What do ‘PPX’ and ‘PEX’ stand for (as abbreviations)?

Answer (Louise Faherty  and Tony O’Dowd): PEX stands for Post-Editing Automation. PEX allows you to take the output of an MT engine and dynamically alter that. When would you need to use PEX? Suppose there is a situation where your engine is repeating the same error over and over again. What you can do in such cases is write a PEX file (developed in the GENTRY programming language). This allows the engine to look for patterns in the output of the engine and to dynamically change that in the output.

For example, one of our French clients did not want to have a space preceding a colon mark in the output of their MT (because this was one of their typographical standards and repeated throughout the content). So we wrote a PEX rule that forced a stylistic change in the output of the engine. This enabled the client to reduce the number of Post-Edits substantially.

PPX stands for Preprocessor automation. You can use PPX files for to normalise or improve the training data. It is based on our GENTRY programming language which is available to all our clients for free.

In short then, PPX is for your training data, while PEX is for the actual raw output of your engine.

For more questions and answers, stay tuned for the next part of this post!

Understanding and Improving your KantanMT Engine with KantanTimeLine™

Ease of use and simplicity are always on the minds of our Developers, hence the making of KantanTimeLine™. KantanTimeLine enables KantanMT clients to view the life cycle of their KantanMT engine. This empowers our clients as they are able to find exactly what is negatively or positively affecting the quality of their engines. Clients are able to keep track of things such as, Training Data uploads, Translation jobs, Engine Tuning, templates, Build jobs and so on through the KantanTimeLine.

How to use KantanTimeLine™

Login into your KantanMT account using your email and your password.

You will be directed to the ‘My Client Profiles’ page. You will be in the ‘Client Profiles’ section of the ‘My Client Profiles’ page. The last profile you were working on will be ‘Active’.

Active profile

If you wish to use ‘KantanTimeLine’ with another profile other than the ‘Active’ profile. Click on the profile that you want to you wish to view the ‘KantanTimeLine’.

Click on the ‘TimeLine’ tab.

TimeLine tab

You will now be directed to the ‘TimeLine’ page for your chosen profile.

TimeLine

To restore an Archived Build select the Build you wish to restore from the ‘Archives’ drop-down menu and click on the ‘Restore’ button.

Archive and Restore

To delete an archived Build click on the ‘Delete’ button.

Delete

To archive a Build click on the ‘Archive’ button of the build you wish to archive.

Archive

To view or edit the description of a build click on the ‘Yellow Notepad’ icon.

Yellow Notepad

To filter the timeline click on the ‘Filter’ drop down-menu and select the filter you wish to use.

Filters

Additional Information and Support

KantanTimeLine™ is one of the many products offered by KantanMT to make  the integration of Machine Translation into the workflow of our clients seamless. For more information on TimeLine or any KantanMT products please contact us at info@kantanmt.com.

TimeLine can also be used in KantanBuildAnalytics. To learn how TimeLine is incorporated into KantanBuildAnalytics please click on the link below or contact us at  info@kantanmt.com.

Using F-Measure in Kantan BuildAnalytics

What is F-Measure ?

KantanMT Logo 800x800 F-Measure is an automated measurement that determines the precision and recall  capabilities of a KantanMT engine. F-Measure measures enables you to determine the  quality and performance of your KantanMT engine

  • To see the accuracy and performance of your engine click on the ‘F-measure Scores’ tab. You will now be directed to the ‘F-measure Scores’ page.

F-Measure tab

  • Place your cursor on the ‘F-measure Scores Chart’ to see the individual score of each segment. A pop-up will now appear on your screen with details of the segment under these headings, ‘Segment no.’, ‘Score’, ‘Source’, ‘Reference/Target’ and ‘KantanMT Output’.

Segment

  • To see the ‘F-measure Scores’ of each segment in a table format scroll down. You will now see a table with the headings ‘No’, ‘Source’, ‘Reference/Target’, ‘KantanMT Output’ and ‘Score’.
  • To see an even more in depth breakdown of a particular ‘Segment’ click on the Triangle beside the number of the segment you wish to view.Triangle
  • To reuse the engine as Test Data click on the ‘Reuse as Test Data’. When you do so, the ‘Reuse as Test Data’ button will change to ‘Delete Test Data’.Test Data
    Delete Test Data
  • To download the ‘F-measure Scores’, ‘BLEU Score’ and ‘TER Scores’ of all segments click on the ‘Download’ button on either the ‘F-measure Scores’, ‘BLEU Score’ or ‘TER Scores’ page.download

This is one of the features provided by Kantan BuildAnalytics to improve an engine’s quality after its initial training .To see other features used by Kantan BuildAnalytics please click on the link below .To get more information about KantanMT and the services we provide please contact our support team at  at info@kantanmt.com.

What is KantanISR and Why do I need it ?

KantanISR technology enables KantanMT members to perform instant segment retraining using a pop-up editor. The technology is designed to permit the near-instantaneous submission of post-edited translations into a KantanMT engine so that KantanMT members can submit segments for retraining, hence bypassing the need to completely rebuild the engine.

KantanISR was developed with usability, efficiency and productivity in mind as members simply need to login to their KantanMT account, go to their main dashboard and submit new training segments using the KantanISR Editor. This adding of high quality training data to a KantanMT engine will improve the translation quality of that engine and reduce post-editing requirements.

Using KantanISR

      1. Login into your KantanMT account using your email and your password.
      2. You will be directed to the ‘My Client Profiles’ page. You will be in the ‘Client Profiles’section of the ‘My Client Profiles’ page. The last profile you were working on will be‘Active’.
      3. If you wish to use the ‘KantanISR’ with another profile other than the ‘Active’ profile. Click on the profile you wish to use the ‘KantanISR’ with, then click on the ‘Training Data’ tab.
      4. You will be directed to the ‘Training Data’ page. Now click on the ‘IRS’ tab.
      5. The ‘KantanISR’ wizard will now pop-up on your screen.
      6. Add the source language text in the ‘Source’ text editor fields. Add the corresponding target language text in the ‘Target’ text editor fields.
      7. Then click on the ‘Save’ button if your happy with your retraining data. If not click the‘Cancel’ button.
      8. When you click the save button a ‘KantanISR successful’ pop-up will appear on your screen, click the ‘OK’ button and you will be directed back to the ‘Training Data’ page.

Using KantanISR through KantanAPI

Please Note: The KantanAPI is only available to KantanMT members in the Enterprise Plan.

Members’ can also get the benefit of KantanISR through KantanAPI by using HTTP

GET requests. The API expects:

  • A user authorisation token (‘API token’) which can be gotten by clicking on the ‘API’
  • The name of the client profile you wish to use.
  • A source segment and its target segment in the languages specified when profile was created.

To learn more about KantanISR or get help with KantanMT technologies, please contact us at info@kantanmt.com. Hear from the Development team on why KantanISR increases productivity and efficiency for KantanMT customers.