Student Speak: First Time Using Machine Translation

Elodie Vermant, a Swansea University student, studying for an MA in Professional Translation, shares her experience on using Machine Translation for the first time at Swansea University.

The MLTM11 Translation Technologies Module is taught by Dr. Maria Fernandez-Parra, Lecturer, Languages, Translation and Communication at Swansea University. Read more experiences from her students.

Continue reading

All your Burning Questions Answered! How Machine Translation Helps Improve Translation Productivity (Part I)

Part I

We had so many questions during the Q&A in our last webinar session ‘How to Improve Translation Productivity‘ by the KantanMT Professional services team, that we decided to split the answers into two blog posts. So, if you don’t find your questions answered here, check out our blog next week for the remaining answers. 

KantanMT_ComputersInternet today is experiencing what is generally referred to as a ‘content explosion!’ In this fast-paced world, businesses have to strive harder and do more to stay ahead of the game – especially if they are a global business or if they have globalization aspirations. One fool-proof way in which a business can successfully go global is through effective localization. Yet, the huge amount of content available online makes human translation for everything almost impossible. The only viable option then in today’s competitive online environment is through the use of Machine Translation (MT).

On Wednesday 21st October, Tony O’Dowd, Chief Architect of KantanMT.com and Louise Faherty, Technical Project Manager at KantanMT presented a webinar where they showed how Language Service Providers (LSPs)  (as well as enterprises) can improve the translation productivity of the team, manage post-editing effort and easily schedule projects with powerful MT engines. Here is a link to the recording of the webinar on YouTube along with a transcript of the Q&A session.

The answers below are not recorded verbatim and minor edits have been made to make the text more readable.

Question: Do you have clients doing Japanese to English MT? What are the results, and how did you get them? (i.e., do you pre-process the Japanese?)

Answer (Tony O’Dowd): English to Japanese Machine Translation (MT) has indeed always posed a challenge in the MT industry. So is it possible to build a high quality, high fidelity MT system for this language combination? Well, there have been quite a few developments recently to improve the prospect of building effective engines in this language combination. For example, one of the latest changes we made on the KantanMT platform for improving the quality of MT is by using new and improved reordering models to make the translation from English to Japanese and Japanese to English much smoother, so we deliver a higher quality output. In addition to that, higher quality training data sets are now available for this language pair, compared to a couple of years ago, when I had started building English to Japanese engines. Back then it was really challenging. It is still requires some effort to build English to Japanese MT engines, but the fact that there’s more content available in these languages makes it slightly easier for us to build high-quality engines.

We are also developing example-based MT for these engines and it so far this is showing encouraging signs of improving quality for this language pair. However, we have not started deploying this development on the platform yet.

KantanMT note: For more insights into how you can prepare high-quality training data, read these tips shared by Tony O’Dowd, and Selçuk Özcan, co-founder of Transistent Language Automation Services during the webinar ‘Tips for Preparing Training Data for High Quality MT.’

Question: Have you got a webinar recorded or scheduled, where we could see how the system works hands-on?

Answer (Tony O’Dowd): If you go on to the KantanMT website, we have video links on the product features pages. So you can actually watch an explanation video while you are looking at the component.

We work in a very visual environment, and we think videos are a great way of explaining how the platform works. And, if you go on to the website, on the bottom left corner of the page, you will find our YouTube channel, which contains videos on all sorts of topics, including how to build your first enginehow to translate your first document and  how to improve the output of your engines.

If you click on the Resources menu on our site, you can access a number of tutorials that will talk you through the basics of Statistical Machine Translation Systems. In other words, explore the website and you should find what you need.

KantanMT note: Some other useful links for resources are listed below:

Question: Do you provide any Post-Editing recommendations or standards for standardising the PE process? You said translation productivity rose to 8k words per day – this is only PE, correct?

Answer (Tony O’Dowd): I will take the second question first! The 8,000 words per day is the Post-Editing (PE) rate, yes. It is not the raw translation rate. In Machine Translation, everything comes out pretranslated. So this number refers to the Post-Editing effort – like insertions, deletions, substitution of words, and so on that you need to do to get the content to publishable quality.

Louise Faherty: What we recommend to our clients is that when it comes to PE, they should try to use MT. A lot of translators who are new to using MT will try and translate manually, which is a natural tendency, of course. But what we advise our clients is to copy and paste the translation (MT) in the engine and use the MT. The more you use MT and the more you Post-Edit, the better your engine will become.

Tony O’Dowd: I will add something to Louise Faherty ’s comments there. The best example of PE recommendations that I have come across is provided by a group called TAUS. They are at the pivot of educating the industry on how to develop a proficiency in PE.

Subscribe to TAUS YouTube channel here.

Question: What do ‘PPX’ and ‘PEX’ stand for (as abbreviations)?

Answer (Louise Faherty  and Tony O’Dowd): PEX stands for Post-Editing Automation. PEX allows you to take the output of an MT engine and dynamically alter that. When would you need to use PEX? Suppose there is a situation where your engine is repeating the same error over and over again. What you can do in such cases is write a PEX file (developed in the GENTRY programming language). This allows the engine to look for patterns in the output of the engine and to dynamically change that in the output.

For example, one of our French clients did not want to have a space preceding a colon mark in the output of their MT (because this was one of their typographical standards and repeated throughout the content). So we wrote a PEX rule that forced a stylistic change in the output of the engine. This enabled the client to reduce the number of Post-Edits substantially.

PPX stands for Preprocessor automation. You can use PPX files for to normalise or improve the training data. It is based on our GENTRY programming language which is available to all our clients for free.

In short then, PPX is for your training data, while PEX is for the actual raw output of your engine.

For more questions and answers, stay tuned for the next part of this post!

Create, Test and Deploy Post-Editing Automation Rules with KantanMT PEX Rule Editor

The KantanPEX Rule Editor enables members of KantanMT reduce the amount of manual post-editing required for a particular translation by creating, testing and deploying post-editing automation rules on their Machine Translation engines (client profiles).

The editor allows users to evaluate the output of a PEX (Post-Editing Automation) rule on a sample of translated content without needing to upload it to a client profile and run translation jobs. Users can enter up to three pairs of search and replace rules, which will be run in descending order on your content.

How to use the KantanMT PEX Rule Editor

Login into your KantanMT account using your email and your password.

You will be directed to the ‘Client Profiles’ tab in the ‘My Client Profiles’ page.  The last profile you were working on will be ‘Active’ and marked in bold.

Active Profile, KantanMT, Client Profile

To use the ‘PEX-Rule Editor’ with a profile other than the ‘Active’ profile, click on the  new profile name to select that profile for use with the ‘Kantan PEX-Rule editor’.

Then click the ‘KantanMT’ tab and select ‘PEX Editor’ from the drop-down menu.

Client Profile, KantanMT, PEX Editor

You will be directed to the ‘PEX Editor’ page.

Type the content you wish to test on, in the ‘Test Content’ box.

Test Content, PEX Rule Editor, KantanMT

Type the content you wish to search for in the ‘PEX Search Rules’ box.

PEX Search Rules, KantanMT, PEX Editor

Type what you want the replacement to be in the ‘PEX Replacement Rules’ box and click on the ‘Test PEX Rules’ button to test the PEX-Rules.

PEX Replacement Rules, Pex Editor , KantanMt , Products

The results of your PEX-Rules will now appear in the ‘Output’ box.

Output Content , PEX Rule Editor

Give the rules you have created a name by typing in the ‘Rule Name’ box.

Rule Name, PEX Rule Editor , KantanMT

Select the profile you wish to apply this rule(s) to and then click on the ‘Upload Rule’ button.

Profile and Button, KantanMT , PEX

Additional Information

KantanMT PEX editor helps reduce the amount of manual post-editing required for a particular translation, hence, reducing project turn-around times and costs. For additional information on PEX-RULES and the Kantan PEX-Rule editor please click on the links below. For more details about  KantanMT localization products  and ways of improving work productivity and efficiency please contact us at info@kantanmt.com.

 

What is Translation Error Rate (TER)?

Translation Error Rate (TER) is a method used by Machine Translation specialists to determine the amount of Post-Editing required for machine translation jobs. The automatic metric measures the number of actions required to edit a translated segment inline with one of the reference translations. It’s quick to use, language independent and corresponds with post-editing effort. When tuning your KantanMT engine, we recommend a maximum score of 30%. A lower score means less post-editing is required!

How to use TER in KantanBuildAnalytics™

  • The TER scores for your engine are displayed in the KantanBuildAnalytics™ feature. You can get a quick overview or snapshot in the summary tab. But for a more in depth analysis and to calculate the amount of post-editing required for the engine’s MT output select the ‘TER Score’ tab, which takes you to the ‘TER Scores’ page.

TER

  • Place your cursor on the ‘TER Scores Chart’ to see the ‘Translation Error Rate’ for each segment.  If you hold the cursor over the segment, a pop-up will appear on your screen with details of each segment under these headings, ‘Segment no.’, ‘Score’, ‘Source’‘Reference/Target’ and ‘KantanMT Output’.

SEGMENT

  • To see a breakdown of the ‘TER Scores’ for each segment in a table format scroll down. You will now see a table with the headings ‘No’, ‘Source’, ‘Reference/Target’, ‘KantanMT Output’ and ‘Score’.

TABLE

  • To see an even more in depth breakdown of a particular ‘Segment’ click on the ‘Triangle’ beside each number.

TRIANGLE

  • To download the ‘TER Scores’ of all segments click on the ‘Download’ button on the ‘TER Scores’ page.

DOWNLOAD
This is one of the many features included in KantanBuildAnalytics, which can help the Localization Project Manager improve an engine’s quality after its initial training. To see other features used in KantanBuildAnalytics please see the links below.

Contact our team to get more information about KantanMT.com or to arrange a platform demonstration, demo@kantanmt.com.

Tips for Training Post-editors

A good quality Machine Translation engine relies on the quality of the bilingual data used to train it. For most MT users, this bilingual data can be translated by humans, or it can be fully post-edited MT output. In both cases, the quality of the data will influence the engines quality. 
Selçuk Özcan, Transistent’s Co-founder will discuss the differences and give some tips for successful post-editing.Selçuk Özcan, Transistent’s Co-founder has given KantanMT permission to publish his blog post on Translation Quality. This post was originally published in Dragosfer and on the GALA Blog website.

We have entered a new age, and a new technology has come into play: Machine Translation (MT). It’s globally accepted that MT systems dramatically increase productivity but it’s a hard struggle to integrate this technology into your production process. Apart from handling the engine building and optimizing procedures, you have to transform your traditional workflow:

cp2

The traditional roles of the linguists (translators, editors, reviewers etc.) are reconstructed and converged to find a suitable place in this new, innovative workflow. The emerging role is called ‘post-edit’ and the linguists assigned to this role are called ‘post-editors’. You may want to recruit some willing linguists for this role, or persuade your staff to adopt a different point of view. But whatever the case may be, some training sessions are a must.

What are covered in training sessions?

1. Basic concepts of MT systems

Post-editors should have a notion of the dynamics of MT systems. It is important to focus on the system that is utilized (RBMT/SMT/Hybrid). For widely used SMT systems, it’s necessary for them to know:

  • how the systems behave
  • the functions of the Translation Model and Language Model*
  • input (given set of data) and output (raw MT output) relationship
  • what changes in different domains

* It’s not a must to give detailed information about that topics but touching on the issue will make a difference in determining the level of technical backgrounds of candidates. Some of the candidates may be included in testing team.

2. The characteristics of raw MT output

Post-editors should know the factors affecting MT output. On the other hand, the difference between working on fuzzy TM systems and with SMT systems has to be mentioned during a proper training session. Let’s try to figure out what to be given:

  • MT process is not the ‘T’ of the TEP workflow and raw MT output is not the target text expected to be output of ‘T’ process.
  • In the earlier stages of SMT engines, the output quality varies depending on the project’s dynamics and errors are not identical. As the system improves quality level becomes more even and consistent within the same domain.
  • There may be some word or phrase gaps in the systems’ pattern mappings. (Detecting these gaps is one of the main responsibilities of testing team but a successful post-editor must be informed about the possible gaps.)

3. Quality issues

This topic has two aspects: defining required target (end product) quality, and evaluation and estimation of output quality. The first one gives you the final destination and the second one makes you know where you are.

Required quality level is defined according to the project requirements but it mostly depends on target audience and intended usage of the target text. It seems similar to the procedure in TEP workflow. However, it’s slightly different; engine improvement plan should also be considered while defining the target quality level. Basically, this parameter is classified into two groups: publishable andunderstandable quality.

Evaluation and estimation aspect is a little bit more complicated. The most challenging factor is standardizing measurement metrics. Besides, the tools and systems used to evaluate and estimate the quality level have some more complex features. If you successfully establish your quality system, then adversities become easier to cope with.

It’s post-editors’duty to apprehend the dynamics of MT quality evaluation, and the distinction between MT and HT quality evaluation procedures. Thus, they are supposed to be aware of the expected error patterns. It will be more convenient to utilize the error categorization with your well-trained staff (QE staff and post-editors).

4. Post-editing Technique

The fourth and the last topic is the key to success. It covers appropriate method and principles, as well as the perspective post-editors usually acquire. Post-edit technique is formed using the materials prepared for the previous topics and the data obtained from the above mentioned procedures, and it is separately defined for almost every individual customized engines.

The core rule for this topic is that post-edit technique, as a concept, is likely to be definitely differentiated from traditional edit and/or review technique(s). Post-editors are likely to be capable of:

  • reading and analyzing the source text, raw MT output and categorized and/or annotated errors as a whole.
  • making changes where necessary.
  • considering the post-edited data as a part of data set to be used in engine improvement, and performing his/her work accordingly.
  • applying the rules defined for the quality expectation levels.

As briefly described in topic #3, the distance between the measured output quality and required target quality may be seen as the post-edit distance. It roughly defines the post-editor’s tolerance and the extent to which he/she will perform his work. Other criterion allowing us to define the technique and the performance is the target quality group. If the target text is expected to be of publishable quality then it’s called full post-edit and otherwise light post-edit. Light & full post-edit techniques can be briefly defined as above but the distinction is not always so clear. Besides, under/over edit concepts are likely to be included to above mentioned issues. You may want to include some more details about these concepts in the post-editor training sessions; enriching the training materials with some examples would be a great idea!

About Selçuk Özcan

Selçuk Özcan has more than 5 years’ experience in the language industry and is a co-founder of Transistent Language Automation Services. He holds degrees in Mechanical Engineering and Translation Studies and has a keen interest in linguistics, NLP, language automation procedures, agile management and technology integration. Selçuk is mainly responsible for building high quality production models including Quality Estimation and deploying the ‘train the trainers’ model. He also teaches Computer-aided Translation and Total Quality Management at the Istanbul Yeni Yuzyil University, Translation & Interpreting Department.

Transistent Partner PR Image 320x320

 

Read More about KantanMT’s Partnership with Transistent in the official News Release, or if you are interested in joining the KantanMT Partner Program, contact Louise (info@kantanmt.com) for more details on how to get involved. 

 

 

Translation Quality: How to Deal with It?

KantanMTSelcuk Ozcan, Transistent, KantanMT started the New Year on a high note with the addition of the Turkish Language Service Provider, Transistent to the KantanMT Preferred MT Supplier partner program.

Selçuk Özcan, Transistent’s Co-founder has given KantanMT permission to publish his blog post on Translation Quality. This post was originally published in Dragosfer and the Transistent Blog.

 

 

Literally, the word quality has several meanings, one of them being “a high level of value or excellence” according to Merriam-Webster’s dictionary. How should one deal with this idea of “excellence” when the issue at hand is translation quality? What is required, it looks like, is a more pragmatic and objective answer to the abovementioned question.

This brings us to the question “how could an approach be objective?” Certainly, the issue should be assessed through empirical findings. But how? We are basically in need of an assessment procedure with standardized metrics. Here, we encounter another issue; standardization of translation quality. From now on, we need to associate these concepts with the context itself in order to make them clear.

Image 1 blog Transistent

Monolingual issues

Bilingual issues

As it’s widely known, three sets of factors have an effect on the quality of the translation process in general. Basically, analyzing source text’s monolingual issues, target text’s monolingual issues and bilingual issues defines the quality of the work done. Nevertheless, the procedure should be based on the requirements of the domain, audience and linguistic structure of both languages (source and target); and in each step, this key question should be considered: ‘Does the TT serve to the intended purpose?’

We still have not dealt with the standardization and quality of acceptable TT’s. The concept of “acceptable translation” has always been discussed throughout the history of translation studies. No one is able to precisely explain the requirements. However, a further study on dynamic QA models needs to go into details.There are various QA approaches and models. For most of them, acceptable translation falls into somewhere between bad and good quality, depending on the domain and target audience. The quality level is measured through the translation error rates developed to assess MT outputs (BLEU, F-Measure and TER) and there are four commonly accepted quality levels; bad, acceptable, good and excellent.

The formula is so simple: the TT containing more errors is considered to be worse quality. However, the errors should be correlated with the context and many other factors, such as importance for the client, expectations of the audience and so on. These factors define the errors’ severity as minor, major, and critical. A robust QA model should be based upon accurate error categorization so that reliable results may be obtained.

We tried to briefly describe the concept of QA modeling. Now, let’s see what’s going on in practice. There are three publicly available QA models which inspired many software developers on their QA tool development processes. One of them is LISA (Localization Industry Standards Association) QA Model. The LISA Model is very well known in the localization and translation industry and many company-specific QA models have been derived from it.

The second one is J2450 standard that was generated by SAE (Society for Automotive Engineers) and the last one is EN15038 standard, approved by CEN (Comité Européen de Normalisation) in 2006. All of the above mentioned models are the static QA models. One should create his/her own frameworks in compliance with the demands of the projects. Nowadays, many of the institutes have been working on dynamic QA models (EU Commission and TAUS). These models enable creating different metrics for several translation/localization projects.

About Selçuk Özcan

Selçuk Özcan has more than 5 years’ experience in the language industry and is a co-founder of Transistent Language Automation Services. He holds degrees in Mechanical Engineering and Translation Studies and has a keen interest in linguistics, NLP, language automation procedures, agile management and technology integration. Selçuk is mainly responsible for building high quality production models including Quality Estimation and deploying the ‘train the trainers’ model. He also teaches Computer-aided Translation and Total Quality Management at the Istanbul Yeni Yuzyil University, Translation & Interpreting Department.

Read More about KantanMT’s Partnership with Transistent in the official News Release, or if you are interested in joining the KantanMT Partner Program, contact Louise (info@kantanmt.com) for more details on how to get involved. 

Transistent KantanMT Preferred MT Supplier

 

Language Industry Interview: KantanMT speaks with Maxim Khalilov, bmmt Technical Lead

Language Industry Interview: KantanMT speaks with Maxim Khalilov, bmmt Technical LeadThis year, both KantanMT and its preferred Machine Translation supplier, bmmt, a progressive Language Service Provider with an MT focus, exhibited side by side at the tekom Trade Fair and tcworld conference in Stuttgart, Germany.

As a member of the KantanMT preferred partner program, bmmt works closely with KantanMT to provide MT services to its clients, which include major players in the automotive industry. KantanMT was able to catch up with Maxim Khalilov, technical lead and ‘MT guru’ to find out more about his take on the industry and what advice he could give to translation buyers planning to invest in MT.

KantanMT: Can you tell me a little about yourself and, how you got involved in the industry?

Maxim Khalilov: It was a long and exciting journey. Many years ago, I graduated from the Technical University in Russia with a major in computer science and economics. After graduating, I worked as a researcher for a couple of years in the sustainable energy field. But, even then I knew I still wanted to come back to IT Industry.

In 2005, I started a PhD at Universitat Politecnica de Catalunya (UPC) with a focus on Statistical Machine Translation, which was a very new topic back then. By 2009, after successfully defending my thesis, I moved to Amsterdam where I worked as a post-doctoral researcher at the University of Amsterdam and later as a RD manager at TAUS.

Since February 2014, I’ve been a team lead at bmmt GmbH, which is a German LSP with strong focus on machine translation.

I think my previous experience helped me to develop a deep understanding of the MT industry from both academic and technical perspectives.  It also gave me a combination of research and management experience in industry and academia, which I am applying by building a successful MT business at bmmt.

KMT: As a successful entrepreneur, what were the three greatest industry challenges you faced this year?

MK: This year has been a challenging one for us from both technical and management perspectives. We started to build an MT infrastructure around MOSES practically from scratch. MOSES was developed by academia and for academic use, and because of this we immediately noticed that many industrial challenges had not yet been addressed by MOSES developers.

The first challenge we faced was that the standard solution does not offer a solid tag processing mechanism – we had to invest into a customization of the MOSES code to make it compatible with what we wanted to achieve.

The second challenge we faced was that many players in the MT market are constantly talking about the lack of reliable, quick and cheap quality evaluation metrics. BLEU-like scores unfortunately are not always applicable for real world projects. Even if they are useful when comparing different iterations of the same engines, they are not useful for cross language or cross client comparison.

Interestingly, the third problem has a psychological nature; Post-Editors are not always happy to post edit MT output for many reasons, including of course the quality of MT. However, in many situations the problem is that MT post-editing requires a different skillset in comparison with ‘normal’ translation and it will take time before translators adopt fully to post editing tasks.

KMT: Do you believe MT has a say in the future, and what is your view on its development in global markets?

MK: Of course, MT will have a big say in the language services future. We can see now that the MT market is expanding quickly as more and more companies are adopting a combination TM-MT-PE framework as their primary localization solution.

“At the same time, users should not forget that MT has its clear niche”

I don’t think a machine will be ever able to translate poetry, for example, but at the same time it does not need to – MT has proved to be more than useful for the translation of technical documentation, marketing material and other content which represents more than 90% of the daily translators load worldwide.

Looking at the near future I see that the integration of MT and other cross language technologies with Big Data technologies will open new horizons for Big Data making it a really global technology.

KMT: How has MT affected or changed your business models?

MK: Our business model is built around MT; it allows us to deliver translations to our customers quicker and cheaper than without MT, while at the same time preserving the same level of quality and guaranteeing data security. We not only position MT as a competitive advantage when it comes to translation, but also as a base technology for future services. My personal belief, which is shared by other bmmt employees is that MT is a key technology that will make our world different – where translation is available on demand, when and where consumers need it, at a fair price and at its expected quality.

KMT: What advice can you give to translation buyers, interested in machine translation?

MK: MT is still a relatively new technology, but at the same time there is already a number of best practices available for new and existing players in the MT market. In my opinion, the four key points for translation buyers to remember when thinking about adopting machine translation are:

  1. Don’t mix it up with TM – While TMs mostly support human translators storing previously translated segments, MT translates complete sentences in an automatic way, the main difference is in these new words and phrases, which are not stored in a TM database.
  2. There is more than one way to use MT – MT is flexible, it can be a productivity tool that enables translators to deliver translations faster with the same quality as in the standard translation framework. Or MT can be used for ‘gisting’ without post-editing at all – something that many translation buyers forget about, but, which can be useful in many business scenarios. A good example of this type of scenario is in the integration of MT into chat widgets for real-time translation.
  3. Don’t worry about quality – Quality Assurance is always included in the translation pipeline and we, like many other LSPs guarantee, a desired level of quality to all translations independently of how the translations were produced.
  4. Think about time and cost – MT enables translation delivery quicker and cheaper than without MT.

A big ‘thank you’ to Maxim for taking time out of his busy schedule to take part in this interview, and we look forward to hearing more from Maxim during the KantanMT/bmmt joint webinar ‘5 Challenges of Scaling Localization Workflows for the 21st Century’ on Thursday November 20th (4pm GMT, 5pm CET and 8am PST).

KantanMT Industry Webinar 5 Challenges of Scaling Localization for the 21st Century_Webinar

Register here for the webinar or to receive a copy of the recording. If you have any questions about the services offered from either bmmt or KantanMT please contact:

Peggy Linder, bmmt (peggy.lindner@bmmt.eu)

Louise Irwin, KantanMT (louisei@kantanmt.com)