Student Speak: First Time Using Machine Translation

Elodie Vermant, a Swansea University student, studying for an MA in Professional Translation, shares her experience on using Machine Translation for the first time at Swansea University.

The MLTM11 Translation Technologies Module is taught by Dr. Maria Fernandez-Parra, Lecturer, Languages, Translation and Communication at Swansea University. Read more experiences from her students.

Continue reading

All your Burning Questions Answered! How Machine Translation Helps Improve Translation Productivity (Part I)

Part I

We had so many questions during the Q&A in our last webinar session ‘How to Improve Translation Productivity‘ by the KantanMT Professional services team, that we decided to split the answers into two blog posts. So, if you don’t find your questions answered here, check out our blog next week for the remaining answers. 

KantanMT_ComputersInternet today is experiencing what is generally referred to as a ‘content explosion!’ In this fast-paced world, businesses have to strive harder and do more to stay ahead of the game – especially if they are a global business or if they have globalization aspirations. One fool-proof way in which a business can successfully go global is through effective localization. Yet, the huge amount of content available online makes human translation for everything almost impossible. The only viable option then in today’s competitive online environment is through the use of Machine Translation (MT).

On Wednesday 21st October, Tony O’Dowd, Chief Architect of KantanMT.com and Louise Faherty, Technical Project Manager at KantanMT presented a webinar where they showed how Language Service Providers (LSPs)  (as well as enterprises) can improve the translation productivity of the team, manage post-editing effort and easily schedule projects with powerful MT engines. Here is a link to the recording of the webinar on YouTube along with a transcript of the Q&A session.

The answers below are not recorded verbatim and minor edits have been made to make the text more readable.

Question: Do you have clients doing Japanese to English MT? What are the results, and how did you get them? (i.e., do you pre-process the Japanese?)

Answer (Tony O’Dowd): English to Japanese Machine Translation (MT) has indeed always posed a challenge in the MT industry. So is it possible to build a high quality, high fidelity MT system for this language combination? Well, there have been quite a few developments recently to improve the prospect of building effective engines in this language combination. For example, one of the latest changes we made on the KantanMT platform for improving the quality of MT is by using new and improved reordering models to make the translation from English to Japanese and Japanese to English much smoother, so we deliver a higher quality output. In addition to that, higher quality training data sets are now available for this language pair, compared to a couple of years ago, when I had started building English to Japanese engines. Back then it was really challenging. It is still requires some effort to build English to Japanese MT engines, but the fact that there’s more content available in these languages makes it slightly easier for us to build high-quality engines.

We are also developing example-based MT for these engines and it so far this is showing encouraging signs of improving quality for this language pair. However, we have not started deploying this development on the platform yet.

KantanMT note: For more insights into how you can prepare high-quality training data, read these tips shared by Tony O’Dowd, and Selçuk Özcan, co-founder of Transistent Language Automation Services during the webinar ‘Tips for Preparing Training Data for High Quality MT.’

Question: Have you got a webinar recorded or scheduled, where we could see how the system works hands-on?

Answer (Tony O’Dowd): If you go on to the KantanMT website, we have video links on the product features pages. So you can actually watch an explanation video while you are looking at the component.

We work in a very visual environment, and we think videos are a great way of explaining how the platform works. And, if you go on to the website, on the bottom left corner of the page, you will find our YouTube channel, which contains videos on all sorts of topics, including how to build your first enginehow to translate your first document and  how to improve the output of your engines.

If you click on the Resources menu on our site, you can access a number of tutorials that will talk you through the basics of Statistical Machine Translation Systems. In other words, explore the website and you should find what you need.

KantanMT note: Some other useful links for resources are listed below:

Question: Do you provide any Post-Editing recommendations or standards for standardising the PE process? You said translation productivity rose to 8k words per day – this is only PE, correct?

Answer (Tony O’Dowd): I will take the second question first! The 8,000 words per day is the Post-Editing (PE) rate, yes. It is not the raw translation rate. In Machine Translation, everything comes out pretranslated. So this number refers to the Post-Editing effort – like insertions, deletions, substitution of words, and so on that you need to do to get the content to publishable quality.

Louise Faherty: What we recommend to our clients is that when it comes to PE, they should try to use MT. A lot of translators who are new to using MT will try and translate manually, which is a natural tendency, of course. But what we advise our clients is to copy and paste the translation (MT) in the engine and use the MT. The more you use MT and the more you Post-Edit, the better your engine will become.

Tony O’Dowd: I will add something to Louise Faherty ’s comments there. The best example of PE recommendations that I have come across is provided by a group called TAUS. They are at the pivot of educating the industry on how to develop a proficiency in PE.

Subscribe to TAUS YouTube channel here.

Question: What do ‘PPX’ and ‘PEX’ stand for (as abbreviations)?

Answer (Louise Faherty  and Tony O’Dowd): PEX stands for Post-Editing Automation. PEX allows you to take the output of an MT engine and dynamically alter that. When would you need to use PEX? Suppose there is a situation where your engine is repeating the same error over and over again. What you can do in such cases is write a PEX file (developed in the GENTRY programming language). This allows the engine to look for patterns in the output of the engine and to dynamically change that in the output.

For example, one of our French clients did not want to have a space preceding a colon mark in the output of their MT (because this was one of their typographical standards and repeated throughout the content). So we wrote a PEX rule that forced a stylistic change in the output of the engine. This enabled the client to reduce the number of Post-Edits substantially.

PPX stands for Preprocessor automation. You can use PPX files for to normalise or improve the training data. It is based on our GENTRY programming language which is available to all our clients for free.

In short then, PPX is for your training data, while PEX is for the actual raw output of your engine.

For more questions and answers, stay tuned for the next part of this post!

Create, Test and Deploy Post-Editing Automation Rules with KantanMT PEX Rule Editor

The KantanPEX Rule Editor enables members of KantanMT reduce the amount of manual post-editing required for a particular translation by creating, testing and deploying post-editing automation rules on their Machine Translation engines (client profiles).

The editor allows users to evaluate the output of a PEX (Post-Editing Automation) rule on a sample of translated content without needing to upload it to a client profile and run translation jobs. Users can enter up to three pairs of search and replace rules, which will be run in descending order on your content.

How to use the KantanMT PEX Rule Editor

Login into your KantanMT account using your email and your password.

You will be directed to the ‘Client Profiles’ tab in the ‘My Client Profiles’ page.  The last profile you were working on will be ‘Active’ and marked in bold.

Active Profile, KantanMT, Client Profile

To use the ‘PEX-Rule Editor’ with a profile other than the ‘Active’ profile, click on the  new profile name to select that profile for use with the ‘Kantan PEX-Rule editor’.

Then click the ‘KantanMT’ tab and select ‘PEX Editor’ from the drop-down menu.

Client Profile, KantanMT, PEX Editor

You will be directed to the ‘PEX Editor’ page.

Type the content you wish to test on, in the ‘Test Content’ box.

Test Content, PEX Rule Editor, KantanMT

Type the content you wish to search for in the ‘PEX Search Rules’ box.

PEX Search Rules, KantanMT, PEX Editor

Type what you want the replacement to be in the ‘PEX Replacement Rules’ box and click on the ‘Test PEX Rules’ button to test the PEX-Rules.

PEX Replacement Rules, Pex Editor , KantanMt , Products

The results of your PEX-Rules will now appear in the ‘Output’ box.

Output Content , PEX Rule Editor

Give the rules you have created a name by typing in the ‘Rule Name’ box.

Rule Name, PEX Rule Editor , KantanMT

Select the profile you wish to apply this rule(s) to and then click on the ‘Upload Rule’ button.

Profile and Button, KantanMT , PEX

Additional Information

KantanMT PEX editor helps reduce the amount of manual post-editing required for a particular translation, hence, reducing project turn-around times and costs. For additional information on PEX-RULES and the Kantan PEX-Rule editor please click on the links below. For more details about  KantanMT localization products  and ways of improving work productivity and efficiency please contact us at info@kantanmt.com.

 

What is Translation Error Rate (TER)?

Translation Error Rate (TER) is a method used by Machine Translation specialists to determine the amount of Post-Editing required for machine translation jobs. The automatic metric measures the number of actions required to edit a translated segment inline with one of the reference translations. It’s quick to use, language independent and corresponds with post-editing effort. When tuning your KantanMT engine, we recommend a maximum score of 30%. A lower score means less post-editing is required!

How to use TER in KantanBuildAnalytics™

  • The TER scores for your engine are displayed in the KantanBuildAnalytics™ feature. You can get a quick overview or snapshot in the summary tab. But for a more in depth analysis and to calculate the amount of post-editing required for the engine’s MT output select the ‘TER Score’ tab, which takes you to the ‘TER Scores’ page.

TER

  • Place your cursor on the ‘TER Scores Chart’ to see the ‘Translation Error Rate’ for each segment.  If you hold the cursor over the segment, a pop-up will appear on your screen with details of each segment under these headings, ‘Segment no.’, ‘Score’, ‘Source’‘Reference/Target’ and ‘KantanMT Output’.

SEGMENT

  • To see a breakdown of the ‘TER Scores’ for each segment in a table format scroll down. You will now see a table with the headings ‘No’, ‘Source’, ‘Reference/Target’, ‘KantanMT Output’ and ‘Score’.

TABLE

  • To see an even more in depth breakdown of a particular ‘Segment’ click on the ‘Triangle’ beside each number.

TRIANGLE

  • To download the ‘TER Scores’ of all segments click on the ‘Download’ button on the ‘TER Scores’ page.

DOWNLOAD
This is one of the many features included in KantanBuildAnalytics, which can help the Localization Project Manager improve an engine’s quality after its initial training. To see other features used in KantanBuildAnalytics please see the links below.

Contact our team to get more information about KantanMT.com or to arrange a platform demonstration, demo@kantanmt.com.

Tips for Training Post-editors

A good quality Machine Translation engine relies on the quality of the bilingual data used to train it. For most MT users, this bilingual data can be translated by humans, or it can be fully post-edited MT output. In both cases, the quality of the data will influence the engines quality. 
Selçuk Özcan, Transistent’s Co-founder will discuss the differences and give some tips for successful post-editing.Selçuk Özcan, Transistent’s Co-founder has given KantanMT permission to publish his blog post on Translation Quality. This post was originally published in Dragosfer and on the GALA Blog website.

We have entered a new age, and a new technology has come into play: Machine Translation (MT). It’s globally accepted that MT systems dramatically increase productivity but it’s a hard struggle to integrate this technology into your production process. Apart from handling the engine building and optimizing procedures, you have to transform your traditional workflow:

cp2

The traditional roles of the linguists (translators, editors, reviewers etc.) are reconstructed and converged to find a suitable place in this new, innovative workflow. The emerging role is called ‘post-edit’ and the linguists assigned to this role are called ‘post-editors’. You may want to recruit some willing linguists for this role, or persuade your staff to adopt a different point of view. But whatever the case may be, some training sessions are a must.

What are covered in training sessions?

1. Basic concepts of MT systems

Post-editors should have a notion of the dynamics of MT systems. It is important to focus on the system that is utilized (RBMT/SMT/Hybrid). For widely used SMT systems, it’s necessary for them to know:

  • how the systems behave
  • the functions of the Translation Model and Language Model*
  • input (given set of data) and output (raw MT output) relationship
  • what changes in different domains

* It’s not a must to give detailed information about that topics but touching on the issue will make a difference in determining the level of technical backgrounds of candidates. Some of the candidates may be included in testing team.

2. The characteristics of raw MT output

Post-editors should know the factors affecting MT output. On the other hand, the difference between working on fuzzy TM systems and with SMT systems has to be mentioned during a proper training session. Let’s try to figure out what to be given:

  • MT process is not the ‘T’ of the TEP workflow and raw MT output is not the target text expected to be output of ‘T’ process.
  • In the earlier stages of SMT engines, the output quality varies depending on the project’s dynamics and errors are not identical. As the system improves quality level becomes more even and consistent within the same domain.
  • There may be some word or phrase gaps in the systems’ pattern mappings. (Detecting these gaps is one of the main responsibilities of testing team but a successful post-editor must be informed about the possible gaps.)

3. Quality issues

This topic has two aspects: defining required target (end product) quality, and evaluation and estimation of output quality. The first one gives you the final destination and the second one makes you know where you are.

Required quality level is defined according to the project requirements but it mostly depends on target audience and intended usage of the target text. It seems similar to the procedure in TEP workflow. However, it’s slightly different; engine improvement plan should also be considered while defining the target quality level. Basically, this parameter is classified into two groups: publishable andunderstandable quality.

Evaluation and estimation aspect is a little bit more complicated. The most challenging factor is standardizing measurement metrics. Besides, the tools and systems used to evaluate and estimate the quality level have some more complex features. If you successfully establish your quality system, then adversities become easier to cope with.

It’s post-editors’duty to apprehend the dynamics of MT quality evaluation, and the distinction between MT and HT quality evaluation procedures. Thus, they are supposed to be aware of the expected error patterns. It will be more convenient to utilize the error categorization with your well-trained staff (QE staff and post-editors).

4. Post-editing Technique

The fourth and the last topic is the key to success. It covers appropriate method and principles, as well as the perspective post-editors usually acquire. Post-edit technique is formed using the materials prepared for the previous topics and the data obtained from the above mentioned procedures, and it is separately defined for almost every individual customized engines.

The core rule for this topic is that post-edit technique, as a concept, is likely to be definitely differentiated from traditional edit and/or review technique(s). Post-editors are likely to be capable of:

  • reading and analyzing the source text, raw MT output and categorized and/or annotated errors as a whole.
  • making changes where necessary.
  • considering the post-edited data as a part of data set to be used in engine improvement, and performing his/her work accordingly.
  • applying the rules defined for the quality expectation levels.

As briefly described in topic #3, the distance between the measured output quality and required target quality may be seen as the post-edit distance. It roughly defines the post-editor’s tolerance and the extent to which he/she will perform his work. Other criterion allowing us to define the technique and the performance is the target quality group. If the target text is expected to be of publishable quality then it’s called full post-edit and otherwise light post-edit. Light & full post-edit techniques can be briefly defined as above but the distinction is not always so clear. Besides, under/over edit concepts are likely to be included to above mentioned issues. You may want to include some more details about these concepts in the post-editor training sessions; enriching the training materials with some examples would be a great idea!

About Selçuk Özcan

Selçuk Özcan has more than 5 years’ experience in the language industry and is a co-founder of Transistent Language Automation Services. He holds degrees in Mechanical Engineering and Translation Studies and has a keen interest in linguistics, NLP, language automation procedures, agile management and technology integration. Selçuk is mainly responsible for building high quality production models including Quality Estimation and deploying the ‘train the trainers’ model. He also teaches Computer-aided Translation and Total Quality Management at the Istanbul Yeni Yuzyil University, Translation & Interpreting Department.

Transistent Partner PR Image 320x320

 

Read More about KantanMT’s Partnership with Transistent in the official News Release, or if you are interested in joining the KantanMT Partner Program, contact Louise (info@kantanmt.com) for more details on how to get involved. 

 

 

Translation Quality: How to Deal with It?

KantanMTSelcuk Ozcan, Transistent, KantanMT started the New Year on a high note with the addition of the Turkish Language Service Provider, Transistent to the KantanMT Preferred MT Supplier partner program.

Selçuk Özcan, Transistent’s Co-founder has given KantanMT permission to publish his blog post on Translation Quality. This post was originally published in Dragosfer and the Transistent Blog.

 

 

Literally, the word quality has several meanings, one of them being “a high level of value or excellence” according to Merriam-Webster’s dictionary. How should one deal with this idea of “excellence” when the issue at hand is translation quality? What is required, it looks like, is a more pragmatic and objective answer to the abovementioned question.

This brings us to the question “how could an approach be objective?” Certainly, the issue should be assessed through empirical findings. But how? We are basically in need of an assessment procedure with standardized metrics. Here, we encounter another issue; standardization of translation quality. From now on, we need to associate these concepts with the context itself in order to make them clear.

Image 1 blog Transistent

Monolingual issues

Bilingual issues

As it’s widely known, three sets of factors have an effect on the quality of the translation process in general. Basically, analyzing source text’s monolingual issues, target text’s monolingual issues and bilingual issues defines the quality of the work done. Nevertheless, the procedure should be based on the requirements of the domain, audience and linguistic structure of both languages (source and target); and in each step, this key question should be considered: ‘Does the TT serve to the intended purpose?’

We still have not dealt with the standardization and quality of acceptable TT’s. The concept of “acceptable translation” has always been discussed throughout the history of translation studies. No one is able to precisely explain the requirements. However, a further study on dynamic QA models needs to go into details.There are various QA approaches and models. For most of them, acceptable translation falls into somewhere between bad and good quality, depending on the domain and target audience. The quality level is measured through the translation error rates developed to assess MT outputs (BLEU, F-Measure and TER) and there are four commonly accepted quality levels; bad, acceptable, good and excellent.

The formula is so simple: the TT containing more errors is considered to be worse quality. However, the errors should be correlated with the context and many other factors, such as importance for the client, expectations of the audience and so on. These factors define the errors’ severity as minor, major, and critical. A robust QA model should be based upon accurate error categorization so that reliable results may be obtained.

We tried to briefly describe the concept of QA modeling. Now, let’s see what’s going on in practice. There are three publicly available QA models which inspired many software developers on their QA tool development processes. One of them is LISA (Localization Industry Standards Association) QA Model. The LISA Model is very well known in the localization and translation industry and many company-specific QA models have been derived from it.

The second one is J2450 standard that was generated by SAE (Society for Automotive Engineers) and the last one is EN15038 standard, approved by CEN (Comité Européen de Normalisation) in 2006. All of the above mentioned models are the static QA models. One should create his/her own frameworks in compliance with the demands of the projects. Nowadays, many of the institutes have been working on dynamic QA models (EU Commission and TAUS). These models enable creating different metrics for several translation/localization projects.

About Selçuk Özcan

Selçuk Özcan has more than 5 years’ experience in the language industry and is a co-founder of Transistent Language Automation Services. He holds degrees in Mechanical Engineering and Translation Studies and has a keen interest in linguistics, NLP, language automation procedures, agile management and technology integration. Selçuk is mainly responsible for building high quality production models including Quality Estimation and deploying the ‘train the trainers’ model. He also teaches Computer-aided Translation and Total Quality Management at the Istanbul Yeni Yuzyil University, Translation & Interpreting Department.

Read More about KantanMT’s Partnership with Transistent in the official News Release, or if you are interested in joining the KantanMT Partner Program, contact Louise (info@kantanmt.com) for more details on how to get involved. 

Transistent KantanMT Preferred MT Supplier

 

Language Industry Interview: KantanMT speaks with Maxim Khalilov, bmmt Technical Lead

Language Industry Interview: KantanMT speaks with Maxim Khalilov, bmmt Technical LeadThis year, both KantanMT and its preferred Machine Translation supplier, bmmt, a progressive Language Service Provider with an MT focus, exhibited side by side at the tekom Trade Fair and tcworld conference in Stuttgart, Germany.

As a member of the KantanMT preferred partner program, bmmt works closely with KantanMT to provide MT services to its clients, which include major players in the automotive industry. KantanMT was able to catch up with Maxim Khalilov, technical lead and ‘MT guru’ to find out more about his take on the industry and what advice he could give to translation buyers planning to invest in MT.

KantanMT: Can you tell me a little about yourself and, how you got involved in the industry?

Maxim Khalilov: It was a long and exciting journey. Many years ago, I graduated from the Technical University in Russia with a major in computer science and economics. After graduating, I worked as a researcher for a couple of years in the sustainable energy field. But, even then I knew I still wanted to come back to IT Industry.

In 2005, I started a PhD at Universitat Politecnica de Catalunya (UPC) with a focus on Statistical Machine Translation, which was a very new topic back then. By 2009, after successfully defending my thesis, I moved to Amsterdam where I worked as a post-doctoral researcher at the University of Amsterdam and later as a RD manager at TAUS.

Since February 2014, I’ve been a team lead at bmmt GmbH, which is a German LSP with strong focus on machine translation.

I think my previous experience helped me to develop a deep understanding of the MT industry from both academic and technical perspectives.  It also gave me a combination of research and management experience in industry and academia, which I am applying by building a successful MT business at bmmt.

KMT: As a successful entrepreneur, what were the three greatest industry challenges you faced this year?

MK: This year has been a challenging one for us from both technical and management perspectives. We started to build an MT infrastructure around MOSES practically from scratch. MOSES was developed by academia and for academic use, and because of this we immediately noticed that many industrial challenges had not yet been addressed by MOSES developers.

The first challenge we faced was that the standard solution does not offer a solid tag processing mechanism – we had to invest into a customization of the MOSES code to make it compatible with what we wanted to achieve.

The second challenge we faced was that many players in the MT market are constantly talking about the lack of reliable, quick and cheap quality evaluation metrics. BLEU-like scores unfortunately are not always applicable for real world projects. Even if they are useful when comparing different iterations of the same engines, they are not useful for cross language or cross client comparison.

Interestingly, the third problem has a psychological nature; Post-Editors are not always happy to post edit MT output for many reasons, including of course the quality of MT. However, in many situations the problem is that MT post-editing requires a different skillset in comparison with ‘normal’ translation and it will take time before translators adopt fully to post editing tasks.

KMT: Do you believe MT has a say in the future, and what is your view on its development in global markets?

MK: Of course, MT will have a big say in the language services future. We can see now that the MT market is expanding quickly as more and more companies are adopting a combination TM-MT-PE framework as their primary localization solution.

“At the same time, users should not forget that MT has its clear niche”

I don’t think a machine will be ever able to translate poetry, for example, but at the same time it does not need to – MT has proved to be more than useful for the translation of technical documentation, marketing material and other content which represents more than 90% of the daily translators load worldwide.

Looking at the near future I see that the integration of MT and other cross language technologies with Big Data technologies will open new horizons for Big Data making it a really global technology.

KMT: How has MT affected or changed your business models?

MK: Our business model is built around MT; it allows us to deliver translations to our customers quicker and cheaper than without MT, while at the same time preserving the same level of quality and guaranteeing data security. We not only position MT as a competitive advantage when it comes to translation, but also as a base technology for future services. My personal belief, which is shared by other bmmt employees is that MT is a key technology that will make our world different – where translation is available on demand, when and where consumers need it, at a fair price and at its expected quality.

KMT: What advice can you give to translation buyers, interested in machine translation?

MK: MT is still a relatively new technology, but at the same time there is already a number of best practices available for new and existing players in the MT market. In my opinion, the four key points for translation buyers to remember when thinking about adopting machine translation are:

  1. Don’t mix it up with TM – While TMs mostly support human translators storing previously translated segments, MT translates complete sentences in an automatic way, the main difference is in these new words and phrases, which are not stored in a TM database.
  2. There is more than one way to use MT – MT is flexible, it can be a productivity tool that enables translators to deliver translations faster with the same quality as in the standard translation framework. Or MT can be used for ‘gisting’ without post-editing at all – something that many translation buyers forget about, but, which can be useful in many business scenarios. A good example of this type of scenario is in the integration of MT into chat widgets for real-time translation.
  3. Don’t worry about quality – Quality Assurance is always included in the translation pipeline and we, like many other LSPs guarantee, a desired level of quality to all translations independently of how the translations were produced.
  4. Think about time and cost – MT enables translation delivery quicker and cheaper than without MT.

A big ‘thank you’ to Maxim for taking time out of his busy schedule to take part in this interview, and we look forward to hearing more from Maxim during the KantanMT/bmmt joint webinar ‘5 Challenges of Scaling Localization Workflows for the 21st Century’ on Thursday November 20th (4pm GMT, 5pm CET and 8am PST).

KantanMT Industry Webinar 5 Challenges of Scaling Localization for the 21st Century_Webinar

Register here for the webinar or to receive a copy of the recording. If you have any questions about the services offered from either bmmt or KantanMT please contact:

Peggy Linder, bmmt (peggy.lindner@bmmt.eu)

Louise Irwin, KantanMT (louisei@kantanmt.com)

Post-Editing Machine Translation

Statistical Machine Translation (SMT) has many uses – from the translation of User Generated Content (UGC) to Technical Documents, to Manuals and Digital Content. While some use cases may only need a ‘gist’ translation without post-editing, others will need a light to full human post-edit, depending on the usage scenario and the funding available.

Post-editing is the process of ‘fixing’ Machine Translation output to bring it closer to a human translation standard. This, of course is a very different process than carrying out a full human translation from scratch and that’s why it’s important that you give full training for staff who will carry out this task.

Training will make sure that post-editors fully understand what is expected of them when asked to complete one of the many post-editing type tasks. Research (Vasconcellos – 1986a:145) suggests that post-editing is a honed skill which takes time to develop, so remember your translators may need some time to reach their greatest post-editing productivity levels. KantanMT works with many companies who are post-editing at a rate over 7,000 words per day, compared to an average of 2,000 per day for full human translation.

Types of Training: The Translation Automation User Society (TAUS) is now holding online training courses for post-editors.

post-editing

Post-editing Levels

Post-editing quality levels vary greatly and will depend largely by the client or end-user. It’s important to get an exact understanding of user expectations and manage these expectations throughout the project.

Typically, users of Machine Translation will ask for one of the following types of post-editing:

  • Light post-editing
  • Full post-editing

The following diagram gives a general outline of what is involved in both light and full post-editing. Remember however, the effort to meet certain levels of quality will be determined by the output quality your engine is able to produce

post-editing machine translation

Generally, MT users would carry out productivity tests before they begin a project. This determines the effectiveness of MT for the language pair, in a particular domain and their post-editors ability to edit the output with a high level of productivity. Productivity tests will help you determine the potential Return on Investment of MT and the turnaround time for projects. It is also a good idea to carry out productivity tests periodically to understand how your MT engine is developing and improving. (Source: TAUS)

You might also develop a tailored approach to suit your company’s needs, however the above diagram offers some nice guidelines to start with. Please note that a well-trained MT engine can produce near human translations and a light touch up might be all that is required. It’s important to examine the quality of the output with post-editors before setting productivity goals and post-editing quality levels.

PEX Automatic Post-editing

Post-Editor Skills

In recent years, post-editing skills have become much more of an asset and sometimes a requirement for translators working in the language industry. Machine Translation has grown considerably in popularity and the demand for post-editing services has grown in line with this. TechNavio predicted that the market for Machine Translation will grow at a compound annual growth rate (CAGR) of 18.05% until 2016, and the report attributes a large part of this rise to “the rapidly increasing content volume”.

While the task of post-editing is markedly different to human translation, the skill set needed is almost on par.

According to Johnson and Whitelock (1987), post-editors should be:

  • Expert in the subject area, the text type and the contrastive language.
  • Have a perfect command of the target language

Is it also widely accepted that post-editors who have a favourable perception of Machine Translation perform better at post-editing tasks than those who do not look favourably on MT.

How to improve Machine Translation output quality

Pre-editing

Pre-editing is the process of adjusting text before it has been Machine Translated. This includes fixing spelling errors, formatting the document correctly and tagging text elements that must not be translated. Using a pre-processing tool like KantanMT’s GENTRY can save a lot of time by automating the correction of repetitive errors throughout the source text.

More pre-editing Steps:

Writing Clear and Concise Sentences: Shorter unambiguous segments (sentences) are processed much more effectively by MT engines. Also, when pre-editing or writing for MT, make sure that each sentence is grammatically complete (begins with a capital letter, has at least one main clause, and has an ending punctuation).

Using the Active Voice: MT engines work impressively on text that is clear and unambiguous, that’s why using the active voice, which cuts out vagueness and ambiguity can result in much better MT output.

There are many pre-editing steps you can carry out to produce better MT output. Also, keep in mind writing styles when developing content for Machine Translation to cut the amount of pre-editing required. Get tips on writing for MT here.

For more information about any of KantanMT’s post-editing automation tools, please contact: Gina Lawlor, Customer Relationship Manager (ginal@kantanmt.com).

Translation Technology Conferences and Events for 2014

KantanMT events2014 has arrived – and there is no better way to get the ball rolling than by planning what events to attend. Over the next twelve months there is a vast selection of conferences, unconferences, workshops, roundtables, webinars and other events planned around the world.

It was hard to narrow the list of everything going on, so KantanMT tried to focus on events that were related to Machine Translation and the Natural Language Processing (NLP) industry, localization, translation technologies and post-editing. Some of the events are more academic, while others are more business orientated.

Unconferences and Conferences…

We added some ‘unconferences’ to the list, these are the opposite of conferences. Unconferences are peer-to-peer interactions on topics chosen by participants at the beginning of a session, unlike more formal conferences. Unconference participants choose the topics, so it is much easier to promote an open discussion and are a good way for industry professionals to get together in an informal setting, sharing their own challenges and solutions.

Localization World, one of the biggest industry conferences, has had a great response from holding unconferences alongside its traditional conferences and the Association of Language Companies (ALC) also endorses the value of unconferences. The next ALC unconference will held in the early part of February.

Hopefully, this list will be a useful resource in deciding what events and conferences to visit during 2014. You may have registered for some of these events already, if not, then now is the time to start filling in your calendar. If you know of a relevant conference or event we missed, please add it to the comment section at the bottom of this post.

2014 Listings

January

Jan 8, 2014 (17:00-18:00 CET)

Webinar: TAUS Translation Technology Showcase – XTRF and Kilgray’s memoQ

Tomasz Mróz, XTRF Operations Director will present usage scenarios on integrating XTRF technology into the translation workflows, TM integration and faster project turnaround times. István Lengyel, CEO of Kilgray will also be presenting on memoQ, a cloud-based translation technology platform for translation management.


Jan 9, 2014

Webinar:  TAUS Dynamic Quality Framework Users Call

The users call is a bi-monthly webinar where TAUS members discuss solutions for measuring Machine Translation quality. Some of the participants include; Autodesk, CA Technologies, Cisco, Dell, Digital Linguistics, eBay, EMC and Google. To register for the webinar, members can email memberservices@taus.net


Jan 15, 2014

Webinar: The Convergence Era: Translation as A Utility (The Content Wrangler, TAUS)

This webinar, hosted by BrightTalk is a discussion by Jaap van der Meer (TAUS) and Scott Abel (The Content Wrangler) on how translation has become a necessary part of everyday life, the same way as electricity, water and the internet have become indispensable.


Jan 16, 2014

Meeting/Webinar: L20n: Next Generation Localization Framework for the Web, The International Multilingual Computing User Group (IMUG), San José, California USA

Zbigniew Braniecki, Software Engineer, Mozilla Corporation will speak about L20n, a new localization framework that isolates localization and enables translators to give naturally expressive translations for even the most complex user interfaces. Mozilla is investing in moving its products – Firefox, Firefox OS, and Firefox for Android – to this new architecture.


Jan 23, 2014

Unconference: Localization Unconference, Achievers Head office Toronto, Canada

This unconference is an all-day event starting at 09:30am and will cover internationalization and localization topics. It is organized by Jenny Reid, Localization Project Manager, BlackBerry; Oleksandr Pysaryuk, Localization Manager, Achievers; and Richard Sikes, Principal Consultant, Localization Flow Technologies.


Jan 30, 2014 (11:00 EST/17:00 CET)

Webinar: Integrating Your Content Platform, Globalization and Localization Association

Anders Holt, European Director and Robert Timms, Technical Director at translate plus will present a webinar on integrating content management platforms; CMS, DMS, PIM or e-procurement system into the translation workflow. They will discuss the integration methods available and how to get the best results and benefits of integration.


Jan 30-31, 2014

Conference: 2014 CRITT – WCRE Conference, Translation in transition: between cognition, computing and technology, Copenhagen Business School (CBS), Frederiksberg, Denmark

This academic conference presents research from the centre for research and innovation in translation and translation technology (CRITT). The program covers a variety of topics including; translation and cognitive processes, translation and translation theory and observations about Machine Translation and translation and post-editing.


February

Feb 5, 2014 (17:00-18:00 CET)

Webinar: TAUS Translation Technology Showcase – Ontram and Across Language Server v6

Christian Weih, Chief Sales Officer from Across Systems presents a TMS platform that integrates all aspects of the translation workflow.


Feb 6-8, 2014

Unconference: ALC Unconference, (Association of Language Companies), Palm Beach Gardens, Florida USA

The Unconference is geared towards language company owners and senior members of staff who get together without any formal presentation structure for more intimate brainstorming and discussion sessions in a casual and relaxed environment.


Feb 6, 2014 (11:00 EST/17:00 CET)

Webinar: Maximizing Translation Efficiency: Best QA Practices for Large Multi-channel Publishing Projects

Jose Sermeno, Product Evangelist at MadCap Software and Peter Argondizzo, Translation and Localization PM at MadTranslations discuss QA best practices that will make projects more efficient.


Feb 24-26, 2014

Conference: ‘Localization in a Shifting Global Economy’ Localization World, Bangkok Thailand

The first of three Localization World conferences of 2014, Localization World is the leading conference for international business, translation and localization providing opportunities for networking and information exchange.


Feb 26-28, 2014

Conference, workshops:  ICC (Intelligent Content Conference) 2014, San José, California USA

ICC focuses on the creation and management of content in different languages on any device. The topics that will include; content strategy, content marketing, content engineering, structured content, ebooks, mobile, apps, adaptive content, automated translation, terminology management, big data and analytics.


Feb 27, 2014 (11:00 EST/17:00 CET)

Webinar: GALA Translation Project Management with memoQ Server Training session

Daniel Zielinski will explain how the memoQ server can be used for managing translation projects effectively. See the different types of projects and workflows supported, and learn how to set up, prepare, monitor and complete a translation project with the memoQ server.


Feb 27 – Mar 1, 2014

Conference: memoQfest Americas, Kilgray Translation Technologies, Los Angeles, California USA

This three day event is hosted by Kilgray Translation Technologies and is aimed at freelance language professionals, LSPs and corporate translation users. The conference gives an overview of translation technology and how it can be integrated into businesses.


March

Mar 3-6, 2014

Conference: WritersUA, the conference for Software User Assistance, Palm Springs, California USA

This conference is for those involved in creating user assistance content. There will be a variety of presentations focused on developing content strategies, key technologies and tools that are used to create well-designed interfaces, technical communications and support information.


Mar 5, 2014 (17:00-18:00 CET)

Webinar: TAUS Translation Technology Showcase – Safaba and KantanMT

The theme of this webinar is the application and influence of MT technologies on global business. Tony O’Dowd, Founder and Chief Architect presents the KantanMT.com cloud-based platform introducing some of the KantanMT technologies and usage cases, including; KantanWatch, KantanISR, KantanAnalytics, TotalRecall, PEX and GENTRY.

Udi Hershkovich, Vice President of Business Development at Safaba will discuss key business imperatives for businesses and how Enterprise MT removes the language barriers that face global businesses.


Mar 13-14, 2014

Conference: International Conference on Translation and Accessibility in Video Games and Virtual Worlds at Universitat Autònoma de Barcelona, Spain

The conference is a meeting point for academics, professionals and students involved in the game localization industry. The conference aims to foster the interdisciplinary debate in these fields, combine them as academic areas of research and contribute to the development of best practices.


Mar 17-21, 2014

Conference: Game Localization Summit at GDC, IGDA Game Localization SIG, San Francisco, California USA

The game Localization Summit at GDC is supported and organized by the IGDA Game Localization SIG, and it is aimed at helping localization professionals as well as the entire community of game developers and publishers understand how to plan and execute game localization and culturalization as a part of the development cycle. There are other GDC conferences planned for Europe and China later in the year.


Mar 23-26, 2014

Conference: GALA 2014, Globalization and Localization Association (GALA), Istanbul, Turkey

The annual GALA conference brings together localization industry professionals for networking opportunities and peer-to-peer learning of the latest technologies and emerging trends in localization, language and translation technology.


Mar 28-29, 2014

Conference: The Translation and Localization Conference, Localize.pl, TexteM, KOMTE, Warsaw, Poland

This is an annual international event focusing on the latest technologies and localization industry trends. The conference is suited to LSPs and freelance translators, and covers technical communication and implications for the translation industry. Big data vs. the translation industry; CAT tools, MT, cloud computing, project management and the human factor; recruitment and training.


April

Apr 2, 2014 (17:00-18:00 CET)

Webinar: Translation Technology Showcase, TAUS – tauyou and Pangeanic

Diego Bartolome, CEO tauyou will discuss the ‘Big Data’ approach to SMT and the importance of clean data on output quality.


Apr 10-11, 2014

Event: TAUS Executive Forum, Oracle Japan, Tokyo, Japan

The executive forum consists of two-days of meetings for buyers and providers of language services and technologies. It is an open exchange about language business innovation and translation technology with the theme ‘translation as a utility’. Topics to be covered include; translation data, MT showcases, DQF evaluation, translation customer support and integration with CRM systems.


Apr 13-15, 2014

Conference: MadWorld 2014, MadCap Software, Inc., San Diego, California USA

Designed to cater for technical writers, documentation managers and content strategists. This is the top conference for technical communication and content strategy.


Apr 25, 2014

Conference: TCeurope Colloquium, Conseil des Rédacteurs Techniques, Aix-en-Provence, France

Conference themes include; looking at the essential core skills of a technical communicator, accessibility and usability, technical communication and social media, multi‐authoring and international teamwork and training technical authors in the internet age.


Apr 26-30, 2014

Conference: EACL-2014, European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden

Available to all ACL members and covers research in computational linguistics, psycholinguistics, speech, information retrieval, multimodal language processing and language issues in emerging domains such as bioinformatics and social media. Workshops and tutorials are held during Saturday-Sunday April 26-27th, while the main conference is runs from Monday-Wednesday April 28th-30th.


May

May 7, 2014 (17:00-18:00 CET)

Webinar: Translation Technology Showcase, TAUS – TaaS and Interverbum

TaaS and Interverbum present in this month’s Translation Technology Showcase by TAUS.


May 7-9, 2014

Conference: memoQfest International, Kilgray Translation Technologies, Budapest, Hungary

This conference aims to set up a forum where companies, LSPs and translators can discuss workflows and best practices that relate to memoQ or translation technology in general. Attendees will discuss industry trends attend workshops and exchange information with translators, LSPs, and translation end users.


May 7-8, 2014

Workshop: Making the Multilingual Web Work, MultilingualWeb, Madrid, Spain

The workshop is supported by the LIDER project and aims to survey and share information about best practices and standards for promoting multilingualism on the web.


May 8-9, 2014

Conference: Intelligent Content – Life Sciences and Healthcare, the Rockley Group, the Content Wrangler, San Francisco, California USA

The event will showcase examples, standards, methods, strategies and tools needed to help pharmaceutical companies, medical device manufacturers, and healthcare firms deliver the right information, in the right language, on any device. Conference topics include; mhealth, ehealth, digital health, personalized healthcare content and advanced translation technologies.


May 17-18, 2014

Conference: UTIC 2014, Ukrainian Translation Industry Conference, Kiev, Ukraine

Translators, managers, educators and software developers get together for networking opportunities and to discuss future industry trends.


May 18-21, 2014

Conference: Technical Communication Summit 2014, Society for Technical Communication, Phoenix, Arizona USA

The Technical Communication Summit is a source of learning for professional technical communicators giving training on the latest communication techniques, publishing technologies and business trends in the industry.


May 18-21, 2014

Conference: ALC 2014 Annual Conference, Association of Language Companies, Palm Springs, California USA

This conference is a networking event for anyone doing business with LSPs, combining educational content and networking.


May 23, 2014

Roundtable: TAUS Translation Automation Roundtable, TAUS, Moscow, Russia

Hosted by ABBYY Language Services, is a meeting for buyers and providers of translation services. The participants will get a good insight into MT technology, customization, implementation requirements and business cases.


May 26-31, 2014

Conference: LREC 2014, the European Language Resource Association, Reykjavík, Iceland

LREC is focused on Language Resources (LRs) and Evaluation for Language Technologies (LT). The aim of LREC is to give an overview of LR and LTs, emerging trends and the exchange of information.


June

June 2-3, 2014

Event: TAUS Industry Leaders Forum 2014, Clontarf Castle Hotel, Dublin

The theme for this meeting is ‘convergence’ with industry leaders discussing best practices, possible common approaches and shared services to optimize translation efficiencies through a series of short presentations.


Jun 3-4, 2014

Workshop: Localization Project Management Certification – The Localization Institute, Clarion Hotel, Dublin, Ireland

As part of the LPM Certification Program, this two-day project management training workshop will be held alongside Localization world. There is an eight week self-study part that must be completed before the workshop. It is open to Localization Project Managers with at least three years project management experience. Early bird and group registration discounts are available.


Jun 4-6, 2014

Conference: Localization World Dublin, Localization World Ltd., Dublin, Ireland

The second localization conference of 2014 will be held in Dublin with the theme of “disruptive innovation” and how this impacts the localization industry and the role of translators. Topics covered at the conference will include; advanced localization management, global business, localization core competencies and technology.


Jun 5-6, 2014

Conference: UA Europe 2013, UA Europe, Kraków, Poland

In association with Writers UA, the UA Europe technical communication conference focuses on software user assistance and online Help, and provides information on the latest industry trends, technical developments, and best practice in software UA.


Jun 16-18, 2014

Conference: EAMT 2014, European Association for Machine Translation, Dubrovnik, Croatia – 17th Annual Conference of the European Association for Machine Translation

The conference is aimed at anyone interested in MT and translation-related tools and resources. Topics will include; MT in multilingual public service (eGovernment etc.), MT for the web, MT embedded in other services, MT evaluation techniques and evaluation results and more.


August

Aug 23-29, 2014

Conference: COLING 2014, International Committee for Computational Linguistics, Dublin, Ireland

The bi-annual COLING conference, is one of the premier Natural Language Processing conferences in the world. The conference will include full papers, oral presentations, poster presentations, demonstrations, tutorials, and workshops on a variety of technical areas on natural language and computation.


September

Sep 25-26, 2014

Workshop: IATIS Regional Workshop, Translator and Interpreter Training, Serbia

This conference is aimed at promoting translator training, and will address training in areas such as field/domain specialization, technical skills (including pre-/post-editing of MT), revision skills and management skills (soft skills).


October

Oct 4-5, 2014

Conference: MedTranslate 2014, GxP Language Services, Freiburg im Breisgau, Germany


Oct 6-7, 2014

Workshop: Localization Project Management Certification, the Localization Institute, Seattle, Washington USA

As part of the LPM Certification Program, this two-day project management training workshop will be held alongside Localization world.


Oct 19, 2014

Unconference: Localization World Unconference, Seattle

The agenda will be set in the first session and then there will be 3-4 break-out sessions with topics the group chose together. Attendees can submit topics to be considered from Wednesday, October 17th and can be submitted at VistaTEC’s booth.


Oct 27-28, 2014

Conference: TAUS User Conference, TAUS, Vancouver, Canada

The TAUS Annual Conference 2014 will be co-located with the Localization World Conference taking place in the Convention Centre, Vancouver, BC, Canada.


Oct 29-31, 2014

Conference: Localization World Vancouver, Localization World Ltd., Vancouver, Canada

Localization World provides an opportunity for the exchange of information in the language and translation services and technologies market.


November

Nov 3-5, 2014

Conference: 38th Internationalization & Unicode Conference (IUC38), Object Management Group, Santa Clara, California USA

The conference is for internationalization experts, tools vendors, software implementers, and business and program managers who want to discuss the best methods for doing business in international markets. The conference will feature subject areas; cloud computing, upgrading to HTML5, integrating with social networking software, and implementing mobile apps.


Nov 5-8, 2014

Conference: 55th ATA Conference, American Translators Association, Sheraton Hotel Chicago, Illinois USA

A networking event for translators, project managers and industry professionals. The aim of the conference is to promote the professional development of translators and interpreters.


Nov 11-13, 2014

Conference:  tcworld – tekom, Stuttgart, Germany

The technical communication conference and trade fair examines different aspects of localization, internationalization and globalization. It is the largest technical communication, authoring and IT management conference in the world and participating companies offer industrial, software and services for technical communication.


December

Dec 8-12 2014

Conference: IEEE GLOBECOM, Austin Texas USA

The conference is the second largest of the 38 IEEE communications societies will focus on the latest advancements in broadband, wireless, multimedia, internet, image and voice communications.


Dec 15-18 2014

Conference: IEEE CloudCom 2014, Nanyang Avenue, Singapore

CloudCom promotes cloud computing platforms. It is co-sponsored by the Institute of Electrical and Electronics Engineers (IEEE) and the Cloud Computing Association. The conference attracts researchers, developers, users, students and practitioners from the fields of big data, systems architecture, services research, virtualization, security and privacy and high performance computing.

KantanMT will look forward to meeting you at some of these conferences over the next year.

KantanMT – 2013 Year in Review

KantanMT 2013 year in ReviewKantanMT had an exciting year as it transitioned from a publicly funded business idea into a commercial enterprise that was officially launched in June 2013. The KantanMT team are delighted to have surpassed expectations, by developing and refining cutting edge technologies that make Machine Translation easier to understand and use.

Here are some of the highlights for 2013, as KantanMT looks back on an exceptional year.

Strong Customer Focus…

The year started on a high note, with the opening of a second office in Galway, Ireland, and KantanMT kept the forward momentum going as the year progressed. The Galway office is focused on customer service, product education and Customer Relationship Management (CRM), and is home to Aidan Collins, User Engagement Manager, Kevin McCoy, Customer Relationship Manager and MT Success Coach, and Gina Lawlor, Customer Relationship co-ordinator.

KantanMT officially launched the KantanMT Statistical Machine Translation (SMT) platform as a commercial entity in June 2013. The platform was tested pre-launch by both industry and academic professionals, and was presented at the European OPTIMALE (Optimizing Professional Translator Training in a Multilingual Europe) workshop in Brussels. OPTIMALE is an academic network of 70 partners from 32 European countries, and the organization aims to promote professional translator training as the translation industry merges with the internet and translation automation.

The KantanMT Community…

The KantanMT member’s community now includes top tier Language Service Providers (LSPs), multinationals and smaller organizations. In 2013, the community has grown from 400 members in January to 3400 registered members in December, and in response to this growth, KantanMT introduced two partner programs, with the objective of improving the Machine Translation ecosystem.

The Developer Partner Program, which supports organizations interested in developing integrated technology solutions, and the Preferred Supplier of MT Program, dedicated to strengthening the use of MT technology in the global translation supply chain. KantanMT’s Preferred Suppliers of MT are:

KantanMT’s Progress…

To date, the most popular target languages on the KantanMT platform are; French, Spanish and Brazilian-Portuguese. Members have uploaded more than 67 billion training words and built approx. 7,000 customized KantanMT engines that translated more than 500 million words.

As usage of the platform increased, KantanMT focused on developing new technologies to improve the translation process, including a mobile application for iOS and Android that allows users to get access to their KantanMT engines on the go.

KantanMT’s Core Technologies from 2013…

KantanMT have been kept busy continuously developing and releasing new technologies to help clients build robust business models to integrate Machine Translation into existing workflows.

  • KantanAnalytics™ – segment level Quality Estimation (QE) analysis as a percentage ‘fuzzy match’ score on KantanMT translations, provides a straightforward method for costing and scheduling translation projects.
  • BuildAnalytics™ – QE feature designed to measure the suitability of the uploaded training data. The technology generates a segment level percentage score on a sample of the uploaded training data.
  • KantanWatch™ – makes monitoring the performance of KantanMT engines more transparent.
  • TotalRecall™ – combines TM and MT technology, TM matches with a ‘fuzzy match’ score of less than 85% are automatically put through the customized MT engine, giving the users the benefits of both technologies.
  • KantanISR™ Instant Segment Retraining technology that allows members near instantaneous correction and retraining of their KantanMT engines.
  • PEX Rule Editor – an advanced pattern matching technology that allows members to correct repetitive errors, making a smoother post-editing process by reducing post-editing effort, cost and times.
  • Kantan API – critical for the development of software connectors and smooth integration of KantanMT into existing translation workflows. The success of the MemoQ connector, led to the development of subsequent connectors for MemSource and XTM.

KantanMT sourced and cleaned a range of bi-directional domain specific stock engines that consist of approx. six million words across legal, medical and financial domains and made them available to its members. KantanMT also developed support for Traditional and Simplified Chinese, Japanese, Thai and Croatian Languages during 2013.

Recognition as Business Innovators…

KantanMT received awards for business innovation and entrepreneurship throughout the year. Founder and Chief Architect, Tony O’Dowd was presented with the ICT Commercialization award in September.

In October, KantanMT was shortlisted for the PITCH start-up competition and participated in the ALPHA Program for start-ups at Dublin’s Web Summit, the largest tech conference in Europe. Earlier in the year KantanMT was also shortlisted for the Vodafone Start-up of the Year awards.

KantanMT were silver sponsors at the annual 2013 ASLIB Conference ‘Adopting the theme Translating and the Computer’ that took place in London, in November, and in October, Tony O’Dowd, presented at the TAUS Machine Translation Showcase at Localization World in Silicon Valley.

KantanMT have recently published a white paper introducing its cornerstone Quality Estimation technology, KantanAnalytics, and how this technology provides solutions to the biggest industry challenges facing widespread adoption of Machine Translation.

KantanAnalytics WhitePaper December 2013

For more information on how to introduce Machine Translation into your translation workflow contact Niamh Lacy (niamhl@kantanmt.com).