More Questions Answered on How MT Helps Improve Translation Productivity (Part II)

Part II

Welcome to Part II of the Q&A blog on How Machine Translation Helps Improve Translation Productivity. In case you missed the first part of our post, here’s a link to quickly have a look at what was covered.

Tony O’Dowd, Chief Architect of KantanMT.com and Louise Faherty, Technical Project Manager presented a webinar where they showed how LSPs (as well as enterprises) can improve the translation productivity of the language team, manage post-editing effort estimations and easily schedule projects with powerful MT engines. For this section, we are accompanied by Brian Coyle, Chief Commercial Officer at KantanMT, who joined the team on October, 2015 to strengthen KantanMT’s strategic vision.

We have provided a link to the slides used during the webinar below, along with a transcript of the Q&A session.

Please note that the answers below are not recorded verbatim and minor edits have been made to make the text more accessible.

Question: We are a mid-sized LSP and we would like to know what benefits would we enjoy if we choose to work with KantanMT, over building our own systems from scratch? The latter would be cheaper, wouldn’t it?

Answer (Brian): Tony and Louise have mentioned a lot of features available in KantanMT – indeed, the platform is very feature-rich and provides a great user experience. But on top of that, what’s really underneath KantanMT is the fact that it has access to a massive computing power, which is what Statistical Machine Translation requires in order to perform efficiently and quickly. KantanMT has the unique architecture to help provide instant on-demand access at scale.

As Louise Faherty  mentioned, we are currently translating half a billion words per month and we have 760 servers deployed currently. So if you were trying to develop something yourself, it would be hard to reach this level of proficiency in your MT. Whilst no single LSP would probably need this total number of servers, to give you an idea of the cost involved, that kind of server deployment in a self-build environment would cost in the region of €25m.

We also offer 99.99% up time with triple data-centre disaster recovery. It would be very difficult and costly to build this kind of performance yourself.  Also, with this kind of performance at your client’s disposal, you can offer Customised MT for mission critical web-based applications such as eCommerce sites.

Finally, a lot of planning, thought, development hours and research has gone into creating what we believe is the best user interface and the platform for MT, which also has the best functionality set with extreme ease of integration in the market place. So, it would be difficult for you to start on your own and build your own system that would be as robust and high quality as KantanMT.com.

Question: Could you also establish KantanNER rules to convert prices on an eCommerce websites?

Answer (Louise Faherty ): Yes, absolutely! With KantanNER, you can also establish rules, convert prices and so on. The only limitation with that being is that the exchange range will of course fluctuate. But there could be options as well of calculating that information dynamically – otherwise you would be looking at a fixed equation to convert those prices.

KantanMT_ProductivityQuestion: My client does not want us to use MT because they have had bad experience in the past with Bing Translate – what would convince them to use KantanMT? How will the output be different?

Answer (Tony O’Dowd): One of things that you have to recognise in terms of using the KantanMT platform is that you are using MT to build customised machine translation engines. So you are not going to create generic engines (Bing Translate and Google Translate are generic engines). You would be building customised engines that are trained on the previous translations, glossaries that you clients have provided. You will also be using some of our stock engines that are relevant to your client’s domain.

So when you combine that, you get an engine that will mimic the translation style of your client. Indeed, instead of generic translation engines, you are using an engine that is designed to mirror the terminology and stylistic requirements of your client. If you can achieve this through Machine Translation, you will see that there is a lot less requirement for Post-Editing, and this is one of the most important things that drives away translators from using generic systems or broad-based systems and that’s why they choose customised systems. Clients and LSPs have tested the generic systems as well as customisable engines and found that cloud-based customisable MT add a value, which is not available on free, non-customisable MT platforms.

End of Q/A session

The KantanMT Professional Services Team would once again like to thank you for all your questions during the webinar and for sending in your questions by email.

Have more burning questions? Or maybe you would like to see the brilliant platform translate in a live environment? No problem! Just send an email to demo@kantanmt.com and we will take care of the rest.

Want to stay informed about our new webinars? You can bookmark this page, or even better – sign up for our newsletter and ensure that you never miss a post!

Let's Get Started_KantanMT

Using F-Measure in Kantan BuildAnalytics

What is F-Measure ?

KantanMT Logo 800x800 F-Measure is an automated measurement that determines the precision and recall  capabilities of a KantanMT engine. F-Measure measures enables you to determine the  quality and performance of your KantanMT engine

  • To see the accuracy and performance of your engine click on the ‘F-measure Scores’ tab. You will now be directed to the ‘F-measure Scores’ page.

F-Measure tab

  • Place your cursor on the ‘F-measure Scores Chart’ to see the individual score of each segment. A pop-up will now appear on your screen with details of the segment under these headings, ‘Segment no.’, ‘Score’, ‘Source’, ‘Reference/Target’ and ‘KantanMT Output’.

Segment

  • To see the ‘F-measure Scores’ of each segment in a table format scroll down. You will now see a table with the headings ‘No’, ‘Source’, ‘Reference/Target’, ‘KantanMT Output’ and ‘Score’.
  • To see an even more in depth breakdown of a particular ‘Segment’ click on the Triangle beside the number of the segment you wish to view.Triangle
  • To reuse the engine as Test Data click on the ‘Reuse as Test Data’. When you do so, the ‘Reuse as Test Data’ button will change to ‘Delete Test Data’.Test Data
    Delete Test Data
  • To download the ‘F-measure Scores’, ‘BLEU Score’ and ‘TER Scores’ of all segments click on the ‘Download’ button on either the ‘F-measure Scores’, ‘BLEU Score’ or ‘TER Scores’ page.download

This is one of the features provided by Kantan BuildAnalytics to improve an engine’s quality after its initial training .To see other features used by Kantan BuildAnalytics please click on the link below .To get more information about KantanMT and the services we provide please contact our support team at  at info@kantanmt.com.

Tips for Training Post-editors

A good quality Machine Translation engine relies on the quality of the bilingual data used to train it. For most MT users, this bilingual data can be translated by humans, or it can be fully post-edited MT output. In both cases, the quality of the data will influence the engines quality. 
Selçuk Özcan, Transistent’s Co-founder will discuss the differences and give some tips for successful post-editing.Selçuk Özcan, Transistent’s Co-founder has given KantanMT permission to publish his blog post on Translation Quality. This post was originally published in Dragosfer and on the GALA Blog website.

We have entered a new age, and a new technology has come into play: Machine Translation (MT). It’s globally accepted that MT systems dramatically increase productivity but it’s a hard struggle to integrate this technology into your production process. Apart from handling the engine building and optimizing procedures, you have to transform your traditional workflow:

cp2

The traditional roles of the linguists (translators, editors, reviewers etc.) are reconstructed and converged to find a suitable place in this new, innovative workflow. The emerging role is called ‘post-edit’ and the linguists assigned to this role are called ‘post-editors’. You may want to recruit some willing linguists for this role, or persuade your staff to adopt a different point of view. But whatever the case may be, some training sessions are a must.

What are covered in training sessions?

1. Basic concepts of MT systems

Post-editors should have a notion of the dynamics of MT systems. It is important to focus on the system that is utilized (RBMT/SMT/Hybrid). For widely used SMT systems, it’s necessary for them to know:

  • how the systems behave
  • the functions of the Translation Model and Language Model*
  • input (given set of data) and output (raw MT output) relationship
  • what changes in different domains

* It’s not a must to give detailed information about that topics but touching on the issue will make a difference in determining the level of technical backgrounds of candidates. Some of the candidates may be included in testing team.

2. The characteristics of raw MT output

Post-editors should know the factors affecting MT output. On the other hand, the difference between working on fuzzy TM systems and with SMT systems has to be mentioned during a proper training session. Let’s try to figure out what to be given:

  • MT process is not the ‘T’ of the TEP workflow and raw MT output is not the target text expected to be output of ‘T’ process.
  • In the earlier stages of SMT engines, the output quality varies depending on the project’s dynamics and errors are not identical. As the system improves quality level becomes more even and consistent within the same domain.
  • There may be some word or phrase gaps in the systems’ pattern mappings. (Detecting these gaps is one of the main responsibilities of testing team but a successful post-editor must be informed about the possible gaps.)

3. Quality issues

This topic has two aspects: defining required target (end product) quality, and evaluation and estimation of output quality. The first one gives you the final destination and the second one makes you know where you are.

Required quality level is defined according to the project requirements but it mostly depends on target audience and intended usage of the target text. It seems similar to the procedure in TEP workflow. However, it’s slightly different; engine improvement plan should also be considered while defining the target quality level. Basically, this parameter is classified into two groups: publishable andunderstandable quality.

Evaluation and estimation aspect is a little bit more complicated. The most challenging factor is standardizing measurement metrics. Besides, the tools and systems used to evaluate and estimate the quality level have some more complex features. If you successfully establish your quality system, then adversities become easier to cope with.

It’s post-editors’duty to apprehend the dynamics of MT quality evaluation, and the distinction between MT and HT quality evaluation procedures. Thus, they are supposed to be aware of the expected error patterns. It will be more convenient to utilize the error categorization with your well-trained staff (QE staff and post-editors).

4. Post-editing Technique

The fourth and the last topic is the key to success. It covers appropriate method and principles, as well as the perspective post-editors usually acquire. Post-edit technique is formed using the materials prepared for the previous topics and the data obtained from the above mentioned procedures, and it is separately defined for almost every individual customized engines.

The core rule for this topic is that post-edit technique, as a concept, is likely to be definitely differentiated from traditional edit and/or review technique(s). Post-editors are likely to be capable of:

  • reading and analyzing the source text, raw MT output and categorized and/or annotated errors as a whole.
  • making changes where necessary.
  • considering the post-edited data as a part of data set to be used in engine improvement, and performing his/her work accordingly.
  • applying the rules defined for the quality expectation levels.

As briefly described in topic #3, the distance between the measured output quality and required target quality may be seen as the post-edit distance. It roughly defines the post-editor’s tolerance and the extent to which he/she will perform his work. Other criterion allowing us to define the technique and the performance is the target quality group. If the target text is expected to be of publishable quality then it’s called full post-edit and otherwise light post-edit. Light & full post-edit techniques can be briefly defined as above but the distinction is not always so clear. Besides, under/over edit concepts are likely to be included to above mentioned issues. You may want to include some more details about these concepts in the post-editor training sessions; enriching the training materials with some examples would be a great idea!

About Selçuk Özcan

Selçuk Özcan has more than 5 years’ experience in the language industry and is a co-founder of Transistent Language Automation Services. He holds degrees in Mechanical Engineering and Translation Studies and has a keen interest in linguistics, NLP, language automation procedures, agile management and technology integration. Selçuk is mainly responsible for building high quality production models including Quality Estimation and deploying the ‘train the trainers’ model. He also teaches Computer-aided Translation and Total Quality Management at the Istanbul Yeni Yuzyil University, Translation & Interpreting Department.

Transistent Partner PR Image 320x320

 

Read More about KantanMT’s Partnership with Transistent in the official News Release, or if you are interested in joining the KantanMT Partner Program, contact Louise (info@kantanmt.com) for more details on how to get involved. 

 

 

Translation Quality: How to Deal with It?

KantanMTSelcuk Ozcan, Transistent, KantanMT started the New Year on a high note with the addition of the Turkish Language Service Provider, Transistent to the KantanMT Preferred MT Supplier partner program.

Selçuk Özcan, Transistent’s Co-founder has given KantanMT permission to publish his blog post on Translation Quality. This post was originally published in Dragosfer and the Transistent Blog.

 

 

Literally, the word quality has several meanings, one of them being “a high level of value or excellence” according to Merriam-Webster’s dictionary. How should one deal with this idea of “excellence” when the issue at hand is translation quality? What is required, it looks like, is a more pragmatic and objective answer to the abovementioned question.

This brings us to the question “how could an approach be objective?” Certainly, the issue should be assessed through empirical findings. But how? We are basically in need of an assessment procedure with standardized metrics. Here, we encounter another issue; standardization of translation quality. From now on, we need to associate these concepts with the context itself in order to make them clear.

Image 1 blog Transistent

Monolingual issues

Bilingual issues

As it’s widely known, three sets of factors have an effect on the quality of the translation process in general. Basically, analyzing source text’s monolingual issues, target text’s monolingual issues and bilingual issues defines the quality of the work done. Nevertheless, the procedure should be based on the requirements of the domain, audience and linguistic structure of both languages (source and target); and in each step, this key question should be considered: ‘Does the TT serve to the intended purpose?’

We still have not dealt with the standardization and quality of acceptable TT’s. The concept of “acceptable translation” has always been discussed throughout the history of translation studies. No one is able to precisely explain the requirements. However, a further study on dynamic QA models needs to go into details.There are various QA approaches and models. For most of them, acceptable translation falls into somewhere between bad and good quality, depending on the domain and target audience. The quality level is measured through the translation error rates developed to assess MT outputs (BLEU, F-Measure and TER) and there are four commonly accepted quality levels; bad, acceptable, good and excellent.

The formula is so simple: the TT containing more errors is considered to be worse quality. However, the errors should be correlated with the context and many other factors, such as importance for the client, expectations of the audience and so on. These factors define the errors’ severity as minor, major, and critical. A robust QA model should be based upon accurate error categorization so that reliable results may be obtained.

We tried to briefly describe the concept of QA modeling. Now, let’s see what’s going on in practice. There are three publicly available QA models which inspired many software developers on their QA tool development processes. One of them is LISA (Localization Industry Standards Association) QA Model. The LISA Model is very well known in the localization and translation industry and many company-specific QA models have been derived from it.

The second one is J2450 standard that was generated by SAE (Society for Automotive Engineers) and the last one is EN15038 standard, approved by CEN (Comité Européen de Normalisation) in 2006. All of the above mentioned models are the static QA models. One should create his/her own frameworks in compliance with the demands of the projects. Nowadays, many of the institutes have been working on dynamic QA models (EU Commission and TAUS). These models enable creating different metrics for several translation/localization projects.

About Selçuk Özcan

Selçuk Özcan has more than 5 years’ experience in the language industry and is a co-founder of Transistent Language Automation Services. He holds degrees in Mechanical Engineering and Translation Studies and has a keen interest in linguistics, NLP, language automation procedures, agile management and technology integration. Selçuk is mainly responsible for building high quality production models including Quality Estimation and deploying the ‘train the trainers’ model. He also teaches Computer-aided Translation and Total Quality Management at the Istanbul Yeni Yuzyil University, Translation & Interpreting Department.

Read More about KantanMT’s Partnership with Transistent in the official News Release, or if you are interested in joining the KantanMT Partner Program, contact Louise (info@kantanmt.com) for more details on how to get involved. 

Transistent KantanMT Preferred MT Supplier

 

Essential KPIs for SMT: F-Measure

In our last blog post I discussed some of the Key Performance Indicators (KPIs) used by SMT developers to estimate the performance quality of their KantanMT engines. These KPIs help developers understand what aspects of their SMT engine are performing well and which need improvement.

In this blog I’m going to dive deep into F-Measure, a KPI which can provide insight into; the relevancy of your training data, the engine’s overall performance, and the suitability of an SMT engine for a particular domain or content type.

What is F-Measure?

F-Measure is a KPI which measures the precision and recall capabilities of an SMT system. It can also be viewed as a measure of translation accuracy and relevancy.

f-measure analagyBursting Red Balloons

In SMT, we can look at precision as a percentage of retrieved words that are relevant and recall (sometimes referred to as sensitivity) as the percentage of relevant words that are retrieved.

This is best explained using a thought experiment: So, imagine a box containing 10 red balloons and a few green balloons. Suppose we burst 5 balloons at random and 3 of these are red – we can calculate our precision as 3/5 (60%) and our recall as 3/10 (40%).

These two calculations offer a good estimation of the accuracy with which we are able to burst red balloons – the higher this calculation is, the better the chances that we will burst more red balloons.

So what has this thought experiment got to do with SMT systems?

Precision & Recall

Precision and recall are closely related to the understanding of accuracy.  Since SMT systems are based on pattern recognition, it is helpful to see how accurate they are at retrieving words and more importantly how relevant this retrieval is.

F-Measure is a calculation of both precision and recall and is expressed as a ratio.
If we go back to our balloon bursting experiment, precision was calculated as 60% and recall as 40%. To express these two values as a ratio, we can use the F-Measure formula as follows:-

f-measure     0.48

Source: Statistical Machine Translation by Philipp Koehn

In simple terms – we’re just not good at bursting red balloons 🙂

F-Measure and SMT engines

Using F-Measure we can get a general sense of the accuracy in which an SMT engine can retrieve words. If we examine the distribution of these scores across a set of reference translations we can get helpful insights which we can use to improve the training data and boost engine performance.

Here’s an example of an F-Measure distribution:

Statistical Machine Translation graph

Screen shot of Kantan BuildAnalytics F-Measure distributions

The overall F-Measure score for this particular SMT engine is 72%. This is a good value, and we can say that this engine is highly accurate at retrieving words for its target language and domain i.e. it has high precision in word retrieval and these are relevant to the target domain.

Also, the distribution of these scores across the reference translation set shows that the majority of these (60% of the total reference translations set) are in the 70-100% range. The distribution graph also shows that approximately 20% of the reference translations score less than 40%.  By examining this we can check to see if words/terminology are missing, and then create additional training material to improve the performance the engine.

Closing remarks…

F-Measure is a good starting point for understanding the quality of an SMT engine but it does have a major downfall, while it measures the recall and precision capabilities of an SMT engine, it doesn’t take into the account the order in which the words are retrieved.

So, as in the famous sketch with Andre Previn and Morecambe and Wise, we may know all the notes but not necessarily in the right order:

Morecambe_and_Wise_YT_screenshot

One more thing…
In order to improve the F-Measure score, an engine must become aware of word order, which is sometimes referred to as fluency. In the next post I will look at BLEU (Bilingual Evaluation Understudy) and examine how this metric helps us to further understand the quality of SMT engines.

KantanMT’s new BuildAnalytics technology illustrates the distribution of F-Measure, BLEU, and TER score across our members SMT engines. It also generates a Gap Analysis, highlighting missing words in members training data, and gives a provides KantanMT members with a training data rejects reports – great information that helps members of KantanMT.com develop a deep understanding of how their SMT engines work, and how to improve their performance.

You can watch a video of Kantan BuildAnalytics here>>