Giulia Mattoni, an Italian Translation Technology student from DCU talks about her experience using Machine Translation for evaluating player support content localization. Giulia’s fascinating view illustrates why this area needs further research, and how she used KantanMT to evaluate MT and post-editing for this type content. Continue reading
For our fourth post in the ‘5 Questions’ series, we are very excited to introduce you to Louise Faherty, Technical Project Manager of the Professional Services team at KantanMT. This series of interviews aim to give you a deeper insight into the people at KantanMT. Continue reading
KantanMT.com was used in the course ‘Machine Translation and Post-editing,’ which was taught for the first time in the ‘Degree in Modern Languages Applied to Translation’ in UAH. English and Spanish were the main languages used during this course.
The KantanPEX Rule Editor enables members of KantanMT reduce the amount of manual post-editing required for a particular translation by creating, testing and deploying post-editing automation rules on their Machine Translation engines (client profiles).
The editor allows users to evaluate the output of a PEX (Post-Editing Automation) rule on a sample of translated content without needing to upload it to a client profile and run translation jobs. Users can enter up to three pairs of search and replace rules, which will be run in descending order on your content.
How to use the KantanMT PEX Rule Editor
Login into your KantanMT account using your email and your password.
You will be directed to the ‘Client Profiles’ tab in the ‘My Client Profiles’ page. The last profile you were working on will be ‘Active’ and marked in bold.
To use the ‘PEX-Rule Editor’ with a profile other than the ‘Active’ profile, click on the new profile name to select that profile for use with the ‘Kantan PEX-Rule editor’.
Then click the ‘KantanMT’ tab and select ‘PEX Editor’ from the drop-down menu.
You will be directed to the ‘PEX Editor’ page.
Type the content you wish to test on, in the ‘Test Content’ box.
Type the content you wish to search for in the ‘PEX Search Rules’ box.
Type what you want the replacement to be in the ‘PEX Replacement Rules’ box and click on the ‘Test PEX Rules’ button to test the PEX-Rules.
The results of your PEX-Rules will now appear in the ‘Output’ box.
Give the rules you have created a name by typing in the ‘Rule Name’ box.
Select the profile you wish to apply this rule(s) to and then click on the ‘Upload Rule’ button.
KantanMT PEX editor helps reduce the amount of manual post-editing required for a particular translation, hence, reducing project turn-around times and costs. For additional information on PEX-RULES and the Kantan PEX-Rule editor please click on the links below. For more details about KantanMT localization products and ways of improving work productivity and efficiency please contact us at firstname.lastname@example.org.
It’s a fact, infiltrating new markets is the key to increasing profits, and the first item on any company’s internationalization checklist should be to make sure it communicates product information in a way its target customers can understand.
Leading on from the 2006 research, CSA’s updated survey in 2014 was based on a sample of three thousand global respondents, and it reinforced earlier results by showing that 55% only buy from websites in their native language. This jumped dramatically to 80% in cases where the buyers English language ability is limited.
When it comes to selling internationally, tapping into new revenue streams demands translated content. But, what happens when you have thousands of product descriptions that need to be localized into a plethora of languages?
This is where the fun begins for localization teams with well-established traditional translation workflows in place. Their existing method seems fine…but when it’s time to scale up, this is when cracks in the process begin to appear.
The translation workflow works best when it matches the scale and velocity for the content created whether it is product descriptions, manuals or online help documentation.
The challenging part –
How to translate product descriptions with velocity and to scale?
We have heard a great deal of arguments for and against machine translation and one of the most well known against arguments is “the quality is rubbish, sentences translated by machine translation are garbled and incomprehensible”. We in the language technology field hear this frequently and often shudder in disbelief at how these conclusions have been reached.
Generic or free machine translation systems in most cases do not produce great results, expecting such a system to produce publishable quality MT results or using it as benchmark for all MT systems is akin to extracting blood from a stone. Achieving good MT output takes time, care and the ability to customise the MT system properly.
Any company that is serious about breaking into international markets should also be serious about their MT strategy. They should be considering a customised MT solution that is tailored to their needs, not just by going for a cheap and/or supposedly free option.
Why is MT customisation so important?
Statistical machine translation is based on machine learning and pattern recognition. Segments with multiple word phrases or n-grams as they are known are identified with probability algorithms that select the most probable translation match. Generic or free MT systems typically have been built on a broad mix of content styles and types. This means it’s much harder for the MT system to identify the most likely or even relevant matches in generically built engines.
When the MT system is customised specifically for content that comes from a single domain, such as product descriptions for a specific categories e.g. Home and garden, fashion or electronic devices, the syntax, style and phraseology used will make sure that when an MT match is generated there will be a higher probability that the match will be closer to the desired output, resulting in a much more accurate translation.
How important is saving costs?
Of Course Machine Translation can save costs – if done properly, significant savings can be made. But, saving costs is often not the end goal for implementing a serious MT strategy. The real gains come from increasing productivity without a compromise in quality. Why translate 2000 words a day when you can machine translate and post-edit 8000 words with no loss of quality? Really it can be done! See an example first hand (Netthandelen’s case study PDF download).
When it comes to eCommerce and selling hundreds of products online the words to be translated are counted in billions not thousands, and without MT, traditional localization budgets would become more and more expensive, so MT is really the only practical solution. But, if MT is considered a way to save money by cutting corners then it is doomed to fail from the outset.
It will fail because it’s not sustainable, the effort and costs required to fix bad quality MT output are too great, and if fixing is neglected by publishing the content as is, it will result in angry customers who shop elsewhere – and they will, as the choice available now is greater than ever before!
- Generic free MT will not generate the same quality as customised MT
- Investing in a robust MT strategy will save time, costs and headaches in the long run
- Keep focus on communicating with the customer, in their language and your eCommerce business will thrive
Email email@example.com if you have questions or want to learn more about how Machine Translation works for product descriptions.
We have entered a new age, and a new technology has come into play: Machine Translation (MT). It’s globally accepted that MT systems dramatically increase productivity but it’s a hard struggle to integrate this technology into your production process. Apart from handling the engine building and optimizing procedures, you have to transform your traditional workflow:
The traditional roles of the linguists (translators, editors, reviewers etc.) are reconstructed and converged to find a suitable place in this new, innovative workflow. The emerging role is called ‘post-edit’ and the linguists assigned to this role are called ‘post-editors’. You may want to recruit some willing linguists for this role, or persuade your staff to adopt a different point of view. But whatever the case may be, some training sessions are a must.
What are covered in training sessions?
1. Basic concepts of MT systems
Post-editors should have a notion of the dynamics of MT systems. It is important to focus on the system that is utilized (RBMT/SMT/Hybrid). For widely used SMT systems, it’s necessary for them to know:
- how the systems behave
- the functions of the Translation Model and Language Model*
- input (given set of data) and output (raw MT output) relationship
- what changes in different domains
* It’s not a must to give detailed information about that topics but touching on the issue will make a difference in determining the level of technical backgrounds of candidates. Some of the candidates may be included in testing team.
2. The characteristics of raw MT output
Post-editors should know the factors affecting MT output. On the other hand, the difference between working on fuzzy TM systems and with SMT systems has to be mentioned during a proper training session. Let’s try to figure out what to be given:
- MT process is not the ‘T’ of the TEP workflow and raw MT output is not the target text expected to be output of ‘T’ process.
- In the earlier stages of SMT engines, the output quality varies depending on the project’s dynamics and errors are not identical. As the system improves quality level becomes more even and consistent within the same domain.
- There may be some word or phrase gaps in the systems’ pattern mappings. (Detecting these gaps is one of the main responsibilities of testing team but a successful post-editor must be informed about the possible gaps.)
3. Quality issues
This topic has two aspects: defining required target (end product) quality, and evaluation and estimation of output quality. The first one gives you the final destination and the second one makes you know where you are.
Required quality level is defined according to the project requirements but it mostly depends on target audience and intended usage of the target text. It seems similar to the procedure in TEP workflow. However, it’s slightly different; engine improvement plan should also be considered while defining the target quality level. Basically, this parameter is classified into two groups: publishable andunderstandable quality.
Evaluation and estimation aspect is a little bit more complicated. The most challenging factor is standardizing measurement metrics. Besides, the tools and systems used to evaluate and estimate the quality level have some more complex features. If you successfully establish your quality system, then adversities become easier to cope with.
It’s post-editors’duty to apprehend the dynamics of MT quality evaluation, and the distinction between MT and HT quality evaluation procedures. Thus, they are supposed to be aware of the expected error patterns. It will be more convenient to utilize the error categorization with your well-trained staff (QE staff and post-editors).
4. Post-editing Technique
The fourth and the last topic is the key to success. It covers appropriate method and principles, as well as the perspective post-editors usually acquire. Post-edit technique is formed using the materials prepared for the previous topics and the data obtained from the above mentioned procedures, and it is separately defined for almost every individual customized engines.
The core rule for this topic is that post-edit technique, as a concept, is likely to be definitely differentiated from traditional edit and/or review technique(s). Post-editors are likely to be capable of:
- reading and analyzing the source text, raw MT output and categorized and/or annotated errors as a whole.
- making changes where necessary.
- considering the post-edited data as a part of data set to be used in engine improvement, and performing his/her work accordingly.
- applying the rules defined for the quality expectation levels.
As briefly described in topic #3, the distance between the measured output quality and required target quality may be seen as the post-edit distance. It roughly defines the post-editor’s tolerance and the extent to which he/she will perform his work. Other criterion allowing us to define the technique and the performance is the target quality group. If the target text is expected to be of publishable quality then it’s called full post-edit and otherwise light post-edit. Light & full post-edit techniques can be briefly defined as above but the distinction is not always so clear. Besides, under/over edit concepts are likely to be included to above mentioned issues. You may want to include some more details about these concepts in the post-editor training sessions; enriching the training materials with some examples would be a great idea!
About Selçuk Özcan
Selçuk Özcan has more than 5 years’ experience in the language industry and is a co-founder of Transistent Language Automation Services. He holds degrees in Mechanical Engineering and Translation Studies and has a keen interest in linguistics, NLP, language automation procedures, agile management and technology integration. Selçuk is mainly responsible for building high quality production models including Quality Estimation and deploying the ‘train the trainers’ model. He also teaches Computer-aided Translation and Total Quality Management at the Istanbul Yeni Yuzyil University, Translation & Interpreting Department.
Read More about KantanMT’s Partnership with Transistent in the official News Release, or if you are interested in joining the KantanMT Partner Program, contact Louise (firstname.lastname@example.org) for more details on how to get involved.