Here at KantanMT, over the month of November, some of our team members got extremely serious about their facial hair. And why wouldn’t they? It’s Movember, a time to kick cancer and raise awareness and money by growing out some serious ‘staches! Carlos Collantes, Dimitar Shterionov, Seosamh O Cinneide, and Brian Murray from KantanMT grew some gentlemanly moustaches, and to show off our results, we got together for some group pictures. Continue reading
Today is Friday the 13th, known by many as the unluckiest day of the year (as most of Jason Voorhees’s victims would probably agree). Indeed, in the Anglo-Saxon world and in some other parts of the globe, Friday the 13th still has the potential to paralyse and invoke irrational dread. The Stress Management Center and Phobia Institute in Asheville, North Carolina, announced that an estimated 17 to 21 million people in the US are affected by this day – so much so that they would go out of their way to avoid having to face the day – thus making it the most feared day in the history. Continue reading
Ease of use and simplicity are always on the minds of our Developers, hence the making of KantanTimeLine™. KantanTimeLine enables KantanMT clients to view the life cycle of their KantanMT engine. This empowers our clients as they are able to find exactly what is negatively or positively affecting the quality of their engines. Clients are able to keep track of things such as, Training Data uploads, Translation jobs, Engine Tuning, templates, Build jobs and so on through the KantanTimeLine.
How to use KantanTimeLine™
Login into your KantanMT account using your email and your password.
You will be directed to the ‘My Client Profiles’ page. You will be in the ‘Client Profiles’ section of the ‘My Client Profiles’ page. The last profile you were working on will be ‘Active’.
If you wish to use ‘KantanTimeLine’ with another profile other than the ‘Active’ profile. Click on the profile that you want to you wish to view the ‘KantanTimeLine’.
Click on the ‘TimeLine’ tab.
You will now be directed to the ‘TimeLine’ page for your chosen profile.
To restore an Archived Build select the Build you wish to restore from the ‘Archives’ drop-down menu and click on the ‘Restore’ button.
To delete an archived Build click on the ‘Delete’ button.
To archive a Build click on the ‘Archive’ button of the build you wish to archive.
To view or edit the description of a build click on the ‘Yellow Notepad’ icon.
To filter the timeline click on the ‘Filter’ drop down-menu and select the filter you wish to use.
Additional Information and Support
KantanTimeLine™ is one of the many products offered by KantanMT to make the integration of Machine Translation into the workflow of our clients seamless. For more information on TimeLine or any KantanMT products please contact us at email@example.com.
TimeLine can also be used in KantanBuildAnalytics. To learn how TimeLine is incorporated into KantanBuildAnalytics please click on the link below or contact us at firstname.lastname@example.org.
KantanISR technology enables KantanMT members to perform instant segment retraining using a pop-up editor. The technology is designed to permit the near-instantaneous submission of post-edited translations into a KantanMT engine so that KantanMT members can submit segments for retraining, hence bypassing the need to completely rebuild the engine.
KantanISR was developed with usability, efficiency and productivity in mind as members simply need to login to their KantanMT account, go to their main dashboard and submit new training segments using the KantanISR Editor. This adding of high quality training data to a KantanMT engine will improve the translation quality of that engine and reduce post-editing requirements.
- Login into your KantanMT account using your email and your password.
- You will be directed to the ‘My Client Profiles’ page. You will be in the ‘Client Profiles’section of the ‘My Client Profiles’ page. The last profile you were working on will be‘Active’.
- If you wish to use the ‘KantanISR’ with another profile other than the ‘Active’ profile. Click on the profile you wish to use the ‘KantanISR’ with, then click on the ‘Training Data’ tab.
- You will be directed to the ‘Training Data’ page. Now click on the ‘IRS’ tab.
- The ‘KantanISR’ wizard will now pop-up on your screen.
- Add the source language text in the ‘Source’ text editor fields. Add the corresponding target language text in the ‘Target’ text editor fields.
- Then click on the ‘Save’ button if your happy with your retraining data. If not click the‘Cancel’ button.
- When you click the save button a ‘KantanISR successful’ pop-up will appear on your screen, click the ‘OK’ button and you will be directed back to the ‘Training Data’ page.
Using KantanISR through KantanAPI
Please Note: The KantanAPI is only available to KantanMT members in the Enterprise Plan.
Members’ can also get the benefit of KantanISR through KantanAPI by using HTTP
GET requests. The API expects:
- A user authorisation token (‘API token’) which can be gotten by clicking on the ‘API’
- The name of the client profile you wish to use.
- A source segment and its target segment in the languages specified when profile was created.
To learn more about KantanISR or get help with KantanMT technologies, please contact us at email@example.com. Hear from the Development team on why KantanISR increases productivity and efficiency for KantanMT customers.
Statistical Machine Translation (SMT) has many uses – from the translation of User Generated Content (UGC) to Technical Documents, to Manuals and Digital Content. While some use cases may only need a ‘gist’ translation without post-editing, others will need a light to full human post-edit, depending on the usage scenario and the funding available.
Post-editing is the process of ‘fixing’ Machine Translation output to bring it closer to a human translation standard. This, of course is a very different process than carrying out a full human translation from scratch and that’s why it’s important that you give full training for staff who will carry out this task.
Training will make sure that post-editors fully understand what is expected of them when asked to complete one of the many post-editing type tasks. Research (Vasconcellos – 1986a:145) suggests that post-editing is a honed skill which takes time to develop, so remember your translators may need some time to reach their greatest post-editing productivity levels. KantanMT works with many companies who are post-editing at a rate over 7,000 words per day, compared to an average of 2,000 per day for full human translation.
Types of Training: The Translation Automation User Society (TAUS) is now holding online training courses for post-editors.
Post-editing quality levels vary greatly and will depend largely by the client or end-user. It’s important to get an exact understanding of user expectations and manage these expectations throughout the project.
Typically, users of Machine Translation will ask for one of the following types of post-editing:
- Light post-editing
- Full post-editing
The following diagram gives a general outline of what is involved in both light and full post-editing. Remember however, the effort to meet certain levels of quality will be determined by the output quality your engine is able to produce
Generally, MT users would carry out productivity tests before they begin a project. This determines the effectiveness of MT for the language pair, in a particular domain and their post-editors ability to edit the output with a high level of productivity. Productivity tests will help you determine the potential Return on Investment of MT and the turnaround time for projects. It is also a good idea to carry out productivity tests periodically to understand how your MT engine is developing and improving. (Source: TAUS)
You might also develop a tailored approach to suit your company’s needs, however the above diagram offers some nice guidelines to start with. Please note that a well-trained MT engine can produce near human translations and a light touch up might be all that is required. It’s important to examine the quality of the output with post-editors before setting productivity goals and post-editing quality levels.
In recent years, post-editing skills have become much more of an asset and sometimes a requirement for translators working in the language industry. Machine Translation has grown considerably in popularity and the demand for post-editing services has grown in line with this. TechNavio predicted that the market for Machine Translation will grow at a compound annual growth rate (CAGR) of 18.05% until 2016, and the report attributes a large part of this rise to “the rapidly increasing content volume”.
While the task of post-editing is markedly different to human translation, the skill set needed is almost on par.
According to Johnson and Whitelock (1987), post-editors should be:
- Expert in the subject area, the text type and the contrastive language.
- Have a perfect command of the target language
Is it also widely accepted that post-editors who have a favourable perception of Machine Translation perform better at post-editing tasks than those who do not look favourably on MT.
How to improve Machine Translation output quality
Pre-editing is the process of adjusting text before it has been Machine Translated. This includes fixing spelling errors, formatting the document correctly and tagging text elements that must not be translated. Using a pre-processing tool like KantanMT’s GENTRY can save a lot of time by automating the correction of repetitive errors throughout the source text.
More pre-editing Steps:
Writing Clear and Concise Sentences: Shorter unambiguous segments (sentences) are processed much more effectively by MT engines. Also, when pre-editing or writing for MT, make sure that each sentence is grammatically complete (begins with a capital letter, has at least one main clause, and has an ending punctuation).
Using the Active Voice: MT engines work impressively on text that is clear and unambiguous, that’s why using the active voice, which cuts out vagueness and ambiguity can result in much better MT output.
There are many pre-editing steps you can carry out to produce better MT output. Also, keep in mind writing styles when developing content for Machine Translation to cut the amount of pre-editing required. Get tips on writing for MT here.
For more information about any of KantanMT’s post-editing automation tools, please contact: Gina Lawlor, Customer Relationship Manager (firstname.lastname@example.org).
A commonly asked question within the localization industry is which is better: Rule Based or Statistical Machine Translations systems. While both approaches have merits and advantages, the question in my mind is which offers the best future potential and best value for LSPs who are considering a future offering which includes an element of Machine Translation?
According to Don DePalma and his team at Common Sense Advisory, if you’re an LSP and haven’t been asked to provide an RFQ (Request for Quotation) that includes an element of Machine Translation, then you’re rapidly becoming the exception!
So as a successful LSP entrepreneur, which is the best wagon to hitch your horses to: Rule Based or Statistical Machine Translation?
First of all, what is Machine Translation?
Machine translation (MT) is automated translation or “translation carried out by a computer” – as defined in the Oxford English dictionary. It is the process by which computer software is used to translate a text from one natural language to another.
Machine Translation systems have been in development since the 1950s, however the technology required to develop successful MT systems was not up to par at this time and so research was largely put to the side. But in the last 15 years, as computational resources have became more mainstream and the internet opening up a wider multilingual and global community, interest in Machine Translation has been renewed.
There are three different types of Machine Translation systems available today. These are Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT) and hybrid systems – a combination of RBMT and SMT.
Rule-Based Machine Translation Technology
Rule-based machine translation relies on countless built-in linguistic rules and gigantic bilingual dictionaries for each language pair. RBMT system works by parsing text and creating a transitional representation from which the text in the target language is generated. This process requires extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules. RBMT uses a complex rule set and then transfers the grammatical structure of the source language into the target language.
In most cases, there are two steps: an initial investment that significantly increases the quality at a limited cost, and an ongoing investment to increase quality incrementally. While rule-based MT brings companies to a reasonable quality threshold, the quality improvement process is generally long and expensive. This has been a contributing factor to the slow adoption and usage of MT in the localization industry.
Surely, there must be a better approach!
Statistical Machine Translation Technology
Statistical Machine Translation (SMT) utilizes statistical translation models generated from the analysis of monolingual and bilingual content. Essentially this approach uses computing power to build sophisticated data models to translate one source language into another. This makes the use of SMT a far simpler option, and a significant factor in the broader adoption of statistical machine translation technology in the localization industry.
Building SMT models is a relatively quick and simple process. Using current systems – users can upload training material and have an MT engne generated in a matter of hours. While it is genereally thought that a minimum of two million words are required to train an engine for a specific domain, it is possible to reach an acceptable quality threshold with much less. The technology relies on bilingual corpora such as translation memories and glossaries for the system to learn the language patterns, and monolingual data is used to improve the fluency of the output as the engine has more text examples to choose from. SMT engines will prove to have a higher output quality if trained using domain specific training data such as; medical, financial or technical domains.
SMT technology is CPU intensive and requires an extensive hardware configuration to run translation models for acceptable performance levels. However, the introduction of cloud services, and the increasing availability of bilingual corpora are having a dramatic effect on the popularity of SMT systems, which is leading to a higher adoption rate in the language services industry.
RBMT vs. SMT
- RBMT can achieve good results but the training and development costs are very high for a good quality system. In terms of investment, the customization cycle needed to reach the quality threshold can be long and costly.
- RBMT systems can be built with much less data than SMT systems, instead using dictionaries and language rules to translate. This sometimes results in a lack of fluency.
- Language is constantly changing, which means rules must be managed and updated where necessary in RBMT systems.
- SMT systems can be built in much less time and do not require linguistic experts to apply language rules to the system.
- SMT models require state-of the-art computer processing power and storage capacity to build and manage large translation models.
- SMT systems can mimic the style of the training data to generate output based on the frequency of patterns allowing them to produce more fluent output.
Statistical Machine Translation technology is growing in acceptance and is by far, the clear leader between both technologies. The increasing availability of cloud-based computing is providing a solution to the high computer processing power and storage capacity required to run SMT technology effectively, making SMT a game changer for the localization industry.
Training data for SMT engines is becoming more widely available, thanks to the internet and the increasing volumes of multilingual content being created by both companies and private internet users. High quality aligned bilingual corpora is still expensive and time consuming to create but, once created becomes a valuable asset to any organization implementing SMT technology, with translations benefiting from economies of scale over time.
Tony O’Dowd, Founder and Chief Architect, KantanMT.com
Welcome back to the second part of this blog series, which examines ‘innovation as strategy’. Please feel free to comment and share.
The primary goal of an “Innovation Strategy”, as defined by Porter, is to leapfrog competitors via the introduction of a completely new, or notably better product or service. The best example I can think of is Apple and its introduction of the Apple iPod.
I was part of the Sony Walkman generation (I even had a Sony Discman!). But when Apple released the iPod – well it was a no brainer, Sony was ditched and I happily joined the hip new iPod generation!
In the 90’s LSPs were viewed as innovative if they were using Translation Memory (TM) technologies such as TRADOS and Alchemy CATALYST. Today this is no longer the case. TM technologies are now considered as standard, and are an expected part of the process. Translation Memories are no longer differentiators!
As Machine Translation becomes more accessible, both in terms of cost and ease of use, progressive mid-sized LSPs are increasingly more eager to integrate this technology into their workflows.
Easy access to affordable MT has given many Language Service Providers (LSPs) the opportunity to become innovative, inching ahead of competitors. It has also given them the opportunity to offer the same Machine Translation services that in the past were only provided by large LSPs.
The technological playing field is now being levelled. Ignoring an Innovation Strategy that includes the introduction of Machine Translation may well leave some LSPs on the side-line in future project negotiations, as they compete with more progressive LSPs who have adopted the latest technologies.
Have you tried Machine Translation on KantanMT.com? It’s easy, and free to get started. Sign up for your 14 day free trial today and start translating within hours.
Watch out for KantanMT’s post on differentiation strategies.
Tony O’Dowd, Founder and Chief Architect, KantanMT.com
Welcome. This is a four part blog series which will examine Porter’s core strategies for competitive advantage. During the series we will look at how these strategies can be applied to companies working in the translation industry.
Michael Porter, Harvard Business School, explains that competitive advantage occurs when an organisation “acquires or develops an attribute or combination of attributes that allows it to outperform its competitors.”
Expanding on this concept, in his book “Competitive Strategy” (1980, a book which was voted the ninth most influential management book of the 20th century) – and again in “Competitive Advantage” (1985, a book I read during my years in college) – he surmised four core strategies companies should embrace in order to create a clear and superior competitive advantage in their markets.
I thought it would be interesting to see how Machine Translation – as a growing service differentiator in the LSP world – would fit into Porter’s four strategies, and to examine if it ticks all of the Competitive Advantages check boxes!
Cost Leadership Strategy
Porter defines “Cost Leadership” as offering products or services at the lowest possible cost in the industry. The emphasis here is on cost rather than price; cost is what you purchase your products/services at and well, price is what you sell these on at – hopefully obtaining a nice profit in the process, helping your company grow and thrive. I guess in a nutshell, it’s all about avoiding operating at a loss by optimising this cost/price ratio.
But the devil of achieving that cost/price optimisation is in the detail of efficiently running a day-to-day innovative business. And by running a business that develops an attribute, or attributes, that differentiates it from its competitors. Successful companies that embrace Porter’s Cost concept must by necessity strategically vary their Cost attributes through the product/service they offer. A good example is Walmart, where they offer key items at deep discounts, while selling other products at less aggressive discounts. It is different sides of the same cost/price coin and taken holistically can be a very successful strategy. Walmart has successfully beaten off all of its major competitors in the US domestic market for decades by pursuing this particular Cost Leadership strategy.
So what’s the take-out here for Localization Service Providers (LSPs) on cost/price? Well, for the majority of translation quotes, the per-word translation-costs represent the lion’s share of the total project costs: in many cases this is as much as 85%. So while some LSPs may focus on containing the costs of their support services (such as engineer, project management, review and edit etc.), the really successful ones realise that it is by focusing on the translation-costs – that 85% of cost area – that they can gain most competitive advantage.
This reality has been manifesting itself as a significant and wholehearted move by many LSPs. Many are now moving towards Translation Automation as a cost saver. Clearly, for an LSP to embrace a “Cost Leadership Strategy”, it must be relentless in pursuing a translation automation strategy. Only by developing such a strategy will an LSP give itself the strong differentiating cost attribute that allows it to outperform its competitors.
Machine Translation is a key component of any translation automation strategy, and its use can positively impact on the translation-cost component of any localization project. For instance, one of our KantanMT members reported a 37% reduction in translation costs as a result of integrating MT into their automated translation workflow.
…Read more about Porter’s strategies in Friday’s blog.
Tony O’Dowd, Founder and Chief Architect