In my last post, I talked about the reasons that motivated me to start learning Irish. In the second instalment of my blog post, I would like to highlight some interesting aspects that reflect the current situation in Ireland, with relation to how the locals feel about their national language and their reactions to foreigners learning it. Continue reading
Following our launch of KantanNeural™ engines as part of our KantanFleet™ repository of pre-built MT engines, we received a number of questions and interest around the product. To address these questions, we asked Tony O’Dowd, CEO and Chief Architect of KantanMT.com a few questions about the Neural Machine Translation engines on KantanMT, the features and benefits of these engines and the impetus behind launching KantanNeural. Continue reading
Master’s student Ewa Nitoń of the University College London submitted her thesis as part of the MSc degree in Scientific, Technical and Medical Translation with Translation Technology. The following guest article is a reflection on her research concerning the application of Machine Translation in medical context. Ewa was supervised by Teaching Fellow and Lecturer Dr. Emmanouela Patiniotaki and she used KantanMT.com for her MSc research. Continue reading
Kirti Vashee, a well-known Machine Translation veteran and independent MT consultant, is currently writing a series on expert MT systems in his blog eMpTy pages. The in-depth posts and interviews by Kirti not only highlight the MT buyer’s expectations, but also stress what the Expert MT Developers are doing differently.
In his blog Kirti informs and introduces the reader to “competent MT technology alternatives available in the market today.” To date he has spoken about tauyou, Iconic and KantanMT. As Kirti points out, our client base consists of Language Service Providers as well as multinational enterprises. What makes KantanMT attractive to both of these diverse client bases is its extremely customisable, bespoke solution, which can be tailored according to the requirements of each client. Our clients can easily build their own Custom Machine Translation (CMT) engines, or they can opt for our Professional Services team to do it for them. Continue reading
Following the announcement of a direct collaboration of KantanLabs and the ADAPT Centre for Digital Content Technology, we got in touch with Professor Andy Way from the School of Computing in Dublin City University and ADAPT Centre to ask him about innovations in the field of automated translations as well as his thoughts on the engagement between KantanLabs and ADAPT. Continue reading
A commonly asked question within the localization industry is which is better: Rule Based or Statistical Machine Translations systems. While both approaches have merits and advantages, the question in my mind is which offers the best future potential and best value for LSPs who are considering a future offering which includes an element of Machine Translation?
According to Don DePalma and his team at Common Sense Advisory, if you’re an LSP and haven’t been asked to provide an RFQ (Request for Quotation) that includes an element of Machine Translation, then you’re rapidly becoming the exception!
So as a successful LSP entrepreneur, which is the best wagon to hitch your horses to: Rule Based or Statistical Machine Translation?
First of all, what is Machine Translation?
Machine translation (MT) is automated translation or “translation carried out by a computer” – as defined in the Oxford English dictionary. It is the process by which computer software is used to translate a text from one natural language to another.
Machine Translation systems have been in development since the 1950s, however the technology required to develop successful MT systems was not up to par at this time and so research was largely put to the side. But in the last 15 years, as computational resources have became more mainstream and the internet opening up a wider multilingual and global community, interest in Machine Translation has been renewed.
There are three different types of Machine Translation systems available today. These are Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT) and hybrid systems – a combination of RBMT and SMT.
Rule-Based Machine Translation Technology
Rule-based machine translation relies on countless built-in linguistic rules and gigantic bilingual dictionaries for each language pair. RBMT system works by parsing text and creating a transitional representation from which the text in the target language is generated. This process requires extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules. RBMT uses a complex rule set and then transfers the grammatical structure of the source language into the target language.
In most cases, there are two steps: an initial investment that significantly increases the quality at a limited cost, and an ongoing investment to increase quality incrementally. While rule-based MT brings companies to a reasonable quality threshold, the quality improvement process is generally long and expensive. This has been a contributing factor to the slow adoption and usage of MT in the localization industry.
Surely, there must be a better approach!
Statistical Machine Translation Technology
Statistical Machine Translation (SMT) utilizes statistical translation models generated from the analysis of monolingual and bilingual content. Essentially this approach uses computing power to build sophisticated data models to translate one source language into another. This makes the use of SMT a far simpler option, and a significant factor in the broader adoption of statistical machine translation technology in the localization industry.
Building SMT models is a relatively quick and simple process. Using current systems – users can upload training material and have an MT engne generated in a matter of hours. While it is genereally thought that a minimum of two million words are required to train an engine for a specific domain, it is possible to reach an acceptable quality threshold with much less. The technology relies on bilingual corpora such as translation memories and glossaries for the system to learn the language patterns, and monolingual data is used to improve the fluency of the output as the engine has more text examples to choose from. SMT engines will prove to have a higher output quality if trained using domain specific training data such as; medical, financial or technical domains.
SMT technology is CPU intensive and requires an extensive hardware configuration to run translation models for acceptable performance levels. However, the introduction of cloud services, and the increasing availability of bilingual corpora are having a dramatic effect on the popularity of SMT systems, which is leading to a higher adoption rate in the language services industry.
RBMT vs. SMT
- RBMT can achieve good results but the training and development costs are very high for a good quality system. In terms of investment, the customization cycle needed to reach the quality threshold can be long and costly.
- RBMT systems can be built with much less data than SMT systems, instead using dictionaries and language rules to translate. This sometimes results in a lack of fluency.
- Language is constantly changing, which means rules must be managed and updated where necessary in RBMT systems.
- SMT systems can be built in much less time and do not require linguistic experts to apply language rules to the system.
- SMT models require state-of the-art computer processing power and storage capacity to build and manage large translation models.
- SMT systems can mimic the style of the training data to generate output based on the frequency of patterns allowing them to produce more fluent output.
Statistical Machine Translation technology is growing in acceptance and is by far, the clear leader between both technologies. The increasing availability of cloud-based computing is providing a solution to the high computer processing power and storage capacity required to run SMT technology effectively, making SMT a game changer for the localization industry.
Training data for SMT engines is becoming more widely available, thanks to the internet and the increasing volumes of multilingual content being created by both companies and private internet users. High quality aligned bilingual corpora is still expensive and time consuming to create but, once created becomes a valuable asset to any organization implementing SMT technology, with translations benefiting from economies of scale over time.
Tony O’Dowd, Founder and Chief Architect, KantanMT.com