Site icon KantanAI – Machine Translation – Neural Language Technology – AI – Localization Technology – Customer Support Solutions

RBMT vs SMT

Image

A commonly asked question within the localization industry is which is better: Rule Based or Statistical Machine Translations systems.  While both approaches have merits and advantages, the question in my mind is which offers the best future potential and best value for LSPs who are considering a future offering which includes an element of Machine Translation?

According to Don DePalma and his team at Common Sense Advisory, if you’re an LSP and haven’t been asked to provide an RFQ (Request for Quotation) that includes an element of Machine Translation, then you’re rapidly becoming the exception!

So as a successful LSP entrepreneur, which is the best wagon to hitch your horses to: Rule Based or Statistical Machine Translation?

First of all, what is Machine Translation?

Machine translation (MT) is automated translation or “translation carried out by a computer” – as defined in the Oxford English dictionary. It is the process by which computer software is used to translate a text from one natural language to another.

Machine Translation systems have been in development since the 1950s, however the technology required to develop successful MT systems was not up to par at this time and so research was largely put to the side. But in the last 15 years, as computational resources have became more mainstream and the internet opening up a wider multilingual and global community, interest in Machine Translation has been renewed.

There are three different types of Machine Translation systems available today. These are Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT) and hybrid systems – a combination of RBMT and SMT.

Rule-Based Machine Translation Technology

Rule-based machine translation relies on countless built-in linguistic rules and gigantic bilingual dictionaries for each language pair. RBMT system works by parsing text and creating a transitional representation from which the text in the target language is generated. This process requires extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules. RBMT uses a complex rule set and then transfers the grammatical structure of the source language into the target language.

In most cases, there are two steps: an initial investment that significantly increases the quality at a limited cost, and an ongoing investment to increase quality incrementally. While rule-based MT brings companies to a reasonable quality threshold, the quality improvement process is generally long and expensive. This has been a contributing factor to the slow adoption and usage of MT in the localization industry.

Surely, there must be a better approach!

Statistical Machine Translation Technology

Statistical Machine Translation (SMT) utilizes statistical translation models generated from the analysis of monolingual and bilingual content. Essentially this approach uses computing power to build sophisticated data models to translate one source language into another. This makes the use of SMT a far simpler option, and a significant factor in the broader adoption of statistical machine translation technology in the localization industry.

Building SMT models is a relatively quick and simple process. Using current systems – users can upload  training material and have an MT engne generated in a matter of hours. While it is genereally thought that a minimum of two million words are required to train an engine for a specific domain, it is possible to reach an acceptable quality threshold with much less.  The technology relies on bilingual corpora such as translation memories and glossaries for the system to learn the language patterns, and monolingual data is used to improve the fluency of the output as the engine has more text examples to choose from. SMT engines will prove to have a higher output quality if trained using domain specific training data such as; medical, financial or technical domains.

SMT technology is CPU intensive and requires an extensive hardware configuration to run translation models for acceptable performance levels. However, the introduction of cloud services, and the increasing availability of bilingual corpora are having a dramatic effect on the popularity of SMT systems, which is leading to a higher adoption rate in the language services industry.

RBMT vs. SMT

The Verdict

Statistical Machine Translation technology is growing in acceptance and is by far, the clear leader between both technologies. The increasing availability of cloud-based computing is providing a solution to the high computer processing power and storage capacity required to run SMT technology effectively, making SMT a game changer for the localization industry.

Training data for SMT engines is becoming more widely available, thanks to the internet and the increasing volumes of multilingual content being created by both companies and private internet users. High quality aligned bilingual corpora is still expensive and time consuming to create but, once created becomes a valuable asset to any organization implementing SMT technology, with translations benefiting from economies of scale over time.

Tony O’Dowd, Founder and Chief Architect, KantanMT.com

Exit mobile version