Interview with Nikos on Cross-Language Information Retrieval

university-of-limerick-logo

Nikos Katris, submitted his thesis; ‘Evaluation of Two Statistical Machine Translation Systems within a Greek-English Cross-Language Information Retrieval Architecture’ to University of Limerick  in October 2015. In his research he compared the results of KantanMT with the Moses system for information retrieval.

Nikos was supervised by Dr Richard Sutcliffe at the University of Limerick’s College of Science and Engineering Department of Computer Science and Information Systems (CSIS). Nikos kindly agreed to discuss his research in an interview. The University of Limerick and the Localisation Research Centre are KantanMT’s academic partners. Continue reading

What’s New in eCommerce in 2016? More Localization and Better Machine Translation

ECommerce_Whitepaper_KantanMT Download

If the post-Black Friday sales numbers are anything to go by, there’s no question any more that the face of eCommerce is changing, and with it, the brick-and-mortar retailers have started rethinking their business strategy. As this news piece about Scotland experiencing a major dip in shoppers goes on to prove, demand for online shopping will increase substantially in 2016. This in turn means that the need for content localization and translation for eTailers (online retailers) will be even more pressing during the coming new year. As the often quoted Common Sense Advisory report points out, 72.4% of consumers are more likely to buy from a site, which is in their native language.  Indeed, localization is no longer a good-to-have feature – it is now a must-have for all eCommerce businesses that aim to sell their products globally.

Chris Bishop, Managing Director of Microsoft Research, Cambridge, UK points out that “by 2026 we will have ubiquitous, human-quality translation among all European languages, thereby eliminating the language barrier throughout Europe.” Bishop’s prediction does not sound far off the mark at all when we take into account the fact that in the past ten years, Machine Translation (MT) has improved by leaps and bounds. Early MT was rules-based (RBMT) and required sets of linguistic rules, and it worked moderately well within a prescribed domain. However, this was resource intensive and cost prohibitive for many.

By 2026 we will have ubiquitous, human-quality translation among all European languages, thereby eliminating the language barrier throughout Europe

Chris Bishop, Managing Director of Microsoft Research, Cambridge, UK

The turning point for using MT in business came with the advent of the Internet, the SaaS model and the open source development model for software. These new changes in technology helped build the foundation for Statistical Machine Translation (SMT) research, and subsequently the open source development of the Moses Decoder. Moses enabled researchers and private companies to commercialise Statistical MT and develop it to the custom solutions it is today. The year of 2016 and beyond, will see further research in the fields of Natural Language Processing (NPL), Deep learning and machine learning, contributing directly to immense improvements in the fields of Custom MT.

The KantanMT Business Team published a new white paper, which provides an in depth understanding of how eTailers in 2016 will be affected by Machine Translation, and also goes on to discuss how Custom Machine Translation when compared to generic MT systems, will emerge as the clear winner in solving eTailing localization issues in the coming year.

Here are some of the highlights how MT will evolve in 2016 for eTailers:

  • eTailers will use a combination of only CMT or CMT and Human Post-Editing to reach new markets ahead of their competitors
  • With increased multilingual customer demand for products, content translation will find support in auto scaling
  • Custom Machine Translation will be used more widely as eCommerce customers expand globally

download_call

Summary:

Machine Translation is no longer a luxury. It is an essential component as a Tier 1 application to support global business. The purpose of this paper is to highlight how Machine Translation and more importantly Custom Machine Translation technology has come of age, in terms of quality, speed and scalability. During 2016 and beyond eTailers need to ensure that they review their globalization strategies to reflect these advances in technology, so they can maximise their global growth potential.

Download the KananMT white paper on eCommerce today!

eComm_whitepaper_calltoaction

Machines to the Rescue Once Again in Medicine: How can Machine Translation Contribute to Healthcare?

A few weeks ago we mentioned how the Machine Translation market is expecteMedicined to reach USD 983.3 million by 2022. In yet another Industry Global forecast, it was announced that the Natural Language Processing (NLP) Market for the Healthcare and Life Sciences Industry is projected to grow up to USD 2.67 billion, almost doubling the current value of 1.10 billion. In this post, we will discuss the present state of arts of Machine Translation in Medicine, Healthcare and Clinical Practices, and at the same time delve into other recent innovations in technology that can enhance the language industry within Healthcare.

“My Doctor can surely tell me what’s wrong with me in my language?”

The short answer is – no, a healthcare provider may not always be able to communicate in your local language – especially if you are travelling or in another country. About 1 in 50 patient’s visit to the doctor will require an interpreter, and thus, translated content or verbal translation is essential for the contemporary Healthcare and Life Sciences Industry. For a more detailed and enlightening view of this issue, read this article on Machine leaning for medicine and this study on healthcare interpreting.

Translated content or verbal translation is essential for the contemporary Healthcare and Life Sciences Industry

The fact of the matter is that the needs of the Healthcare industry goes beyond mere point-of-care healthcare by doctors and extends to medical documents, web medical help, insurance claims forms, patient records, educational materials, studies and papers, warnings, IVR scripts to just name just a few. This evidently increases the challenges faced by the Healthcare industry. One of our major partnersCNGL Centre for Global Intelligent Content  carried out extensive research in medical care to create a search system that allows users to access biomedical data from a variety of different sources. You can read more about the project in their blog.

So what’s the answer?

Why, Natural Language Processing (NLP) and Machine Translation (MT), of course! The type of NLP solutions for the Healthcare Industry can be broadly categorised into rule-based, statistical, and hybrid NLP solutions. Essentially, this is similar to Machine Translation categories, and works on the same ground rules. The Rule-based NLP technologies work on the basis of certain set of rules provided by humans. The statistical NLP solutions incorporate high end technologies such as machine learning, use the cause-and-effect relationship of language to derive a solution, and the hybrid NLP is the combination of both rule-based and statistical NLP technologies.

How can Statistical Machine Translation Help (SMT)?
Machine Translation can help the Healthcare Industry by automatically translating text or speech in one specific source language into another target language. Statistical Machine Translation (SMT) will translate a given string in the source text into a string in the target language. Simply put then, what SMT systems like KantanMT do is, among all possible target strings, the system selects the string with the highest probability match. Modern SMT is based on the intuition that a better way to compute these probabilities is by considering the beKantanMT Machine Translationhaviour of phrase or sequences of words. In addition to the translation model, SMT systems use a language model, which is usually framed as a probability distribution over strings that attempts to reflect how likely a string is to occur in a particular language.

Building an SMT system requires written and high computational resources with a huge number of parallel corpora between source and target languages at the sentence level. This corpora building can often be a challenging task, especially in Healthcare industry where a huge variation in Named Entities is possible. While we will discuss the challenges with MT in Healthcare in a little more detail in the next section, it should be sufficient in this section to note that the SMT quality depends largely on the language pair of the specific domain being translated. As such, though the need for Machine Translated content in Healthcare cannot be denied, its credibility and increased usage in the vertical can only be expedited with a more robust training data for the engine to “learn” from. KantanMT is a cloud-based, Customised SMT system, which inherently lends itself perfectly to this sort of machine learning or training.

To know more about how the Customised Machine Translation (CMT) by KantanMT can help you, ask for a demo today and shoot a mail to demo@kantanmt.com.

demo

Addressing potential challenges and pitfalls

First things first: Machine Translation as it stands today cannot perform without the help of human translators. So why should the Healthcare industry still use MT, or indeed, why does the Research and Markets study estimate a rise in the use of MT in the industry?

Simple answer: Content explosion! The Healthcare and Life Sciences Industry as it stands today cannot cater to an increasingly globalised world that required medical help, without the aid of MT – indeed, it is simply not feasible. Having mentioned that, we will quickly discuss the potential pitfalls and solutions of using MT, before rounding off with a look at the potential future of this industry.

  • Machine translation may lead to misunderstanding in Healthcare in the case of inaccurate translations. As such, if MT is being used, Healthcare experts must be ready to mitigate any misunotebooknderstanding through regular feedback. This feedback/ translation can in turn be used to train the MT engines to translate better for the domain.
  • Back-translation, which involves cutting and pasting translated text back into the translator, might help estimate accuracy and appropriateness of the translation, and this is a best-practice that should be carried out often in the Healthcare industry to avoid potentially risky situations.
  • The risk of misunderstanding increases with a patient with low literacy and limited levels of health education. Once again, in such cases, it is important that a trained human translator post-edits the MT output.

The call to action for the Healthcare industry right now then is to ensure that there is enough good quality legacy training data for engines to get “smarter” 

Final words

Even though MT is already being used extensively in the Healthcare industry and clinical settings, medical organisations must be extremely cautious about the application of the translated content. Machine Translation needs to be incorporated in the Healthcare industry, but raw MT output can’t be utilised as the final product. An expert translator should review the content before patients can benefit from the translation.

Because of the boom in content in the Healthcare industry (be it research materials or clinical content), MT is rapidly emerging as an accessible supplementary to communication in the area. However, the performance of the engine remains imperfect and can vary greatly between language pairs. The call to action for the Healthcare industry right now then is to ensure that there is enough good quality legacy training data for engines to get “smarter” and create a data pool that can help MT content be more relevant, precise and useful to the vertical.

To talk more about how the Customised Machine Translation (CMT) by KantanMT can help you, ask for a demo today and shoot a mail to demo@kantanmt.com.

demo

Resources:

“Machine Learning for Medicine – Idibon.” Idibon. N.p., 29 May 2013. Web. 20 Oct. 2015.

“Machine Translation in Medicine. A Quality Analysis of Statistical Machine Translation in the Medical Domain.” Machine Translation in Medicine. A Quality Analysis of Statistical Machine Translation in the Medical Domain. N.p., n.d. Web. 20 Oct. 2015.

“Natural Language Processing Market for Health Care and Life Sciences Industry by Type, Region – Global Forecast to 2020.” Natural Language Processing Market for Health Care and Life Sciences Industry by Type, Region – Global Forecast to 2020. N.p., n.d. Web. 20 Oct. 2015.

“The MT Industry Is Evolving: At KantanMT, We Are Growing Too!” Web log post. KantanMT Blog. N.p., n.d. Web.

Randhawa, Gurdeeshpal, Mariella Ferreyra, Rukhsana Ahmed, Omar Ezzat, and Kevin Pottie. “Using Machine Translation in Clinical Practice.” Canadian Family Physician. College of Family Physicians of Canada, Apr. 2013. Web. 10 Oct. 2015.