A few weeks ago we mentioned how the Machine Translation market is expected to reach USD 983.3 million by 2022. In yet another Industry Global forecast, it was announced that the Natural Language Processing (NLP) Market for the Healthcare and Life Sciences Industry is projected to grow up to USD 2.67 billion, almost doubling the current value of 1.10 billion. In this post, we will discuss the present state of arts of Machine Translation in Medicine, Healthcare and Clinical Practices, and at the same time delve into other recent innovations in technology that can enhance the language industry within Healthcare.
“My Doctor can surely tell me what’s wrong with me in my language?”
The short answer is – no, a healthcare provider may not always be able to communicate in your local language – especially if you are travelling or in another country. About 1 in 50 patient’s visit to the doctor will require an interpreter, and thus, translated content or verbal translation is essential for the contemporary Healthcare and Life Sciences Industry. For a more detailed and enlightening view of this issue, read this article on Machine leaning for medicine and this study on healthcare interpreting.
Translated content or verbal translation is essential for the contemporary Healthcare and Life Sciences Industry
The fact of the matter is that the needs of the Healthcare industry goes beyond mere point-of-care healthcare by doctors and extends to medical documents, web medical help, insurance claims forms, patient records, educational materials, studies and papers, warnings, IVR scripts to just name just a few. This evidently increases the challenges faced by the Healthcare industry. One of our major partners, CNGL Centre for Global Intelligent Content carried out extensive research in medical care to create a search system that allows users to access biomedical data from a variety of different sources. You can read more about the project in their blog.
So what’s the answer?
Why, Natural Language Processing (NLP) and Machine Translation (MT), of course! The type of NLP solutions for the Healthcare Industry can be broadly categorised into rule-based, statistical, and hybrid NLP solutions. Essentially, this is similar to Machine Translation categories, and works on the same ground rules. The Rule-based NLP technologies work on the basis of certain set of rules provided by humans. The statistical NLP solutions incorporate high end technologies such as machine learning, use the cause-and-effect relationship of language to derive a solution, and the hybrid NLP is the combination of both rule-based and statistical NLP technologies.
How can Statistical Machine Translation Help (SMT)?
Machine Translation can help the Healthcare Industry by automatically translating text or speech in one specific source language into another target language. Statistical Machine Translation (SMT) will translate a given string in the source text into a string in the target language. Simply put then, what SMT systems like KantanMT do is, among all possible target strings, the system selects the string with the highest probability match. Modern SMT is based on the intuition that a better way to compute these probabilities is by considering the behaviour of phrase or sequences of words. In addition to the translation model, SMT systems use a language model, which is usually framed as a probability distribution over strings that attempts to reflect how likely a string is to occur in a particular language.
Building an SMT system requires written and high computational resources with a huge number of parallel corpora between source and target languages at the sentence level. This corpora building can often be a challenging task, especially in Healthcare industry where a huge variation in Named Entities is possible. While we will discuss the challenges with MT in Healthcare in a little more detail in the next section, it should be sufficient in this section to note that the SMT quality depends largely on the language pair of the specific domain being translated. As such, though the need for Machine Translated content in Healthcare cannot be denied, its credibility and increased usage in the vertical can only be expedited with a more robust training data for the engine to “learn” from. KantanMT is a cloud-based, Customised SMT system, which inherently lends itself perfectly to this sort of machine learning or training.
To know more about how the Customised Machine Translation (CMT) by KantanMT can help you, ask for a demo today and shoot a mail to firstname.lastname@example.org.
Addressing potential challenges and pitfalls
First things first: Machine Translation as it stands today cannot perform without the help of human translators. So why should the Healthcare industry still use MT, or indeed, why does the Research and Markets study estimate a rise in the use of MT in the industry?
Simple answer: Content explosion! The Healthcare and Life Sciences Industry as it stands today cannot cater to an increasingly globalised world that required medical help, without the aid of MT – indeed, it is simply not feasible. Having mentioned that, we will quickly discuss the potential pitfalls and solutions of using MT, before rounding off with a look at the potential future of this industry.
Machine translation may lead to misunderstanding in Healthcare in the case of inaccurate translations. As such, if MT is being used, Healthcare experts must be ready to mitigate any misunderstanding through regular feedback. This feedback/ translation can in turn be used to train the MT engines to translate better for the domain.
Back-translation, which involves cutting and pasting translated text back into the translator, might help estimate accuracy and appropriateness of the translation, and this is a best-practice that should be carried out often in the Healthcare industry to avoid potentially risky situations.
The risk of misunderstanding increases with a patient with low literacy and limited levels of health education. Once again, in such cases, it is important that a trained human translator post-edits the MT output.
The call to action for the Healthcare industry right now then is to ensure that there is enough good quality legacy training data for engines to get “smarter”
Even though MT is already being used extensively in the Healthcare industry and clinical settings, medical organisations must be extremely cautious about the application of the translated content. Machine Translation needs to be incorporated in the Healthcare industry, but raw MT output can’t be utilised as the final product. An expert translator should review the content before patients can benefit from the translation.
Because of the boom in content in the Healthcare industry (be it research materials or clinical content), MT is rapidly emerging as an accessible supplementary to communication in the area. However, the performance of the engine remains imperfect and can vary greatly between language pairs. The call to action for the Healthcare industry right now then is to ensure that there is enough good quality legacy training data for engines to get “smarter” and create a data pool that can help MT content be more relevant, precise and useful to the vertical.
To talk more about how the Customised Machine Translation (CMT) by KantanMT can help you, ask for a demo today and shoot a mail to email@example.com.
“Machine Learning for Medicine – Idibon.” Idibon. N.p., 29 May 2013. Web. 20 Oct. 2015.
“Machine Translation in Medicine. A Quality Analysis of Statistical Machine Translation in the Medical Domain.” Machine Translation in Medicine. A Quality Analysis of Statistical Machine Translation in the Medical Domain. N.p., n.d. Web. 20 Oct. 2015.
“Natural Language Processing Market for Health Care and Life Sciences Industry by Type, Region – Global Forecast to 2020.” Natural Language Processing Market for Health Care and Life Sciences Industry by Type, Region – Global Forecast to 2020. N.p., n.d. Web. 20 Oct. 2015.
“The MT Industry Is Evolving: At KantanMT, We Are Growing Too!” Web log post. KantanMT Blog. N.p., n.d. Web.
Randhawa, Gurdeeshpal, Mariella Ferreyra, Rukhsana Ahmed, Omar Ezzat, and Kevin Pottie. “Using Machine Translation in Clinical Practice.” Canadian Family Physician. College of Family Physicians of Canada, Apr. 2013. Web. 10 Oct. 2015.
KantanMT Founder and Chief Architect, Tony O’Dowd was recently featured in one of Ireland’s major national newspapers; The Irish Times.
The author of the news article, Olive Keogh is a business journalist who specialises in writing about innovative Irish enterprises and startups. With Olive’s kind permission, we are republishing the Irish times article.
“It’s not widely known at home but Ireland has developed an international reputation for research in statistical machine translation. Trinity, DCU and UL are all recognised worldwide and 120 PhD students have graduated here with skills in the field in the last five years. That’s more than in any other country in Europe,” says Tony O’Dowd the man behind KantanMT, a new scalable, high-speed machine translation system based on the Moses decoder and the Amazon Web Services and Cloud Computing infrastructure.
O’Dowd has spent almost 30 years in the software localization sector with companies such as Lotus Development Corporation and Symantec. Xcelerator, the company behind KantanMT, is O’Dowd’s second start-up, but he was also involved in the formation of FIT, a training organisation set up in 1998 to provide IT skills and training for the long-term unemployed.
Economics of the Cloud
“We are leveraging the Moses MT decoder and multiple streams of research from the Centre for Global Intelligent Content to make statistical machine translation (SMT) technology available to the masses,” he says.
“Traditional SMT systems are slow, expensive to deploy, time-consuming to customise and complex to manage. In short, not for the faint-hearted. I wanted to harness the economics of the cloud to solve these problems. Using hundreds of high-powered cloud-based severs to convert training data into data models also accelerated the process of customisation and the development of SMT engines.”
O’Dowd points out that in addition to the cost factor, traditional SMT solutions can produce translations of dubious quality. By focusing on advanced natural language processes and data processing algorithms, KantanMT also addresses these quality issues.
“Because of the costs involved, SMT tends to be used by large organisations with big budgets and plenty of people available to work on the system. The KantanMT platform removes this expense and complexity and makes it a far more practical and usable tool for businesses both big and small. Our clients can customise, improve and deploy their own engines in a matter of days,” O’Dowd says.
O’Dowd took his first steps as an entrepreneur in 2000 when he set up Alchemy Software Development. It quickly became a leading player in the software localization sector with over 27,000 licences in use worldwide. This success didn’t go unnoticed. The company was sold to the largest privately owned localization service provider, Translations.com, in March 2007.
Prior to setting up Alchemy O’Dowd was technology manager for Symantec Corporation Ireland and responsible for establishing the organisation’s Asian localization hub in Japan. He was also executive vice-president of Corel Corporation and spent three years as a lecturer in Trinity College Dublin teaching microprocessor design and assembly language programming.
O’Dowd began working on the idea for KantanMT in 2011 while on a year “off” to retrain himself on cloud-based technologies. He employed an MBA student to do detailed research into the barriers preventing companies using SMT and says the major leap forward in computing and storage capacity provided by the cloud enabled him to build a platform for SMT systems that would have been inconceivable without it.
Xcelerator recently raised €1.1 million in seed funding from venture capital company Delta Partners and the Enterprise Ireland High Potential Start Up fund. Early versions of KantanMT were given away free to kill competition and grab market share but first revenues (based on a usage pricing model) began flowing this time last year and O’Dowd says it is now profitable. A second round of funding is planned for later this year.
The company currently employs 11 people in its offices in Dublin and Galway, but this is expected to rise to 20-25 by the end of 2015. Its focus is the export market and its biggest customers are independent software vendors from industries such as ecommerce, finance and electronics. The company also provides MT services to the language industry.
School of Hard Knocks
“Starting your first business is definitely daunting as everything is new and you’re travelling down every road for the first time,” O’Dowd says.
“Next time around there is a lot of commonality and because you’ve learned by engaging with the school of hard knocks, you’re better at anticipating the problems and meeting the challenges. You also have a better network of contacts, you’re less frazzled when things don’t go right and you can actually grow the business faster and at a higher level. You also get a better hearing from the funding community as they view you as a safe pair of hands.”
KantanMT is based in the Invent Building at DCU and O’Dowd says the resources and expertise provided by the Invent team were instrumental in getting KantanMT.com off the ground.
“KantanMT.com is the fastest growing SMT platform in the localization industry today. So far over 80.5 billion words have been uploaded to the platform as training data and more than 750 million words have been translated by our clients. When you consider this has all happened in the last nine months, the company is rapidly becoming one of the biggest translation hubs in the market,” O’Dowd says.
KantanMT took the time out to interview Steve Götz, Design & Innovation Lab Director at CNGL to find out a little more about his role in CNGL, his future outlook for the localization industry, and some of the projects he is working on.
Originally from the US, Steve holds degrees in Computer Science, Biomolecular Science, and an MBA in Technology and Innovation Strategy. After completing his master’s degree at Said Business School, University of Oxford, Steve joined the CNGL Centre for Global Intelligent Content in 2008, where he was drawn to the unique business and research ecosystem supporting Irish entrepreneurs.
Steve took on the role of Director within the CNGL Design & Innovation Lab (d.Lab), where he leads designers, developers and researchers in developing innovative products for established industry partners like Cisco, Intel and McAfee, and startups like; KantanMT, Scream technologies, and Emizar.
What is CNGL?
Steve:The CNGL Centre for Global Intelligent Content is an academia-industry partnership supported by Science Foundation Ireland, and CNGL researchers are focused on developing new ‘disruptive’ technologies that will have a positive impact on both society and industry. Its goal is to bridge the gap between research and products by fostering a unique ecosystem between academic research and the commercial industry. Researchers come from four Irish universities; Dublin City University, Trinity College Dublin, University College Dublin and the University of Limerick, CNGL’s academic partners.
CNGL has 17 industry partners, these partners range from large companies like Symantec and Microsoft who have a strong global foothold in the technology industry, to commercial startups like KantanMT that have the potential to become strong industry leaders.
Can you explain the Commercial/Research Ecosystem?
Steve: The ecosystem created and maintained through CNGL provides solutions to the industry in a number of different ways. Large established companies approach CNGL when they need a business or technology solution. They want to find a solution through research that they can either implement themselves or outsource.
When an entrepreneur approaches CNGL with a business idea, CNGL works with the entrepreneur to develop a plan and product roadmap to turn their idea into a business realization. In cases where the business idea is compatible with the research, or a larger company wishes to outsource, the large company then becomes a first reference customer for the new startup.
How was KantanMT a part of this ecosystem?
Steve: KantanMT became part of this ecosystem when entrepreneur Tony O’Dowd approached CNGL with his business idea – an automated Statistical Machine Translation (SMT) service that operates on the cloud. This was a good fit with the type research undertaken by CNGL; natural language parsing, text analytics, machine learning and predictive analytics.
Through market analysis Tony used the Minimum Viable Product (MVP) strategy to prove there was a need in the market for his idea. Together Tony and I worked out a suitable product roadmap and plan of what was to be done to get the lean startup up and running.
As Tony developed the KantanMT platform infrastructure, CNGL researchers were also working with the same timeline to develop an analytics feature that would fit into the platform adding value to the product and Tony’s business idea. When the platform was ready, the KantanWatch™ technology just dropped into its place on the platform.
The cooperation and open communication channels between Tony and the CNGL research team meant a viable product was ready to deploy in the market in a very short time frame, making the best possible use of critical resources; funding and time.
KantanWatch technology now licensed to KantanMT is an analytics feature that gives members the ability to monitor the performance of their cloud-based customized KantanMT engines. The key point of KantanWatch is to highlight areas where quality improvements to the engine can be made, and tracking the engine’s progress over time. This opened up a new flexibility for MT users not offered by existing MT vendors.
How do you see the localization industry developing over the next few years?
Steve:The future of the localization industry will most likely be driven by disruptive innovation, the industry is worth approx. $9 billion and this number is set to increase. As with other industries, changes in the localization industry may come from investment outside the industry.
An example of a successful ‘disruptive innovation’ that changed how we use mobile phones came from the iPhone. Apple used the iPhone to shift the focus of the mobile phone from its traditional call and text functions to a more interactive user experience. This disruptive innovation had a monumental impact on how we use and interact with mobile phones and it came from a computer manufacturer rather than a mobile phone or service provider.
The same may happen with the localization industry where the disruption might not come from the industry itself, but instead outside traditional channels. Within the language services industry, companies like Gengo, are playing a part to change and shape the industry by successfully introducing crowdsourcing models, and other Language Service Providers (LSPs) are beginning to follow suit. They are adapting to the demand for real-time translations that technology and the web are driving.
The concept of the ‘Digital native translator’ is becoming more popular and this concept is being fuelled by the need for not only real-time translations, but also the developments of micro blogging. Micro blogging gained popularity through Tumblr, Twitter, and Facebook where social media users post updates and share small or ‘micro’ pieces of information with their friends and the wider online community.
These developments in technology, and how information is created and consumed has not only made an impact in our everyday lives but it has been an instrumental tool in disaster relief and crisis management. This was evident during the Japanese earthquake in March 2011, when the Pacific Disaster Centre in Hawaii posted information about the earthquake on Twitter before it was reported by CNN.
Social media and text messages proved the most reliable methods of communication. This was also the case during the Haiti Earthquake in January 2010, where social media and SMS were used to identify people trapped or injured.
One of the presentations by Julie Dugdale, University of Grenoble 2, Bartel Van de Walle, Tilburg University, and Corinna Koeppinghoff, Tilburg University was a study on ‘Social Media and SMS in the Haiti Earthquake’. While the study found social media and SMS communications useful in the aftermath of the earthquake, it also highlighted a couple of issues. One of those issues was “the difficulties of processing information in a non-standard format from different sources and in various languages”.
Providing real-time translations is an area that will develop in the future and CNGL have been working on projects that may help facilitate this.
What other CNGL projects are there?
Steve: One project CNGL is currently working on is called Kanjingo, which is a real-time, mobile, and micro-crowdsourcing platform for hyper-local social translation.
The platform is a tool that translators can use to translate and post-edit small chunks of text or microposts from social media such as Twitter on their mobile device. This translation and post-editing service facilitated by Kanjingo is an example of how CNGL research contributes to society by providing a medium for valuable real-time translations to a variety of users including disaster relief organizations.
A product like Kanjingo has a lot of potential, not only in disaster situations but in any situation where small chunks of text require real-time translations like news reporting or international events. Language is always changing and the way we communicate through social media is very different to standard writing styles. The Kanjingo platform will be able generate high quality ‘social media speak’ bilingual language assets that can be incorporated into existing CAT tools i.e. training MT engines.
For more information on the CNGL Centre for Global Intelligent Content please go to www.cngl.ie
KantanMT recently announced the forthcoming release of KantanAnalytics™, a tool that provides segment level quality analysis for Machine Translation output. KantanMT has developed this new technology in partnership with the CNGL Centre for Global Intelligent Content, which is also based at Dublin City University.
KantanAnalytics measures the quality of the translations generated by KantanMT engines. The measurement provides a quality score for each segment translated through a KantanMT engine. This means that Language Service Providers (LSPs)will be able to:
accurately identify segments that require the most post-editing effort
accurately identify segments that match the client’s quality standards
better predict project completion times
offer more accurate pricing to their clients and set a price during the early stages of the project
KantanAnalytics is being rolled out to a sample of KantanMT members this month, July 2013. It will be made available to all members of the KantanMT platform in September 2013.
The CNGL Centre for Global Intelligent Content
CNGL was established in 2007 as a collaborative academia-industry research centre aiming to break new ground in digital intelligent content and to “revolutionise the global content value chain for enterprises, communities, and individuals” (CNGL, 2013).
CNGL says that it intends to “pioneer development of advanced content processing technologies for content creation, multilingual discovery, translation and localization, personalisation, and multimodal interaction across global markets”. Its adds that “these technologies will revolutionise the integration and unification of multilingual, multi-modal and multimedia content and interactions, and drive innovation across the global content value chain” (CNGL, 2013)