5 Global Companies Localizing Right

Globalization Localization KantanMT

Globalization is no longer a modern phenomenon. With accelerating technological advancements in every sphere including communication, manufacturing and transport, even Globalization 2.0 is a somewhat dated concept. So what’s next? Continue reading

Industry Leader Meets Academia: KantanMT Interviews Prof. Andy Way on MT’s Future and More

ResearchLabs.jpg

Following the announcement of a direct collaboration of KantanLabs and the ADAPT Centre for Digital Content Technology, we got in touch with Professor Andy Way from the School of Computing in Dublin City University and  ADAPT Centre to ask him about innovations in the field of automated translations as well as his thoughts on the engagement between KantanLabs and ADAPT. Continue reading

Machine Translation Trend: Translation Cycles Instead of One-Off Projects

KantanMT recently published a white paper on what global companies can expect to see in 2016 for Machine Translation (MT). The MT industry is rapidly charrows-151433_1280anging and moulding itself to the technical needs and globalization requirements of the present day. Our white paper puts forward six major MT trends that all businesses need to heed in order to stay relevant and ahead of their competitors.

Continue reading

Improving workflow integration and efficiency with KantanAPI

What is the KantanAPI?

KantanAPI enables KantanMT clients to interact with KantanMT as an on-demand web service. It also provides a number of different services including translation, file upload and retrieval and job launches.

With the KantanAPI  you not only have the opportunity to integrate KantanMT into your workflow systems but also the ability to receive on-demand translations from your KantanMT engines. All these services make the experience with Machine Translation as seamless as possible.

Accessing KantanAPI

Please Note: The API is only available to KantanMT members in the Enterprise Plan.

To access the KantanMT API you will first need your ‘API token’. This token can be found in the ‘API’ tab on the ‘My Client Profiles’ page of your KantanMT account.

Once you have your token you can use the API in a number of ways

  1. Using the API tab on the ‘My Client Profiles’ page in the KantanMT Web interface
  2. Using the REST interface via HTTP GET or POST requests
  3. Using one of our various connectors, which are built using our KantanAPI

For more details on implementing your API solution via the REST interface, please see the full API technical documentation at the following link:

How to use KantanAPI?

Login into your KantanMT account using your email and your password.

You will be directed to the ‘My Client Profiles’ page. You will be in the ‘Client Profiles’ section of the ‘My Client Profiles’ page. The last profile you were working on will be ‘Active’.

If you wish to use the ‘KantanAPI’ with another profile other than the ‘Active’ profile. Click on the profile you wish to use the ‘KantanAPI’ with, then click on the ‘API’ tab.

API tab

You will be directed to the ‘API Settings’ page. Now click on the ‘Launch API’ button.

Launching API

A ‘Launch API’ pop-up will now appear on your screen asking you ‘Are you sure you want to launch the API?’ Click ‘OK’.

launch Pop-up alert

The ‘API Status’ will now change from ‘offline’ to ‘initialising’, the ‘Launch API’ button will now change to ‘Launching API’ .

Launching API

When your KantanAPI launches the ‘API Status’ will now change from ‘initialising’ to ‘running’, the ‘Launching API’ button changes to ‘Shutdown API’ and you should now be able to click on the ‘Translate’ button.

API running

Type the text you wish to translate in the text box and click on the ‘Translate’ button.

Translating

The translated text will now appear in the ‘Translated Text’ box. If you wish to make any changes to the translated text simply place the cursor inside the ‘Translated Text’ box and make the changes. Save these changes by clicking the ‘Retrain Engine’ button.

Retrain Engine

Test if your engine was successfully retrained by clicking the ‘Translate’ button. The retrained text will now appear in the ‘Translated Text’ box.

If you don’t wish to retrain your engine and you are happy with the translated text in the ‘Translated Text’ box. You may continue translating other text or shut down your KantanAPI by clicking the ‘Shutdown API’ button.

When you click the ‘Shutdown API’ button a pop-up will now appear asking you ‘Are you sure you want to shout down the API?’ Click ‘OK’.

Shutdown Pop-up alert

The ‘Shutdown API’ button will now change to ‘Terminating API’, the ‘API status’ will now change from ‘running’ to ‘terminating’ and you shouldn’t be able to click on the ‘Translate’ or ‘Retrain Engine’ button.

Terminating API

You will now be directed back to the initial screen on the API Settings page.

API settings page

 

Additional Support

KantanAPI™ is one of the various machine translation services offered by KantanMT to improve  productivity for our clients and also enable them to be more efficient. For more information on KantanAPI or any KantanMT products please contact us at info@kantanmt.com.

For more details on the KantanMT API please see the following links and the video below:

Sue’s Top Tips for Building MT Engines

Sue McDermott, KantanMTI’m new to machine translation and one of the things I’ve been doing at KantanMT is learning how to refine training data with a view to building stock engines.

Stock engines are the optional training data provided by KantanMT to improve the performance of your customized MT engine. In this post I’m going to describe the process of building an engine and refining the training data.

The building process on the platform is quite simple. From your dashboard on the website select “My Client Profiles” where you will find two profiles, which have already been set up. A default profile and sample profile; both of which let you run translation jobs straight away.

To create your own customized profile select ‘New’ at the top of the left-most column. This launches the client Profile Wizard.  Enter the name of your new engine; try to make this something meaningful, or use an easily recognizable standard around how you name your profiles. This makes it easier to recognize which profile is which, when you have more than one profile.

When you select ‘next’ you will be asked to specify the source and target languages from drop down menus. The wizard lets you distinguish between different variants of the same language for example Canadian English or US English. Let’s say we’re translating from Canadian English to Canadian French. If you’re not sure which variant you need, have a quick look at the training data, which will give you the language codes.

The next step gives you an option to select a stock engine from a drop down menu. The stock engines are grouped according to their business area or domain.

You will see a summary of your choices, if you’re happy with them select ‘create’. Your new engine will be shown in the list of your client profiles. However, while you have created your engine, you haven’t yet built it.

KantanMT Stock Engine Training data
Stock training data available for social and conversational domains on the KantanMT platform.

 

Building Your Engine

Selecting your profile from the list will make it the current active engine.  By selecting the Training Data tab you can upload any additional training data easily by using the drag and drop function. Then select the ‘Build’ option to begin building your engine.

It’s always a good idea to supply as much useful training data as possible. This ‘educates’ the engine in the way your organization typically translates text.

Once the build job has been submitted, you can monitor its progress in the ‘My Jobs’ page.

When the job is completed the BuildAnalytics™ feature is created. This can be accessed by clicking on the database icon to the left of the profile name. BuildAnalytics will give you feedback on the strength of your engine using industry standard scores, as well as details about your engines word count. The tabs across the page will give you access to more detail.

The summary tab lets you to see the average BLEU, F-Measure and TER scores for the engine, and the pie charts show you a summary of the percentage scores for all segments. For more detail select the respective tabs and use the data to investigate individual segments.

KantanMT BuildAnalytics Feature
KantanBuildAnalytics provides a granular analyis of your MT engine.

 

A Rejects Report is created for every file of Training Data uploaded. You can use this to determine why some of your data is not being used, and improve the uptake rate of your data.

Gap analysis gives you an effective way to improve your engine with relevant glossary or noise lists, which you can upload to future engine builds. By adding these terminology files in either TBX (Terminology Interchange) or XLSX (Microsoft Excel Spreadsheet) formats you will quickly improve the engines performance.

The Timeline tag shows you the evolution of your engine over its lifetime. This feature lets you compare the statistics with previous builds, and track all the data you have uploaded. On a couple of occasions, I used the archive feature to revert back to a previous build, when the engine building process was not going according to plan.

KantanMT Timeline
KantanMT Timeline lets you view your entire engine’s build history.

 

Improving Your Engine

A great way to improve your engines performance is to analyze the rejects report for the files with a higher rejection rate.  Once you understand the reasons segments are rejected you can begin to address them.  For example, an error 104 is caused by a difference in place holder counts. This can be something as simple as the source language using the % sign where the target language uses the word ‘percent’. In this case a preprocessor rule can be created to fix the problem.

KantanMT Rejects Report Error 104
A detailed rejects report shows you the errors in your MT engine.

A PEX rule editor is accessed from the KantanMT drop down menu. This lets you try out your preprocessor rules, and see the effect that they have in the data. I would suggest directly copying and pasting from the rejects report to the test area and applying your PEX rule to ensure you’re precisely targeting the data concerned. You can get instant feedback using this tool.

Once you’re happy with the way the rules work on the rejected data it’s useful to analyze the rest of the data to see what effect the rules will have.  You want to avoid a situation where using a rule resolves 10 rejects, but creates 20 more. Once the rules are refined copy them to the appropriate files (source.ppx, target.ppx) and upload with the training data. Remember that the rules will run against the content in the order they are specified.

When you rebuild the engine they will be incorporated, and hopefully improve the scores.

Sue’s 3 Tips for Successfully Building MT Engines

  1. Name your profiles clearly – When you are using a number of profiles simultaneously knowing what each one is (Language pair/domain) will make it much easier as you progress through the building process.
  2. Take advantage of BuildAnalytics – Use the insights and Gap analysis features to give you tips on improving your engine. Listening to these tips can really help speed up the engine refinement process.
  3. The PEX Rule Editor is your friend – Don’t be afraid to try out creating and using new PEX rules, if things go south you can always go back to previous versions of your engine.

My internship at KantanMT.com really opened my eyes to the world of language services and machine translation. Before joining the team I knew nothing about MT or the mechanics behind building engines. This was a great experience, and being part of such a smoothly run development team was an added bonus that I will take with me when I return ITB to finish my course.

About Sue McDermott

Sue is currently studying for a Diploma in Computer Science from ITB (Institute of Technology Blanchardstown). Sue joined KantanMT.com on a three month internship. She has a degree in English Literature and a background in business systems, and is also a full-time mum for the last 17 years.

Email: info@kantanmt.com, if you have any questions or want more information on the KantanMT platform.

Language Industry Interview: KantanMT speaks with Maxim Khalilov, bmmt Technical Lead

Language Industry Interview: KantanMT speaks with Maxim Khalilov, bmmt Technical LeadThis year, both KantanMT and its preferred Machine Translation supplier, bmmt, a progressive Language Service Provider with an MT focus, exhibited side by side at the tekom Trade Fair and tcworld conference in Stuttgart, Germany.

As a member of the KantanMT preferred partner program, bmmt works closely with KantanMT to provide MT services to its clients, which include major players in the automotive industry. KantanMT was able to catch up with Maxim Khalilov, technical lead and ‘MT guru’ to find out more about his take on the industry and what advice he could give to translation buyers planning to invest in MT.

KantanMT: Can you tell me a little about yourself and, how you got involved in the industry?

Maxim Khalilov: It was a long and exciting journey. Many years ago, I graduated from the Technical University in Russia with a major in computer science and economics. After graduating, I worked as a researcher for a couple of years in the sustainable energy field. But, even then I knew I still wanted to come back to IT Industry.

In 2005, I started a PhD at Universitat Politecnica de Catalunya (UPC) with a focus on Statistical Machine Translation, which was a very new topic back then. By 2009, after successfully defending my thesis, I moved to Amsterdam where I worked as a post-doctoral researcher at the University of Amsterdam and later as a RD manager at TAUS.

Since February 2014, I’ve been a team lead at bmmt GmbH, which is a German LSP with strong focus on machine translation.

I think my previous experience helped me to develop a deep understanding of the MT industry from both academic and technical perspectives.  It also gave me a combination of research and management experience in industry and academia, which I am applying by building a successful MT business at bmmt.

KMT: As a successful entrepreneur, what were the three greatest industry challenges you faced this year?

MK: This year has been a challenging one for us from both technical and management perspectives. We started to build an MT infrastructure around MOSES practically from scratch. MOSES was developed by academia and for academic use, and because of this we immediately noticed that many industrial challenges had not yet been addressed by MOSES developers.

The first challenge we faced was that the standard solution does not offer a solid tag processing mechanism – we had to invest into a customization of the MOSES code to make it compatible with what we wanted to achieve.

The second challenge we faced was that many players in the MT market are constantly talking about the lack of reliable, quick and cheap quality evaluation metrics. BLEU-like scores unfortunately are not always applicable for real world projects. Even if they are useful when comparing different iterations of the same engines, they are not useful for cross language or cross client comparison.

Interestingly, the third problem has a psychological nature; Post-Editors are not always happy to post edit MT output for many reasons, including of course the quality of MT. However, in many situations the problem is that MT post-editing requires a different skillset in comparison with ‘normal’ translation and it will take time before translators adopt fully to post editing tasks.

KMT: Do you believe MT has a say in the future, and what is your view on its development in global markets?

MK: Of course, MT will have a big say in the language services future. We can see now that the MT market is expanding quickly as more and more companies are adopting a combination TM-MT-PE framework as their primary localization solution.

“At the same time, users should not forget that MT has its clear niche”

I don’t think a machine will be ever able to translate poetry, for example, but at the same time it does not need to – MT has proved to be more than useful for the translation of technical documentation, marketing material and other content which represents more than 90% of the daily translators load worldwide.

Looking at the near future I see that the integration of MT and other cross language technologies with Big Data technologies will open new horizons for Big Data making it a really global technology.

KMT: How has MT affected or changed your business models?

MK: Our business model is built around MT; it allows us to deliver translations to our customers quicker and cheaper than without MT, while at the same time preserving the same level of quality and guaranteeing data security. We not only position MT as a competitive advantage when it comes to translation, but also as a base technology for future services. My personal belief, which is shared by other bmmt employees is that MT is a key technology that will make our world different – where translation is available on demand, when and where consumers need it, at a fair price and at its expected quality.

KMT: What advice can you give to translation buyers, interested in machine translation?

MK: MT is still a relatively new technology, but at the same time there is already a number of best practices available for new and existing players in the MT market. In my opinion, the four key points for translation buyers to remember when thinking about adopting machine translation are:

  1. Don’t mix it up with TM – While TMs mostly support human translators storing previously translated segments, MT translates complete sentences in an automatic way, the main difference is in these new words and phrases, which are not stored in a TM database.
  2. There is more than one way to use MT – MT is flexible, it can be a productivity tool that enables translators to deliver translations faster with the same quality as in the standard translation framework. Or MT can be used for ‘gisting’ without post-editing at all – something that many translation buyers forget about, but, which can be useful in many business scenarios. A good example of this type of scenario is in the integration of MT into chat widgets for real-time translation.
  3. Don’t worry about quality – Quality Assurance is always included in the translation pipeline and we, like many other LSPs guarantee, a desired level of quality to all translations independently of how the translations were produced.
  4. Think about time and cost – MT enables translation delivery quicker and cheaper than without MT.

A big ‘thank you’ to Maxim for taking time out of his busy schedule to take part in this interview, and we look forward to hearing more from Maxim during the KantanMT/bmmt joint webinar ‘5 Challenges of Scaling Localization Workflows for the 21st Century’ on Thursday November 20th (4pm GMT, 5pm CET and 8am PST).

KantanMT Industry Webinar 5 Challenges of Scaling Localization for the 21st Century_Webinar

Register here for the webinar or to receive a copy of the recording. If you have any questions about the services offered from either bmmt or KantanMT please contact:

Peggy Linder, bmmt (peggy.lindner@bmmt.eu)

Louise Irwin, KantanMT (louisei@kantanmt.com)

SMT Quality Challenge

KantanMT Machine Translation TechnologyOne of the biggest challenges when customizing Statistical Machine Translation (SMT) is improving the engine after its initial development. While you can build a baseline engine using existing Translation Memories (TM), terminology and monolingual training data assets – the real challenge is going beyond this, and achieving even higher levels of quality. More importantly, how can you do this rapidly with minimum cost and effort? A proactive approach to measuring the quality of your training data will greatly assist in doing this.

Kantan BuildAnalytics™ is a new technology that addresses this head-on and helps SMT developers to build engines that are production ready, fast!

What is Kantan BuildAnalytics?

Kantan BuildAnalytics brings a new level of transparency to the SMT building and training process, and KantanMT users can now build higher performing engines for each domain, resulting in less post-editing requirements.

How it works…

When you build a KantanMT engine, some of your training data is automatically extracted and kept to one side. This is called a Reference Data Set – and contains both source and target texts. After a KantanMT engine is built, this Reference Data Set is used to calculate a series of automated quality scores – including BLEU (Bilingual Evaluation Understudy), F-Measure and TER.

This Reference Data Set is also used to perform a Gap Analysis. Gap Analysis is a quick way to determine any missing words in the engine’s phrase-tables. I’ll come back to this later and demonstrate how Gap Analysis can improve the quality performance of KantanMT engines.

But for now, let’s focus on the automated quality scores of BLEU, F-Measure and TER.

BuildAnalytics uses the KantanMT data visualization library to graphically display the distribution of these automated scores based on the Reference Data Set. Since an automated score is calculated for each text segment within the Reference Data Set, this means we get a detailed view of how a KantanMT engine is performing and how it should generate translated output.

By analysing these scores and the Gap Analysis results, and examining the translated output, users of KantanMT are producing higher quality engines because their training data choices are more strategic and refined.

F-Measure

Let’s look at F-Measure first, as this is the most straightforward to understand and visualize.  F-Measure scores show how precise a KantanMT engine is when retrieving words, and how many words it can retrieve or recall during translation. This is why it is commonly referred to as a Recall and Precision measurement. By expressing these two measurements as a ratio, it is a good indicator of the engines performance and its ability to translate content.

KantanMT F-Measure
KantanMT engine F-Measure score distribution

However, while your KantanMT engine may have a high F-Measure score – it doesn’t mean that these words are recalled in the correctly translated order.  We need another metric to give us an indication of how well the engine translated the text and BLEU is one of the most recognized and automated metric for estimating the texts fluency.

BLEU

BLEU is an automatic evaluation metric well known in both the industry and academia, which calculates an estimation of text fluency. Fluency is a measure of the correspondence between a KantanMT engine output and that of a professional translator.

Since the Reference Data Set consists of both source and human translated equivalents, which were created by a professional translator, BLEU score can be calculated by comparing the output of a KantanMT engine to this Reference Data Set.

KantanMT BLEU score
KantanMT engine BLEU score distribution

In practice, BLEU achieves a high correlation with human judgement of quality and remains one of the most popular automated metrics in use today.

TER

TER standards for Translation Error Rate and is used to estimate the amount of post-editing required to transform a generated translation to its original human translation equivalent. In simple terms this is a count of the number of insertions, deletions and substitutions required to transform a segment to match its original human translation equivalent.

KantanMT TER score
KantanMT engine TER score distribution

So the lower this score, the less transformation required which means the less post-editing required too.

Working with Kantan BuildAnalytics™

BuildAnalytics is a really great way to see all these automated scores in action. It uses KantanMT data visualization technology to graphically present these scores, helping developers of KantanMT engines to fine-tune their training data and maximize their engine’s quality performance.

Let’s take a closer look at how this data visualization can be used to gain insights into an engine and determine if it is a high or low performing engine, and what steps we can take to improve it.

Here’s the summary distribution graphs for an engine that contains approx. 3.2m words. It’s a small engine within a technical domain. Its overall scores are:

KantanMT BuildAnalytics Graph

These Summary Graphs show the distribution of scores, grouped into bands (i.e. <40%, 40-54% etc.), for each automated score. This is very helpful in determining the scores’ overall distribution, and how the KantanMT engine is likely to be performing.

Here are the detailed distribution graphs for each automated score:

KantanMT distribution graphs

By reviewing both the Summary Graphs and the more detailed Distribution Graphs we can make some observations of how this engine would most likely perform. My observations are included as part of the commentary in the table above.

It’s important to point out that no one individual score gives an absolute of how a KantanMT engine will perform. We need to take a holistic view on how to determine a general sense of the performance of the engine by reviewing all automated scores together.

Using Kantan BuildAnalytics users can get a good sense of how a KantanMT engine will perform in a production environment and with a little practice and experimentation, they can use this knowledge to build higher performing MT engines.

Gap Analysis

I mentioned this concept earlier in the post, so let’s take a closer look at this really helpful new feature. Gap Analysis determines how many untranslated words remain in the generated translations. These missing words, or ‘Gaps’ can quickly be identified and filled by introducing the most relevant training data to your KantanMT engine and re-training it.

The Gap Analysis feature not only lists the gaps, it also presents suitable training data, which can be post-edited and resubmitted as training data to improve overall engine’s performance. This makes filling the gaps just that little bit easier!

One more (very important) thing…

Most quality improvements for SMT systems will be created by fine tuning terminology and filling data gaps. Post-editing raw-MT output and a focus on minimizing data gaps will significantly improve the quality performance of your KantanMT engines. This cannot be done without the involvement of professional translators. They have the skills, knowledge and linguistic expertise to finesse terminology, identify gaps and choose better training data. While BuildAnalytics helps SMT developers get engines ready for production, ultimately, it’s the professional translator that should have the final say in how production-ready it truly is!

To get the most from your Machine Translation engine, always keep in mind:

  • Measuring and improving training data – high quality training data is the first step to building a successful Machine Translation engine.
  • Take a holistic approach to evaluating performance – automatic evaluation metrics can give a good indicator of how your KantanMT engine will perform, but metrics alone are insufficient for measuring post-editing effort.

Kantan BuildAnalytics is available to Enterprise members of KantanMT, but you can also experience this quality estimation and measurement software by signing up for a free trial on KantanMT.com.

Leveraging MT to Improve Productivity

KantanMT Leveraging MT in BusinessCommunication is the one of the most important elements of business, and Machine Translation is a flexible tool that can be used to facilitate communication in a wide variety of scenarios and situations. Multinationals and other companies operating globally can take advantage of Machine Translation to achieve productivity gains.

This two part blog series examines two very different examples of implementing Machine Translation. This first post will look at what multinational organizations should consider before introducing Machine Translation to their business, and the second post will discuss the productivity gains and competitive advantages that can be achieved by Language Service Providers (LSPs) who adopt MT.

What is a multinational and why should it use Machine Translation?

Multinational corporations or global businesses are organizations operating in more than one country or region. The concept of an ‘international company’ has been around for hundreds of years, going back to the trading companies, which were established in the 1700s. Outside political agendas, their main purpose was to trade in spices and other commodities throughout Asia and Europe exposing traders to different languages and cultures.

Hundreds of years later, global communication is common place as more businesses operate internationally. There are no boundaries, and companies with worldwide operations require a constant flow of multilingual communication in order to maintain relationships between global employees, customers and stakeholders.

Multinational organizations typically have two types of content; external and internal. External content is created and released to the public; corporate documents, investor information, Corporate Social Responsibility (CSR) and marketing communications. On the other hand, internal content is created for use within the company, this is usually in the form of email and chat communications, memos and other internal documents.

To Translate or not to translate

Organizations without an in house translation team, often outsource the translation of external content to a reputable LSP. This ensures a guaranteed level of quality for the translation, and it also means that the process of localization is more efficient and cost effective. This is because, over time language assets in the form of translation memories, can be built up and leveraged to off-set the cost of future translations.

Internal content, however, is mostly comprised of communications between departments; emails, chats and information on sales and marketing activities. These are usually not translated professionally for a number of reasons:

  • Cost – the volume to be translated can make costs unmanageable
  • Confidentiality – managing sensitive information is more difficult
  • Real-time translation – emails and chat conversations generally requires real-time speed

As an example, if a company is headquartered in the United States, but operates in both Asia and Europe there is a very high possibility that more than one language is used in the company’s internal communication.

Multinational companies often select working languages that must be used for internal communications and department managers are sometimes required to have a certain level of proficiency in the company’s designated working languages, which usually includes English.

Large organizations like the United Nations also have official languages. In this case, documents are not published until a translation has been prepared in each official language.

So, what happens when an email with a client’s product specifications and sales information is sent to a group of employees who speak different languages? Some of those readers may have limited knowledge of the language being used, and only be able to understand the communication, but are not familiar enough with the language to write a coherent response. This can result in them responding in their native language. Suddenly, a single conversation thread contains more than one language, with a greater potential for miscommunication.

Why use Machine Translation?

Multinationals with global operations often have issues with the quantity and flow of internal information between departments operating in different languages. If the corporate headquarters uses a different language than its global subsidiaries, corporate documents need to be translated into each language as the internal information moves down the organizational hierarchy.

Machine Translation is a solution that can provide an instant, understandable ‘gist’ of internal information across a company operating in different languages and the use of MT can serve two purposes:

  • Documents that require a professional human translation are easily identified
  • Internal documents can be translated instantly so employees can get an understanding of the content

In order to understand internal content, employees often might use an open source MT solution such as Google Translate. While this is useful, it does not take into consideration any proprietary jargon or writing styles specific to the organization, and it also raises the question of confidentiality.

Challenges of MT

Many organizations may be interested in taking steps to deploy their own MT systems rather than outsourcing translation jobs or asking bilinguals in the company to do ad hoc translations. Those considering MT have two options; develop their own in house system or use a cloud-based subscription model.

Implementing any new process has challenges and MT is no exception. Some challenges traditionally associated with implementing MT systems are:

  • High costs
  • Complex technology
  • Long deployment times

How should an MT system be integrated?

Before going ahead with an MT solution, an organization needs to carefully consider what it hopes to achieve from implementing Machine Translation. The company should evaluate all the perceived benefits thoroughly, including managing any and all expectations about using Machine Translation.

Organizations thinking of implementing MT should ask:

  • What is its purpose? – Will MT be used as a management tool to improve internal communication and productivity, or to make decisions on what documents require professional outside translation? The purpose should be clearly defined at the outset.
  • Do we have enough language assets to build high quality engines? Bilingual language assets are a key ingredient for building MT engines. The quality of the training data will have a direct impact on the MT engines output “garbage in, garbage out”.
  • Should we invest in building our own system or buy a cloud-based subscription service? MT systems can be rule-based (RBMT), statistical (SMT) and hybrid. In house development of a propriety MT system requires a heavy technology, HR and training investment, unless those assets are readily available. Cloud-based subscription models do not require such a heavy initial investment and are often more cost effective than developing and managing an in house MT system.
  • Is the Machine Translation option scalable? How many language combinations will be needed? If each language pair requires its own unique engine, how simple is it to build additional engines with new language combinations? Scalability will be determined by translating capacity and the ability to add new language combinations, this would be especially important when entering different language markets or expanding the business to new regions. The MT solution should align itself with the company’s long term goals.
  • How will MT be integrated into everyday workflows?  Users need to be able to easily access translation functions through their existing applications like email or the company intranet system to make it accessible and viable.
  • What indirect costs and planning will be involved? RBMT and hybrid systems require qualified linguists or language experts to develop and manage the engines. SMT systems use algorithms to identify probable translations based on the frequency, therefore, storage capacity is essential for the large volumes of training data required. Cloud options eliminate the need for in house technology investment, but extra costs might be incurred for going over the subscription plans, similar to the minutes allowance with mobile phone usage.

In carefully answering these questions, any organization planning to implement MT can stay focused on using the most cost-effective solution and achieve productivity gains with less miscommunication and more time savings.

The next part of this blog will look at how LSPs can leverage Machine Translation technology for productivity gains and competitive advantage.

Translation Technology Conferences and Events for 2014

KantanMT events2014 has arrived – and there is no better way to get the ball rolling than by planning what events to attend. Over the next twelve months there is a vast selection of conferences, unconferences, workshops, roundtables, webinars and other events planned around the world.

It was hard to narrow the list of everything going on, so KantanMT tried to focus on events that were related to Machine Translation and the Natural Language Processing (NLP) industry, localization, translation technologies and post-editing. Some of the events are more academic, while others are more business orientated.

Unconferences and Conferences…

We added some ‘unconferences’ to the list, these are the opposite of conferences. Unconferences are peer-to-peer interactions on topics chosen by participants at the beginning of a session, unlike more formal conferences. Unconference participants choose the topics, so it is much easier to promote an open discussion and are a good way for industry professionals to get together in an informal setting, sharing their own challenges and solutions.

Localization World, one of the biggest industry conferences, has had a great response from holding unconferences alongside its traditional conferences and the Association of Language Companies (ALC) also endorses the value of unconferences. The next ALC unconference will held in the early part of February.

Hopefully, this list will be a useful resource in deciding what events and conferences to visit during 2014. You may have registered for some of these events already, if not, then now is the time to start filling in your calendar. If you know of a relevant conference or event we missed, please add it to the comment section at the bottom of this post.

2014 Listings

January

Jan 8, 2014 (17:00-18:00 CET)

Webinar: TAUS Translation Technology Showcase – XTRF and Kilgray’s memoQ

Tomasz Mróz, XTRF Operations Director will present usage scenarios on integrating XTRF technology into the translation workflows, TM integration and faster project turnaround times. István Lengyel, CEO of Kilgray will also be presenting on memoQ, a cloud-based translation technology platform for translation management.


Jan 9, 2014

Webinar:  TAUS Dynamic Quality Framework Users Call

The users call is a bi-monthly webinar where TAUS members discuss solutions for measuring Machine Translation quality. Some of the participants include; Autodesk, CA Technologies, Cisco, Dell, Digital Linguistics, eBay, EMC and Google. To register for the webinar, members can email memberservices@taus.net


Jan 15, 2014

Webinar: The Convergence Era: Translation as A Utility (The Content Wrangler, TAUS)

This webinar, hosted by BrightTalk is a discussion by Jaap van der Meer (TAUS) and Scott Abel (The Content Wrangler) on how translation has become a necessary part of everyday life, the same way as electricity, water and the internet have become indispensable.


Jan 16, 2014

Meeting/Webinar: L20n: Next Generation Localization Framework for the Web, The International Multilingual Computing User Group (IMUG), San José, California USA

Zbigniew Braniecki, Software Engineer, Mozilla Corporation will speak about L20n, a new localization framework that isolates localization and enables translators to give naturally expressive translations for even the most complex user interfaces. Mozilla is investing in moving its products – Firefox, Firefox OS, and Firefox for Android – to this new architecture.


Jan 23, 2014

Unconference: Localization Unconference, Achievers Head office Toronto, Canada

This unconference is an all-day event starting at 09:30am and will cover internationalization and localization topics. It is organized by Jenny Reid, Localization Project Manager, BlackBerry; Oleksandr Pysaryuk, Localization Manager, Achievers; and Richard Sikes, Principal Consultant, Localization Flow Technologies.


Jan 30, 2014 (11:00 EST/17:00 CET)

Webinar: Integrating Your Content Platform, Globalization and Localization Association

Anders Holt, European Director and Robert Timms, Technical Director at translate plus will present a webinar on integrating content management platforms; CMS, DMS, PIM or e-procurement system into the translation workflow. They will discuss the integration methods available and how to get the best results and benefits of integration.


Jan 30-31, 2014

Conference: 2014 CRITT – WCRE Conference, Translation in transition: between cognition, computing and technology, Copenhagen Business School (CBS), Frederiksberg, Denmark

This academic conference presents research from the centre for research and innovation in translation and translation technology (CRITT). The program covers a variety of topics including; translation and cognitive processes, translation and translation theory and observations about Machine Translation and translation and post-editing.


February

Feb 5, 2014 (17:00-18:00 CET)

Webinar: TAUS Translation Technology Showcase – Ontram and Across Language Server v6

Christian Weih, Chief Sales Officer from Across Systems presents a TMS platform that integrates all aspects of the translation workflow.


Feb 6-8, 2014

Unconference: ALC Unconference, (Association of Language Companies), Palm Beach Gardens, Florida USA

The Unconference is geared towards language company owners and senior members of staff who get together without any formal presentation structure for more intimate brainstorming and discussion sessions in a casual and relaxed environment.


Feb 6, 2014 (11:00 EST/17:00 CET)

Webinar: Maximizing Translation Efficiency: Best QA Practices for Large Multi-channel Publishing Projects

Jose Sermeno, Product Evangelist at MadCap Software and Peter Argondizzo, Translation and Localization PM at MadTranslations discuss QA best practices that will make projects more efficient.


Feb 24-26, 2014

Conference: ‘Localization in a Shifting Global Economy’ Localization World, Bangkok Thailand

The first of three Localization World conferences of 2014, Localization World is the leading conference for international business, translation and localization providing opportunities for networking and information exchange.


Feb 26-28, 2014

Conference, workshops:  ICC (Intelligent Content Conference) 2014, San José, California USA

ICC focuses on the creation and management of content in different languages on any device. The topics that will include; content strategy, content marketing, content engineering, structured content, ebooks, mobile, apps, adaptive content, automated translation, terminology management, big data and analytics.


Feb 27, 2014 (11:00 EST/17:00 CET)

Webinar: GALA Translation Project Management with memoQ Server Training session

Daniel Zielinski will explain how the memoQ server can be used for managing translation projects effectively. See the different types of projects and workflows supported, and learn how to set up, prepare, monitor and complete a translation project with the memoQ server.


Feb 27 – Mar 1, 2014

Conference: memoQfest Americas, Kilgray Translation Technologies, Los Angeles, California USA

This three day event is hosted by Kilgray Translation Technologies and is aimed at freelance language professionals, LSPs and corporate translation users. The conference gives an overview of translation technology and how it can be integrated into businesses.


March

Mar 3-6, 2014

Conference: WritersUA, the conference for Software User Assistance, Palm Springs, California USA

This conference is for those involved in creating user assistance content. There will be a variety of presentations focused on developing content strategies, key technologies and tools that are used to create well-designed interfaces, technical communications and support information.


Mar 5, 2014 (17:00-18:00 CET)

Webinar: TAUS Translation Technology Showcase – Safaba and KantanMT

The theme of this webinar is the application and influence of MT technologies on global business. Tony O’Dowd, Founder and Chief Architect presents the KantanMT.com cloud-based platform introducing some of the KantanMT technologies and usage cases, including; KantanWatch, KantanISR, KantanAnalytics, TotalRecall, PEX and GENTRY.

Udi Hershkovich, Vice President of Business Development at Safaba will discuss key business imperatives for businesses and how Enterprise MT removes the language barriers that face global businesses.


Mar 13-14, 2014

Conference: International Conference on Translation and Accessibility in Video Games and Virtual Worlds at Universitat Autònoma de Barcelona, Spain

The conference is a meeting point for academics, professionals and students involved in the game localization industry. The conference aims to foster the interdisciplinary debate in these fields, combine them as academic areas of research and contribute to the development of best practices.


Mar 17-21, 2014

Conference: Game Localization Summit at GDC, IGDA Game Localization SIG, San Francisco, California USA

The game Localization Summit at GDC is supported and organized by the IGDA Game Localization SIG, and it is aimed at helping localization professionals as well as the entire community of game developers and publishers understand how to plan and execute game localization and culturalization as a part of the development cycle. There are other GDC conferences planned for Europe and China later in the year.


Mar 23-26, 2014

Conference: GALA 2014, Globalization and Localization Association (GALA), Istanbul, Turkey

The annual GALA conference brings together localization industry professionals for networking opportunities and peer-to-peer learning of the latest technologies and emerging trends in localization, language and translation technology.


Mar 28-29, 2014

Conference: The Translation and Localization Conference, Localize.pl, TexteM, KOMTE, Warsaw, Poland

This is an annual international event focusing on the latest technologies and localization industry trends. The conference is suited to LSPs and freelance translators, and covers technical communication and implications for the translation industry. Big data vs. the translation industry; CAT tools, MT, cloud computing, project management and the human factor; recruitment and training.


April

Apr 2, 2014 (17:00-18:00 CET)

Webinar: Translation Technology Showcase, TAUS – tauyou and Pangeanic

Diego Bartolome, CEO tauyou will discuss the ‘Big Data’ approach to SMT and the importance of clean data on output quality.


Apr 10-11, 2014

Event: TAUS Executive Forum, Oracle Japan, Tokyo, Japan

The executive forum consists of two-days of meetings for buyers and providers of language services and technologies. It is an open exchange about language business innovation and translation technology with the theme ‘translation as a utility’. Topics to be covered include; translation data, MT showcases, DQF evaluation, translation customer support and integration with CRM systems.


Apr 13-15, 2014

Conference: MadWorld 2014, MadCap Software, Inc., San Diego, California USA

Designed to cater for technical writers, documentation managers and content strategists. This is the top conference for technical communication and content strategy.


Apr 25, 2014

Conference: TCeurope Colloquium, Conseil des Rédacteurs Techniques, Aix-en-Provence, France

Conference themes include; looking at the essential core skills of a technical communicator, accessibility and usability, technical communication and social media, multi‐authoring and international teamwork and training technical authors in the internet age.


Apr 26-30, 2014

Conference: EACL-2014, European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden

Available to all ACL members and covers research in computational linguistics, psycholinguistics, speech, information retrieval, multimodal language processing and language issues in emerging domains such as bioinformatics and social media. Workshops and tutorials are held during Saturday-Sunday April 26-27th, while the main conference is runs from Monday-Wednesday April 28th-30th.


May

May 7, 2014 (17:00-18:00 CET)

Webinar: Translation Technology Showcase, TAUS – TaaS and Interverbum

TaaS and Interverbum present in this month’s Translation Technology Showcase by TAUS.


May 7-9, 2014

Conference: memoQfest International, Kilgray Translation Technologies, Budapest, Hungary

This conference aims to set up a forum where companies, LSPs and translators can discuss workflows and best practices that relate to memoQ or translation technology in general. Attendees will discuss industry trends attend workshops and exchange information with translators, LSPs, and translation end users.


May 7-8, 2014

Workshop: Making the Multilingual Web Work, MultilingualWeb, Madrid, Spain

The workshop is supported by the LIDER project and aims to survey and share information about best practices and standards for promoting multilingualism on the web.


May 8-9, 2014

Conference: Intelligent Content – Life Sciences and Healthcare, the Rockley Group, the Content Wrangler, San Francisco, California USA

The event will showcase examples, standards, methods, strategies and tools needed to help pharmaceutical companies, medical device manufacturers, and healthcare firms deliver the right information, in the right language, on any device. Conference topics include; mhealth, ehealth, digital health, personalized healthcare content and advanced translation technologies.


May 17-18, 2014

Conference: UTIC 2014, Ukrainian Translation Industry Conference, Kiev, Ukraine

Translators, managers, educators and software developers get together for networking opportunities and to discuss future industry trends.


May 18-21, 2014

Conference: Technical Communication Summit 2014, Society for Technical Communication, Phoenix, Arizona USA

The Technical Communication Summit is a source of learning for professional technical communicators giving training on the latest communication techniques, publishing technologies and business trends in the industry.


May 18-21, 2014

Conference: ALC 2014 Annual Conference, Association of Language Companies, Palm Springs, California USA

This conference is a networking event for anyone doing business with LSPs, combining educational content and networking.


May 23, 2014

Roundtable: TAUS Translation Automation Roundtable, TAUS, Moscow, Russia

Hosted by ABBYY Language Services, is a meeting for buyers and providers of translation services. The participants will get a good insight into MT technology, customization, implementation requirements and business cases.


May 26-31, 2014

Conference: LREC 2014, the European Language Resource Association, Reykjavík, Iceland

LREC is focused on Language Resources (LRs) and Evaluation for Language Technologies (LT). The aim of LREC is to give an overview of LR and LTs, emerging trends and the exchange of information.


June

June 2-3, 2014

Event: TAUS Industry Leaders Forum 2014, Clontarf Castle Hotel, Dublin

The theme for this meeting is ‘convergence’ with industry leaders discussing best practices, possible common approaches and shared services to optimize translation efficiencies through a series of short presentations.


Jun 3-4, 2014

Workshop: Localization Project Management Certification – The Localization Institute, Clarion Hotel, Dublin, Ireland

As part of the LPM Certification Program, this two-day project management training workshop will be held alongside Localization world. There is an eight week self-study part that must be completed before the workshop. It is open to Localization Project Managers with at least three years project management experience. Early bird and group registration discounts are available.


Jun 4-6, 2014

Conference: Localization World Dublin, Localization World Ltd., Dublin, Ireland

The second localization conference of 2014 will be held in Dublin with the theme of “disruptive innovation” and how this impacts the localization industry and the role of translators. Topics covered at the conference will include; advanced localization management, global business, localization core competencies and technology.


Jun 5-6, 2014

Conference: UA Europe 2013, UA Europe, Kraków, Poland

In association with Writers UA, the UA Europe technical communication conference focuses on software user assistance and online Help, and provides information on the latest industry trends, technical developments, and best practice in software UA.


Jun 16-18, 2014

Conference: EAMT 2014, European Association for Machine Translation, Dubrovnik, Croatia – 17th Annual Conference of the European Association for Machine Translation

The conference is aimed at anyone interested in MT and translation-related tools and resources. Topics will include; MT in multilingual public service (eGovernment etc.), MT for the web, MT embedded in other services, MT evaluation techniques and evaluation results and more.


August

Aug 23-29, 2014

Conference: COLING 2014, International Committee for Computational Linguistics, Dublin, Ireland

The bi-annual COLING conference, is one of the premier Natural Language Processing conferences in the world. The conference will include full papers, oral presentations, poster presentations, demonstrations, tutorials, and workshops on a variety of technical areas on natural language and computation.


September

Sep 25-26, 2014

Workshop: IATIS Regional Workshop, Translator and Interpreter Training, Serbia

This conference is aimed at promoting translator training, and will address training in areas such as field/domain specialization, technical skills (including pre-/post-editing of MT), revision skills and management skills (soft skills).


October

Oct 4-5, 2014

Conference: MedTranslate 2014, GxP Language Services, Freiburg im Breisgau, Germany


Oct 6-7, 2014

Workshop: Localization Project Management Certification, the Localization Institute, Seattle, Washington USA

As part of the LPM Certification Program, this two-day project management training workshop will be held alongside Localization world.


Oct 19, 2014

Unconference: Localization World Unconference, Seattle

The agenda will be set in the first session and then there will be 3-4 break-out sessions with topics the group chose together. Attendees can submit topics to be considered from Wednesday, October 17th and can be submitted at VistaTEC’s booth.


Oct 27-28, 2014

Conference: TAUS User Conference, TAUS, Vancouver, Canada

The TAUS Annual Conference 2014 will be co-located with the Localization World Conference taking place in the Convention Centre, Vancouver, BC, Canada.


Oct 29-31, 2014

Conference: Localization World Vancouver, Localization World Ltd., Vancouver, Canada

Localization World provides an opportunity for the exchange of information in the language and translation services and technologies market.


November

Nov 3-5, 2014

Conference: 38th Internationalization & Unicode Conference (IUC38), Object Management Group, Santa Clara, California USA

The conference is for internationalization experts, tools vendors, software implementers, and business and program managers who want to discuss the best methods for doing business in international markets. The conference will feature subject areas; cloud computing, upgrading to HTML5, integrating with social networking software, and implementing mobile apps.


Nov 5-8, 2014

Conference: 55th ATA Conference, American Translators Association, Sheraton Hotel Chicago, Illinois USA

A networking event for translators, project managers and industry professionals. The aim of the conference is to promote the professional development of translators and interpreters.


Nov 11-13, 2014

Conference:  tcworld – tekom, Stuttgart, Germany

The technical communication conference and trade fair examines different aspects of localization, internationalization and globalization. It is the largest technical communication, authoring and IT management conference in the world and participating companies offer industrial, software and services for technical communication.


December

Dec 8-12 2014

Conference: IEEE GLOBECOM, Austin Texas USA

The conference is the second largest of the 38 IEEE communications societies will focus on the latest advancements in broadband, wireless, multimedia, internet, image and voice communications.


Dec 15-18 2014

Conference: IEEE CloudCom 2014, Nanyang Avenue, Singapore

CloudCom promotes cloud computing platforms. It is co-sponsored by the Institute of Electrical and Electronics Engineers (IEEE) and the Cloud Computing Association. The conference attracts researchers, developers, users, students and practitioners from the fields of big data, systems architecture, services research, virtualization, security and privacy and high performance computing.

KantanMT will look forward to meeting you at some of these conferences over the next year.

KantanMT – 2013 Year in Review

KantanMT 2013 year in ReviewKantanMT had an exciting year as it transitioned from a publicly funded business idea into a commercial enterprise that was officially launched in June 2013. The KantanMT team are delighted to have surpassed expectations, by developing and refining cutting edge technologies that make Machine Translation easier to understand and use.

Here are some of the highlights for 2013, as KantanMT looks back on an exceptional year.

Strong Customer Focus…

The year started on a high note, with the opening of a second office in Galway, Ireland, and KantanMT kept the forward momentum going as the year progressed. The Galway office is focused on customer service, product education and Customer Relationship Management (CRM), and is home to Aidan Collins, User Engagement Manager, Kevin McCoy, Customer Relationship Manager and MT Success Coach, and Gina Lawlor, Customer Relationship co-ordinator.

KantanMT officially launched the KantanMT Statistical Machine Translation (SMT) platform as a commercial entity in June 2013. The platform was tested pre-launch by both industry and academic professionals, and was presented at the European OPTIMALE (Optimizing Professional Translator Training in a Multilingual Europe) workshop in Brussels. OPTIMALE is an academic network of 70 partners from 32 European countries, and the organization aims to promote professional translator training as the translation industry merges with the internet and translation automation.

The KantanMT Community…

The KantanMT member’s community now includes top tier Language Service Providers (LSPs), multinationals and smaller organizations. In 2013, the community has grown from 400 members in January to 3400 registered members in December, and in response to this growth, KantanMT introduced two partner programs, with the objective of improving the Machine Translation ecosystem.

The Developer Partner Program, which supports organizations interested in developing integrated technology solutions, and the Preferred Supplier of MT Program, dedicated to strengthening the use of MT technology in the global translation supply chain. KantanMT’s Preferred Suppliers of MT are:

KantanMT’s Progress…

To date, the most popular target languages on the KantanMT platform are; French, Spanish and Brazilian-Portuguese. Members have uploaded more than 67 billion training words and built approx. 7,000 customized KantanMT engines that translated more than 500 million words.

As usage of the platform increased, KantanMT focused on developing new technologies to improve the translation process, including a mobile application for iOS and Android that allows users to get access to their KantanMT engines on the go.

KantanMT’s Core Technologies from 2013…

KantanMT have been kept busy continuously developing and releasing new technologies to help clients build robust business models to integrate Machine Translation into existing workflows.

  • KantanAnalytics™ – segment level Quality Estimation (QE) analysis as a percentage ‘fuzzy match’ score on KantanMT translations, provides a straightforward method for costing and scheduling translation projects.
  • BuildAnalytics™ – QE feature designed to measure the suitability of the uploaded training data. The technology generates a segment level percentage score on a sample of the uploaded training data.
  • KantanWatch™ – makes monitoring the performance of KantanMT engines more transparent.
  • TotalRecall™ – combines TM and MT technology, TM matches with a ‘fuzzy match’ score of less than 85% are automatically put through the customized MT engine, giving the users the benefits of both technologies.
  • KantanISR™ Instant Segment Retraining technology that allows members near instantaneous correction and retraining of their KantanMT engines.
  • PEX Rule Editor – an advanced pattern matching technology that allows members to correct repetitive errors, making a smoother post-editing process by reducing post-editing effort, cost and times.
  • Kantan API – critical for the development of software connectors and smooth integration of KantanMT into existing translation workflows. The success of the MemoQ connector, led to the development of subsequent connectors for MemSource and XTM.

KantanMT sourced and cleaned a range of bi-directional domain specific stock engines that consist of approx. six million words across legal, medical and financial domains and made them available to its members. KantanMT also developed support for Traditional and Simplified Chinese, Japanese, Thai and Croatian Languages during 2013.

Recognition as Business Innovators…

KantanMT received awards for business innovation and entrepreneurship throughout the year. Founder and Chief Architect, Tony O’Dowd was presented with the ICT Commercialization award in September.

In October, KantanMT was shortlisted for the PITCH start-up competition and participated in the ALPHA Program for start-ups at Dublin’s Web Summit, the largest tech conference in Europe. Earlier in the year KantanMT was also shortlisted for the Vodafone Start-up of the Year awards.

KantanMT were silver sponsors at the annual 2013 ASLIB Conference ‘Adopting the theme Translating and the Computer’ that took place in London, in November, and in October, Tony O’Dowd, presented at the TAUS Machine Translation Showcase at Localization World in Silicon Valley.

KantanMT have recently published a white paper introducing its cornerstone Quality Estimation technology, KantanAnalytics, and how this technology provides solutions to the biggest industry challenges facing widespread adoption of Machine Translation.

KantanAnalytics WhitePaper December 2013

For more information on how to introduce Machine Translation into your translation workflow contact Niamh Lacy (niamhl@kantanmt.com).