The innovative Machine Translation features released by KantanMT, along with our contribution towards improving automated translation workflow has earned us the reputation for being thought leaders in the industry. A few months back, we released a white paper on what global companies can expect to see in 2016 for Machine Translation (MT). Continue reading
We had so many questions during the Q&A in our last webinar session ‘How to Improve Translation Productivity‘ by the KantanMT Professional services team, that we decided to split the answers into two blog posts. So, if you don’t find your questions answered here, check out our blog next week for the remaining answers.
Internet today is experiencing what is generally referred to as a ‘content explosion!’ In this fast-paced world, businesses have to strive harder and do more to stay ahead of the game – especially if they are a global business or if they have globalization aspirations. One fool-proof way in which a business can successfully go global is through effective localization. Yet, the huge amount of content available online makes human translation for everything almost impossible. The only viable option then in today’s competitive online environment is through the use of Machine Translation (MT).
On Wednesday 21st October, Tony O’Dowd, Chief Architect of KantanMT.com and Louise Faherty, Technical Project Manager at KantanMT presented a webinar where they showed how Language Service Providers (LSPs) (as well as enterprises) can improve the translation productivity of the team, manage post-editing effort and easily schedule projects with powerful MT engines. Here is a link to the recording of the webinar on YouTube along with a transcript of the Q&A session.
The answers below are not recorded verbatim and minor edits have been made to make the text more readable.
Question: Do you have clients doing Japanese to English MT? What are the results, and how did you get them? (i.e., do you pre-process the Japanese?)
Answer (Tony O’Dowd): English to Japanese Machine Translation (MT) has indeed always posed a challenge in the MT industry. So is it possible to build a high quality, high fidelity MT system for this language combination? Well, there have been quite a few developments recently to improve the prospect of building effective engines in this language combination. For example, one of the latest changes we made on the KantanMT platform for improving the quality of MT is by using new and improved reordering models to make the translation from English to Japanese and Japanese to English much smoother, so we deliver a higher quality output. In addition to that, higher quality training data sets are now available for this language pair, compared to a couple of years ago, when I had started building English to Japanese engines. Back then it was really challenging. It is still requires some effort to build English to Japanese MT engines, but the fact that there’s more content available in these languages makes it slightly easier for us to build high-quality engines.
We are also developing example-based MT for these engines and it so far this is showing encouraging signs of improving quality for this language pair. However, we have not started deploying this development on the platform yet.
KantanMT note: For more insights into how you can prepare high-quality training data, read these tips shared by Tony O’Dowd, and Selçuk Özcan, co-founder of Transistent Language Automation Services during the webinar ‘Tips for Preparing Training Data for High Quality MT.’
Question: Have you got a webinar recorded or scheduled, where we could see how the system works hands-on?
Answer (Tony O’Dowd): If you go on to the KantanMT website, we have video links on the product features pages. So you can actually watch an explanation video while you are looking at the component.
We work in a very visual environment, and we think videos are a great way of explaining how the platform works. And, if you go on to the website, on the bottom left corner of the page, you will find our YouTube channel, which contains videos on all sorts of topics, including how to build your first engine, how to translate your first document and how to improve the output of your engines.
If you click on the Resources menu on our site, you can access a number of tutorials that will talk you through the basics of Statistical Machine Translation Systems. In other words, explore the website and you should find what you need.
KantanMT note: Some other useful links for resources are listed below:
- The KantanMT blog is full of helpful tips, tricks, information and guides on using MT effectively
- You can access KantanMT company slides on our SlideShare page
- Read our client success stories, KantanMT Case Studies
- Find answers in our FAQs
- See specs of our products on our product sheets section
- Read our whitepapers and view past webinars KantanMT webinars
- Check out our help section for help on Getting Started, File Parsing, Post-Editing and Preprocessors
Question: Do you provide any Post-Editing recommendations or standards for standardising the PE process? You said translation productivity rose to 8k words per day – this is only PE, correct?
Answer (Tony O’Dowd): I will take the second question first! The 8,000 words per day is the Post-Editing (PE) rate, yes. It is not the raw translation rate. In Machine Translation, everything comes out pretranslated. So this number refers to the Post-Editing effort – like insertions, deletions, substitution of words, and so on that you need to do to get the content to publishable quality.
Louise Faherty: What we recommend to our clients is that when it comes to PE, they should try to use MT. A lot of translators who are new to using MT will try and translate manually, which is a natural tendency, of course. But what we advise our clients is to copy and paste the translation (MT) in the engine and use the MT. The more you use MT and the more you Post-Edit, the better your engine will become.
Tony O’Dowd: I will add something to Louise Faherty ’s comments there. The best example of PE recommendations that I have come across is provided by a group called TAUS. They are at the pivot of educating the industry on how to develop a proficiency in PE.
Question: What do ‘PPX’ and ‘PEX’ stand for (as abbreviations)?
Answer (Louise Faherty and Tony O’Dowd): PEX stands for Post-Editing Automation. PEX allows you to take the output of an MT engine and dynamically alter that. When would you need to use PEX? Suppose there is a situation where your engine is repeating the same error over and over again. What you can do in such cases is write a PEX file (developed in the GENTRY programming language). This allows the engine to look for patterns in the output of the engine and to dynamically change that in the output.
For example, one of our French clients did not want to have a space preceding a colon mark in the output of their MT (because this was one of their typographical standards and repeated throughout the content). So we wrote a PEX rule that forced a stylistic change in the output of the engine. This enabled the client to reduce the number of Post-Edits substantially.
PPX stands for Preprocessor automation. You can use PPX files for to normalise or improve the training data. It is based on our GENTRY programming language which is available to all our clients for free.
In short then, PPX is for your training data, while PEX is for the actual raw output of your engine.
For more questions and answers, stay tuned for the next part of this post!
If you are in the language service industry, you are undoubtedly on the lookout for ways in which you can improve the productivity of your team – more translated words in less time – that’s what drives your clients as well as you. Automated Machine Translation (MT) seems to be the logical step forward in today’s world of content explosion and tightening deadlines. However, for most Language Service Providers (LSPs), the challenge lies in the actual implementation of this sophisticated technology.
For this reason, it is important that no matter what translation management tools you use, it should be integrated with a powerful MT engine that is reliable, scalable, flexible, and can be trained and re-trained constantly for maximum efficiency and quick turnaround times.
In today’s fast-paced world of content explosion on the Internet, the need for translating this organically growing content with the help of machines has become inevitable. While post-editing the machine translated content will always be required, choosing the right MT features will ensure that translators do not spend countless frustrating hours on those edits.
In this Kantanwebinar, The KantanMT Professional Services Team, Tony O’Dowd and Louise Faherty (Quinn) will show how you can improve the translation productivity of your team, and manage effort estimations and project deadlines better with a powerful MT engine.
During this webinar you will learn:
- Translation challenges (co-ordinating and managing translation projects)
- About the necessity of Machine Translation to be competitive
- How KantanMT.com can be integrated with other Translation Management Systems
Did you know there are about 7,000 languages in the world and most of them are spoken in Asia and Africa? Did you also know that that one language dies about every 14 days? This was news to me before I started working at KantanMT.
Now that I have your attention, I’ll introduce myself. My name is Faith Isichei and I am a student going into my third year of a four-year B.Sc honours degree in Enterprise Computing at Dublin City University.
During my second year of college, I knew I really wanted to get an internship in an organisation that would nurture and bolster my ambition of gaining knowledge about the different areas of computing.
With hopes of doing my Masters straight after my undergraduate degree, I wanted to know more about the diverse IT sectors available. I sought to find an Internship!
Soon, I started spending countless hours researching all sorts of IT companies; Ireland is an IT hub so the options were endless. I knew what areas of IT interested me – Cloud Computing, Data Analytics, Information Systems and IT Support and lastly, Machine Translation.
However, after endless emails and countless rejections based on my age and lack of experience, I was ready to give up. Also, with summer exams around the corner and the never-ending continuous assessments, my hope of having a summer internship was starting to become nothing but a mere fantasy.
KantanMT – a miracle that ticked IT boxes
Nonetheless, when I was ready to give up I received what I deemed to be a miracle – an email from Tony O’Dowd, the Founder and Chief Architect of KantanMT.com. Not only was he offering me the opportunity to go for an interview, but also his business ticked several of the boxes of IT sectors where my interests lay. KantanMT offered me the chance of working in a small but rapidly growing company, where I felt I could make a big difference as an intern. For the following couple of days I lived and breathed everything that was Machine Translation.
Success! That email and interview brought about the birth of my summer internship at KantanMT where my role focused on three key areas, Quality Assessment (Q.A), Site Reliability Engineering (SRE) and Customer Support. I couldn’t have been any happier. To give you some background on what the company does, KantanMT.com is a cloud-based implementation of Moses Statistical Machine Translation (SMT) technology. The platform leverages the power and flexibility of the cloud, effortlessly scaling to generate a high quality, low-cost Machine Translation solution.
I really don’t know where to start. Do I talk about the lovely staff or the feeling of being part of a team that helps and cares for one another like a family? Or, the fact that the job came with a considerate and thoughtful boss who motivates and carries everyone along effortlessly? Or, the serenity of the flexible working environment?
My Internship at Xcelerator Machine Translations Ltd. a.k.a KantanMT came with various benefits. Benefits in the form of skills and life experience that I will take back to the classroom/computer lab.
Time management & communication skills
Two very important skills for working in a team and with clients. When clients need technical support at KantanMT, they need the problem fixed, immediately! Communication with my team was vital for ensuring I could get clients’ problems fixed quickly. Needless to say, this experience did wonders for my time management and communication skills.
Industry and subject matter expertise
The internship opened my eyes and mind to the subject of cloud computing and Statistical Machine Translation. I’m delighted to say that over the last three months I’ve become more preserving, versatile, efficient and competent in these areas.
Celebrating major milestones
As each day went by the company grew and changed, so too did my skills, my knowledge and my circle of friends. In a short few months, I saw new team members come on board and major company milestones surpassed.
In one month, the KantanMT community translated more than half a billion words and nothing can compare to the excitement and happiness that comes from knowing I was a part of that achievement.
Now I can look back at my summer and say, I really made the most of it.
Whatever you are looking for, be it a workplace where you can put your existing skills and knowledge into action, a place to learn new skills to improve yourself in any way possible. Or, somewhere there is a “we are all in this together” or “there’s no I in team” sort of spirit, you will find these things readily available for you at KantanMT. Today is my last day and I think I can honestly say my internship at KantanMT has set the bar high for what I want in my future professional career.
Faith Isichei is studying for a BSc degree in Enterprise Computing at Dublin City University. Her internship at KantanMT focuses on Quality Assessment (Q.A), Site Reliability Engineering (SRE) and Customer Support.
Faith selected KantanMT for her summer internship, so she could increase her knowledge and understanding of ‘Machine Translation’ while gaining experience in a new, but high-growth, start-up company. Faith’s background in Enterprise Computing will help in her technical support role for KantanMT.com. You can contact Faith at FaithIsichei@hotmail.com or at FaithIsichei1995@gmail.com .
The KantanPEX Rule Editor enables members of KantanMT reduce the amount of manual post-editing required for a particular translation by creating, testing and deploying post-editing automation rules on their Machine Translation engines (client profiles).
The editor allows users to evaluate the output of a PEX (Post-Editing Automation) rule on a sample of translated content without needing to upload it to a client profile and run translation jobs. Users can enter up to three pairs of search and replace rules, which will be run in descending order on your content.
How to use the KantanMT PEX Rule Editor
Login into your KantanMT account using your email and your password.
You will be directed to the ‘Client Profiles’ tab in the ‘My Client Profiles’ page. The last profile you were working on will be ‘Active’ and marked in bold.
To use the ‘PEX-Rule Editor’ with a profile other than the ‘Active’ profile, click on the new profile name to select that profile for use with the ‘Kantan PEX-Rule editor’.
Then click the ‘KantanMT’ tab and select ‘PEX Editor’ from the drop-down menu.
You will be directed to the ‘PEX Editor’ page.
Type the content you wish to test on, in the ‘Test Content’ box.
Type the content you wish to search for in the ‘PEX Search Rules’ box.
Type what you want the replacement to be in the ‘PEX Replacement Rules’ box and click on the ‘Test PEX Rules’ button to test the PEX-Rules.
The results of your PEX-Rules will now appear in the ‘Output’ box.
Give the rules you have created a name by typing in the ‘Rule Name’ box.
Select the profile you wish to apply this rule(s) to and then click on the ‘Upload Rule’ button.
KantanMT PEX editor helps reduce the amount of manual post-editing required for a particular translation, hence, reducing project turn-around times and costs. For additional information on PEX-RULES and the Kantan PEX-Rule editor please click on the links below. For more details about KantanMT localization products and ways of improving work productivity and efficiency please contact us at firstname.lastname@example.org.
What is the KantanAPI?
KantanAPI enables KantanMT clients to interact with KantanMT as an on-demand web service. It also provides a number of different services including translation, file upload and retrieval and job launches.
With the KantanAPI you not only have the opportunity to integrate KantanMT into your workflow systems but also the ability to receive on-demand translations from your KantanMT engines. All these services make the experience with Machine Translation as seamless as possible.
To access the KantanMT API you will first need your ‘API token’. This token can be found in the ‘API’ tab on the ‘My Client Profiles’ page of your KantanMT account.
Once you have your token you can use the API in a number of ways
- Using the API tab on the ‘My Client Profiles’ page in the KantanMT Web interface
- Using the REST interface via HTTP GET or POST requests
- Using one of our various connectors, which are built using our KantanAPI
For more details on implementing your API solution via the REST interface, please see the full API technical documentation at the following link:
How to use KantanAPI?
Login into your KantanMT account using your email and your password.
You will be directed to the ‘My Client Profiles’ page. You will be in the ‘Client Profiles’ section of the ‘My Client Profiles’ page. The last profile you were working on will be ‘Active’.
If you wish to use the ‘KantanAPI’ with another profile other than the ‘Active’ profile. Click on the profile you wish to use the ‘KantanAPI’ with, then click on the ‘API’ tab.
You will be directed to the ‘API Settings’ page. Now click on the ‘Launch API’ button.
A ‘Launch API’ pop-up will now appear on your screen asking you ‘Are you sure you want to launch the API?’ Click ‘OK’.
The ‘API Status’ will now change from ‘offline’ to ‘initialising’, the ‘Launch API’ button will now change to ‘Launching API’ .
When your KantanAPI launches the ‘API Status’ will now change from ‘initialising’ to ‘running’, the ‘Launching API’ button changes to ‘Shutdown API’ and you should now be able to click on the ‘Translate’ button.
Type the text you wish to translate in the text box and click on the ‘Translate’ button.
The translated text will now appear in the ‘Translated Text’ box. If you wish to make any changes to the translated text simply place the cursor inside the ‘Translated Text’ box and make the changes. Save these changes by clicking the ‘Retrain Engine’ button.
Test if your engine was successfully retrained by clicking the ‘Translate’ button. The retrained text will now appear in the ‘Translated Text’ box.
If you don’t wish to retrain your engine and you are happy with the translated text in the ‘Translated Text’ box. You may continue translating other text or shut down your KantanAPI by clicking the ‘Shutdown API’ button.
When you click the ‘Shutdown API’ button a pop-up will now appear asking you ‘Are you sure you want to shout down the API?’ Click ‘OK’.
The ‘Shutdown API’ button will now change to ‘Terminating API’, the ‘API status’ will now change from ‘running’ to ‘terminating’ and you shouldn’t be able to click on the ‘Translate’ or ‘Retrain Engine’ button.
You will now be directed back to the initial screen on the API Settings page.
KantanAPI™ is one of the various machine translation services offered by KantanMT to improve productivity for our clients and also enable them to be more efficient. For more information on KantanAPI or any KantanMT products please contact us at email@example.com.
For more details on the KantanMT API please see the following links and the video below:
What is Gap Analysis and Kantan TimeLine ?
Gap Analysis identifies and reports any untranslated words in the training data set and allows you to take preventive measures quickly by fine tuning training data and filling data gaps.The KantanTimeLine™ provides a chronological history of activities for each engine and uses version control for precise management of released and production-ready engines.
Using Kantan TimeLine and Gap Analysis:
In KantanBuildAnalytics, click the Gap Analysis tab to see the amount of untranslated words that remain in the generated translations. You will be directed to the Gap Analysis page, where you will see a breakdown of any gaps in your training data.
A table appears with 3 headings: ‘#’, Unknown Word, Reference/Source, KantanMT Output. Under those headings you will find details of any untranslated words, their source and the KantanMT Output.
Click Download to download your Gap Analysis report.
Note: You can also click the Timeline tab to view your profiles’s Timeline, which is essentially a record of the changes you have made on your engine.
This is one of the many features provided in KantanBuildAnalytics, which aids Localization Project Managers in improving an engine’s quality after its initial training. To see other features used in KantanBuildAnalytics suite please see the links below.
- BLEU in BuildAnalytics
- F-Measure in Kantan-BuildAnalytics
- KantanMT Timeline
- TER in Kantan BuildAnalytics