Conference and Event Guide – December 2013

KantanMT eventsThings are winding down as we are getting closer to the end of the year, but there are still some great events and webinars coming up during the month of December that we can look forward to.

Here are some recommendations from KantanMT to keep you busy in the lead up to the festive season.

Listings

Dec 02 – Dec 05, 2013
Event: IEEE CloudCom 2013, Bristol, United Kingdom

Held in association with Hewlett-Packard Laboratories (HP Labs), the conference is open to researchers, developers, users, students and practitioners from the fields of big data, systems architecture, services research, virtualization, security and high performance computing.


Dec 04, 2013
Event: LANGUAGES & BUSINESS Forum – Hotel InterContinental Berlin

The forum highlights key issues in language education, particularly in the workplace and the new technologies that are becoming a key part of the process. The event, will promote international networking and has four main themes; Corporate Training, Pre-Experience Learners, Intercultural Communication and Online Learning.


Dec 05, 2013
Webinar: Effective Post-Editing in Human and Machine Translation Workflows

Stephen Doherty and Federico Gaspari, CNGL (Centre for Next Generation Localisation) will give an overview of post-editing and different post-editing scenarios from ‘gist’ to ‘full’ post-edits. They will also give advice on different post-editing strategies and how they differ for Machine Translation systems.


Dec 07 – Dec 09, 2013
Event: 6th Language and Technology Conference, Poznan, Poland

The conference will address the challenges of Human Language Technologies (HLT) in computer science and linguistics. The event covers a wide range of topics including; electronic language resources and tools, formalisation of natural languages, parsing and other forms of NL processing.


Dec 09 – Dec 13, 2013
Event: IEEE GLOBECOM 2013 – Power of Global Communications, Atlanta, Georgia USA

The conference, which is the second largest of the 38 IEEE technical societies will focus on the latest advancements in broadband, wireless, multimedia, internet, image and voice communications. Some of the topics presented referring to localization occur on the 10th December and include; Localization Schemes, Localization and Link Layer Issues, and Detection, Estimation and Localization.


Dec 10 – Dec 11, 2013
Event: Game QA & Localization 2013, San Francisco, California USA

This event brings together QA and Localisation Managers, Directors and VPs from game developers around the world to discuss key game localization industry challenges. The event in London, June 2013 was a huge success, as more than 120 senior QA and localization professionals from developers, publishers and 3rd party suppliers of all sizes and platforms came to learn, benchmark and network.


Dec 11 – Dec 15, 2013
Event: International Conference on Language and Translation, Thailand, Vietnam and Cambodia

The Association of Asian Translation Industry (AATI) is holding an International Conference on Language and Translation or “Translator Day” in three countries; Thailand on December 11, 2013, Vietnam on December 13, 2013, and Cambodia on December 15, 2013. The events provide translators, interpreters, translation agencies, foreign language centres, NGO’s, FDI financed enterprises and other translation purchasers with opportunities to meet.


Dec 12, 2013
Webinar: LSP Partnerships & Reseller Programs 16:00 GMT (11:00 EST/17:00 CET)

This webinar, which is hosted by GALA and presented by Terena Bell covers how to open up new revenue streams by introducing reseller programs to current business models. The webinar is aimed at world trade associations, language schools, and other non-translation companies wishing to offer their clients translation, interpreting, or localization services.


Dec 13 – Dec 14 2013
Event: The Twelfth Workshop on Treebanks and Linguistic Theories (TLT12), Sofia (Bulgaria)

The workshops, hosted by BulTreeBank Group­­­­­­­ serve to promote new and ongoing high-quality work related to syntactically-annotated corpora such as treebanks. Treebanks are important resources for Natural Language processing applications including Machine Translation and information extraction. The workshops will focus on different aspects of treebanking; descriptive, theoretical, formal and computational.


Are you planning to go to any events during December? KantanMT would like to hear about your thoughts on what makes a good event in the localization industry.

Crowdsourcing vs. Machine Translation

KantanMT CrowdsourcingCrowdsourcing is becoming more popular with both organizations and companies since the concept’s introduction in 2006, and has been adopted by companies who are using this new production model to improve their production capacity while keeping costs low. The web-based business model, uses an open call format to reach a wide network of people willing to volunteer their services for free or for a limited reward, for any activity including translation. The application of translation crowdsourcing models has opened the door for increased demand of multilingual content.

Jeff Howe, Wired magazine defined crowdsourcing as:

“…the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call”.

Crowdsourcing costs equate to approx. 20% of a professional translation. Language Service Providers (LSPs) like Gengo and Moravia have realised the potential of crowdsourcing as part of a viable production model, which they are combining with professional translators and Machine Translation.

The crowdsourcing model is an effective method for translating the surge in User Generate Content (UGC). Erratic fluctuations in demand need a dynamic, flexible and scalable model. Crowdsourcing is definitely a feasible production model for translation services, but it still faces some considerable challenges.

Crowdsourcing Challenges

  • No specialist knowledge – crowdsourcing is difficult for technical texts that require specialised knowledge. It often involves breaking down a text to be translated into smaller sections to be sent to each volunteer. A volunteer may not be qualified in the domain area of expertise and so they end up translating small sections text, out of context, with limited subject knowledge which leads to lower quality or mistranslations.
  • Quality – translation quality is difficult to manage, and is dependent on the type of translation. There have been some innovative suggestions for measuring quality, including evaluation metrics such as BLEU and Meteor, but these are costly and time consuming to implement and need a reference translation or ‘gold standard’ to benchmark against.
  • Security – crowd management can be a difficult task and the moderator must be able to vet participants and make sure that they follow the privacy rules associated with the platform. Sensitive information that requires translation should not be released to volunteers.
  • Emotional attachment – humans can become emotionally attached to their translations.
  • Terminology and writing style inconsistency – when the project is divided amongst a number of volunteers, the final version’s style needs to be edited and checked for inconsistencies.
  • Motivation – decisions on how to motivate volunteers and keep them motivated can be an ongoing challenge for moderators.

Improvements in the quality of Machine Translation have had an influence on crowdsourcing popularity and the majority of MT post-editing and proofreading tasks fit into crowdsourcing models nicely. Content can be classified into ‘find-fix-verify’ phases and distributed easily among volunteers.

There are some advantages to be gained when pairing MT technology and collaborative crowdsourcing.

Combined MT/Crowdsourcing

Machine Translation will have a pivotal role to play within new translation models, which focus on translating large volumes of data in cost-effective and powerful production models. Merging both Machine Translation and crowdsourcing tasks will create not only fit-for-purpose, but also high quality translations.

  • Quality – as the overall quality of Machine Translation output improves, it is easier for crowdsourcing volunteers with less experience to generate better quality translations. This will in turn increase the demand for crowdsourcing models to be used within LSPs and organizations. MT quality metrics will also make post-editing tasks more straightforward and easier to delegate among volunteers based on their experience.
  • Training data word alignment and engine evaluations can be done through crowd computing, and parallel corpora created by volunteers can be used to train and/or retrain existing SMT engines.
  • Security – customized Machine Translation engines are more secure when dealing with sensitive product or client information. General or publicly available information is more suited to crowdsourcing.
  • Terminology and writing style consistency – writing style and terminology can be controlled and updated through a straightforward process when using MT. This avoids the idiosyncrasies of volunteer writing styles. There is no risk of translator bias when using Machine Translation.
  • Speed – Statistical Machine Translation (SMT) engines can process translations quickly and efficiently. When there is a need for a high volume of content to be translated within a short period of time it is better to use Machine Translation. Output is guaranteed within a designated time and crowdsourcing post-editing tasks speeds up the production process before final checks are carried out by experienced translators or post-editors.
crowdsource and Machine Translation model
Use of crowdsourcing for software localization. Source: V. Muntes-Mulero and P. Paladini, CA Technologies and M. Solé and J. Manzoor, Universitat Politècnica de Catalunya.

Last chance for a FREE TRIAL for KantanAnalytics™ for all members until November 30th 2013. KantanAnalytics will be available on the Enterprise Plan.

tcworld special

The tcworld Conference & tekom Fair starts tomorrow, November 6th in Wiesbaden, Germany. Aidan Collins, KantanMT’s User Engagement Manager will be visiting the conference on Thursday and is looking forward to seeing you all there. To help keep you organised, KantanMT put together a list of professional and expert presentations and workshops relevant for localization professionals. Expert speakers will cover topics on content strategy and design, terminology management, translation, localization, and quality assurance.
Rhein-Main-Hallen convention centre Wiesbaden, Germany, KantanMT
Source: GCB German Convention Bureau e. V, 2011.

The fair opens at 9am on Wednesday and finishes at 4pm on Friday in the Rhein-Main-Hallen, Wiesbaden’s biggest convention centre. The centre has more than 20,000m² of conference space, seven exhibition halls and a number of conference and congress rooms. The largest congress hall has the capacity to seat 3,000 people. The exhibition’s size and central location, just half a kilometre from the city centre, make it an excellent option for hosting the technical communication event.The line up includes:

Wednesday November 6th 9:00 – 18:00

Content Strategy 08:45 – 09:30: ‘Strategic Video Storytelling’. The Content Wrangler’s Scott Abel, will give the opening keynote speech, a presentation on Content Strategies and the importance of using stories in video production.

International Management 08:45 – 10:30: ‘A Business Model Generation Session’. Diego Bartolome will host a design thinking workshop on technology and languages.

Content Strategy 11:15 – 12:00: ‘The Need for Speed: Preparing for New Requirements’. Content Strategy Consultant, Sarah O’Keefe, will present on the importance of developing content initiatives to improve technical communication workflows.

Language Technology 11:15 – 12:00: ‘Real-time Selection of Best Assets Based on Productivity Analysis’. Anton Voronov, Innovations Director for ABBYY Language Services will discuss the use of productivity metrics and translator preferences in developing a pricing structure and best practices for Machine Translation deployment.

Language Technology 14:45 – 15:30: ‘Developing “Ideal” Software for Language Industry’. Julia Makoushina and Eugenia Tashkun will co-present on developing the “ideal” Language technology. They will discuss the software possibilities from both the user and developer’s perspective, and how to identify and meet user needs.

Language Technology 16:15 – 17:00: ‘Extracting Translation Relations for Human-readable Dictionaries from Bilingual Text’. Kurt Eberle, Managing Director and Co-founder of Lingenio GmbH, talks about “cross-lingual expression” through the identification and extraction of dictionary entries from source and target texts.

Content Strategy 17:15 – 18:00: ‘The Convergence Era: Translation Becomes a Utility’. TAUS Founder and Director, Jaap van der Meer, will discuss the evolution of the translation industry and what this will mean for content creators.

Thursday November 7th 9:00 – 18:00

Localization 08:30 – 08:45: ‘Welcome Session’. Don De Palma, Founder Common Sense Advisory will give a welcome session in Room 12B.

Localization 08:45 – 09:30: ‘Is Your Content Ready to Go Global?’ Localization and Content Strategy Consultant, Kit Brown-Hoekstra will discuss how localized quality content can be leveraged as a competitive advantage.

Localization 09:45 – 10:30: ‘Rules of Engagement: Successful Partnerships with Translation/Localization Companies’. Aki Ito, Localization Professional, will co-present with Robin Franke, a Technical Product Communication Specialist, on the client-vendor partnership and the role each partner plays when working together.

Language Technology 11:15 – 12:00: ‘Welcome to the Cloud! Terminology as a Service’. Dr. Andrejs Vasiljevs will introduce a cloud based terminology platform and solutions for using multilingual terminological data. The talk will target both language workers and Machine Translation users.

Localization 12:15 – 13:00: ‘Machine Translation in the Mainstream: New Tools, New Gains, New Headaches’. Daniel Grasmick from Lucy Software and Services GmbH, will discuss how Machine Translation can be built into the localization lifecycle.

Language Technology 16:00 – 16:45: ‘Terminology in the cloud with MemoQ and TaaS’. CEO of Kilgray Translation Technologies, Istvan Lengyel and Kilgray’s Founder, Gabor Ugray will present on TaaS CAT tool solutions and their integration with memoQ.

Localization 17:15 – 18:00: ‘Closing Session: Summaries and Lessons Learned’. Don De Palma will cover the sessions highlights and future localization “trends” and “innovations”.

Friday November 8th 9:00 – 16:00

Localization 08:45 – 09:30: ‘Simplified English and MT: Best Practices for Localization Content Optimization and Simplification’. Alberto Ferreira, Avira Operations will discuss Machine Translation and automated post editing integration into the localization workflow.

Localization 09:45 – 10:30: ‘A Unified Model for Document and Translation Quality Assurance’. Dr. Aljoscha Burchardt and Dr. Arle Lommel will address Translation quality assurance (QA) with QTLaunchPad, an open source software project funded by the European commission.

Localization 14:30 – 15:15: ‘UX and Localization: Optimal Design Practices for World-Ready Applications’. Alberto Ferreira will talk about user interface (UI) and web design localization trends based on “platform-independent design principles”.  Ferreira will cover topics such as; usability testing, visual text layout, cultural adaptation, internationalisation concerns, cost reduction and the time-to-market development cycle.

Localization 15:30 – 16:15: ‘The “International Persona” – Usability and Localization Communication consultant, Henrietta Hartl will discuss the use of an “international persona” in localization usability evaluation.

It will be a busy couple of days with informative presentations and workshops from industry experts and market leaders. KantanMT hopes you all enjoy the conference!

Would you like to learn how Machine Translation can increase your business opportunities? Contact Kevin McCoy, KantanMT’s Machine Translation Success Coach: kevinmcc@kantanmt.com.

Conference and Event Guide – November 2013

KantanMT eventsThere are some great events and webinars coming up over the next month and KantanMT put together a list of some noteworthy dates to add to the calendar.

KantanMT’s Aidan Collins, User Engagement Manager, will be attending tcworld on Thursday 7th November in Wiesbaden, Germany. Then towards the end of the month, Aidan will head to London, and present at the 35th ASLIB Translating and the Computer Conference. KantanMT are also a silver sponsor for this year’s ASLIB conference.

Listings

Nov 04 – 05, 2013
Workshop:  Translation Project Management, Wiesbaden, Germany.
Angelika Zerfaß and Martin Beuster will be presenting a Translation Project Management (PM) and Localization PM workshop. This is geared towards current and future Project Managers in the localization and translation industry.


Nov 06 – 08, 2013
Event: tcworld 2013 – tekom trade fair, Rhein-Main-Hallen, Wiesbaden, Germany.
This is the largest global event for technical communication. Participating companies offer industrial, software and services for technical communication with a regional focus on Germany, Austria and Switzerland. The conference will cover topics on localization, internationalization, and globalization, management of technical communication, mobile documentation and content strategies.Contact: tekom, info@tekom.de

To set up a meeting with Aidan Collins, User Engagement Manager, email him directly at aidanc@kantanmt.com or call him on +353 86 823 1767.


Nov 06 – 09, 2013
Event: 54th ATA Conference, San Antonio, Texas USA.
This is a great networking event for translators, project managers and industry professionals. The aim of the conference is to promote the professional development of translators and interpreters. There will be approx. 175 educational sessions in varying languages, specializations and levels. Contact: American Translators Association, ata@atanet.org


Nov 11, 2013
Webinar: MemoQ – Getting Started guide, online.
An introductory webinar for translators who want to use MemoQ. Participants will learn how to create projects, translate using MemoQ Editor and Translation Memory management.


Nov 13, 2013
Webinar: Editing for Localization, online.
Katherine (Kit) Brown-Hoekstra is targeting Senior Technical Communicators and Content Managers with a webinar on editing for Localization.


­­­­­­­­­­­­­­­­­­­­­­­Nov 15 – 16, 2013 (Expolingua International Fair, Nov 15 – 17)
Event:: InDialog: Mapping the Field of Community Interpreting, Expolingua International Fair Berlin, Germany
This conference is focusing on interpreting services aimed towards government representatives, policy makers, service providers and anyone involved in the interpreting service workflow. InDialog is taking place in conjunction with 26th EXPOLINGUA International Fair for languages and Cultures. Contact: ICWE GmbH, info@indialog-conference.com


Nov 20, 2013
Webinar: The Convergence Era: Translation As A Utility

Jaap van der Meer (TAUS) talks with Scott Abel (The Content Wrangler) about the future of translation and the evolution of the translation industry. They will look at the opportunities and challenges for content publishers about the need for real-time translation.


Nov 22, 2013
Event: think! India, The Metropolitan Hotel & Spa, Delhi
think! India is a one day event with a regional focus on how to succeed in the expanding localization industry in India. The event is coordinated by GALA, the Globalization and Localization Association, and is part of a series of regional events, which bring language service providers (LSPs) together.


Nov 28 – 29, 2013
Event: 35th Translating and the Computer Conference, Paddington, London
This event covers technology and its influence on the localization and translation industry. It aims to bring translators, researchers and students in the translation and localization field together. It is also a great event for catching up on the latest computer aided translation (CAT) tools. KantanMT are sponsoring this event. Niamh Lacy and Aidan Collins will both be there to answer any questions about KantanMT’s technology.

To set up a meeting with Aidan or Niamh, email Niamhl@kantanmt.com or call her directly on +353 877526320


KantanMT @ Web Summit 2013

Dublin’s Web Summit kicks off today and KantanMT is very excited that Tony O’Dowd, Founder and Chief Architect is battling against 50 other businesses in the first round of the PITCH competition today at 1:35pm. Tony will be presenting at PITCH STAGE 3.

Tomorrow, KantanMT are exhibiting in ALPHA village. Stop by the KantanMT exhibit and say ‘Hi’ to Tony, Eric and Niamh. They will be able to tell you all about the platform, new technologies and show you some demos.

In the following video, Tony gives an introduction into what KantanMT is about and what technologies have been developed so far. It is a great overview of KantanMT’s technology and team.

If you’re attending the Web Summit either today or tomorrow and want to set up a meeting with Tony, Niamh or Eric, you can call Niamh on +353 877526320, or email Niamhl@kantanmt.com.

Enjoy Web Summit 2013!

 

Dublin Web Summit 2013

Dublin_Web_Summit_logoKantanMT has two days out at the Dublin Web Summit 2013

The Dublin Web summit, 2013 is just a week away and anyone and everyone in the technology world is gearing up for it. The summit runs for two days from 30th-31st October, in the Royal Dublin Society (RDS), and is Europe’s biggest tech conference.

Ireland, a prime location for this event, is the European headquarters for many leading tech companies, and 40% of the summit’s attendees are based in Ireland. The conference has been running successfully since 2010. The first event, which attracted 500 attendees, has grown to an expected 10,000 for this year’s event. This shows just how popular the conference has become.

KantanMT will be one of the 22 Irish companies exhibiting at the summit as part of the ALPHA program. KantanMT was selected, as one of the 600 start ups participating from all over the world.

The ALPHA program is designed specifically to give new start-ups a chance to get their name out, and build networks with potential investors and partners. More than 200 potential investors will be at the summit. A snapshot of the impressive investor line-up includes: Atomico, Accel Partners, Goldman Sachs, Andreessen Horowitz and Google ventures. The ALPHA program gives participating technology companies the opportunity for one on one meetings with these investors.

pitch logo
Another great event KantanMT is involved in, is the PITCH Competition. KantanMT’s Founder and Chief Architect, Tony O’Dowd is battling against 50 other start-ups in the BETA category at PITCH during the conference.

The PITCH competition, presented by Box offers the opportunity to present pitches in front of the attending media and investors. The competition runs over both days on three stages and the winning company will be awarded the title best high potential start-up of 2013 and a prize amounting to the value of €500,000. The judges are a mix of high profile entrepreneurs, venture capitalists (VC), Angel investors and tech journalists.

If you’re attending the web summit next week, contact Niamh to set up a meeting Niamhl@kantanmt.com.

Or stop by the KantanMT exhibit and meet Tony, Eric and Niamh to hear all about the platform, watch some demos and see for yourself how KantanMT is making waves in the global translation industry.

Cloud Security

Cloud security, data securityCloud migration – business communities are now embracing this concept, where utilizing cloud technologies benefits the “more for less” modern business approaches. It is all about paying for what you use without the big overheads and without the big carbon footprint.

CompTIA, a non-profit trade organization recently published their fourth annual study on cloud computing, using a sample of approx. 900 IT respondents. The Study identified not only an increase in the number of cloud based IT systems, but also a 10% increase since last year, in the number of businesses using cloud computing services. It shows, businesses are demonstrating greater confidence in the use of cloud models such as; Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS).

Benefits driving business to the cloud:

  • Integration – a mix of technologies integrated into the workflow
  • IT Investment –  moves from fixed expenditure to operating expenditure
  • Scalability  to manage fluctuations in demand
  • Upgrade and patch flexibility –  automated systems leave time for more important IT activities
  • Speed turnaround times become hours rather than weeks
  • Flexibility and user control personal devices can be used to log on to secure IT networks from any location
  • Big Data easily manages the sheer volume of data generated
  • Security   can be left to the experts

Data security was a major concern for cloud migration, especially, during the early stages of cloud computing. The cloud has been around a few years now and its security has been tested. Cloud security is just as secure if not, more secure than more traditional IT structures, with cloud hosting services benefitting from economies of scale, specialised staff and a strong security focus. These benefits are rolled back to cloud users who can leave security to the experts.

Data security is at the centre of a security services ecosystem of monitoring, encryption, cloud-archiving and data recovery services. Organizations like the Cloud Security Alliance (CSA) endorse best practices to set industry standards, which are consistently upgraded and modified in compliance with governing bodies.

KantanMT also embraced cloud integration into its software architecture, with four layers of security protecting client data.

KantanMT security
SPF for data, four levels of security equals four times more resistance against cyber attacks.

The multi-tenant architecture gives each member full functional access on the platform, while keeping data completely isolated from other members. Each member has their own password protected account, and all data is encrypted to meet the advanced encryption standards, AES 256-bit encryption. KantanMT uses a secure amazon server to host data services.

Basically, when data is uploaded to the KantanMT platform, it is stored on secure amazon servers, and protected behind amazon firewalls. Accessing the data requires a pass key, which has two parts. One part is used by the client and the other kept by the cloud service provider (CSP). Access requires both parts of the key, the CSP alone cannot access the data.

KantanMT selected Amazon Web Services (AWS) to host cloud services for their excellent data security practices and compliance with IT security standards. AWS is in a bid war against IBM for a $600 million cloud contract with the CIA. Compared with IBM’s more cost effective solution, the CIA considered AWS to be “a superior technical solution”. Notable other AWS clients are Intuit Inc., SAP AG, Spotify, Sage, Pfizer and Thomson Reuters.

More detailed information on how KantanMT manages security will be published in our data security whitepaper coming out in the next few months.

機械翻訳 KantanMT Supports Japanese

KantanMT Japanese TokenizerThis week, KantanMT announced the introduction of a Japanese tokenizer and detokenizer to its KantanMT platform. This means that members can now build Machine Translation engines with Japanese as either the source or target language. To celebrate the release of KantanMT Japanese, we are going to give you a few facts and figures about Japan, the language, and Japan’s Machine Translation industry.

Oh and by the way, the title of this post means “Machine Translation”!!

The Japanese Language
Japanese is known as one of the world’s most difficult languages. Not too difficult to speak, but tough to read and write.

Construction
Japanese syntax is very different to English

  • Japanese sentence structure is in a subject-object-verb (SOV) or object-subject-verb (OSV) order, which is opposite to the English subject–verb–object (SVO) structure. The verb always comes at the end of a sentence
  • The indefinite and definite articles (‘a’ and ‘the’) are not commonly used
  • Japanese is written in 3 alphabets – Hiragana, Katakana, and Kanji
  • The singular and plural of a word are the same
  • 5 vowels and 11 consonants produce the 48 sounds of the language
  • There are no “L” and “R” sounds in Japanese

There is some good news however, because nouns do not have genders in Japanese-just like English!

Some other facts about Japanese…

  • There are approx.130 million people speaking Japanese in the world today. Most of these are in Japan of course, but there are also people speaking Japanese as their first language in the USA and South America. Japanese is the second most common language spoken in Brazil.
  • The literacy rate in Japan is almost 100%.
  • There are thousands of foreign loan words in the Japanese language. These are called gairaigo (外来語) and come from mostly English and European languages. These words are always written with the Katakana alphabet.
  • English is the only foreign language taught in public Japanese schools.

easelly_visual(6)

Japan and Machine Translation
Now that we know some more about the Japanese language, we’re going to turn our attention to the history of Japan’s Machine Translation Industry.

In 1955, the first Japanese research programme began at Kyushu University, and the other major Machine Translation research bodies in Japan up until the mid-60s were The Electrotechnical Laboratory in Tokyo and Kyoto University. It was at the Electrotechnical Laboratory in Tokyo that research on the first English to Japanese Machine Translation system began in 1957.

John Hutchins (n.d.) says that English to Japanese was the primary research focus of the period, however, it was very difficult to analyse written Japanese because of the “lack of any indication of word boundaries” (Hutchins, n.d., p. 1). Hutchins goes on to say that there was also very few general purpose computers in Japan with “sufficient storage capacity for Machine Translation needs (Hutchins, n.d., p. 1)”, he adds that this directed early Japanese Machine Translation research towards “the investigation of special purpose machines and perhaps the emphasis on theoretical studies” (Hutchins, n.d., p. 2).

Japan a Leader in MT…

Japan became a leading player in the Machine Translation field during the 1980s. In 1982, the state launched a four year Machine Translation programme that resulted in a huge increase in the number of English to Japanese Machine Translation projects within the Japanese manufacturing industry. The decade also saw Fujitsu launching its Atlas Machine Translation Japanese to English engine and the first ever Machine Translation summit was held in Tokyo in 1987.

You can find out more about early Japanese Machine Translation projects by reading the TAUS timeline and John Hutchins’s Projects and groups in Japan, China, and Mexico (1956-1966).

The Japanese language itself has also been involved in some of the major Machine Translation projects of the past decades. For example, in 1991 NEC showcased INTERTALKER, which was an “automatic speech to speech system combining speech recognition, PiVOT MT, and speech synthesis for English, Japanese, French, and Spanish” (TAUS, 2013). In 1992, the C-Star demonstrated the first phone translation between Japanse, English, and German. Then in 1993, the eight year German state-supported project Veromobil began. Veromobil aimed to produce “portable systems for face-to-face English-language business negotiations in German and Japanese” (Wired, 2000).

By introducing a Japanese tokenizer and detokenizer, KantanMT is adding a new page to the history of Machine Translation and the Japanese language. We also want to play a part in the continued expansion of your company, and with KantanMT, the door to Japanese markets is now open!

If you want to find out more about KantanMT, visit KantanMT.com and sign up to our free 14 day trial.

Featured Image Source: http://www.csuci.edu/cia/countries/japan.htm

PEMT Standards

KantanMT PEMT standardsIn this blog series, we are discussing the area of post-editing. In our earlier posts, ‘The Rise of PEMT‘ and ‘Cutting PEMT Times‘ we have discussed the meaning of automated post-editing, why its popularity is growing among Language Service Providers (LSPs), and how you can cut your post-editing times.

Machine Translated text can be post-edited to different quality levels. This post is based on post-editing guidelines that have been developed by TAUS with, among others, KantanMT’s partners DCU and CNGL. A link to these guidelines is available at the end of this post.

Post-editing to an understandable level
An understandable level of post-editing is a standard by which the main content of the message is correct and understandable for the user. However, the documents readability may not be perfect and there may be a number of styling errors. Correct styling however is not essential as long as the main message content is understandable.

Follow these rules to post-edit a translated text to an understandable level

  • Ensure that the meaning of the translated text is the same as the source text and that it is understandable to the user
  • Read through the document to make sure that there is no missing or excess information
  • Because the translation is part of the localization process, make sure that the content is not offensive or culturally insensitive
  • Correct basic spelling errors
  • Errors that only effect the styling of the document do not need to be changed, so, there is no need to correct the following sentence, “Kantanmt is cloud based statistical machine translator platform”. Note: The stylistically correct version is “KantanMT is a cloud-based Statistical Machine Translation platform”
  • Remember that the fewer post-edits there are the better – use as much of the original Machine Translation output as possible
  • Don’t restructure sentences to improve the flow if the meaning is comprehensible

easelly_visual(4)

Post-editing to a quality standard similar to human translation
TAUS defines this level as being, “comprehensible (i.e. an end-user perfectly understands the content of the message), correct (i.e. it communicates the same meaning as the source text), stylistically fine, though the style may not be as good as that achieved by a native-speaking human translator. Syntax is normal, grammar and punctuation are correct”

Follow these rules to post-edit a translated text to this standard

  • Ensure that content is grammatically complete and structured logically, and that the meaning of the message is clear to the user
  • Check the translation of terms that are essential to the document and make sure that any untranslated terms have been requested to stay as such by the client
  • Read through the document to make sure that there is no missing or excess information
  • Because the translation is part of the localization process, make sure that the content is not offensive or culturally insensitive
  • Remember that the fewer post-edits there are the better – use as much of the original MT output as possible
  • Correct spelling errors and make sure that the document is correctly punctuated and well formatted

And that’s it! For errors such as misspellings or formatting mistakes, you can use KantanMT’s PEX technology to find and correct any repetitive errors throughout a document. This will help to speed up post-editing times while reducing post-editing costs.

TAUS Machine Translation Post-Editing Guidelines

You can find out more about KantanMT by visiting KantanMT.com and signing up to our free 14 day trial.

The History of Machine Translation Pt. 2

KantanMT Machine Translation HistoryKantanMT is presenting a two-part blog series on the history of Machine Translation to give our readers a better understanding of the industry and where KantanMT fits in within the grand scheme of things. In our last post, The History of MT Pt.1 KantanMT presented the key stages in the history of MT during 1945-1979. In this post KantanMT highlights the major developments from 1980 to the present day.

Again, thanks to the folks at TAUS for providing such a great timeline on their website to help us in writing this post.

The 80s and 90s…
While the EC’s Machine Translation project EUROTRA continues, Japan launches a state supported Machine Translation research programme in 1982. Japanese manufacturing witnesses a surge in the number of English-Japanese MT projects as a result. In Stuttgart in 1984, Trados is founded and becomes the first company to roll out translation memory technology (Multiterm (1992) and Translator’s workbench (1994)). In the same year IBM begins research on using “slot” grammars for Machine Translation. In 1987 the first ever Machine Translation Summit is held in Tokyo, Japan and at a conference in 1988 IBM reports on its experiments in Statistical Machine Translation (SMT) with the Canadian Hansard corpus. The feasibility of SMT becomes a major research topic and represents a break from the traditional Rule-Based methods.

The 1990s begin with Michael Blekhman establishing the first university course on Machine Translation at Kharkov State University. In 1996 Systran offers free translation of small text segments on the internet. iTranslator, the first commercial internet Machine Translation service is launched by Lernout & Hauspie in 1998 while in Dublin, ALPNET launches one of the localization industry’s first language technology integration services.

easelly_visual(2)

The 00s and beyond…
In 2001, the US National Institute of Standards and Technology introduces its Open Machine Translation evaluation system. The aim of OpenMT is to help improve Machine Translation technologies. Language Weaver is established in California in 2002 to produce Statistical Machine Translation systems and in 2003 the ISI team wins DARMA’s speed MT competition with, you guessed it, a Statistical Machine Translation engine. In 2004 TAUS is established and the state funded OpenTrad project is rolled out in Spain – the scheme aims to develop Machine Translation engines for the different languages in Spain.

In 2006, the European Commission launches EuroMatrix which aims to develop Machine Translation engines for European language pairs and in 2007, Moses, the Statistical Machine Translation system is launched and incorporated into EuroMatrix. 2008 sees the introduction of text/SMS translation for mobile phones by NEC and in 2009, Apptek combines Statistical Machine Translation with traditional Rule-Based models to produce a hybrid MT system.

There is also large-scale development in cloud technologies after the turn of the century and in 2012, KantanMT is launched as a cloud based Statistical Machine Translation platform.

By providing a Statistical Machine Translation Service in the cloud, KantanMT is drawing from developments throughout the rich history of the Machine Translation industry and carrying the torch into the future.

Find out more about Machine Translation and KantanMT by going to KantanMT.com and signing up to our free 14 day trial.