As a team of people with an unbridled passion for innovations in the Machine Translation industry, Monday’s news about Reverie Technologies, a Bengaluru-based startup bagging a $4M investment did not come as much surprise to us. This brilliant news serves to highlight once again that in the ever-changing world of retail marketing and globalization, any business with plans to accelerate their products into global markets needs to localize their content for enhanced user experience. This goes on to drive global revenues and increase brand equity in existing and new markets. Continue reading
When we heard that the GALA (Globalization and Localization Association) conference was going to be held in Istanbul this year, I have to admit we were very excited! Istanbul is a wonderful city that bridges Europe and Asia and offers the visitor a sense spectacular of smells, tastes, culture and architecture. The beautiful Blue Mosque (Sultan Ahmed Mosque), Ayasofya and Topkapi Palace offer great insight into a society that is rich in cultural history, and the shopping around Taksim square and the Bazaars (Grand bazaar and Spice Bazaar) is fun for all.
Aside from the cultural attractions which is driving a booming tourism industry, Turkey’s industry structure has changed a lot in the last 15 years. The economy has been growing steadily since 2001 and its export market has taken a shift from a focus on textiles to increased production in the automotive, construction, and electronics industries. Turkey’s exports to Iraq have also seen a major increase, reaching $10.8 billion in 2012.
Unsurprisingly, Turkey’s translation industry has also developed considerably over the last few years – and today Istanbul is home to a growing community of LSPs (Language Service Providers) including the following:
Urban Translation – Urban Translations is a full service agency that prides itself on quality and excellence. The company launched in Turkey in 2006 and opened a second office in Barcelona, Spain in 2010. They mainly work with Turkish, Greek, Arabic, Persian, Turkic languages, Spanish, Catalan, Galician.
Dragoman – Dragoman is a translation technology powerhouse, whose progressive approach to translation and localization has built them strong foothold in the market. Dragoman group has three business units: Interpretation, Translation and Language Training.
Loc.PRO – Loc.PRO provides complete translation and localization services in Turkish, Arabic and Greek including linguistic (translation, localization and terminology management), testing and DTP services.
Apart from Turkish vendors, the GALA conference was of course packed with language professionals from all over the world, many of whom contributed to the conference by giving expert presentations and live product demos. The conference schedule can be viewed here>>>
One of the highlights of the event was the conference welcome and keynote address by Fikret Orman, President of Beşiktaş J.K. – who spoke to a room full of football jersey clad language professionals about the Turkish team’s story:
The conference was a huge success and a great opportunity to discuss language topics, generate ideas, meet new friends and touch base with old.
The team at KantanMT are looking forward to Seville already – will we see you there?
For more information about KantanMT please go to www.kantanmt.com
KantanMT had an exciting year as it transitioned from a publicly funded business idea into a commercial enterprise that was officially launched in June 2013. The KantanMT team are delighted to have surpassed expectations, by developing and refining cutting edge technologies that make Machine Translation easier to understand and use.
Here are some of the highlights for 2013, as KantanMT looks back on an exceptional year.
Strong Customer Focus…
The year started on a high note, with the opening of a second office in Galway, Ireland, and KantanMT kept the forward momentum going as the year progressed. The Galway office is focused on customer service, product education and Customer Relationship Management (CRM), and is home to Aidan Collins, User Engagement Manager, Kevin McCoy, Customer Relationship Manager and MT Success Coach, and Gina Lawlor, Customer Relationship co-ordinator.
KantanMT officially launched the KantanMT Statistical Machine Translation (SMT) platform as a commercial entity in June 2013. The platform was tested pre-launch by both industry and academic professionals, and was presented at the European OPTIMALE (Optimizing Professional Translator Training in a Multilingual Europe) workshop in Brussels. OPTIMALE is an academic network of 70 partners from 32 European countries, and the organization aims to promote professional translator training as the translation industry merges with the internet and translation automation.
The KantanMT Community…
The KantanMT member’s community now includes top tier Language Service Providers (LSPs), multinationals and smaller organizations. In 2013, the community has grown from 400 members in January to 3400 registered members in December, and in response to this growth, KantanMT introduced two partner programs, with the objective of improving the Machine Translation ecosystem.
The Developer Partner Program, which supports organizations interested in developing integrated technology solutions, and the Preferred Supplier of MT Program, dedicated to strengthening the use of MT technology in the global translation supply chain. KantanMT’s Preferred Suppliers of MT are:
To date, the most popular target languages on the KantanMT platform are; French, Spanish and Brazilian-Portuguese. Members have uploaded more than 67 billion training words and built approx. 7,000 customized KantanMT engines that translated more than 500 million words.
As usage of the platform increased, KantanMT focused on developing new technologies to improve the translation process, including a mobile application for iOS and Android that allows users to get access to their KantanMT engines on the go.
KantanMT’s Core Technologies from 2013…
KantanMT have been kept busy continuously developing and releasing new technologies to help clients build robust business models to integrate Machine Translation into existing workflows.
- KantanAnalytics™ – segment level Quality Estimation (QE) analysis as a percentage ‘fuzzy match’ score on KantanMT translations, provides a straightforward method for costing and scheduling translation projects.
- BuildAnalytics™ – QE feature designed to measure the suitability of the uploaded training data. The technology generates a segment level percentage score on a sample of the uploaded training data.
- KantanWatch™ – makes monitoring the performance of KantanMT engines more transparent.
- TotalRecall™ – combines TM and MT technology, TM matches with a ‘fuzzy match’ score of less than 85% are automatically put through the customized MT engine, giving the users the benefits of both technologies.
- KantanISR™ Instant Segment Retraining technology that allows members near instantaneous correction and retraining of their KantanMT engines.
- PEX Rule Editor – an advanced pattern matching technology that allows members to correct repetitive errors, making a smoother post-editing process by reducing post-editing effort, cost and times.
- Kantan API – critical for the development of software connectors and smooth integration of KantanMT into existing translation workflows. The success of the MemoQ connector, led to the development of subsequent connectors for MemSource and XTM.
KantanMT sourced and cleaned a range of bi-directional domain specific stock engines that consist of approx. six million words across legal, medical and financial domains and made them available to its members. KantanMT also developed support for Traditional and Simplified Chinese, Japanese, Thai and Croatian Languages during 2013.
Recognition as Business Innovators…
KantanMT received awards for business innovation and entrepreneurship throughout the year. Founder and Chief Architect, Tony O’Dowd was presented with the ICT Commercialization award in September.
In October, KantanMT was shortlisted for the PITCH start-up competition and participated in the ALPHA Program for start-ups at Dublin’s Web Summit, the largest tech conference in Europe. Earlier in the year KantanMT was also shortlisted for the Vodafone Start-up of the Year awards.
KantanMT were silver sponsors at the annual 2013 ASLIB Conference ‘Adopting the theme Translating and the Computer’ that took place in London, in November, and in October, Tony O’Dowd, presented at the TAUS Machine Translation Showcase at Localization World in Silicon Valley.
KantanMT have recently published a white paper introducing its cornerstone Quality Estimation technology, KantanAnalytics, and how this technology provides solutions to the biggest industry challenges facing widespread adoption of Machine Translation.
For more information on how to introduce Machine Translation into your translation workflow contact Niamh Lacy (firstname.lastname@example.org).
The internet became truly multilingual yesterday, as the Internet Corporation for Assigned Names and Numbers (ICANN), announced the release of four new generic top-level domains (gTLDs). gTLDs are internet domain names with language-specific scripts and the four new suffixes represent some of the world’s most widely spoken languages. Their selection for release by the ICANN was a strategic decision.
After Latin script, Chinese is the second most widely used alphabet with approx. 1340 million users, Arabic holds the number three position with 380 million users, and Cyrillic is number five used by approx. 250 million people. The four domain names released yesterday are:
游戏 (game) – Chinese
شبكة (web) – Arabic
Онлайн (online) – Cyrillic
Сайт (site) – Cyrillic
The president of ICANN’s Generic Domains Division, Akram Atallah indicated this was just the start of a, “global society” coming together. The purpose of The New Generic Top Level Domain Program is to create a, “globally-inclusive Internet”, improving ecommerce and internet globalisation.
Ripples will be felt in the localization industry with increased demand for real-time translation of user generated content (UGC). Translation technologies are constantly being developed, adopted to markets and fine-tuned. A leading example of this in the development of Machine Translation and these improvements are best seen in the quality assessment (QA) of Machine Translation.
Machine Translation quality has been subjected to scrutiny for decades. This is also changing. Commercial use of Machine Translation is growing, especially in certain industries. Computational capabilities and the availability of vast amounts of multi and monolingual training data have played a significant role in the adoption rate of Machine Translation in both the public and private sectors.
Next week, KantanMT, will release a technology, which addresses the challenge of Machine Translation quality estimation (QE). KantanAnalytics is a revolutionary product that carries out quality analysis at segment level.
Increased demand for real-time high quality translated content will be seen in the near future as internationalised domain names (IDNs) bring people and communities together. This is one of the first steps in increasing the current number of 22 English language dominated domain names to a further 1,400 new multilingual names.
IDNs are domain names registered in non-Latin scripts or ASCII characters, like Chinese. IDNs are already available as second-level domains and country code top-level domains (ccTLDs) tied to specific countries. For example, In Ireland a ccTLD will end in “.ie”. These are different from gTLDs, which belong to a core group of restricted domain names such as .com, .net and .org.
Watch out for the KantanAnalytics release next week. KantanMT are continuing to offer a 14 day free trial to new members. click here>>
The “five percent gamble”, a new buzz phrase, implemented by the digital information industry, assumes most of the world’s population can be reached by supporting just five percent of the world’s 6,000 + languages.
This ‘gamble’ discussed by Thomas Petzold and Han-Teng Liao, social technology analysts, came about through calculating the return on investment for internationalisation and localization activities. It was also a major stepping stone for driving our multi-lingual internet.
English, considered to be the original language of the internet, and the global lingua franca, was predicted to overshadow other languages as the internet phenomena exploded. However, the expected English language hegemony was disrupted as the internet became more accessible to other language users.
It is through these other language users that the internet transitioned from a mono-to-multilingual infrastructure. Businesses looking to enter European markets localised through FIGS (French, Italian, German and Spanish) the big four for Europe, and CJK (Chinese, Japanese and Korean) language support became necessary for penetrating Asian markets.
Together with English, these seven languages formed the top of a global language hierarchy. But as the global marketplace is evolving this hierarchy is shifting. We are seeing a much higher demand for localised products for BRIC (Brazil, Russia, India and China) regions, especially as purchasing power for those areas increases.
Research from the Common Sense Advisory shows 90% of online purchasers can be reached using only 13 languages. These languages include: English, Japanese, German, Spanish, French, Simplified Chinese, Italian, Portuguese, Dutch, Korean, Arabic, Russian, and Swedish. Another interesting fact identified from the research, showed 72.1% of online buyers preferred browsing and buying from websites in their native language.
Byte Level Research, one of the first companies to undertake an extensive analysis on how websites are designed and shared globally, produce an annual web globalisation report. According to the 2012 report websites supporting 10 languages are just “not global enough”. The average number of languages supported by companies in the 2012 web globalisation report was 32 languages. The Common Sense Advisory suggests a 16 language minimum is needed to just be competitive.
The five percent gamble by companies, like Google, which supports approximately 345 different languages, and Wikipedia, which supports 285 language editions has had a knock on effect in shaping the future of languages and turning the internet into an “international platform”.
What this means for businesses and organisations in the foreseeable future is a huge jump in the demand for translation services across varying language combinations. Implementing machine translation will be the only viable way to achieve this.
Did you attend Localization World, Santa Clara last week? Check out KantanMT’s Facebook page for photos from our booth!
This week, KantanMT announced the introduction of a Japanese tokenizer and detokenizer to its KantanMT platform. This means that members can now build Machine Translation engines with Japanese as either the source or target language. To celebrate the release of KantanMT Japanese, we are going to give you a few facts and figures about Japan, the language, and Japan’s Machine Translation industry.
Oh and by the way, the title of this post means “Machine Translation”!!
The Japanese Language
Japanese is known as one of the world’s most difficult languages. Not too difficult to speak, but tough to read and write.
Japanese syntax is very different to English
- Japanese sentence structure is in a subject-object-verb (SOV) or object-subject-verb (OSV) order, which is opposite to the English subject–verb–object (SVO) structure. The verb always comes at the end of a sentence
- The indefinite and definite articles (‘a’ and ‘the’) are not commonly used
- Japanese is written in 3 alphabets – Hiragana, Katakana, and Kanji
- The singular and plural of a word are the same
- 5 vowels and 11 consonants produce the 48 sounds of the language
- There are no “L” and “R” sounds in Japanese
There is some good news however, because nouns do not have genders in Japanese-just like English!
Some other facts about Japanese…
- There are approx.130 million people speaking Japanese in the world today. Most of these are in Japan of course, but there are also people speaking Japanese as their first language in the USA and South America. Japanese is the second most common language spoken in Brazil.
- The literacy rate in Japan is almost 100%.
- There are thousands of foreign loan words in the Japanese language. These are called gairaigo (外来語) and come from mostly English and European languages. These words are always written with the Katakana alphabet.
- English is the only foreign language taught in public Japanese schools.
Japan and Machine Translation
Now that we know some more about the Japanese language, we’re going to turn our attention to the history of Japan’s Machine Translation Industry.
In 1955, the first Japanese research programme began at Kyushu University, and the other major Machine Translation research bodies in Japan up until the mid-60s were The Electrotechnical Laboratory in Tokyo and Kyoto University. It was at the Electrotechnical Laboratory in Tokyo that research on the first English to Japanese Machine Translation system began in 1957.
John Hutchins (n.d.) says that English to Japanese was the primary research focus of the period, however, it was very difficult to analyse written Japanese because of the “lack of any indication of word boundaries” (Hutchins, n.d., p. 1). Hutchins goes on to say that there was also very few general purpose computers in Japan with “sufficient storage capacity for Machine Translation needs (Hutchins, n.d., p. 1)”, he adds that this directed early Japanese Machine Translation research towards “the investigation of special purpose machines and perhaps the emphasis on theoretical studies” (Hutchins, n.d., p. 2).
Japan a Leader in MT…
Japan became a leading player in the Machine Translation field during the 1980s. In 1982, the state launched a four year Machine Translation programme that resulted in a huge increase in the number of English to Japanese Machine Translation projects within the Japanese manufacturing industry. The decade also saw Fujitsu launching its Atlas Machine Translation Japanese to English engine and the first ever Machine Translation summit was held in Tokyo in 1987.
You can find out more about early Japanese Machine Translation projects by reading the TAUS timeline and John Hutchins’s Projects and groups in Japan, China, and Mexico (1956-1966).
The Japanese language itself has also been involved in some of the major Machine Translation projects of the past decades. For example, in 1991 NEC showcased INTERTALKER, which was an “automatic speech to speech system combining speech recognition, PiVOT MT, and speech synthesis for English, Japanese, French, and Spanish” (TAUS, 2013). In 1992, the C-Star demonstrated the first phone translation between Japanse, English, and German. Then in 1993, the eight year German state-supported project Veromobil began. Veromobil aimed to produce “portable systems for face-to-face English-language business negotiations in German and Japanese” (Wired, 2000).
By introducing a Japanese tokenizer and detokenizer, KantanMT is adding a new page to the history of Machine Translation and the Japanese language. We also want to play a part in the continued expansion of your company, and with KantanMT, the door to Japanese markets is now open!
Featured Image Source: http://www.csuci.edu/cia/countries/japan.htm
In our last post, The US and MT, we looked at the Georgetown-IBM experiment in 1954. In this post, we are going to turn our attention to France, and the work of one of the major figures in the history of Machine Translation-Bernard Vauquois.
Vauquois was one of the world’s leading Machine Translation researchers from 1960 until his death in 1985. Vauquois’s original interest was in mathematics and astronomy. However, in 1960 he went to Grenoble to set up a curriculum in computer science and formal languages, and a Machine Translation research lab. Vauquois had developed a keen interest in computer science while it was becoming increasingly popular in 1950s France.
Around the time that he became Professor of Computer Science at Grenoble, Vauquois took leadership of CETA (Centre d’ Étude pour la Traduction Automatique) and began working on ways to improve problems with the “first-generation” approach to Machine Translation.
In a dedication to Vauquois, Christian Boitet says that Vauquois, “assessed the potential of the new, grammar-based methods of formal language theory, and proposed a new approach, based on “pivot” representation, and on the use of (declarative) rule systems to transform a sentence sequentially from one level of representation to another”. This was after Vauquois’s predecessor at CETA, M. Sestier, believed that the problems facing Machine Translation were too central to overcome. Under Vauquois’s stewardship, CETA built the first large second-generation Machine Translation system of the sixties.
Rather than using the traditional declarative and interlingual approach to building and deploying Machine Translation systems, Boitet says that Vauquois, “used heuristic programming techniques, implemented as procedural grammars written in SLLPs (Specialised Languages for Linguistic Programming)” to produce a programming environment for “building and using Machine Translation Systems”.
In 1969, Vauquois became chairman of the ICCL (International Committee on Computation Linguistics) and his position as a leading figure in the field was concretised in the seventies, when Vauquois pioneered “multilevel structural descriptors”; multilevel structural descriptors were to be applied to translation units longer than sentences such as paragraphs and pages. This idea was a bed rock for the French National Machine Translation project which started in the 1980s and GETA (Groupe d’ Étude pour la Traduction Automatique), the successor to CETA. Vauquois was also an initiator of EUROTRA, which was a project funded by the European Commission from 1978 to 1992. The aim of EUROTRA was to produce a high-spec Machine Translation system for the then-member languages of the European Community.
Vauquois’s next major addition to the field of Machine Translation was the “static grammar” model. This, as Vauquois himself says, involves “defining the mapping between the strings of words of a language and their structural organisation, given that with transducers there are many ways of obtaining the same result using different strategies”.
“Static Grammar” was also Vauquois’s final addition. In his 25 year career working in computer linguistics, he became a global figure who collaborated with countries such as USA, Russia, and China. The sub-title to Boitet’s dedication sums up Vauquois’s contribution to Machine Translation in this quarter century-“Pioneer of Machine Translation”.
Click here to read Christian Boitet’s full dedication to Bernard Vauquois.
To celebrate Independence Day and Bastille Day, we here in the KantanMT blogging workshop thought that we would use this opportunity to pay homage to the early contributions made by both American and French pioneers to the development of Machine Translation. In this first post, we are going to focus on America and one of the most important developments in the history of Machine Translation: The Georgetown-IBM Experiment. Background to the Experiment… Funnily enough, it all began with the Frenchman Léon Dostert who was Director of Georgetown University’s Institute of Languages and Linguistics. Dostert had previously worked as an interpreter for Eisenhower and liaison officer with Charles De Gaulle. Dostert also developed the translation system for the Nuremburg Trials. After attending the first ever conference on Machine Translation in 1952, an inspired Dostert decided to check out the feasibility of this new technology in a practical experiment. Dostert contacted the founder of IBM Thomas J. Watson, who agreed to support Dostert’s work. They established a team of both IT and linguistic specialists and the experiment was ready to begin. The Experiment… 12 machines, collectively known as the IBM type 701 electronic data processor, would translate 250 lexical items with six rules. The source language was Russian and the target was English. Why? Well, Russia was the biggest military threat to the US at the time, a machine that could translate Russian content to English would help the US to keep tabs on the Soviets. Watson said “I see this as an instrument that will be helpful in working out the problems (of world peace), we must do everything possible to get the people of the world to understand each other as quickly as possible”. Most of the sentences that were translated related to organic chemistry to show different uses of nouns and verbs. W John Hutchins, in his report The Georgetown Experiment-Demonstrated in January 1954, gives some examples:
- They prepare TNT
- They prepare TNT out of coal
- TNT is prepared out of coal
Associate Professor at the Institute Paul Garvin said that one of the major shortcomings of the experiment was that it was so limited – remember the experiment only consisted of 250 lexical items and six rules. But he defended its relevance, saying that the engine did have to make selection and arrangement decisions while translating the content.
While many people around the world have felt some sort of effect from the Global recession it seems that the language industry has largely bypassed this, as we see growth projections of 13% for the coming year. The language industry currently turns over $35 billion per year and employs over 200,000 in the US alone.
As Globalisation continues to put pressure on firms to localise offerings and communications there is increasing opportunity for business development, particularly in emerging markets.
The growth of the Triple A markets (Asia, Africa and Arab) is a major contributing factor for the industry’s expansion. CEO of the Globalization and Localization Association (GALA), Hans Fenstermacher, suggests that the rapid spread of the internet, coupled with the projected economic growth in Asia, Sub-Saharan Africa and the Middle East are accelerating the demand for translation and localisation services in these regions.
Africa is made up of 53 countries, with over 2,000 languages and dialects. It is a multicultural landscape rich in resources and the business world is starting to take note.
Africa is experiencing it’s longest income boom in over 30 years going from stagnation to above 5% GDP growth on average. This growth has led to a growing middle class and an increase in demand for consumer goods. African governments are trying to encourage consumption by introducing strategies that will reduce transaction costs.
The IMF forecasts that seven of the world’s 10 fastest-growing economies will be African. Nations like Ethiopia, Mozambique, Tanzania, Congo, Ghana, Zambia, and Nigeria are expected to expand by more than 6 per cent per annum until 2015
According to the GSM Association, Africa is the fastest growing region for mobiles in the world with an estimated 700million sim cards in use. Growing internet usage has created increased consumer demand for targeted communication as a McKinsey report highlights.
40% of Africans from non-English speaking countries such as Angola, Algeria and Senegal said that localised content was the key change that they wanted to see in the internet.