KantanMT Japanese TokenizerThis week, KantanMT announced the introduction of a Japanese tokenizer and detokenizer to its KantanMT platform. This means that members can now build Machine Translation engines with Japanese as either the source or target language. To celebrate the release of KantanMT Japanese, we are going to give you a few facts and figures about Japan, the language, and Japan’s Machine Translation industry.

Oh and by the way, the title of this post means “Machine Translation”!!

The Japanese Language
Japanese is known as one of the world’s most difficult languages. Not too difficult to speak, but tough to read and write.

Japanese syntax is very different to English

  • Japanese sentence structure is in a subject-object-verb (SOV) or object-subject-verb (OSV) order, which is opposite to the English subject–verb–object (SVO) structure. The verb always comes at the end of a sentence
  • The indefinite and definite articles (‘a’ and ‘the’) are not commonly used
  • Japanese is written in 3 alphabets – Hiragana, Katakana, and Kanji
  • The singular and plural of a word are the same
  • 5 vowels and 11 consonants produce the 48 sounds of the language
  • There are no “L” and “R” sounds in Japanese

There is some good news however, because nouns do not have genders in Japanese-just like English!

Some other facts about Japanese…

  • There are approx.130 million people speaking Japanese in the world today. Most of these are in Japan of course, but there are also people speaking Japanese as their first language in the USA and South America. Japanese is the second most common language spoken in Brazil.
  • The literacy rate in Japan is almost 100%.
  • There are thousands of foreign loan words in the Japanese language. These are called gairaigo (外来語) and come from mostly English and European languages. These words are always written with the Katakana alphabet.
  • English is the only foreign language taught in public Japanese schools.


Japan and Machine Translation
Now that we know some more about the Japanese language, we’re going to turn our attention to the history of Japan’s Machine Translation Industry.

In 1955, the first Japanese research programme began at Kyushu University, and the other major Machine Translation research bodies in Japan up until the mid-60s were The Electrotechnical Laboratory in Tokyo and Kyoto University. It was at the Electrotechnical Laboratory in Tokyo that research on the first English to Japanese Machine Translation system began in 1957.

John Hutchins (n.d.) says that English to Japanese was the primary research focus of the period, however, it was very difficult to analyse written Japanese because of the “lack of any indication of word boundaries” (Hutchins, n.d., p. 1). Hutchins goes on to say that there was also very few general purpose computers in Japan with “sufficient storage capacity for Machine Translation needs (Hutchins, n.d., p. 1)”, he adds that this directed early Japanese Machine Translation research towards “the investigation of special purpose machines and perhaps the emphasis on theoretical studies” (Hutchins, n.d., p. 2).

Japan a Leader in MT…

Japan became a leading player in the Machine Translation field during the 1980s. In 1982, the state launched a four year Machine Translation programme that resulted in a huge increase in the number of English to Japanese Machine Translation projects within the Japanese manufacturing industry. The decade also saw Fujitsu launching its Atlas Machine Translation Japanese to English engine and the first ever Machine Translation summit was held in Tokyo in 1987.

You can find out more about early Japanese Machine Translation projects by reading the TAUS timeline and John Hutchins’s Projects and groups in Japan, China, and Mexico (1956-1966).

The Japanese language itself has also been involved in some of the major Machine Translation projects of the past decades. For example, in 1991 NEC showcased INTERTALKER, which was an “automatic speech to speech system combining speech recognition, PiVOT MT, and speech synthesis for English, Japanese, French, and Spanish” (TAUS, 2013). In 1992, the C-Star demonstrated the first phone translation between Japanse, English, and German. Then in 1993, the eight year German state-supported project Veromobil began. Veromobil aimed to produce “portable systems for face-to-face English-language business negotiations in German and Japanese” (Wired, 2000).

By introducing a Japanese tokenizer and detokenizer, KantanMT is adding a new page to the history of Machine Translation and the Japanese language. We also want to play a part in the continued expansion of your company, and with KantanMT, the door to Japanese markets is now open!

If you want to find out more about KantanMT, visit KantanMT.com and sign up to our free 14 day trial.

Featured Image Source: http://www.csuci.edu/cia/countries/japan.htm