Indonesia as a country has a huge, diverse landscape and vast cultural tapestry.  Along with this rich cultural mixture, it has more than 700 native regional languages and dialects spoken across the islands populated by 30 million people.  Bahasa Indonesia, as the official language, has its role as the lingua-franca for all of the hundreds of languages and dialects present in the country.

Bahasa Indonesia is a flourishing language.  Based on the colonial history of Indonesia, Bahasa Indonesia was developed by inheriting words from Sanskrit and Dutch. Because of the very large population, of which the majority speak Indonesian Bahasa, it is one of the most of the widely spoken languages in the world.

Bahasa Indonesia has been treated as an active commercial language by the East Asian countries.  Because of the strong economic market in Indonesia, other East Asian countries have found it necessary to understand Bahasa Indonesia in order to trade with the country. Yet unfortunately, Bahasa Indonesia is still considered a minority language by many companies in the wider world, leading to research and development of the language being under-resourced.

Perhaps as a by-product of this under development, there has been a lack of statistical data created in order for Bahasa Indonesia to qualify as a candidate for either Statistical Machine Translation (SMT) or Neural Machine Translation (NMT).

As a taught subject, Bahasa Indonesia is a very dynamic language.  Bahasa Indonesia is not only taught by its grammar and sentence structure, but also the usage of Bahasa Indonesia in proverbs, poems, and essay writing.  It can be said that Bahasa Indonesia is one of the most difficult subjects to learn as a student.  The dynamic of the language makes it difficult to find the ‘right’ equivalence during the translation process.  As it was mentioned above, Bahasa Indonesia is a mixture of Sanskrit and Dutch. Many of the concepts and definitions have a deep historical background behind their ‘meaning’.

It the recent years it has been getting even more difficult and complicated to translate Bahasa Indonesia in to other languages, and vice versa.  The over-whelming power of English as a global language, and the advances in education and technology, which mostly comes from English speaking countries, has greatly challenged the development of Bahasa Indonesia. This is particularly so in the evolution of new words, which has the knock-on effect of making the task of creating, improving and training better statistical data for SMT even more difficult.

Today, if we look at the Bahasa Indonesia and English language pair being processed through machine translation there will be a lot of fixes needing to be done. This is because English words tend not to be fully translatable in to Bahasa Indonesia.  In addition to this complication, the younger generations in Indonesia tend to intersperse English and Bahasa Indonesia in most of their conversations and writing.

This leads to Bahasa Indonesian lacking a purity of language needed for an optimum use of SMT. Even human translators will keep some of the original English words because they are more commonly used than the Bahasa Indonesian words. As a result, many translations in to Bahasa Indonesian become a hybrid combination.

This lack of linguistic purity means a lot of preparatory work is required in order for Bahasa Indonesia texts to be suitable for SMT. Many of the SMT products for use in the handling of the Bahasa Indonesian and English language pair are inconsistent in their translation, leading to a lot of incorrectly translated texts. The quality of these is such that they are misleading, and of little use to the intended readers.

Even with the growing use of English and Chinese as mandatory subjects in the Indonesian education system, Bahasa Indonesia has been able to hold its place as the official language. It has adapted to the imposition of other languages and cultures by evolving. Yet ironically, it is this lack of linguistic purity created by the adaption of words and concepts from other languages that has proven a challenge for SMT.

If SMT is to become a solution for Bahasa Indonesia, a lot of work will be needed to be done to create and make usable a suitable body of statistical data. Only this level of work will allow Bahasa Indonesia to use SMT and be recognised as a world language.

Janet Siska, a fourth-generation Chinese, was born and grew up in Jakarta, Indonesia. At the age of 15, Janet moved to the USA. In 2015, she graduated with a BSc in Biochemistry/Chemistry from California Polytechnic University, Pomona. Currently, Janet is attending Dublin City University where she is studying for a Master of Science in Translation Technology.  Janet is fluent in Bahasa Indonesia, English, Korean, and has a working knowledge of Chinese and German.