Machine Translation Terminology – MT technology can be overwhelming for those new to the industry, and getting to grips with the jargon can be a daunting task even for some of the most and industry savvy gurus. KantanMT put together a list of some acronyms, popular buzzwords and numeronyms, which are abbreviations that use numbers, so that you can keep up with the MT professionals or just brush up on your tech vocabulary.
- L10n – Localization/Localisation the process of adapting and translating a product or service so that it is culturally acceptable for a specific country or region.
- I18n – Internationalization/Internationalisation is a process implemented in the planning stages of a product or application, it ensures the infrastructure (coding) suits future translations or localizations. Some of the more common Internationalization preparation for software products involves supporting international character sets like Unicode, or ensuring there is enough space in the User Interface (UI) for text to be translated from languages like English with single-byte character codes to the multiple-byte character codes used in Chinese and Japanese Kanji.
- G11n – Globalization/Globalisation refer to the internationalization and localization preparations for products and services to be released in global markets. It usually incorporates ‘sim-ship’ or simultaneous shipment to different regions.
- MT – Machine Translation or Automated Translation is a translation carried out by computer. A piece of natural language text like English is translated by computer software into another language like French. Cloud MT is Machine Translation based on the cloud. There are different types of MT systems available.
- RBMT – Rule-Based Machine Translation system that uses a list of syntactic, grammatical and translation rules to generate the most appropriate translations.
- SMT – Statistical Machine Translation systems are data driven and have a statistical modelling architecture using algorithms to find the most probable match between source and target segments.
- API – Application Programming Interface is an interface that allows communication and interoperability between two applications or software programs.
- LSP – Language Service Provider sometimes referred to as a Localization Service Provider, is a service provider that carries out the translation and localization of different types of content for specific countries or locales.
- TM – Translation Memory is a database of aligned source and target translations called segments. Segments can be words, sentences or paragraphs. TMs can be integrated with CAT tools and they help speed up the translation process. TM files can be used as training data to train SMT engines.
- SPE – Statistical Post-editing is when Machine Translation output that has been post-edited is re-used as training data and fed back into the SMT engine or used to train a new engine.
- Normalization is the checking and cleaning up a Translation Memory so it can be included as training data for a SMT engine. Things to identify and correct are tags, mistranslations, sentence mismatches and stylistic features like upper and lower case inconsistencies.
- CAT tools – Computer-Aided Translation tools/ Computer-assisted Translation tools are used by humans to support the translation process by managing MT, TM and glossaries.
- Glossaries are vocabulary lists of specialised terminology, usually specific to an industry or organisation. These files can be uploaded as additional training data to an SMT engine.
- Bilingual corpus/ Bi-text database is a large text document with source and target languages. If the corpus is aligned it can be used as training data for an SMT engine.
If you know any new terms or interesting words you heard from your experience in the language and localization industry, KantanMT would love to hear about them, just pop them into the comment box below.