In 2013 KantanMT was invited to make a presentation to Meta Forum Berlin. It was an international conference gathered to discuss powerful language technologies for the multilingual information society, the data value chain, and the information marketplace. The two special themes of that year’s conference were Big Data Text Analytics and Multilingual Web Services for a Multilingual Europe.  

Our presentation (which is still available on YouTube) focused how machine translation was beginning to redefine what customers accepted as quality in translation. In our industry quality had always been an absolute. It was either 10 out of 10 perfection or it was considered sub-quality work. We suggested that there was now an emerging paradigm where quality was becoming what the customer deemed it to be. The previous quality paradigm assumed, usually true, that the translated product was aimed at the customer end-user. Translations targeted at customers had to be perfect less they possibly mislead users in a product’s operation, or the use of poor translation damage a company’s image. 

We gave what we saw as an example of what we believed would be a perfect example of this new quality spectrum – the necessity of what was then called “gist translations”. To illustrate this, we spoke a situation where a large law firm (several had been in contact to discuss this possible service) had boxes full of documents for “discovery” purposes. Discovery is the formal process of exchanging information between the parties about the witnesses and evidence they will present at trial.  

Such discovery inevitably involves large quantities of documents. In some cases, many of these documents might need translation. We argued that this was a scenario that lent itself to an AI-based machine learning solution. In this case, the customer needed to know the “gist” of what was in the documents so that they could then choose which ones to have translated by qualified legal translators. This we said was an example where quality was what the customer needed. It did not require the perfect translation quality paradigm of old. What we were talking about as a possibility at that conference is what we today call e-Discovery, also known as Computer-Assisted Review (CAR) or Technology-Assisted Review (TAR).  

CAR has become a rapidly growing machine learning market. The practice involves the use of software to help companies to interrogate documents or evaluate their content for relevant information. In the legal world, e-Discovery is now a vital tool in helping human decision-makers glean significant data from a vast volume of documents, sometimes in multiple formats and languages.  

As with many of the innovations in machine learning and machine translation industry, the dynamic driving its development has been the emergence of Big Data, along with the limitless Cloud storage option and high-speed data transfer by fibre optic plus the ability for even modest-sized companies to employ highly complex computer hardware. In addition, the use of CAR has now been accepted by US courts as a legitimate way to e-Discover relevant electronically stored information. This has cleared the path for the technology to be used across the US legal industry and beyond. It has quickly become the go-to solution to retrieve the informational needle from the electronically vast information haystack. 

Initially, CAR was seen as something of a Black Box. People were suspicious as to the wisdom of trusting such serious work to machines. Machines, they argued, make mistakes. So too do humans, was the counter argument. The difference said its proponents is CAR can manage huge data volumes, at high speed and do so cost effectively. The same volumes could not be managed by human reviewers at a similar speed and at a similar cost. The solution that evolved was to use a combination of Artificial Intelligence-based (AI) machines and human reviewers; the same hybrid model used in KantanMT’s machine translation model. 

The process of CAR is simple in its concept, albeit more complex in its practise. Firstly, CAR can only be used on Rich Text Documents (RTF). RTF is richer than plain text content. It supports text formatting, such as bold, italics, and underlining, as well as different fonts, font sizes, and coloured text. RTF documents can also include page formatting options, such as custom page margins, line spacing, and tab widths. RTF would include well known formats such as MSWord, Excel, PowerPoint, email formats and other similar software products.  

In order to analyse these documents a machine is trained through the use of AI-based algorithms to analyse documents and identify keywords, phrases, or sentiments within the text. This is carried out at an incredibly high speed. The reviewed documents are then categorised as “responsive” or “non-responsive”. The responsive documents can then be given to a human reviewer to do a more finessed search. Where it is thought necessary, the algorithms can be further fine-tuned by a human engineer to give more accurate and refined search results. 

Big Data has been the mother of many inventions. The reality is every enterprise today has access to enormous volumes of information which arrives at them with high speed. So large are the volumes traditional desk-top database solutions cannot handle the task. The development of machine learning and machine translation are tools that now make the management of such volumes of information manageable, and profitable.  

According to a from February 2021: “…the global eDiscovery Market was estimated at USD 12.61 Billion in 2019 and is expected to reach USD 24.12 Billion by 2026. The global eDiscovery Market is expected to grow at a compound annual growth rate (CAGR) of 9.7%% from 2019 to 2026”. 

Last year it was estimated that as many as 50% of all US companies used CAR. The analytics technology is increasingly finding new supporters across a range of industries from the Government, legal sector, BFSI sector, Energy and Utilities, Healthcare, Travel and hospitality, transportation and logistics, IT and telecom, media and entertainment, and others.  

As seen from the numbers above, the use of CAR by companies is growing exponentially. It is accepted that it is a challenge that can only be handled through the use of AI-based smart analytics algorithms operating on state-of-the-art machine learning platforms. Little did we suspect back in 2013 at the MET conference that our example of using machine translation as a tool of e-discovery was in fact a prescient take on the growth of a whole new industry. 

Aidan Collins, Marketing Manager KanatnMT