Google Translate receives machine learning boost
By Edwin Yapp July 18, 2017
- Launches neural machine translation in 30 languages, including Malay
- Still a ways to go to achieve better translation; dependent on NMT training
IF YOU’VE ever been stuck in a country where you can’t speak the language, chances are that you’ve turned to Google’s Translate app and used it to get you through some tough ‘lost in translation’ situations.
But how often have you experienced inaccurate results from the Translate app, especially if you’ve compared its results with that of human translations? Probably quite a few times.
However according to Google Inc’s product manager Julie Cattiau (pic, right), things are set to improve as the Mountain View, California-based search giant is rolling out what it claims to be better translation algorithm that is based on neural machine translation (NMT) instead of using phrase-based machine translation (PBMT).
Cattiau said PBMT applies a statistical approach to translation by looking for word patterns that repeat themselves. The method scans for words in documents that have similar meanings and marks those words down as having equivalent meanings, thereby completing the translation process based on the statistical likelihood that they mean one and the same thing.
“For example, if a certain Mandarin character is used a number of times to mean ‘chicken’ in English, in all likelihood the correct translation for that character is chicken,” she explained. “This process is then replicated across words used in the language pair [Mandarin/ English] and PBMT builds the translation model this way.”
Cattiau said PBMT uses data that was previously translated by humans, mainly professionals who have translated books, official documents and/ or press articles. This translated data has been used as a basis for Google’s translation models for about 10 years now.
“When users typed words into Google Translate, the engine behind the app actually breaks the sentence into small parts and translates each small part then stitches together the sum of all the smart parts to get the desired translation,” she explained.
She acknowledged that this is a tedious process and the results, many a time, do not sound very natural to humans, but she added that this was the best way to translate languages known to date.
“This has been state of the art of translation for this last decade,” she argued. “But with the advent of NMT, things are set to change,” she said.
In comes NMT
Cattiau said that with NMT, linguists and engineers are able to look at sentences as a continuous whole and translate them as one unit without breaking them down into small parts, which was how PBMT was done.
She said NMT is based on a series of data, which can comprise pictures, words or general language input that is fed into a series of ‘neural nets,’ which are mathematical algorithms which do a lot of processing in order to get an output.
Each input can be a query, for instance, something a user is asking Google to translate, while the output is the same query but in a different language, she explained.
“The key part of NMT is to correctly train the neural nets. We need to tell our neural net models, this is how we say a certain word in a language. We give these neural nets a billion examples that have previously been translated and get the neural nets to recognise these words as accurately as possible.”
Cattiau said this means the quality of data directly impacts the output and so it’s very important to train the neural nets to get the accurate translation. But once these neural nets are properly trained, they are able to give much better translation results, she added.
“The impact of this is that our translations sound more natural and they are now more grammatically correct, and it’s getting closer to how humans can translate,” she claimed.
In fact, Cattiau claimed that Google has managed to improved translation quality in one update using NMT compared to what it was able to achieve with PBMT in the past 10 years combined.
As much as Google has made progress with NMT, there are still challenges facing the translation process, Cattiau conceded.
Because NMT depends on input that is derived from available documents professionally translated previously by humans, the output from NMT is only as good as its input.
“The challenge is that for certain languages, such as Chinese [Mandarin], there aren’t a lot of documents that are available that have been translated previously by professionals,” she argued.
“In those cases, it is difficult to train neural nets. But for languages that have more professional data available, such as western languages [Spanish, French, German], our translation is more accurate.”
Asked how accurately Google Translate will be able to do its job when slang, idioms or colloquial phrases are used, Cattiau acknowledged that the app does not take into account such variations, as there is very little training data for the neural nets to learn such phrases.
“However, there is a crowdsourcing Google Translate Community – users who are native speakers, who have helped us translate such sentences. We depend on these input to feedback into the neural nets to train the model.
“But even then, we aren’t doing well enough, so there is still some way to go but it is work in progress,” she said.
Google’s NMT is now available for translation in 30 languages including from Malay into English and vice-versa. All Google Translate apps are available in iOS, Android and Web versions. More information can be found here.