Top 5 Challenges in Neural Machine Translation
Author: Andreea Balaoiu
Top 5 Challenges in Neural Machine Translation
In our series of blog posts dedicated to NMT insights, What is Neural Machine Translation & ISO 18587:2017 Standardizing Machine Translation Post-Editing, we have opened the path towards a better understanding of the Neural Machine Translation concept and controversies, showcasing its framework, applications and scalability.
NMT or ‘Neural Machine Translation’ represents a cutting-edge AI-powered translation method. It consists of neural networks closely based on the human brain, allowing data categorization into various groups and layers, as well as an end-to-end automated approach, leveraging bilingual databases and deep learning processes.
Considering the above, we continue the series of NMT insights with the main challenges in Neural Machine Translation, detailing the top five most important trends and roadblocks to NMT automation, which continue to distinguish and redefine its role as a professional Translation and Localization tool, in relation to the human factors influencing it.
“What has changed with neural machine translation?” highlights a critical review of the human factors that affect NMT or Neural Machine Translation trends in terms of automation and translation/localization productivity. The authors, Ragni and Vieira, point out that “from (human) translators’ perspective, changes brought about by the neural paradigm are not as much to do with workflows, but rather with the NMT editing process and its specifics”.
NMT Challenge 1: Beam Search
In our article, What is Neural Machine Translation we defined the elements that ensure the functionality of NMT, trained jointly in an end-to-end approach that maximizes target translation outputs and turnaround times in comparison to other classic Machine Translation approaches.
Compared to PBMT (or Phrase-based Machine Translation), which has a separate linguistic reordering model, NMT relies solely on a single sequence model that predicts one word at a time. The sequence modelling is done using the ‘Encoder-Decoder’ approach, where an encoder neural network reads and encodes a source sentence into a fixed-length vector and a decoder outputs the corresponding translation.
The first challenge NMT faces is Beam search, closely related to this sequential model. As explained in Beam Search Strategies for Neural Machine Translation, “new sentences are translated by a simple beam search decoder that finds a translation that approximately maximizes the probability of a trained NMT model. The beam search strategy generates the translation word-by-word from left to right while keeping a fixed number (beam) of active candidates at each time step.”
However, the challenge of employing beam search in NMT is that it does not always guarantee optimal translation match solutions, albeit being more efficient than the exhaustive search (Lecture 8 on Machine Translation, Sequence-to-sequence and Attention, by Abigail See, Standford educational resources).
Wider beams (larger values) generally yield better results, however, NMT systems will occasionally pick a bad initial word, resulting in a higher probability of biased, less accurate responses.
NMT Challenge 2: Alignment
Just as any successful translation demands clear and accurate textual alignment between source and target, NMT input and output word alignment is equally important, and even more so challenging.
As pointed out in On the Word Alignment from Neural Machine Translation (2019), “neural machine translation (NMT) captures word alignment through its attention mechanism, however, this attention mechanism may almost fail to capture word alignment for some specific NMT models.”
The main challenge around NMT alignment lies in the contextual framework and how the NMT attention model chooses alignments that may or may not correspond intuitively, requiring additional guided alignment training processes.
Additionally, as Koehn & Knowles argue, “while alignment is a latent variable that is used to obtain probability distributions over words or phrases, arguably the attention model has a broader role. For instance, when translating a verb, attention may also be paid to its subject and object since these may disambiguate it.”
Let’s take a closer look at the example from Chinese showcased in On the Word Alignment from Neural Machine Translation, where the verb (faces) linked to the subject (Basescu) is clearly misaligned in relation to the reference sentence, resulting in an out of context mistranslation.
bā xiè sī gǔ dāng xuǎn luó mǎ ní yà zǒng tǒng chóu zǔ zhèng fǔ miàn lín tiǎo zhàn
巴谢斯古 当选 罗马尼亚 总统 筹组 政府 面临 挑战
R*: Basescu elected Romanian president, faces challenge of forming government
T**: Romanian president elected to form government
*(reference sentence)
** (translated sentence)
NMT Challenge 3: Long sentences
As stated in Long Sentence Preprocessing in Neural Machine Translation, “to build an efficient neural machine translation (NMT) system, it is essential to have an accurate and massive bilingual corpus for training, and ensure the continuous improvement of the methods and techniques used in the translation system. Despite multiple advantages, one challenging issue for current neural network translation systems is long sentence processing.”
One of the key differences between existing NMT models is how they incorporate information on word positions in the input.
Koehn & Knowles argue that “NMT systems have lower translation quality on very long sentences but do comparably better up to a sentence length of about 60 words.” And that “the introduction of the attention model remedied this problem somewhat. (…) While overall NMT is better than SMT, the SMT system outperforms NMT on sentences of length 60 and higher. Quality for the two systems is relatively close, except for the very long sentences (80 and more tokens). The quality of the NMT system is dramatically lower for these since it produces too short translations (length ratio 0.859, opposed to 1.024).”
NMT Challenge 4: Rare Words
NMT systems can also operate at a sub-word level, performing slightly better than SMT (Statistical machine translation systems, operating based on analysis of bilingual text corpora) on low frequency, rare words or highly-inflected word categories (e.g., verbs).
However, both SMT and NMT display faulty performance on words observed a single time within the training corpus, even compared to completely unobserved words.
Koehn & Knowles state that “the most common rare word categories challenging both NMT and SMT are ‘named entities’ (including entity and location names), which can often pass unchanged through the encoder-decoder filters (notable example is the surname ‘Elabdellaoui’, broken into ‘E@@ lab@@ d@@ ell@@ a@@ oui’ by the pair encoding) and nouns, which, usually are compounds.
NMT Challenge 5: Mismatched Domains
According to Luong & Manning, a crucial step in developing specific use-case targeted machine translation is accurate domain adaptation, requiring high levels of general and in-domain data trainings.
At the same time, Koehn & Knowles state that “while the in-domain NMT and SMT systems are similar (NMT is better for IT and Subtitles, SMT is better for Law, Medical, and Religious domains: Koran), the out-of-domain performance for the NMT systems is worse in almost all cases, sometimes dramatically so.”
A relevant example is a sequence of NMT translations of the source German sentence “Schaue um dich herum (Look around you.)” across multiple domains corpora, resulting in often non-sensical and contextually inaccurate translation output, as can be observed below:
German Source: Schaue um dich herum.
English subtitle reference: Look around you.
Law Domain:
NMT: Sughum gravecorn.
SMT: In order to implement dich Schaue.
Medical Domain:
NMT: EMEA / MB / 049 / 01-EN-Final Work programme for 2002
SMT: Schaue by dich around.
IT Domain:
NMT: Switches to paused.
SMT: To Schaue by itself. \t \t
Religious Domain: Koran
NMT: Take heed of your own souls.
SMT: And you see.
The Bottom Line
Despite the skyrocketing technological progress of NMT and related AI driven Translation and Localization venues, Neural Machine Translation has yet to completely overcome its roadblocks to become “bulletproof”, the top 5 challenges we have mentioned being just ones of the many at hand.
Robust linguistic behaviours, consistent data training in and out-of-domain testing grounds, accurate alignment, and rare word match may yet be the promising solutions to attain the Neural Machine Translation optimization our cross-linguistic Translation and Localization future ever so longs for.
At AD VERBUM, we take great pride in our streamlined service portfolio, bringing the best of both the Human and AI-driven Translation worlds, as we are constantly striving to strengthen the translation digital synapses and guide you on your Translation and Localization journey through in-depth knowledge and expertise in a wide range of fields, harnessing state-of-the-art translation technology and processes.
Get in touch and discover our wealth of Translation and Localization services today, and let us be your voice for Global Success.