ISO 18587:2017: Standardizing Machine Translation Post-Editing

Author: Janis Locmelis

ISO 18587:2017 Standardizing Machine Translation Post-Editing

Machine Translation (MT) has been around for a while, but it is only in recent years that its use has spread significantly, due in large part to the advent of Neural Machine Translation (NMT). Although advancements in Artificial Intelligence (AI), MT, and Neural Machine Translation are constantly pushing the industry forward to achieve more and more natural-sounding Machine Translations, raw MT output remains far from perfection and requires human post-editing (PE) for high-quality results. But what about standardizing the quality of post-edited MT output? That’s where ISO 18587:2017 comes in.

Key Concepts of Machine Translation and Neural Machine Translation

Before we dive deeper, let’s look at the basic concepts of Translation, Machine Translation, and Neural Machine Translation. There are many definitions of translation out there. A one was proposed by Basil Hatim and Jeremy Munday (2004) in ‘Translation: An Advanced Resource Book’ where translation is defined as the process of transferring a written text from a source language to a target language.

A slight adjustment to this definition gives us an explanation for Machine Translation, which is the process of conversion of a text from a source language to a target language by a computer.

Machine Translation is automatic and a user who has access to a Machine Translation tool, such as Google Translate, has the ability to obtain a ‘raw’ machine-translated text instantly. ‘Raw’ is the term used for the output obtained from a Machine Translation tool before it is reviewed or revised by a human. And this brings us to Neural Machine Translation, the latest major technological development in the field of Machine Translation.

Neural Machine Translation (NMT) involves the use of neural networks to achieve translations closer to those a human translator would produce. Neural Machine Translation is similar to the way human translators work, meaning that both human translators and NMT engines learn and train themselves in the process.

The Rise of Machine Translation

Have you ever used Google Translate to translate an email or a word or text you don’t understand into your native language? Have you ever read a comment on Facebook automatically translated from a language you don’t understand? If the answer to either of those questions is yes, you have used MT. MT has become common in the everyday lives of Language Service Providers (LSPs), linguists, and even people who do not knowingly associate their work or hobbies with translation. A study conducted in Spain shows that almost half (47.3% to be exact) of Spanish LSPs use MT. This number has likely increased since 2016, when the study was performed.

One of the reasons why Machine Translation is experiencing such improvements is the contribution of tech giants like Microsoft and Google.

There are several types of MT, including Neural Machine Translation, Statistical Machine Translation (SMT), Rule-based Machine Translation (RBMT), and Phrase-based Machine Translation (PBMT). It is Neural Machine Translation that has shown the best results and helped push Machine Translation forward the most.

A study has shown that the use of PBMT leads to an 18% increase in translation productivity, while NMT provides a 36% boost and helps achieve a 42% reduction in the number of keystrokes needing to be made by linguists during post-editing. Read our blog post (What is Neural Machine Translation) to learn more about Neural Machine Translation and MT as such.

Sounds like Machine Translation

‘Sounds like Machine Translation’ is something that, not too long ago, you could hear people say about substandard translations. The days when this was the first thing that came to mind when you thought about MT are gone. Machine Translation – and Neural Machine Translation specifically – is quickly gaining ground and becoming the talk of the town among LSPs, industry professionals, and clients looking for translation services – and for good reason.

LSPs are embracing MT, as it helps them:
• Boost productivity
• Decrease turnaround times for their translation projects
• Reduce expenses compared to human translation or the traditional full TEP workflow
• Meet the increasing demand from clients for MT post-editing services

Standardization of Translation Quality

The idea behind standardization, and ISO standards in particular, is quite straight-forward. It is basically about coming up with the best way to do something, putting the best industry practices together in a single internationally recognized standards document, and awarding certificates of compliance to those that meet these standards. In addition, the certificate recipients are audited regularly to make sure that they continue to meet the requirements.

Human Translation Quality Standard ISO 17100:2015

The idea of achieving and agreeing on common standards for translation quality is not new. Until recently, 2017 to be exact, ISO 17100:2015 was the key quality standard for translation services, focusing on human translation and the core processes and requirements for the resources involved. This standard clearly states that it concerns human translation only and is in no way applicable to the quality of machine translations, whether raw or post-edited, or the quality of machine translation systems.

Machine Translation (MT) Post-Editing

The first point to make is that Machine Translation is highly dependent on the technology used, known in the industry as the Machine Translation Engine. The quality of the engine, in turn, depends on the technology driving it and the amount of parallel data, called parallel texts or parallel corpora, that serve as the data source for the MT engine.

It is undeniable that Machine Translation has seen significant growth in recent years. However, ‘raw’ MT output will need to be revised by a linguist in a process called post-editing if the client requesting the translation needs the translated text to be very accurate, fully consistent, showing an excellent flow of language and adherence to any existing translation assets, such as the client’s Translation Memories (TMs), glossaries (termbases).

Although it depends on the text type and the degree of creativity used in the source text, the raw MT output is likely to produce a text that flows well and might appear, especially at first glance, to be a very good representation of the source.

However, without any further revision, known as MT post-editing, it is possible that the finer nuances of meaning will be lost, the resulting text will contain some inaccuracies with abbreviations, acronyms, and placeholders, and the text will not be fully consistent with itself (especially for larger texts) and with existing translation assets, such as TMs or glossaries.

Scope of the ISO 18587:2017 Machine Translation (MT) Post-Editing Quality Standard

It is the rapid improvement of and advancements made in MT engines that have encouraged the International Standards Organization to focus on the second aspect, namely Machine Translation post-editing, rather than the MT systems as such.

ISO 18587:2017 describes the processes involved in Machine Translation post-editing by a linguist, known as a post-editor, and establishes the competencies required for post-editors. Post-editing is the process during which a linguist revises (reads and corrects) raw MT output.

Types of Machine Translation Post-Editing

ISO 18587:2017 singles out two distinct types of MT post-editing:

Full Machine Translation Post-Editing
Light Machine Translation Post-Editing

During full MT post-editing, the linguist ensures that the resulting translated text is comparable to the quality provided by a human translator. This approach provides the highest translation quality.

During light MT post-editing, the linguist reviews the MT output and makes light changes to produce a comprehensible text in the target language. In this scenario, the post-editor does not attempt to produce a text comparable to the quality provided by a human translator. This provides entry-level translation quality in a text that gives you the main idea of the source while being free from major issues.

Full or Light Machine Post Editing - Which one to use

The answer is: it depends.

Are you looking for a way to speed up the translation process, while retaining quality and achieving an end result comparable to that produced in the standard human translation process? If yes, you are looking for full MT post-editing.

Ask yourself the question: are you looking for a translation that will not be published for your customers and that will just give you an idea of what the text is about? If yes, you are probably looking for light MT post-editing.

Features of light and full Machine Translation (MT) Post-Editing

The post-editor's task is to review the MT output, checking it for accuracy, consistency, readability, and other parameters, and to correct any errors. It is important to note that the quality of raw MT output depends on several factors, including:

MT engine and the quality and size of the underlying bilingual corpora
Language combination
Text type
Style of language used in the source text

Post-editors working on light MT post-editing assignments will generally focus on the following 4 areas:

Making sure there is no added or omitted content in the target text as compared to the source
Correcting any major mistakes in the MT output
Rearranging sentences to clarify or bring out the correct meaning
Sticking to the raw MT output where possible and only correcting the most glaring issues

What this essentially means is that the end result will represent the main idea of the source text, with all of the source content present and reflected in the target text. However, the resulting text will likely be lacking in terms of the style of language, consistency of terminology, and the finer nuances of meaning.

Post-editors working on full MT post-editing assignments need to focus on a wide range of aspects, including:

Making sure all the information is there, meaning that the translated text is free from any omissions or additions compared to the source
Correcting any major or minor mistakes in the text
Adjusting sentences structures to ensure that the meaning is clear and correct
Ensuring that the text is correct in terms of the target language grammar, syntax, and semantics requirements
Using correct and consistent industry- and or client-specific terminology
Making sure the text is correct in terms of spelling and punctuation
Ensuring that the style of language is appropriate for the text type, target audience, and, if applicable, consistent with any specific style guides
Adjusting any placeholders and the formatting of the target text to match the source and the conventions of the target language
Revising the target text to ensure it complies with any additional client- or project-specific instructions or guidelines

In short, this list means that the post-editor is expected to do what he or she would normally do while translating or revising a human translation.

The Bottom Line

Machine Translation has experienced significant growth in the last few years, in part owing to the contributions of Google and Microsoft. There are several types of Machine Translation, but Neural Machine Translation is the most promising method delivering the best results.

Without any revision (post-editing), raw machine-translated texts are not perfect. Therefore, they need to be post-edited using either light or full MT post-editing. The ISO standard outlining the key principles of the quality of translations obtained through MT post-editing is ISO 18587:2017.

The choice between light and full MT post-editing depends on the expectations for the end result. Light post-editing is suitable if the expectation is to obtain a text that outlines the main idea of the source without major translation errors, while full post-editing is the way to go if you need a high-quality text that is fully correct, meaning that it reflects the source, it uses correct terminology and style of the target language, there are no linguistic or technical errors (e.g., relating to formatting and placeholders) and the text is consistent with existing translation resources.