Gain in-depth knowledge and perspective on various aspects and topics of translation and translation industry
In this article, we cover best practices related to translating a PDF document with quality in mind in a most cost-effective way.
Translating a PDF document is a simple task if done right, however, if done wrong it can easily turn into an expensive, time-consuming headache for everyone involved in the process. Over the years we have met countless cases of companies and individuals facing additional unnecessary costs and issues related to the translation of PDF documents, which can be easily avoided with the right information.
In this article, we will cover guidelines on how to make the right choices when translating a PDF document, cover the most common mistakes, explain how to save costs and approach the whole PDF translation process with quality and consistency in mind for the final result.
Portable Document Format (PDF), according to its creator Adobe, "is a file format used to present and exchange documents reliably, independent of software, hardware, or operating system. PDF is now an open standard maintained by the International Organization for Standardization (ISO)".
PDF’s popularity can be credited to how universal it is. It can be applicable to any modern business, some even rely on it as their core format. But such universality comes at a cost which lays at the core of how PDF works.
When working with documents that involve a lot of text, images, multiple types of fonts and graphics you start facing an issue of ever-increasing source file size, which will eventually result in it becoming impractical to distribute such a file and various other issues related to compatibility.
At its core PDF is designed to address those issues by the following methods:
The source file is compressed with as little loss of quality as possible by means of multiple compression methods.
The source formatting of the document is kept intact.
The document is standardized, making it ideal for further distribution and usage in publishing software.
Supported across a wide range of applications and operating systems.
The result of this process is what makes PDF so popular – you receive a “flattened” file which is highly compatible and which at a first glance looks as good as the source while only being a fraction of its initial size. However, this is a one-way process which comes at a cost - once a document undergoes the PDF compression process it cannot be reversed back to its source and becomes non-editable.
The best way to translate a PDF is to work on the source file. PDF files always represent a finalized version of a document. For this reason, the best way to approach translation involving a PDF file is to have its source at the ready.
Source files can originate from a wide range of software applications, as pretty much anything nowadays allows you to save a source document in a PDF format. Some of the most common programs that create PDFs are Adobe Software Suite / Microsoft Office applications / Open Office / Libre Office etc.
Quality – The overall quality of the translated file will be as close as possible to its original document if the source is available.
Consistency – This becomes even more relevant if the goal is to scale up and translate into multiple language pairs. An example: often text presented in one language, once translated into a different one will take up additional space, which results in the need to edit the original file using the software in order to make everything fit nicely with the layout. If the source file is unavailable, in most cases when working with complex documents, it will be nearly impossible to achieve the look and feel of the original document.
Fonts – These are key elements of any document. In essence, a font is a displayable or printable text character with a specific unique style and size. And there are hundreds of thousands of them. Some are free, some are unique, some are custom made for the specific purpose, some will work with one language only, and they all share one thing in common – translation agencies cannot have them all. The source file, on the other hand, will have them.
Images – Any document can have images of various sizes, which after undergoing PDF compression can lose up to 95% of their initial “digital” size. PDF mainly converts images using two compression methods – JPEG and FLATE. The first aims to reduce space, while the second aims to preserve quality to some degree. Such compression comes at a cost – once images have been compressed and shrunk, they cannot be scaled back. Depending on the type of image, attempts to resize it back can and most likely will result in a pixelated or blurred image, which makes it impossible to achieve quality comparable to the original non-translated file. A source document in Adobe InDesign format (.indd), for example, keeps the source images intact, allowing them to be placed and translated into a new document without any loss of quality.
Compatibility – Translation agencies use a wide variety of tools such as Computer-Assisted Translation (CAT) tools, Translation Memories (TM) and many other types of software that increase the speed, quality and consistency of translation while reducing costs for their clients. In cases where the source files aren’t available, PDFs have to be converted to transform them into a format supported by these tools.
Cost – Source files make your translation project cheaper right away, as mentioned above. Many other steps involved in the Desktop Publishing (DTP) file preparation process can be skipped right away.
Time – Time is of the essence, and having the source files will save it. Each additional editing step takes time as the project passes from one specialist/department to another. Add in potential time zone differences and you risk having a project which would normally take a couple of days, instead, will turn into a couple of weeks.
We fully understand that sometimes due to various reasons the source file is not available. Perhaps the company you work in is very big and communication between departments to track down the source file is difficult? Or perhaps you outsourced the design of your documents to a marketing agency and they are hesitant to share the source files with you (this is actually quite a common factor; our recommendation is to always ask them for the source when they deliver the project to you).
Such scenarios happen all the time when the PDF source is not available. This makes it necessary to use multiple other methods of extracting the data from PDFs and converting them into a more editable format. The usage and application of those methods and their outcome highly depend on the quality of the PDF itself – is the PDF marketing material, simple text, a technical spreadsheet, or a simple scanned document? Each one of these PDF types needs a special approach and method of conversion.
Conversion tools – PDF conversion tools vary in complexity, depth, the algorithms they utilize, and the outcome of their conversion. However, all of them serve one purpose – to convert a PDF from a non-editable format into an editable one. In many cases to achieve the desired outcome, the whole process has to undergo several conversion stages with the application of multiple software tools.
OCR – When text in a document is presented in a non-standard way – at an angle, within an image, has some graphical effect to it, is part of a scanned document – it becomes unselectable. In those cases, OCR (Optical Character Recognition) has to be used. OCR is a type of software and hardware tool designed to recognize written or digital text. OCR scans light and dark areas of a document for recognizable text/number patterns and compares them with its database. Such a method is highly dependent on the quality of the file being worked on.
Remaking by hand – Finally, when nothing else works, it all falls down to remaking it by hand. But have you ever tried mimicking someone else’s handwriting? If the answer is yes, most certainly you know that your version, even if it looks similar, will vary to some degree from the original no matter how good you are. This is exactly the same case with this method, as it very much depends on skill, is extremely time-consuming and a costly process, the results of which will produce at best a nearly identical copy.
PDFs are here to stay, and issues related to translating them will always follow. That is why we recommend you follow a simple procedure of keeping a document’s source file just in case. You never know when a need to translate it may arise, and when it does you will be ready.
We have stated multiple benefits such as lowering your costs, improving quality and not wasting time – simply by keeping a source file of a document you wish to translate. Meanwhile, easily avoiding all the issues which are related to not having one at hand.
But when obtaining the source file is impossible, there are always other conversion options available to translate a document in a manner as close as possible to its original.
Translation service quality certified company.
Quality management system certified company.
Information security standard certified company.
+371 6 7229 430
©2002-2023, AD VERBUM Ltd.