Founder at OCR Craft
Over 15 Years of Experience in the Translation Industry

Here we’ll discuss the potential pros and cons associated with using automatic PDF converters for CAT- or MT-based translations.

You may experience some conversion-related problems even as late as at the post-translation stage—in fact, it could happen right before you may be scheduled to deliver the project. Although today’s translation and localization technologies (CATs) seem to be in step with the times, they are still incapable of solving some of the persistent issues that continue tormenting translators needing to process PDFs and other scanned documents:   

Broken Target Document Layout in autoconverted PDFs

After generating the target file, you may still run into formatting issues, which you will need to fix yourself or ask someone else for help. If you decide to try to deal with it yourself, well, you may be in for a tough time. As it happens, having to correct someone else’s work may be harder and longer than recreating one’s own thing from scratch! Besides, if you are not at least an “advanced” user of MS Word, you may as well give up now. Even if you get enough time to re-format the project document, it is quite possible that the client may later come back with complaints or ask you to redo the project altogether. Keep in mind that any shift, any cut or paste within the converted document may cause the entire structure to collapse. Let me explain why.

Here are some of the potential issues that you may experience when translating a file converted by an automatic PDF converter:

  • Section breaks added on each page:

To recreate the layout of the source PDF document, an automatic PDF converter may use a lot of section breaks with unique headers and footers. A section break at the end of the page, as you may know, prevents the text  from flowing freely from that page to the next one. As the number of characters in the translation will inevitably change—that is, most probably, increase—any overflowing lines will inevitably move to the next page, with no other lines added after the section break. In other words, many pages in the target file will end up having empty paragraphs after each added section break. Remember that a properly created, native Word document should not have any empty paragraphs. 

Now, removing any of the added section breaks from the translation text will affect the flow of the entire page, including the positions of its header and footer. This may also cause the page size and its orientation to change. Furthermore, removing section breaks may also affect the pages formatted as a text column: PDF converters often use more than one text column to recreate the original layout, instead of formatting it as a table. Note that any page formatted as a text column will also have a column break in addition to the section break on that page. Similarly to other pages in the translated file, any page formatted as text columns will end up having empty paragraphs added after the column break due to expansion during translation. 

To restore, or at least to approximate, the original text flow, you will need to either reduce the font size or to play around with the paragraph spacing settings on each page of the file. Needless to say, this will require some effort and time.

  • Frames used for text formatting:

Automatic PDF converters may use a special layout element called frame (which is different from a textbox) to mirror the positions of the original images and/or tables, and other text. If any of the source text ends up formatted as a frame, you may not be able to see the entire translation if it is longer than the source text. Should that happen, you will need to manually expand the border of the overflowing frame. While doing that, keep in mind that frames tend to overlap with each other.

  • Pictures or shapes as background images:

As great as a CAT program is for translating an editable file, it will be helpless with your scanned or non-editable document, especially if it contains both text and pictures/shapes, which, upon conversion, will be placed behind the text, as background images. Since, during translation, you will no longer be able to fix any formatting issues, the background images will retain their positions after translation—just as any other original formatting attributes. In other words, the expanded translated text will be placed within the same source layout in the document. This means that you will need to manually adjust the position and text-wrapping style for each image.

These are just some of the potential problems that you may experience when converting PDF files for translation. Any of them can cause additional problems or require extra manual input, which may put your project delivery at risk.

We will be happy to help you avoid all this by performing high-quality preparation of files of any complexity. We do this for many large global top 100 LSPs and small translation companies. Should I share information with you about how we work?