PDF Conversion Drama: To OCR or Not to OCR (Act 3)
Let’s continue our discussion of PDF conversion in the translation industry and finding solutions for any of your conversion and formatting needs. This time, let’s take a closer look at what happens with post-conversion tables and images containing editable text:
Post-conversion tables:
Automatic PDF converters always distort tables. The more complicated the table, the more issues it may have, including:
Fixed row height:
An automatic PDF converter often sets a fixed height for rows when converting tables to the Microsoft Word format. Due to the expected text expansion upon translation, some of the text may no longer fit inside the table cells, partially overflowing and “disappearing”. Once again, you will need to adjust the cell size by hand.

Hidden table borders:
If you cannot view a table border, it does not mean it should not be there. Unfortunately, instead of using tables, a PDF converter may instead opt for tabs, paragraph breaks, and spaces to mirror the original text spacing and indentations. As the target text might be longer than the source text, the layout of the text may become corrupted (see below the missing border shown in green):

Inconsistent text alignment within cells:
The problem here is that the text alignment within any selected cell is inconsistent and set by the marker on the ruler. Thus, for example, the cells should be center-aligned (as shown in the illustration below), but instead they are aligned to the left with a ragged right edge. Even though you may not notice it in the original document, it may become evident if the size of the translated text increases. If you experience a similar issue, you will need, again, to adjust the formatting manually:

Images with annotations:
An automatic PDF converter tends to completely distort the flow and structure of annotations or captions, namely:
- Sentences can become broken and fragmented by paragraph breaks.
- Syntactically matching phrases from the fragmented sentences can be placed in different textboxes (whose autofitting option will not be enabled should the target language text be longer than the source text).
- The translated text may be formatted as 2–3 text columns.
- Multiple superfluous section and column breaks may be added.
Example #1

Example #2

The above suggestions and issues (such as mid-sentence breaks, excessive tags, and partially invisible post-translation sections) aim to give you some guidelines on how to handle PDF-based files.
To ensure clear and unambiguous segmentation and to reduce the workload after formatting and translation, I strongly recommend that, before translation, you take your time to properly prepare your project documents for the CAT tool, which will save you time, effort, and money.
Should you need technical support, professional expertise, or guidance on optimizing PDF documents for CAT use and reducing your post-formatting workload after translation, feel free to contact us at any time.