PDF Remediation

What is PDF Remediation?

It is "correcting" of old, and presumably inaccessible, PDFs, so that they are ADA-compliant and accessible. The goal is to make sure that individuals accessing the web site using a screen reader can obtain the same information and have a parallel user experience to that of a sighted user.

What is wrong with using PDFs?

There is nothing "wrong" with a PDF per se, as long as it was designed to be accessible— that is, it is properly tagged and formatted to allow screen readers to read the information in the PDF, and to allow navigation around longer documents, or between fields in fillable forms.

The problem is, most PDFs as created are not fully accessible. It is very easy to create a non-accessible PDF. It is much more labor-intensive to create one that is accessible.

As a worst-case example, a PDF created by scanning pages of a journal contains pages of data that are purely images— the text you see on the screen is a "picture" of the text, not actual text that can be selected and copied, or read aloud by a screen reader. (Note that PDFs created in this fashion can be made accessible (see option 5 below), but it is rarely done.)

Even in instances where a PDF is created directly from some content-creation software (e.g., Microsoft Word, Adobe InDesign, Apple Pages), unless specific steps were taken during the process of saving the document to PDF to make it accessible, there will be features missing (tags, tab order, indices). The text will still be readable by a screen reader, but the usability (especially on longer documents) will be decreased. Imagine reading a 20 page document where the only way to find out what is on page 18 is to read all 17 preceding pages in their entirety.

A properly formatted, accessible PDF will have tags (bookmark-like keywords that allow for quick searching and navigation within the document) and be properly semantically structured— that is, organized with meaningful headers and tagged in a hierarchical way that allows a screen reader to "skim" over the whole document by perusing section headers first before drilling down to a desired section for more information. All images in the document will have descriptive text that allows fully explains what information the image presents. Data presented in table format will be properly tagged so that a screen reader reading the data will present it in a useful fashion, and not just a random string of numbers or data. Long documents will have a properly formatted table of contents and/or an index.

What are the options?

When dealing with a pre-existing PDF, there are six possible methods of dealing with it:

1. Delete it

Many web sites are cluttered with old PDFs that were uploaded years ago, and are no longer relevant— e.g., announcements of past events, syllabi for courses no longer offered. If you examine a PDF and determine that it is no longer timely or relevant, simply delete it. Note however that maintaining a "local copy" on your desktop for your department's historical reference might be desirable.

2. Convert the entire PDF document into a web page

This is frequently the simplest option, and also the most desirable option. Quite often when we look at PDFs on older web sites, there is no reason that the information in the PDF could not be contained in a web page. This is not only more accessible to people using screen readers, it is also more user-friendly for people accessing the data on a phone or other mobile device; in short, it is optimal for everyone.

3. Convert the PDF to an image (JPG, PNG, GIF)

This option should rarely be used, and only when the PDF is graphically rich— that is, mostly imagery, not text. An image has limitations similar to a PDF— there needs to be a clear and complete text-only alternative text available for the image, so that users accessing the information via a screen reader can have the same information and experience as a sighted reader. If you are considering converting a PDF into an image, weigh whether it might be better to manually convert the PDF into an accessible PDF instead (see option 5 below).

4. Convert the PDF to an on-line Form

If the PDF under consideration is a form to be filled out, it is preferable to convert the PDF to an on-line form. Currently (2015), the University is using FormStack for on-line forms. If you have not previously used FormStack, you will need to first sign up for training through University Communications & Marketing (UCM).

5. Manually convert the existing PDF into an accessible PDF

Sometimes the PDF simply needs to be a PDF; e.g., the end user needs to be able to download and print out a copy, or print and sign a form. If you don't have the original source document, and the PDF is long enough that re-creating it (option 6) would be prohibitively labor-intensive, a PDF can be opened up in Adobe Acrobat Pro in order to edit it and convert it to an accessible document. This is not a task for the faint-of-heart; in fact, most often we send such PDF's to an outside service provider to have these documents remediated.

If the PDF was created by scanning a printed document, it can be processed using a function called OCR (Optical Character Recognition), whereby Acrobat attempts to "read" the image and then creates a text version of the image. Although OCR technology has improved drastically over recent years, PDFs so created still require extensive and careful proof-reading to make sure that the OCR process was accurate. A text image that is blurry or has a darker background will be especially susceptible to errors.

6. Re-create the PDF in accessible format

If you have access to the source document, such as the original MS Word file used to create the PDF in the first place, or if the document is short enough that you can re-type it in short order, you can use that file to create a more accessible version of the PDF. This is the topic of an entire separate document.