Evaluating Your Document

In this chapter

…you will learn how to determine whether a document has any problems that might affect the redaction process. The following sections describe the evaluation process:

Command-line options

This section describes the command-line options for evaluating documents for redaction.

Command syntax

$redaxserver -o <output> [options] input1.pdf [input2.pdf ...]

Options for evaluating documents — summary

Option Description
-ftext Find all text areas
-fimage Find all image areas
-fpath Find all path areas

Options for evaluating documents — details

-ftext — find all text areas

Finds all text areas in the document and marks them with Redax boxes. This option is used for debugging purposes and for finding potential problems with the text in your document before you begin the redaction process.

-fimage — find all image areas

Finds all bitmap images in the document and marks them with Redax boxes. This option is used for debugging purposes and for finding potential problems with bitmap images in your document before you begin the redaction process.

-fpath — find all path areas

Finds all path areas in the document and marks them with Redax boxes. This option is used for debugging purposes and for finding the paths that make up the vector images in your document before you begin the redaction process.

Determining what can be redacted

Before you begin to mark a PDF document for redaction, check for any potential problems Redax Enterprise Server might encounter. Documents that are scanned with OCR (Optical Character Recognition) software and then converted to PDF can contain an unpredictable mix of images and text that can result in these potential problems:

  • Hidden text — Sensitive text might be hidden behind an image.
  • Inline images — The OCR process may substitute inline images for characters it cannot decipher and Redax Enterprise Server does not recognize words that contain inline images.
  • Unredactable content — Some text or images might be unredactable.

The following command-line options can help you detect these problems:

  • -ftext finds and marks all text areas in the document, and can alert you to hidden text.
  • -fimage finds and marks all bitmap images, including inline character images in words.
  • -fpath finds and marks all paths in vector images. All images not detected by -fimage should be detected by -fpath.

Text and images that are not detected by any of these options are not redactable.

Finding all text

Note: If you subsequently decide to remove the Redax boxes inserted by this procedure, all Redax boxes in the file will be removed.

To find all redactable text in a document, enter the following command:

$redaxserver -ftext -o <output> <input.pdf>

Redax Enterprise Server searches for text in the input.pdf document, marks each occurrence that it finds, and saves the results to the specified output file or directory. The input.pdf file remains unchanged.

Example: Mark all text in the sample_base.pdf file, located in the samples directory, and save the output to find_all_text.pdf.

In Windows:

>redaxserver -ftext -o samples\find_all_text.pdf samples\sample_base.pdf

In UNIX:

$redaxserver -ftext -o ./samples/find_all_text.pdf ./samples/sample_base.pdf

A segment of the output for this example is shown below.

Areas marked up after running Find All Text Areas

After you run -ftext, examine the document for these potential problem areas:

  • Text not marked with Redax boxes — This area cannot be redacted. Run -fimage to find out whether any characters have been converted to inline character images. If so, you will be able to redact these images with Redax plug-in.
  • Redax box on an image — This indicates that text is hidden beneath the image. To see the text, apply the Adobe Select Text tool to the Redax box, and then copy the selection and paste it to a text file.

Finding all bitmap images

Note: If you subsequently decide to remove the Redax boxes inserted by this procedure, all Redax boxes in the file will be removed.

To find all redactable bitmap images in a document, enter the following command:

$redaxserver -fimage -o <output> <input.pdf>

Redax Enterprise Server searches for bitmap images in the input.pdf document, marks each occurrence that it finds, and saves the results to the specified output file or directory. The input.pdf file remains unchanged.

Example: Mark all bitmap images in the sample_base.pdf file, located in the samples directory, and save the output to find_all_images.pdf.

In Windows:

>redaxserver -fimage -o samples\find_all_images.pdf samples\sample_base.pdf

In UNIX:

$redaxserver -fimage -o ./samples/find_all_images.pdf ./samples/sample_base.pdf

A segment of the output for this example is displayed in the figure below.

Areas marked up after running Find All Image Areas

After you run -fimage, examine the document for text marked with Redax boxes.

Any marked text is actually an image of text. The word that contains the image cannot be redacted as text. It must be redacted as an image. Either use the -fimage option to mark up all bitmap images or use Redax plug-in to draw a Redax box around specific character images.

Note: Images not marked with Redax boxes are probably vector images. To make sure, run -fpath, as described in “Finding all paths in vector images” below.

Finding all paths in vector images

Note: If you subsequently decide to remove the Redax boxes inserted by this procedure, all Redax boxes in the file will be removed.

To find all redactable paths in vector images in a document, enter the following command:

$redaxserver -fpath -o <output> <input.pdf>

Redax Enterprise Server searches for paths in the input.pdf document, marks each occurrence that it finds, and saves the results to the specified output file or directory. The input.pdf file remains unchanged.

Example: Mark all paths in the sample_base.pdf file, located in the samples directory, and save the output to find_all_paths.pdf.

In Windows:

>redaxserver -fpath -o samples\find_all_paths.pdf samples\sample_base.pdf

In UNIX:

$redaxserver -fpath -o ./samples/find_all_paths.pdf ./samples/sample_base.pdf

A segment of the output for this example is displayed in the figure below.

Areas marked up after running Find All Path Areas

After you run -fpath, examine the document for any images not marked with -fimage or -fpath. These images are not redactable.