Generating Reports

In this chapter…

…you will learn how to generate a report detailing the redaction and text of your document.

About Redax box reports

A Redax box report is a text file that catalogs each Redax box in the document. The report file is tab delimited, so you can import it into a spreadsheet or other software for easier analysis or processing.

You can generate a Redax box report any time in the redaction process as long as the document contains Redax markup. The report indicates the total number of Redax boxes and Full-page tags and contains the following information for each one:

  • Page number
  • Type of markup (Redax box or Full-page tag)
  • Modification date and time
  • Redaction category
  • Exemption code
  • Author
  • Note
  • Top, left, bottom, and right coordinates, if the markup is a Redax box

About Text reports

A Text report is the PDF reading-order text output from the PDF file. There are several options in generating Text reports to suit a wide variety of integration options, including support for Natural Language Processing (NLP) systems and backend XML or other text-based presentational or search subsystems.

You can generate a Text report any time in the redaction process—before or after redacting a document as long as the Redax boxes are still present in the redacted file.

Command-line options

This section describes the command-line option for generating each type of report.

Command syntax

$redaxserver [options] -<report_option> <report.txt> input1.pdf [input2.pdf ...]

Options — summary

Option Parameter Description
-sumfile report.txt write Redax box report to a file
-textfile report.txt exports plain text to a file
-textindex report.txt exports text with offsets to a file
-textxml report.xml exports text to an XML file

Option — description

-sumfile <report.txt> — write Redax box report to file

Generates a Redax box report and writes it to the text file you specify. You cannot use the -redact or -rmarkup option with the -sumfile option. See RedaxReport.txt for an example report using this option.

Example of Redax Report

-textfile <report.txt> — exports plain text to file

Each line of PDF text is written to a line in the output file. Blank lines are written at the end of each page. Additionally, the file begins with the original PDF name, and each page begins with a page number. The first page is page 1. See textfile_sample.txt for an example report using this option.

Sample of output from textfile option

-textindex <report.txt> — exports text with offsets to a file

Each line of PDF text is written to a line in the output file. Each line, word, and character offset is relative to the beginning of the page, with the least offset being zero. The first page is page 1. The word and character offsets apply to the first word on the line. See textindex_sample.txt for an example report using this option.

Example of output from textindex option

-textxml <report.xml> — exports text to an XML file

Similar to the textfile report, -textxml reports out document text in an XML format. See textxml_sample.xml for an example report using this option.

Screen-shot of xml sample report.

Creating Reports

To create a redax box report, enter the following command:

$redaxserver -sumfile <report.txt> <input.pdf>

This command creates a report on the input.pdf file and saves it to the report.txt file.

Example: Create a Redax box report on sample_marked.pdf, located in the samples directory, and save the output to report_marked.txt.

In Windows:

>redaxserver -sumfile samples\report_marked.txt samples\sample_marked.p

In UNIX:

$redaxserver -sumfile ./samples/report_marked.txt ./samples/sample_marked.pdf