In this chapter…
…you will learn how to generate a report detailing the redaction and text of your document.
- About Redax box reports provides an overview of Redax box reports.
- About Text reports provides an overview of Text reports.
- Command-line options describes the command syntax and the options you can use on the command line to generate reports.
A Redax box report is a text file that catalogs each Redax box in the document. The report file is tab delimited, so you can import it into a spreadsheet or other software for easier analysis or processing.
You can generate a Redax box report any time in the redaction process as long as the document contains Redax markup. The report indicates the total number of Redax boxes and Full-page tags and contains the following information for each one:
- Page number
- Type of markup (Redax box or Full-page tag)
- Modification date and time
- Redaction category
- Exemption code
- Top, left, bottom, and right coordinates, if the markup is a Redax box
A Text report is the PDF reading-order text output from the PDF file. There are several options in generating Text reports to suit a wide variety of integration options, including support for Natural Language Processing (NLP) systems and backend XML or other text-based presentational or search subsystems.
You can generate a Text report any time in the redaction process—before or after redacting a document as long as the Redax boxes are still present in the redacted file.
This section describes the command-line option for generating each type of report.
$redaxserver [options] -<report_option> <report.txt> input1.pdf [input2.pdf ...]
Options — summary
|-sumfile||report.txt||write Redax box report to a file|
|-textfile||report.txt||exports plain text to a file|
|-textindex||report.txt||exports text with offsets to a file|
|-textxml||report.xml||exports text to an XML file|
-sumfile <report.txt> — write Redax box report to file
Generates a Redax box report and writes it to the text file you specify. You cannot use the -redact or -rmarkup option with the -sumfile option. See RedaxReport.txt for an example report using this option.
-textfile <report.txt> — exports plain text to file
Each line of PDF text is written to a line in the output file. Blank lines are written at the end of each page. Additionally, the file begins with the original PDF name, and each page begins with a page number. The first page is page 1. See textfile_sample.txt for an example report using this option.
-textindex <report.txt> — exports text with offsets to a file
Each line of PDF text is written to a line in the output file. Each line, word, and character offset is relative to the beginning of the page, with the least offset being zero. The first page is page 1. The word and character offsets apply to the first word on the line. See textindex_sample.txt for an example report using this option.
-textxml <report.xml> — exports text to an XML file
Similar to the textfile report, -textxml reports out document text in an XML format. See textxml_sample.xml for an example report using this option.
To create a redax box report, enter the following command:
$redaxserver -sumfile <report.txt> <input.pdf>
This command creates a report on the input.pdf file and saves it to the report.txt file.
Example: Create a Redax box report on sample_marked.pdf, located in the samples directory, and save the output to report_marked.txt.
>redaxserver -sumfile samples\report_marked.txt samples\sample_marked.p
$redaxserver -sumfile ./samples/report_marked.txt ./samples/sample_marked.pdf