Performing Custom Splits

Introduction

In the previous chapter, you learned how to split documents into single-page files. By adding one or more options, you can also split documents by:

This chapter explains how to perform these custom splits.

Input/Output Considerations

Some custom split options accept one or more input files; others accept only one. The number of input files that you include on a command (one or many) determines the proper output file specification (-o option):

  • One input file — In most cases, you may specify either a file name or a directory name following the -o option. (Exceptions are noted, where applicable.) When you use a file name, the output file names are based on that name. When you use a directory name, the output file names are based on the input file name.
  • Multiple input files — You must specify a directory name following the -o option. The output file names are based on the input file names.

Note: If you are splitting all of the PDF files in a directory, you may use the *.pdf wildcard shortcut in the input specification.

When you specify a directory name for the output, you must include the ending slash: a back slash in Windows and a forward slash in UNIX and Macintosh operating systems. If you do not include the slash, the directory name will be interpreted as a file name.

Windows

C:\Appligent\APSplit\output\

UNIX/Macintosh

/Appligent/APSplit/output/

If the directory that you specify does not already exist, it will be created for you.

Splitting by First Level Bookmarks

When you split a document by bookmarks, each first-level bookmark section is extracted to a new PDF file. If a bookmark starts in the middle of a page, the file will contain that entire page, including text before the bookmark.

To perform a bookmark split, use the -bybookmarks option with the following input and output specifications:

  • Input — One or more PDF files.
  • Output (-o option) — A PDF file name, or a new or existing directory name.

Command

$ apsplit -bybookmarks -o outPDFFileorDir [other options] inPDFFile [inPDFFile2...]

Example 1. Splitting one document to file name

Windows

> apsplit -bybookmarks -o C:\Appligent\APSplit\output\bookmarks.pdf C:\Appligent\APSplit\samples\SplitSample.pdf

UNIX/Macintosh

$ ./apsplit -bybookmarks -o /Appligent/APSplit/output/bookmarks.pdf /Appligent/APSplit/samples/SplitSample.pdf

Result

APSplit creates one file for each first-level bookmark. The files are named bookmarks.nnnnnn.pdf, where nnnnnn is the number of the first page in the bookmarked section.  Assuming that SplitSample.pdf is a 13-page document with three bookmarked sections, the first beginning on page 1, the second beginning on page 4, and the third beginning on page 7, you would receive the following output:

Output File Name Pages Split
bookmarks.000001.pdf 1-3
bookmarks.000004.pdf 4-6
bookmarks.000007.pdf 7-13

Example 2. Splitting one document to directory name

Windows

> apsplit -bybookmarks -o C:\Appligent\APSplit\output\ C:\Appligent\APSplit\samples\SplitSample.pdf

UNIX/Macintosh

$ ./apsplit -bybookmarks -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample.pdf

Result

APSplit creates one file for each first-level bookmark. The files are named SplitSample.nnnnnn.pdf, where nnnnnn is the number of the first page in the bookmarked section. Assuming that SplitSample.pdf is a 13-page document with three bookmarked sections, the first beginning on page 1, the second beginning on page 4, and the third beginning on page 7.

Output File Name Pages Split
SplitSample.000001.pdf 1-3
SplitSample.000004.pdf 4-6
SplitSample.000007.pdf 7-13

Example 3. Splitting multiple documents

Windows

> apsplit -bybookmarks -o C:\Appligent\APSplit\output\ C:\Appligent\APSplit\samples\SplitSample1.pdf C:\Appligent\APSplit\samples\SplitSample2.pdf C:\Appligent\APSplit\samples\SplitSample3.pdf

UNIX/Macintosh

$ ./apsplit -bybookmarks -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample1.pdf /Appligent/APSplit/samples/SplitSample2.pdf /Appligent/APSplit/samples/SplitSample3.pdf

Result

APSplit creates one file for each first-level bookmark in each input file. The files are named SplitSample1.nnnnnn.pdf, SplitSample2.nnnnnn.pdf, and SplitSample3.nnnnnn.pdf, where nnnnnn is the number of the first page in the bookmarked section.

Splitting by Odd or Even Pages

To extract the odd pages from a document, use the -odd option. To extract the even pages, use the -even option. The following input and output specifications apply to these options:

  • Input — One PDF file.
  • Output (-o option) — The output file name.

Note: The -odd and -even options do not support multiple input files or the use of a directory name for the output specification.

Command

$ apsplit -odd -o outPDFFile [other options] inPDFFile

Example 1. Splitting by odd pages

Windows

> apsplit -odd -o C:\Appligent\APSplit\output\OddPages.pdf C:\Appligent\APSplit\samples\SplitSample.pdf

UNIX/Macintosh

$ ./apsplit -odd -o /Appligent/APSplit/output/OddPages.pdf /Appligent/APSplit/samples/SplitSample.pdf

Result

All of the odd pages in SplitSample.pdf are extracted to OddPages.pdf. If SplitSample.pdf had 13 pages, OddPages.pdf would contain pages 1, 3, 5, 7, 9, 11, and 13 from SplitSample.pdf.

Command

$ apsplit -even -o outPDFFile [other options] inPDFFile

Example 2. Splitting by even pages

Windows

> apsplit -even -o C:\Appligent\APSplit\output\EvenPages.pdf C:\Appligent\APSplit\samples\SplitSample.pdf

UNIX/Macintosh

$ ./apsplit -even -o /Appligent/APSplit/output/EvenPages.pdf /Appligent/APSplit/samples/SplitSample.pdf

Result

All of the even pages in SplitSample.pdf are extracted to EvenPages.pdf. If SplitSample.pdf had 13 pages, EvenPages.pdf would contain pages 2, 4, 6, 8, 10, and 12 from SplitSample.pdf.

Splitting by Page Range

You can split one or more documents by extracting:

  • A single page range
  • Consecutive page ranges of the same length
  • Nonconsecutive, variable-length page ranges

The following sections explain how.

Extracting a single page range

To extract a range of pages from a document, use the -startpage and -endpage options with the following input and output specifications:

  • Input — One or more PDF files.
  • Output (-o option) — A file name, or a new or existing directory name.

Command

$ apsplit -startpage <int> -endpage <int> -o outPDFFileorDir [other options] inPDFFile [inPDFFile2...]

Example 1. Splitting one document to file name

Windows

> apsplit -startpage 2 -endpage 5 -o C:\Appligent\APSplit\output\PageRange.pdf C:\Appligent\APSplit\samples\SplitSample.pdf

UNIX/Macintosh

$ ./apsplit -startpage 2 -endpage 5 -o /Appligent/APSplit/output/PageRange.pdf /Appligent/APSplit/samples/SplitSample.pdf

Result

APSplit creates a file named PageRange.pdf. This file contains pages 2-5 from the original SplitSample.pdf document.

Example 2. Splitting one document to directory name

Windows

> apsplit -startpage 2 -endpage 5 -o C:\Appligent\APSplit\output\ C:\Appligent\APSplit\samples\SplitSample.pdf

UNIX/Macintosh

$ ./apsplit -startpage 2 -endpage 5 -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample.pdf

Result

APSplit creates a file named SplitSample.000002.pdf. The number 000002 indicates the start of the page range from the input file.

Example 3. Splitting multiple documents

Windows

> apsplit -startpage 2 -endpage 5 -o C:\Appligent\APSplit\output\ C:\Appligent\APSplit\samples\SplitSample1.pdf C:\Appligent\APSplit\samples\SplitSample2.pdf

UNIX/Macintosh

$ ./apsplit -startpage 2 -endpage 5 -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample1.pdf /Appligent/APSplit/samples/SplitSample2.pdf

Result

APSplit creates two files: SplitSample1.000002.pdf and SplitSample2.000002.pdf.

Extracting consecutive page ranges of the same length

You can extract consecutive page ranges of the same length by repeating a single page range split. Use the -repeat option, along with the -startpage and -endpage options. The -repeat option indicates the number of times to perform the split. The -startpage and -endpage options define the number of pages to split each time, as well as the first and last page of the first split.

The following input and output specifications apply to consecutive page range splits:

  • Input — One or more PDF files.
  • Output (-o option) — A new or existing directory name.

Note: The -repeat option does not support the use of a file name for the output specification.

Command

$ apsplit -startpage <int> -endpage <int> -repeat n -o outPDFDir [other options] inPDFFile [inPDFFile2...]

Example 1. Splitting one document

Windows

> apsplit -startpage 2 -endpage 5 -repeat 3 -o C:\Appligent\APSplit\output\C:\Appligent\APSplit\samples\SplitSample.pdf

UNIX/Macintosh

$ ./apsplit -startpage 2 -endpage 5 -repeat 3 -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample.pdf

Result

SplitSample.pdf is split three times. Pages 2-5 are extracted first, followed by the second range of four pages, and then the third range of four pages. The following table describes the details of the split.

Pages Split Output File Name
2-5 SplitSample.000002.pdf
6-9 SplitSample.000006.pdf
10-13 SplitSample.000010.pdf

If SplitSample.pdf had fewer than 13 pages, APSplit would go as far as it could and then end without error. It might not create all three files, and the last file might contain fewer than four pages.

Example 2. Splitting multiple documents

Windows

> apsplit -startpage 2 -endpage 5 -repeat 3 -o C:\Appligent\APSplit\output\ C:\Appligent\APSplit\samples\SplitSample1.pdf C:\Appligent\APSplit\samples\SplitSample2.pdf

UNIX/Macintosh

$ ./apsplit -startpage 2 -endpage 5 -repeat 3 -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample1.pdf /Appligent/APSplit/samples/SplitSample2.pdf

Result

Each input document is split three times, as shown in the following table.

Input File Pages Split Output File Name
SplitSample1.pdf 2-5 SplitSample1.000002.pdf
6-9 SplitSample1.000006.pdf
10-13 SplitSample1.000010.pdf
SplitSample2.pdf 2-5 SplitSample2.000002.pdf
6-9 SplitSample2.000006.pdf
10-13 SplitSample2.000010.pdf

Extracting nonconsecutive, variable-length page ranges

If you want to extract multiple page ranges from a document, but they do not contain the same number of pages or are not consecutive, use the -list option.

First create a text file that specifies the page ranges you want to extract and output files you want to save them to. Each line in the file must contain a start page, end page, and output file name in comma-separated format. The last valid line must end with a carriage return. The following line may contain a carriage return but no other characters.

The following sample file, list.txt, illustrates the format:

1,4,pathname\Section.1.pdf
6,10,pathname\Section.2.pdf
12,13,pathname\Section.3.pdf

After you create the text file, reference it in your command using the -list option.

Note: The -list option does not support multiple input files or the -o option.

Command

Example 1. Splitting a document using a list file

$ apsplit -list textFile [other options] inPDFFile

Windows

> apsplit -list C:\Appligent\APSplit\samples\list.txt C:\Appligent\APSplit\samples\SplitSample.pdf

UNIX/Macintosh

$ ./apsplit -list /Appligent/APSplit/samples/list.txt /Appligent/APSplit/samples/SplitSample.pdf

Result

These examples produce three files. Section.1.pdf contains pages 1-4 from SplitSample.pdf, Section.2.pdf contains pages 6-10, and Section.3.pdf contains pages 12-13.

Splitting by File Size

You can split a document based on the maximum output file size and maximum number of pages per output file. Three options are required:

  • -byfilesize indicates that the input document is to be split by file size.
  • -maxsize is the maximum output file size, in kilobytes (KB).
  • -maxcount is the maximum number of pages in each output file. APSplit creates output files that are as close to the maximum size as possible (-maxsize), but with no more than the maximum number of pages (-maxcount).

Note: If your document contains large pages, APSplit might produce some output files that are larger than the specified -maxsize. That is because APSplit does not break up pages into fractional components. The smallest number of pages that can be extracted to an output file is one.

The following input and output specifications apply to a split by file size:

  • Input — One or more PDF files.
  • Output (-o option) — A file name, or a new or existing directory name.

Command

$ apsplit -byfilesize -maxsize <int> -maxcount <int> -o outPDFFileorDir [other options] inPDFFile [inPDFFile2...]

Example 1. Splitting one document to file name

Windows

> apsplit -byfilesize -maxsize 300 -maxcount 15 -o C:\Appligent\APSplit\output\size.pdf C:\Appligent\APSplit\samples\SplitSample1.pdf

UNIX/Macintosh

$ ./apsplit -byfilesize -maxsize 300 -maxcount 15 -o /Appligent/APSplit/output/size.pdf /Appligent/APSplit/samples/SplitSample1.pdf

Result

The following table shows typical results that you might receive. Each output file name is a concatenation of the output file specification and the number of the first page in the extracted section. Note that some of the file sizes are much less than the maximum of 300 KB. That is because the maximum page count of 15 was reached before the maximum file size.

Output File Name Size (KB)
size.000001.pdf 280
size.000007.pdf 187
size.000022.pdf 245
size.000037.pdf 283
size.000052.pdf 217
size.000067.pdf 203

Example 2. Splitting one document to directory name

Windows

> apsplit -byfilesize -maxsize 300 -maxcount 15 -o C:\Appligent\APSplit\output\ C:\Appligent\APSplit\samples\SplitSample1.pdf

UNIX/Macintosh

$ ./apsplit -byfilesize -maxsize 300 -maxcount 15 -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample1.pdf

Result

The results are the same as in the previous example, except that the output file names are based on the input file name.

Output File Name Size (KB)
SplitSample1.000001.pdf 280
SplitSample1.000007.pdf 187
SplitSample1.000022.pdf 245
SplitSample1.000037.pdf 283
SplitSample1.000052.pdf 217
SplitSample1.000067.pdf 203

Example 3. Splitting multiple documents

Windows

> apsplit -byfilesize -maxsize 300 -maxcount 15 -o C:\Appligent\APSplit\output\ C:\Appligent\APSplit\samples\SplitSample1.pdf C:\Appligent\APSplit\samples\SplitSample2.pdf C:\Appligent\APSplit\samples\SplitSample3.pdf

UNIX/Macintosh

$ ./apsplit -byfilesize -maxsize 300 -maxcount 15 -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample1.pdf /Appligent/APSplit/samples/SplitSample2.pdf /Appligent/APSplit/samples/SplitSample3.pdf

Result

The following table represents typical results. Output file names are based on the corresponding input file names.

Input File Output File Name Size (KB)
SplitSample1.pdf SplitSample1.000001.pdf 280
SplitSample1.000007.pdf 187
SplitSample1.000022.pdf 245
SplitSample1.000037.pdf 283
SplitSample1.000052.pdf 217
SplitSample1.000067.pdf 203
SplitSample2.pdf SplitSample2.000001pdf 174
SplitSample2.000005.pdf 297
SplitSample3.pdf SplitSample3.000001.pdf 226
SplitSample3.000008.pdf 268