Performing Custom Splits
Introduction
In the previous chapter, you learned how to split documents into single-page files. By adding one or more options, you can also split documents by:
- First level bookmarks — Split the PDF by the first level bookmark
- Odd or even numbered pages — Split file into odd and/or even pages
- Page range — Split file by page ranges
- File size — Split a document based on the maximum output file size and maximum number of pages per output file
This chapter explains how to perform these custom splits.
Input/Output Considerations
Some custom split options accept one or more input files; others accept only one. The number of input files that you include on a command (one or many) determines the proper output file specification (-o option):
- One input file — In most cases, you may specify either a file name or a directory name following the -o option. (Exceptions are noted, where applicable.) When you use a file name, the output file names are based on that name. When you use a directory name, the output file names are based on the input file name.
- Multiple input files — You must specify a directory name following the -o option. The output file names are based on the input file names.
Note: If you are splitting all of the PDF files in a directory, you may use the *.pdf wildcard shortcut in the input specification.
When you specify a directory name for the output, you must include the ending slash: a back slash in Windows and a forward slash in UNIX and Macintosh operating systems. If you do not include the slash, the directory name will be interpreted as a file name.
Windows
C:\Appligent\APSplit\output\
UNIX/Macintosh
/Appligent/APSplit/output/
If the directory that you specify does not already exist, it will be created for you.
Splitting by First Level Bookmarks
When you split a document by bookmarks, each first-level bookmark section is extracted to a new PDF file. If a bookmark starts in the middle of a page, the file will contain that entire page, including text before the bookmark.
To perform a bookmark split, use the -bybookmarks option with the following input and output specifications:
- Input — One or more PDF files.
- Output (-o option) — A PDF file name, or a new or existing directory name.
Command
$ apsplit -bybookmarks -o outPDFFileorDir [other options] inPDFFile [inPDFFile2...]
Example 1. Splitting one document to file name
Windows
> apsplit -bybookmarks -o C:\Appligent\APSplit\output\bookmarks.pdf C:\Appligent\APSplit\samples\SplitSample.pdf
UNIX/Macintosh
$ ./apsplit -bybookmarks -o /Appligent/APSplit/output/bookmarks.pdf /Appligent/APSplit/samples/SplitSample.pdf
Result
APSplit creates one file for each first-level bookmark. The files are named bookmarks.nnnnnn.pdf, where nnnnnn is the number of the first page in the bookmarked section. Assuming that SplitSample.pdf is a 13-page document with three bookmarked sections, the first beginning on page 1, the second beginning on page 4, and the third beginning on page 7, you would receive the following output:
Output File Name | Pages Split |
---|---|
bookmarks.000001.pdf | 1-3 |
bookmarks.000004.pdf | 4-6 |
bookmarks.000007.pdf | 7-13 |
Example 2. Splitting one document to directory name
Windows
> apsplit -bybookmarks -o C:\Appligent\APSplit\output\ C:\Appligent\APSplit\samples\SplitSample.pdf
UNIX/Macintosh
$ ./apsplit -bybookmarks -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample.pdf
Result
APSplit creates one file for each first-level bookmark. The files are named SplitSample.nnnnnn.pdf, where nnnnnn is the number of the first page in the bookmarked section. Assuming that SplitSample.pdf is a 13-page document with three bookmarked sections, the first beginning on page 1, the second beginning on page 4, and the third beginning on page 7.
Output File Name | Pages Split |
---|---|
SplitSample.000001.pdf | 1-3 |
SplitSample.000004.pdf | 4-6 |
SplitSample.000007.pdf | 7-13 |
Example 3. Splitting multiple documents
Windows
> apsplit -bybookmarks -o C:\Appligent\APSplit\output\ C:\Appligent\APSplit\samples\SplitSample1.pdf C:\Appligent\APSplit\samples\SplitSample2.pdf C:\Appligent\APSplit\samples\SplitSample3.pdf
UNIX/Macintosh
$ ./apsplit -bybookmarks -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample1.pdf /Appligent/APSplit/samples/SplitSample2.pdf /Appligent/APSplit/samples/SplitSample3.pdf
Result
APSplit creates one file for each first-level bookmark in each input file. The files are named SplitSample1.nnnnnn.pdf, SplitSample2.nnnnnn.pdf, and SplitSample3.nnnnnn.pdf, where nnnnnn is the number of the first page in the bookmarked section.
Splitting by Odd or Even Pages
To extract the odd pages from a document, use the -odd option. To extract the even pages, use the -even option. The following input and output specifications apply to these options:
- Input — One PDF file.
- Output (-o option) — The output file name.
Note: The -odd and -even options do not support multiple input files or the use of a directory name for the output specification.
Command
$ apsplit -odd -o outPDFFile [other options] inPDFFile
Example 1. Splitting by odd pages
Windows
> apsplit -odd -o C:\Appligent\APSplit\output\OddPages.pdf C:\Appligent\APSplit\samples\SplitSample.pdf
UNIX/Macintosh
$ ./apsplit -odd -o /Appligent/APSplit/output/OddPages.pdf /Appligent/APSplit/samples/SplitSample.pdf
Result
All of the odd pages in SplitSample.pdf are extracted to OddPages.pdf. If SplitSample.pdf had 13 pages, OddPages.pdf would contain pages 1, 3, 5, 7, 9, 11, and 13 from SplitSample.pdf.
Command
$ apsplit -even -o outPDFFile [other options] inPDFFile
Example 2. Splitting by even pages
Windows
> apsplit -even -o C:\Appligent\APSplit\output\EvenPages.pdf C:\Appligent\APSplit\samples\SplitSample.pdf
UNIX/Macintosh
$ ./apsplit -even -o /Appligent/APSplit/output/EvenPages.pdf /Appligent/APSplit/samples/SplitSample.pdf
Result
All of the even pages in SplitSample.pdf are extracted to EvenPages.pdf. If SplitSample.pdf had 13 pages, EvenPages.pdf would contain pages 2, 4, 6, 8, 10, and 12 from SplitSample.pdf.
Splitting by Page Range
You can split one or more documents by extracting:
- A single page range
- Consecutive page ranges of the same length
- Nonconsecutive, variable-length page ranges
The following sections explain how.
Extracting a single page range
To extract a range of pages from a document, use the -startpage and -endpage options with the following input and output specifications:
- Input — One or more PDF files.
- Output (-o option) — A file name, or a new or existing directory name.
Command
$ apsplit -startpage <int> -endpage <int> -o outPDFFileorDir [other options] inPDFFile [inPDFFile2...]
Example 1. Splitting one document to file name
Windows
> apsplit -startpage 2 -endpage 5 -o C:\Appligent\APSplit\output\PageRange.pdf C:\Appligent\APSplit\samples\SplitSample.pdf
UNIX/Macintosh
$ ./apsplit -startpage 2 -endpage 5 -o /Appligent/APSplit/output/PageRange.pdf /Appligent/APSplit/samples/SplitSample.pdf
Result
APSplit creates a file named PageRange.pdf. This file contains pages 2-5 from the original SplitSample.pdf document.
Example 2. Splitting one document to directory name
Windows
> apsplit -startpage 2 -endpage 5 -o C:\Appligent\APSplit\output\ C:\Appligent\APSplit\samples\SplitSample.pdf
UNIX/Macintosh
$ ./apsplit -startpage 2 -endpage 5 -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample.pdf
Result
APSplit creates a file named SplitSample.000002.pdf. The number 000002 indicates the start of the page range from the input file.
Example 3. Splitting multiple documents
Windows
> apsplit -startpage 2 -endpage 5 -o C:\Appligent\APSplit\output\ C:\Appligent\APSplit\samples\SplitSample1.pdf C:\Appligent\APSplit\samples\SplitSample2.pdf
UNIX/Macintosh
$ ./apsplit -startpage 2 -endpage 5 -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample1.pdf /Appligent/APSplit/samples/SplitSample2.pdf
Result
APSplit creates two files: SplitSample1.000002.pdf and SplitSample2.000002.pdf.
Extracting consecutive page ranges of the same length
You can extract consecutive page ranges of the same length by repeating a single page range split. Use the -repeat option, along with the -startpage and -endpage options. The -repeat option indicates the number of times to perform the split. The -startpage and -endpage options define the number of pages to split each time, as well as the first and last page of the first split.
The following input and output specifications apply to consecutive page range splits:
- Input — One or more PDF files.
- Output (-o option) — A new or existing directory name.
Note: The -repeat option does not support the use of a file name for the output specification.
Command
$ apsplit -startpage <int> -endpage <int> -repeat n -o outPDFDir [other options] inPDFFile [inPDFFile2...]
Example 1. Splitting one document
Windows
> apsplit -startpage 2 -endpage 5 -repeat 3 -o C:\Appligent\APSplit\output\C:\Appligent\APSplit\samples\SplitSample.pdf
UNIX/Macintosh
$ ./apsplit -startpage 2 -endpage 5 -repeat 3 -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample.pdf
Result
SplitSample.pdf is split three times. Pages 2-5 are extracted first, followed by the second range of four pages, and then the third range of four pages. The following table describes the details of the split.
Pages Split | Output File Name |
---|---|
2-5 | SplitSample.000002.pdf |
6-9 | SplitSample.000006.pdf |
10-13 | SplitSample.000010.pdf |
If SplitSample.pdf had fewer than 13 pages, APSplit would go as far as it could and then end without error. It might not create all three files, and the last file might contain fewer than four pages.
Example 2. Splitting multiple documents
Windows
> apsplit -startpage 2 -endpage 5 -repeat 3 -o C:\Appligent\APSplit\output\ C:\Appligent\APSplit\samples\SplitSample1.pdf C:\Appligent\APSplit\samples\SplitSample2.pdf
UNIX/Macintosh
$ ./apsplit -startpage 2 -endpage 5 -repeat 3 -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample1.pdf /Appligent/APSplit/samples/SplitSample2.pdf
Result
Each input document is split three times, as shown in the following table.
Input File | Pages Split | Output File Name |
---|---|---|
SplitSample1.pdf | 2-5 | SplitSample1.000002.pdf |
6-9 | SplitSample1.000006.pdf | |
10-13 | SplitSample1.000010.pdf | |
SplitSample2.pdf | 2-5 | SplitSample2.000002.pdf |
6-9 | SplitSample2.000006.pdf | |
10-13 | SplitSample2.000010.pdf |
Extracting nonconsecutive, variable-length page ranges
If you want to extract multiple page ranges from a document, but they do not contain the same number of pages or are not consecutive, use the -list option.
First create a text file that specifies the page ranges you want to extract and output files you want to save them to. Each line in the file must contain a start page, end page, and output file name in comma-separated format. The last valid line must end with a carriage return. The following line may contain a carriage return but no other characters.
The following sample file, list.txt, illustrates the format:
1,4,pathname\Section.1.pdf
6,10,pathname\Section.2.pdf
12,13,pathname\Section.3.pdf
After you create the text file, reference it in your command using the -list option.
Note: The -list option does not support multiple input files or the -o option.
Command
Example 1. Splitting a document using a list file
$ apsplit -list textFile [other options] inPDFFile
Windows
> apsplit -list C:\Appligent\APSplit\samples\list.txt C:\Appligent\APSplit\samples\SplitSample.pdf
UNIX/Macintosh
$ ./apsplit -list /Appligent/APSplit/samples/list.txt /Appligent/APSplit/samples/SplitSample.pdf
Result
These examples produce three files. Section.1.pdf contains pages 1-4 from SplitSample.pdf, Section.2.pdf contains pages 6-10, and Section.3.pdf contains pages 12-13.
Splitting by File Size
You can split a document based on the maximum output file size and maximum number of pages per output file. Three options are required:
- -byfilesize indicates that the input document is to be split by file size.
- -maxsize is the maximum output file size, in kilobytes (KB).
- -maxcount is the maximum number of pages in each output file. APSplit creates output files that are as close to the maximum size as possible (-maxsize), but with no more than the maximum number of pages (-maxcount).
Note: If your document contains large pages, APSplit might produce some output files that are larger than the specified -maxsize. That is because APSplit does not break up pages into fractional components. The smallest number of pages that can be extracted to an output file is one.
The following input and output specifications apply to a split by file size:
- Input — One or more PDF files.
- Output (-o option) — A file name, or a new or existing directory name.
Command
$ apsplit -byfilesize -maxsize <int> -maxcount <int> -o outPDFFileorDir [other options] inPDFFile [inPDFFile2...]
Example 1. Splitting one document to file name
Windows
> apsplit -byfilesize -maxsize 300 -maxcount 15 -o C:\Appligent\APSplit\output\size.pdf C:\Appligent\APSplit\samples\SplitSample1.pdf
UNIX/Macintosh
$ ./apsplit -byfilesize -maxsize 300 -maxcount 15 -o /Appligent/APSplit/output/size.pdf /Appligent/APSplit/samples/SplitSample1.pdf
Result
The following table shows typical results that you might receive. Each output file name is a concatenation of the output file specification and the number of the first page in the extracted section. Note that some of the file sizes are much less than the maximum of 300 KB. That is because the maximum page count of 15 was reached before the maximum file size.
Output File Name | Size (KB) |
---|---|
size.000001.pdf | 280 |
size.000007.pdf | 187 |
size.000022.pdf | 245 |
size.000037.pdf | 283 |
size.000052.pdf | 217 |
size.000067.pdf | 203 |
Example 2. Splitting one document to directory name
Windows
> apsplit -byfilesize -maxsize 300 -maxcount 15 -o C:\Appligent\APSplit\output\ C:\Appligent\APSplit\samples\SplitSample1.pdf
UNIX/Macintosh
$ ./apsplit -byfilesize -maxsize 300 -maxcount 15 -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample1.pdf
Result
The results are the same as in the previous example, except that the output file names are based on the input file name.
Output File Name | Size (KB) |
---|---|
SplitSample1.000001.pdf | 280 |
SplitSample1.000007.pdf | 187 |
SplitSample1.000022.pdf | 245 |
SplitSample1.000037.pdf | 283 |
SplitSample1.000052.pdf | 217 |
SplitSample1.000067.pdf | 203 |
Example 3. Splitting multiple documents
Windows
> apsplit -byfilesize -maxsize 300 -maxcount 15 -o C:\Appligent\APSplit\output\ C:\Appligent\APSplit\samples\SplitSample1.pdf C:\Appligent\APSplit\samples\SplitSample2.pdf C:\Appligent\APSplit\samples\SplitSample3.pdf
UNIX/Macintosh
$ ./apsplit -byfilesize -maxsize 300 -maxcount 15 -o /Appligent/APSplit/output/ /Appligent/APSplit/samples/SplitSample1.pdf /Appligent/APSplit/samples/SplitSample2.pdf /Appligent/APSplit/samples/SplitSample3.pdf
Result
The following table represents typical results. Output file names are based on the corresponding input file names.
Input File | Output File Name | Size (KB) |
---|---|---|
SplitSample1.pdf | SplitSample1.000001.pdf | 280 |
SplitSample1.000007.pdf | 187 | |
SplitSample1.000022.pdf | 245 | |
SplitSample1.000037.pdf | 283 | |
SplitSample1.000052.pdf | 217 | |
SplitSample1.000067.pdf | 203 | |
SplitSample2.pdf | SplitSample2.000001pdf | 174 |
SplitSample2.000005.pdf | 297 | |
SplitSample3.pdf | SplitSample3.000001.pdf | 226 |
SplitSample3.000008.pdf | 268 |