A command-line tool written in Go that extracts a specific chapter from PDF books based on bookmark pattern matching and saves it as a separate PDF file.
- Extract a single chapter from PDF files using bookmark matching
- Regex pattern matching for flexible chapter selection
- Preserves PDF structure and formatting
- Verbose output mode for detailed logging
- Clean, modular architecture
- Go 1.24.2 or higher
git clone https://github.com/oueslati1990/Book-Chapter-Extractor.git
cd Book-Chapter-Extractor
go build -o pdf-chapter-extractor ./cmd/main.go./pdf-chapter-extractor -input <pdf-file> -pattern <regex-pattern> [options]| Flag | Short | Description | Default |
|---|---|---|---|
--input |
-i |
Input PDF file (required) | - |
--output |
-o |
Output directory for extracted chapter | Chapters |
--pattern |
-p |
Regex pattern to match chapter bookmark (required) | - |
--verbose |
-v |
Enable verbose output | false |
The pattern must match exactly one bookmark. If multiple bookmarks match the pattern, the tool will return an error listing all matches, and you'll need to provide a more specific pattern.
Extract a specific chapter by exact title:
./pdf-chapter-extractor -i book.pdf -p "Chapter 1: Introduction"Extract a chapter with verbose output:
./pdf-chapter-extractor -i book.pdf -p "Chapter 5" -o output_folder -vExtract using a more specific regex to match one chapter:
./pdf-chapter-extractor -i book.pdf -p "^Chapter 3:"Extract a chapter with special characters in title:
./pdf-chapter-extractor -i book.pdf -p "Chapter 2\\.1"- The tool reads the PDF file and extracts all bookmarks (table of contents)
- It flattens nested bookmarks to search through all levels
- Matches bookmarks against the provided regex pattern
- If exactly one bookmark matches, it extracts the page range
- Creates a new PDF file containing only those pages
- If multiple bookmarks match, it shows an error with all matching titles
Book-Chapter-Extractor/
├── cmd/
│ └── main.go # CLI entry point
├── internal/
│ ├── bookmark/
│ │ └── bookmark.go # Bookmark extraction and processing
│ └── extractor/
│ └── extractor.go # Chapter extraction logic
├── go.mod
├── go.sum
└── README.md
- pdfcpu - PDF processing library
The tool handles several error cases:
- Missing input file
- Invalid regex patterns
- PDF files without bookmarks
- Multiple bookmarks matching the pattern (requires more specific pattern)
- No bookmarks matching the pattern
Contributions are welcome! Please feel free to submit a Pull Request.
This project is open source and available under the MIT License.