Book-Chapter-Extractor

A command-line tool written in Go that extracts a specific chapter from PDF books based on bookmark pattern matching and saves it as a separate PDF file.

Features

Extract a single chapter from PDF files using bookmark matching
Regex pattern matching for flexible chapter selection
Preserves PDF structure and formatting
Verbose output mode for detailed logging
Clean, modular architecture

Installation

Prerequisites

Go 1.24.2 or higher

Build from source

git clone https://github.com/oueslati1990/Book-Chapter-Extractor.git
cd Book-Chapter-Extractor
go build -o pdf-chapter-extractor ./cmd/main.go

Usage

./pdf-chapter-extractor -input <pdf-file> -pattern <regex-pattern> [options]

Options

Flag	Short	Description	Default
`--input`	`-i`	Input PDF file (required)	-
`--output`	`-o`	Output directory for extracted chapter	`Chapters`
`--pattern`	`-p`	Regex pattern to match chapter bookmark (required)	-
`--verbose`	`-v`	Enable verbose output	`false`

Important Note

The pattern must match exactly one bookmark. If multiple bookmarks match the pattern, the tool will return an error listing all matches, and you'll need to provide a more specific pattern.

Examples

Extract a specific chapter by exact title:

./pdf-chapter-extractor -i book.pdf -p "Chapter 1: Introduction"

Extract a chapter with verbose output:

./pdf-chapter-extractor -i book.pdf -p "Chapter 5" -o output_folder -v

Extract using a more specific regex to match one chapter:

./pdf-chapter-extractor -i book.pdf -p "^Chapter 3:"

Extract a chapter with special characters in title:

./pdf-chapter-extractor -i book.pdf -p "Chapter 2\\.1"

How it works

The tool reads the PDF file and extracts all bookmarks (table of contents)
It flattens nested bookmarks to search through all levels
Matches bookmarks against the provided regex pattern
If exactly one bookmark matches, it extracts the page range
Creates a new PDF file containing only those pages
If multiple bookmarks match, it shows an error with all matching titles

Project Structure

Book-Chapter-Extractor/
├── cmd/
│   └── main.go                 # CLI entry point
├── internal/
│   ├── bookmark/
│   │   └── bookmark.go         # Bookmark extraction and processing
│   └── extractor/
│       └── extractor.go        # Chapter extraction logic
├── go.mod
├── go.sum
└── README.md

Dependencies

pdfcpu - PDF processing library

Error Handling

The tool handles several error cases:

Missing input file
Invalid regex patterns
PDF files without bookmarks
Multiple bookmarks matching the pattern (requires more specific pattern)
No bookmarks matching the pattern

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is open source and available under the MIT License.

Author

oueslati1990

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book-Chapter-Extractor

Features

Installation

Prerequisites

Build from source

Usage

Options

Important Note

Examples

How it works

Project Structure

Dependencies

Error Handling

Contributing

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cmd		cmd
internal		internal
README.md		README.md
go.mod		go.mod
go.sum		go.sum

oueslati1990/Book-Chapter-Extractor

Folders and files

Latest commit

History

Repository files navigation

Book-Chapter-Extractor

Features

Installation

Prerequisites

Build from source

Usage

Options

Important Note

Examples

How it works

Project Structure

Dependencies

Error Handling

Contributing

License

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages