This file provides quick reference for the Python libraries used in pdf-tools skill.
Installation: pip3 install pdfplumber
Best for: Text extraction, table extraction, page layout analysis
Key features:
- Extract text with precise positioning
- Extract tables automatically
- Get page dimensions and metadata
- Access visual elements (lines, rectangles, curves)
Basic usage:
import pdfplumber
with pdfplumber.open('file.pdf') as pdf:
# Get total pages
total_pages = len(pdf.pages)
# Extract text from a page
page = pdf.pages[0]
text = page.extract_text()
# Extract tables
tables = page.extract_tables()
# Get page dimensions
width = page.width
height = page.heightInstallation: pip3 install PyPDF2
Best for: PDF manipulation (merge, split, rotate, watermark)
Key features:
- Merge multiple PDFs
- Split PDFs into separate files
- Rotate pages
- Add watermarks
- Encrypt/decrypt PDFs
Basic usage:
from PyPDF2 import PdfReader, PdfWriter, PdfMerger
# Reading
reader = PdfReader('input.pdf')
pages = reader.pages
metadata = reader.metadata
# Writing
writer = PdfWriter()
writer.add_page(reader.pages[0])
with open('output.pdf', 'wb') as f:
writer.write(f)
# Merging
merger = PdfMerger()
merger.append('file1.pdf')
merger.append('file2.pdf')
merger.write('merged.pdf')
merger.close()
# Rotating
page = reader.pages[0]
page.rotate(90) # clockwise 90 degreesInstallation: pip3 install reportlab
Best for: Generating new PDFs programmatically
Basic usage:
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
c = canvas.Canvas("output.pdf", pagesize=letter)
c.drawString(100, 750, "Hello World")
c.save()import pdfplumber
with pdfplumber.open('document.pdf') as pdf:
for page_num in [0, 2, 4]: # 0-indexed
page = pdf.pages[page_num]
text = page.extract_text()
print(f"Page {page_num + 1}:\n{text}")from PyPDF2 import PdfMerger
merger = PdfMerger()
merger.append('file1.pdf', pages=(0, 3)) # First 3 pages
merger.append('file2.pdf') # All pages
merger.write('output.pdf')
merger.close()from PyPDF2 import PdfReader, PdfWriter
reader = PdfReader('input.pdf')
for i, page in enumerate(reader.pages):
writer = PdfWriter()
writer.add_page(page)
with open(f'page_{i+1}.pdf', 'wb') as f:
writer.write(f)from PyPDF2 import PdfReader, PdfWriter
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from io import BytesIO
# Create overlay
packet = BytesIO()
can = canvas.Canvas(packet, pagesize=letter)
can.setFont("Helvetica", 12)
can.drawString(100, 700, "New Text Here")
can.save()
# Merge with existing PDF
packet.seek(0)
overlay = PdfReader(packet)
reader = PdfReader('input.pdf')
writer = PdfWriter()
page = reader.pages[0]
page.merge_page(overlay.pages[0])
writer.add_page(page)
with open('output.pdf', 'wb') as f:
writer.write(f)# Note: PDF format is complex, this works for simple cases only
with open('input.pdf', 'rb') as f:
pdf_data = f.read()
# Replace at byte level
modified = pdf_data.replace(b'Old Text', b'New Text')
with open('output.pdf', 'wb') as f:
f.write(modified)Important: Direct text replacement in PDFs is unreliable due to PDF format complexity (text encoding, positioning, fonts). For reliable editing:
- Extract text → Edit → Regenerate PDF, or
- Use overlay method to add new text on top