GhostPrompt is an 8-layer PDF prompt injection scanner that detects hidden instructions and malicious content before documents are used in AI workflows.
python3 scan.py suspicious.pdf
✓ Layer 1 — Dangerous PDF Features: CLEAN
✓ Layer 2 — Invisible Text: CLEAN
⚠️ Layer 3 — Injection Patterns: ISSUES FOUND
✓ Layer 4 — Encoding/Obfuscation: CLEAN
⚠️ Layer 5 — Stream Content: ISSUES FOUND
✓ Layer 6 — Zero-Width Characters: CLEAN
✓ Layer 7 — Annotation/ObjStm: CLEAN
✓ Layer 8 — XMP/Incremental Updates/Semantic: CLEAN
🔴 VERDICT: DANGEROUSWhen LLM systems ingest PDFs, attackers can hide prompt instructions inside visual or structural PDF layers. GhostPrompt scans those layers before ingestion.
| Layer | Attack Vector |
|---|---|
| 1 | JavaScript, Launch actions, OpenAction, embedded files, form submissions |
| 2 | White text, tiny fonts, invisible render mode (Tr=3) |
| 3 | Prompt override phrases, role hijacking, jailbreak patterns |
| 4 | Base64 payloads, homoglyph obfuscation, acrostic encoding |
| 5 | Injection terms in compressed streams, structural anomalies |
| 6 | Zero-width and invisible Unicode characters (steganographic encoding) |
| 7 | Annotation field payloads (/Contents, /T, /Subj), ObjStm streams |
| 8 | XMP metadata injection, incremental update overlays, semantic camouflage |
git clone https://github.com/tuguberk/GhostPrompt.git
cd GhostPrompt
pip install -r requirements.txt
pip install pikepdf --break-system-packages # optional, enables Layer 7 ObjStm scanIf your primary workflow is document analysis inside Claude, use GhostPrompt as a Skill first.
- Open Claude.
- Go to Settings -> Skills.
- Choose Install from file.
- Select SKILL.md.
After uploading a PDF, use prompts like:
- Scan this PDF for prompt injection attacks.
- Check this PDF for hidden instructions targeting AI systems.
- Analyze this PDF for invisible text or stream-level injection.
GhostPrompt returns layer-by-layer findings and a final verdict:
- SAFE: no strong indicators across layers.
- SUSPICIOUS: weak or isolated indicators, manual review recommended.
- DANGEROUS: high-confidence indicators, do not pass this PDF into AI context.
- Run GhostPrompt before sending document content to other prompts.
- If verdict is SUSPICIOUS or DANGEROUS, quarantine the file and review findings.
- Re-scan after document edits, OCR, or format conversion.
The skill is self-contained and includes attack-pattern and false-positive guidance directly inside SKILL.md.
# Scan a PDF
python3 scan.py path/to/document.pdf
# Use in shell automation
python3 scan.py document.pdf && echo "safe to process" || echo "do not use"python3 generate_test_pdfs.py
python3 scan.py examples/clean_sample.pdf
python3 scan.py examples/injected_sample.pdfExpected:
examples/clean_sample.pdf-> SAFEexamples/injected_sample.pdf-> DANGEROUS
GhostPrompt/
├── SKILL.md
├── scan.py
├── generate_test_pdfs.py
├── assets/
│ └── ghostprompt-logo.svg
├── requirements.txt
└── README.md
- Scanned-image PDFs without extractable text reduce detection quality for text-based layers.
- Semantic camouflage can still require human review.
- GhostPrompt reports risk signals; it does not sanitize PDFs.
MIT