Skip to content

Tuguberk/GhostPrompt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GhostPrompt logo

GhostPrompt

GhostPrompt is an 8-layer PDF prompt injection scanner that detects hidden instructions and malicious content before documents are used in AI workflows.

python3 scan.py suspicious.pdf

  ✓  Layer 1 — Dangerous PDF Features: CLEAN
  ✓  Layer 2 — Invisible Text: CLEAN
  ⚠️  Layer 3 — Injection Patterns: ISSUES FOUND
  ✓  Layer 4 — Encoding/Obfuscation: CLEAN
  ⚠️  Layer 5 — Stream Content: ISSUES FOUND
  ✓  Layer 6 — Zero-Width Characters: CLEAN
  ✓  Layer 7 — Annotation/ObjStm: CLEAN
  ✓  Layer 8 — XMP/Incremental Updates/Semantic: CLEAN

  🔴  VERDICT: DANGEROUS

Why GhostPrompt

When LLM systems ingest PDFs, attackers can hide prompt instructions inside visual or structural PDF layers. GhostPrompt scans those layers before ingestion.

Threat Coverage

Layer Attack Vector
1 JavaScript, Launch actions, OpenAction, embedded files, form submissions
2 White text, tiny fonts, invisible render mode (Tr=3)
3 Prompt override phrases, role hijacking, jailbreak patterns
4 Base64 payloads, homoglyph obfuscation, acrostic encoding
5 Injection terms in compressed streams, structural anomalies
6 Zero-width and invisible Unicode characters (steganographic encoding)
7 Annotation field payloads (/Contents, /T, /Subj), ObjStm streams
8 XMP metadata injection, incremental update overlays, semantic camouflage

Installation

git clone https://github.com/tuguberk/GhostPrompt.git
cd GhostPrompt
pip install -r requirements.txt
pip install pikepdf --break-system-packages  # optional, enables Layer 7 ObjStm scan

Claude Skill (Recommended)

If your primary workflow is document analysis inside Claude, use GhostPrompt as a Skill first.

1. Install the skill file

  1. Open Claude.
  2. Go to Settings -> Skills.
  3. Choose Install from file.
  4. Select SKILL.md.

2. Run analysis on an uploaded PDF

After uploading a PDF, use prompts like:

  • Scan this PDF for prompt injection attacks.
  • Check this PDF for hidden instructions targeting AI systems.
  • Analyze this PDF for invisible text or stream-level injection.

3. Understand the result

GhostPrompt returns layer-by-layer findings and a final verdict:

  • SAFE: no strong indicators across layers.
  • SUSPICIOUS: weak or isolated indicators, manual review recommended.
  • DANGEROUS: high-confidence indicators, do not pass this PDF into AI context.

4. Best practice in Claude workflows

  • Run GhostPrompt before sending document content to other prompts.
  • If verdict is SUSPICIOUS or DANGEROUS, quarantine the file and review findings.
  • Re-scan after document edits, OCR, or format conversion.

The skill is self-contained and includes attack-pattern and false-positive guidance directly inside SKILL.md.

Python CLI Usage

# Scan a PDF
python3 scan.py path/to/document.pdf

# Use in shell automation
python3 scan.py document.pdf && echo "safe to process" || echo "do not use"

Generate Test PDFs

python3 generate_test_pdfs.py
python3 scan.py examples/clean_sample.pdf
python3 scan.py examples/injected_sample.pdf

Expected:

  • examples/clean_sample.pdf -> SAFE
  • examples/injected_sample.pdf -> DANGEROUS

Repository Structure

GhostPrompt/
├── SKILL.md
├── scan.py
├── generate_test_pdfs.py
├── assets/
│   └── ghostprompt-logo.svg
├── requirements.txt
└── README.md

Limitations

  • Scanned-image PDFs without extractable text reduce detection quality for text-based layers.
  • Semantic camouflage can still require human review.
  • GhostPrompt reports risk signals; it does not sanitize PDFs.

License

MIT

About

GhostPrompt is a 5-layer PDF prompt injection scanner that detects hidden instructions and malicious content before documents are used in AI workflows.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages