Pdf utilities for text extraction in digital and convert scanned pdf into canvas.
-
Updated
Mar 8, 2026 - TypeScript
Pdf utilities for text extraction in digital and convert scanned pdf into canvas.
A self-hosted PDF OCR API that converts scanned documents to markdown. Powered by PaddleOCR-VL, runs on GPU via Docker.
Outil OCR permettant d’extraire et de structurer du texte à partir d’images et de PDF scannés (export en .docx et .txt) — prise en charge du français et de l’anglais
Lightweight bash script to convert scanned PDFs into searchable, copyable PDFs using Tesseract OCR with parallel processing.
Add a description, image, and links to the scanned-pdf topic page so that developers can more easily learn about it.
To associate your repository with the scanned-pdf topic, visit your repo's landing page and select "manage topics."