FileFolio

FileFolio helps privacy-conscious professionals keep large PDF collections searchable and organized using local AI. No cloud, no telemetry, all on your machine.

Status: Actively maintained, used on my own 1,000+ PDF collection. Expect breaking changes before v1.0, but I'm responsive to issues and feedback.

Why FileFolio?

You have hundreds of PDF bills, reports, or research papers scattered in folders.
You care about privacy and do not want to upload them to cloud AI services.
You still want smart search, auto-tagging, and reasonable file names.

FileFolio watches a folder, uses a local LLM via Ollama to analyze each PDF, and keeps everything searchable in one interface.

FileFolio vs Paperless-ngx

Paperless-ngx is the most popular self-hosted alternative. Here's how they compare:

Feature	FileFolio	Paperless-ngx
AI tagging and naming	🟢 Local LLM via Ollama, zero config	🔴 ML classifier available, but requires manual training; no LLM-level understanding
Setup	🟢 Single Python process + SQLite	🔴 Docker Compose: web, worker, Redis, PostgreSQL
Resource footprint	🟢 Single process + Ollama	🔴 Multi-service, heavier
Multi-user	🔴 No	🟢 Yes
Feature scope	🔴 Focused: upload, search, tag, organize	🟢 Broader: email ingestion, custom fields, workflow automation
Best for	🟢 Personal libraries, privacy-first, low setup	🟢 Power users, teams, complex workflows

Features

Automatic organization – drop PDFs in a watched folder and Ollama names, tags, and categorizes them automatically
Privacy-first – every byte stays on your machine, no cloud calls, no telemetry
Fast retrieval – full-text search across content and metadata with thumbnail previews
Disaster-proof – back up and restore your entire library as a single ZIP

Prerequisites

Python 3.10+
Ollama installed locally
Poppler (for PDF processing)
- macOS: brew install poppler
- Ubuntu/Debian: apt-get install poppler-utils
- Windows: Download from poppler releases
Tesseract (for OCR on scanned documents)
- macOS: brew install tesseract
- Ubuntu/Debian: apt-get install tesseract-ocr
- Windows: Download from Tesseract releases

Quick start

Docker (recommended)

Clone the repository

git clone https://github.com/imkrishsub/filefolio.git
cd filefolio

Start Ollama (if not already running)

ollama serve

Start FileFolio

docker compose up

Open your browser Navigate to: http://localhost:8000

Ollama runs on your host machine; the container connects to it automatically via host.docker.internal. On Linux, set OLLAMA_HOST=http://172.17.0.1:11434 if host.docker.internal is not available.

Manual setup

Clone the repository

git clone https://github.com/imkrishsub/filefolio.git
cd filefolio

Create and activate virtual environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Start Ollama (in a separate terminal)

ollama serve

Run the application

python backend/main.py

Open your browser Navigate to: http://127.0.0.1:8000

Configuration

Custom port

Set a custom port using the PORT environment variable:

PORT=8080 python backend/main.py
# or with Docker:
PORT=8080 docker compose up

Custom Ollama URL

OLLAMA_HOST=http://192.168.1.10:11434 docker compose up

Testing

pytest

Full API and functionality coverage including unit tests, integration tests, and frontend tests.

Project structure

filefolio/
├── backend/
│   ├── main.py          # FastAPI server
│   └── sync_service.py  # Folder sync service
├── frontend/
│   ├── static/
│   │   ├── app.js       # Frontend JavaScript
│   │   ├── style.css    # Styles
│   │   └── i18n.json    # Translations
│   └── templates/
│       └── index.html   # Main interface
├── tests/               # Test suite
├── uploads/             # PDF storage (created on first run)
├── thumbnails/          # Document thumbnails (created on first run)
├── data/                # Database (created on first run)
├── setup.cfg            # Linting and tool configuration
├── pytest.ini           # Test configuration
└── requirements.txt

How it works

Upload - Drag and drop a PDF file into the web interface, or sync a local folder to automatically import new files
Extract - Text is extracted from the PDF (with OCR fallback for scanned documents)
Analyze - A local LLM analyzes the content to determine category, tags, and suggest a filename
Organize - The document is saved with metadata in a local SQLite database
Search - Find documents by content, category, tags, or filename

Tech stack

Backend: FastAPI (Python)
Frontend: Vanilla JavaScript
Database: SQLite
AI/LLM: Ollama
PDF Processing: PyPDF, pdf2image, pytesseract
Styling: Custom CSS

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
TASKS.md		TASKS.md
docker-compose.yml		docker-compose.yml
preview.png		preview.png
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run_tests.sh		run_tests.sh
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FileFolio

Why FileFolio?

FileFolio vs Paperless-ngx

Features

Prerequisites

Quick start

Docker (recommended)

Manual setup

Configuration

Custom port

Custom Ollama URL

Testing

Project structure

How it works

Tech stack

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FileFolio

Why FileFolio?

FileFolio vs Paperless-ngx

Features

Prerequisites

Quick start

Docker (recommended)

Manual setup

Configuration

Custom port

Custom Ollama URL

Testing

Project structure

How it works

Tech stack

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages