Document version: 0.1 (planning)
Last updated: 2026-03-02
This plan is structured as incremental phases that each deliver a usable artifact. If priorities change, you can reshuffle phases, but try to keep the “render first, OCR later” principle intact.
Deliverables
- Repository structure (
crates/,docs/,adrs/) - CI pipeline (build + lint + tests)
- Basic logging enabled
- Packaging skeletons (Arch PKGBUILD stub + Flatpak manifest stub)
Definition of done
buildsucceeds on Arch in a clean environment ✅runlaunches an empty window without warnings ✅docs/renders in your preferred markdown viewer ✅
Core tasks
- Implement CLI parsing:
quickview <path>✅quickview --help✅
- Load and display image:
- decode to a texture ✅
- show in a viewer widget ✅
- Fit-to-window baseline ✅
- Keyboard shortcuts:
Esccloses window ✅+/-zoom (or Ctrl+scroll) ✅
- Basic UI shell with libadwaita (headerbar, etc.) ✅
Definition of done
- Opening a PNG/JPG renders correctly ✅
- No UI freezes during decode (decode is async or sufficiently fast) ✅
- Zoom and pan: Ctrl+scroll, pinch-to-zoom, +/- keys, middle-drag pan ✅
Core tasks
- Identify “image set” as all supported images in the same directory ✅
- Maintain a sorted list and current index ✅
- Add navigation:
- Left/Right arrows to prev/next ✅
- Add info panel:
- filename
- dimensions
- file size
Definition of done
- Prev/next navigation is correct and stable ✅
- Info updates immediately when switching images
Core tasks
- Add a
--quick-previewmode:- borderless ✅
- centered ✅
- dismiss on Space/Esc ✅
- Implement “always-on-top” behavior ✅
- If available, integrate Layer Shell (wlroots-friendly overlay):
- runtime detect if Layer Shell is supported ✅
- use overlay layer with appropriate keyboard focus policy ✅
Definition of done
quickview --quick-preview <image>shows a borderless preview and closes instantly on Space/Esc ✅- Works on at least one wlroots compositor ✅
Core tasks
- Add OCR backend abstraction (interface/trait) ✅
- Implement default Tesseract backend:
- run OCR asynchronously ✅
- produce word-level boxes + text ✅
- Add a non-blocking “OCR in progress” indicator ✅
- Ensure cancellation / ignoring late results when user navigates away ✅
Definition of done
- OCR starts after image display ✅
- OCR completion adds internal OCR result state (even before selection UI exists) ✅
- App stays responsive during OCR ✅
Core tasks
- Render OCR overlay (invisible by default or lightly highlighted on hover) ✅
- Implement drag-selection:
- compute selection rectangle in image coordinates ✅
- highlight matched words ✅
- Implement copy:
- Ctrl+C copies selected text ✅
- context menu action “Copy”
Definition of done
- User can reliably select and copy text from an image ✅
- Selection stays aligned under zoom/pan ✅
Core tasks
.desktopintegration:- default open handler for images
- optional action for quick preview mode
- Compositor keybind recipes (niri, Hyprland, Sway) documented
- Optional GNOME Sushi-compatible DBus interface (advanced / optional)
Definition of done
- App can be set as default image viewer in common file managers
- Quick Preview can be triggered by a keybind in at least one compositor
Core tasks
- Add cache (in-memory first)
- Add basic benchmarking hooks (decode + OCR timing)
- Improve OCR accuracy options:
- language selection
- selectable
tessdata_fastvstessdata_best
- Add guardrails:
- maximum image dimensions for OCR (downscale)
- memory usage limits (where feasible)
Definition of done
- Re-opening the same image is faster due to caching
- OCR is stable across a variety of real-world screenshots
- Persistent OCR cache (SQLite)
- Better layout/reading-order reconstruction
- Optional “search within OCR text”
- PDF support (if requested later)
- Image editing tools (crop/rotate)
- Multi-file drag/drop
- Plugin system