Skip to content

feat(filesystem): add PageIndex FileSystem and PIFS CLI#302

Open
BukeLy wants to merge 50 commits into
VectifyAI:mainfrom
BukeLy:feat/pageindex-filesystem
Open

feat(filesystem): add PageIndex FileSystem and PIFS CLI#302
BukeLy wants to merge 50 commits into
VectifyAI:mainfrom
BukeLy:feat/pageindex-filesystem

Conversation

@BukeLy
Copy link
Copy Markdown
Collaborator

@BukeLy BukeLy commented May 26, 2026

Summary

This PR adds the first PageIndex FileSystem (PIFS) MVP: a PageIndex-backed virtual filesystem, a shell-like pifs CLI, an agent chat/ask loop over a workspace, and an example demo that registers documents into a PIFS workspace and retrieves long-document evidence through PageIndex structure/page/node reads.

Why

PIFS gives agents a concrete filesystem-style interface for browsing, filtering, searching, and reading PageIndex documents. It keeps workspace catalog metadata, PageIndex extraction artifacts, semantic recall, and folder projections as product concepts instead of benchmark-only scripts.

What Changed

  • Added the pageindex.filesystem core model, SQLite workspace store, metadata policy/status handling, and summary projection indexing.
  • Added the pifs CLI with ls, tree, find, grep, stat, cat, search-summary, chat, and ask workflows.
  • Integrated PDF/Markdown registration with PageIndex extraction so registered documents have structure, page, node, text, metadata, and summary projection state.
  • Added bounded structural reads for agent safety: paginated structure output, page/node read limits, and clearer errors when the agent asks for too much context.
  • Added provider-neutral metadata and embedding wiring, removed hash embedding fallback, and unified generated/user metadata into one exposed metadata map with metadata_status for provenance/state.
  • Added examples/pifs_demo.py and a reusable examples/pifs_workspace demo flow for agent retrieval over examples/documents.

Verification

  • Ran PIFS CLI/chat demo manually on examples/pifs_workspace.
  • Ran focused filesystem/CLI tests during development, including tests/test_pageindex_filesystem_scope.py, tests/test_pifs_cli.py, tests/test_pageindex_structural_read.py, and tests/test_pifs_find_maxdepth.py.

@BukeLy BukeLy force-pushed the feat/pageindex-filesystem branch from 274af6c to d7d3cb8 Compare May 26, 2026 18:08
BukeLy added 29 commits May 27, 2026 02:12
Remove the synchronous=OFF pragma from PIFS catalog inserts so SQLite remains the durable source of truth.
Route default semantic search to the summary projection when summary is the only populated semantic channel.
Only use the fresh event loop fallback for missing running-loop detection, so RuntimeError from a threaded agent run is not retried.
BukeLy added 14 commits May 27, 2026 02:12
Raise on summary projection dimension mismatch instead of resetting an existing index.
Do not emit source-file grep fallback candidates unless an actual source line matches the query.
Avoid eager optional dependency imports when importing PageIndexFileSystem or filesystem semantic exports.
Resolve root virtual file paths correctly and raise a clear error for ambiguous file targets.
Escape wildcard characters in recursive folder LIKE filters and metadata contains queries.
Persist PageIndex tree build failure details in metadata_status and surface them through stat and structural reads.
Write projection and raw side effects only after a successful catalog insert, and clean owned artifacts when registration fails.
@BukeLy BukeLy force-pushed the feat/pageindex-filesystem branch from d7d3cb8 to 346eb0a Compare May 26, 2026 18:13
@BukeLy BukeLy force-pushed the feat/pageindex-filesystem branch from 76a3b62 to c13cb20 Compare May 26, 2026 18:33
@BukeLy BukeLy requested review from KylinMountain and rejojer May 26, 2026 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant