TL;DR: list_indexed_repositories silently omits repos that are present and fully queryable in the same graph. The omitted ones entered through episodes (the working_tree file-watcher, a fleet agent's record_episode, or a git-history replay) rather than through an explicit index_directory. get_repository_stats, find_symbol, and get_impact all resolve those repos by repo_id. Only the listing does not show them. An agent that discovers coverage from list_indexed_repositories alone misses them.
Environment
- memtrace 0.6.46, darwin-arm64
- engine in remote mode on 127.0.0.1:50051
- a workspace where some repos were bulk-indexed via
index_directory and others entered the graph only via episodes (fleet agents writing working_tree episodes; one via git-history replay)
Repro
list_indexed_repositories returns 36 repos. Three repos that hold symbols in the same graph are absent from that list. Call them repo-g, repo-h, repo-i.
Each absent repo still answers get_repository_stats with full coverage:
get_repository_stats(repo_id="repo-g")
-> { "symbol_node_count": 105,
"nodes_by_kind": {"Function":105,"File":31,"APICall":2,"Process":1,"Community":3},
"last_indexed": null, "last_episode_type": "working_tree", "episode_count": 38 }
find_symbol resolves symbols whose file_path is under those repos:
find_symbol(name="load_catalog")
-> Function load_catalog repo-g/lib/catalog.py:137 (plus copies in repo-h, repo-i)
get_impact on such a symbol returns a non-empty, repo-scoped blast radius. So the repo is fully present in the graph; only list_indexed_repositories hides it.
Observed discriminator
The discriminator is exact across the four repos involved:
| repo |
nodes_by_kind has Repository |
last_indexed |
last_episode_type |
in listing |
| repo-f (bulk-indexed) |
Repository: 1 |
set |
git_commit |
yes |
| repo-g |
(no Repository key) |
null |
working_tree |
no |
| repo-h |
(no Repository key) |
null |
working_tree |
no |
| repo-i |
(no Repository key) |
null |
git_commit |
no |
Across these repos, a repo appears in list_indexed_repositories exactly when its get_repository_stats reports a Repository entry in nodes_by_kind and a non-null last_indexed. The listing's own empty-state note ("no Repository rows under a strongly-anchored .memdb") points the same way: it enumerates Repository rows. In these observed episode-only cases, get_repository_stats reports File / Function / APICall / Process / Community counts, no Repository key in nodes_by_kind, and last_indexed: null, and those repos are absent from the listing while their symbols stay queryable. Episode type is not the discriminator. repo-f and repo-i are both git_commit, and only the bulk-indexed one is listed.
Expected
list_indexed_repositories reflects every repo the graph can answer queries about, or at minimum offers a way to see episode-only repos. The tool's stated role is "call this first to discover available repo_ids; most other tools require a repo_id". That role is undercut when a repo_id accepted by get_repository_stats, find_symbol, and get_impact is not discoverable from the listing.
Actual
Episode-only repos (no Repository row, last_indexed null) are absent from the listing while fully queryable by repo_id. In a fleet setup this is a coverage blind spot. A repo that another agent populated via working_tree episodes, without a local index_directory, stays invisible to discovery yet holds usable symbols.
Ask
- Either surface episode-only repos in
list_indexed_repositories (for instance with a field marking ingestion path or last_indexed: null), or add an opt-in flag to include them.
- If the bulk-indexed-only scope is intentional, document it on the tool: the listing reflects
index_directory'd repos, not all repos present in the graph, so an agent enumerating coverage knows to also probe episode-only repos by repo_id.
This is a discoverability gap rather than a defect in what each tool returns. Happy to help scope the listing change.
TL;DR:
list_indexed_repositoriessilently omits repos that are present and fully queryable in the same graph. The omitted ones entered through episodes (the working_tree file-watcher, a fleet agent'srecord_episode, or a git-history replay) rather than through an explicitindex_directory.get_repository_stats,find_symbol, andget_impactall resolve those repos byrepo_id. Only the listing does not show them. An agent that discovers coverage fromlist_indexed_repositoriesalone misses them.Environment
index_directoryand others entered the graph only via episodes (fleet agents writing working_tree episodes; one via git-history replay)Repro
list_indexed_repositoriesreturns 36 repos. Three repos that hold symbols in the same graph are absent from that list. Call them repo-g, repo-h, repo-i.Each absent repo still answers
get_repository_statswith full coverage:find_symbolresolves symbols whosefile_pathis under those repos:get_impacton such a symbol returns a non-empty, repo-scoped blast radius. So the repo is fully present in the graph; onlylist_indexed_repositorieshides it.Observed discriminator
The discriminator is exact across the four repos involved:
Across these repos, a repo appears in
list_indexed_repositoriesexactly when itsget_repository_statsreports aRepositoryentry innodes_by_kindand a non-nulllast_indexed. The listing's own empty-state note ("no Repository rows under a strongly-anchored .memdb") points the same way: it enumerates Repository rows. In these observed episode-only cases,get_repository_statsreports File / Function / APICall / Process / Community counts, noRepositorykey innodes_by_kind, andlast_indexed: null, and those repos are absent from the listing while their symbols stay queryable. Episode type is not the discriminator. repo-f and repo-i are bothgit_commit, and only the bulk-indexed one is listed.Expected
list_indexed_repositoriesreflects every repo the graph can answer queries about, or at minimum offers a way to see episode-only repos. The tool's stated role is "call this first to discover available repo_ids; most other tools require a repo_id". That role is undercut when arepo_idaccepted byget_repository_stats,find_symbol, andget_impactis not discoverable from the listing.Actual
Episode-only repos (no Repository row,
last_indexednull) are absent from the listing while fully queryable byrepo_id. In a fleet setup this is a coverage blind spot. A repo that another agent populated via working_tree episodes, without a localindex_directory, stays invisible to discovery yet holds usable symbols.Ask
list_indexed_repositories(for instance with a field marking ingestion path orlast_indexed: null), or add an opt-in flag to include them.index_directory'd repos, not all repos present in the graph, so an agent enumerating coverage knows to also probe episode-only repos byrepo_id.This is a discoverability gap rather than a defect in what each tool returns. Happy to help scope the listing change.