Skip to content

list_indexed_repositories omits repos that entered the graph via episodes (working_tree / fleet / git-replay) rather than index_directory #31

@Regis-RCR

Description

@Regis-RCR

TL;DR: list_indexed_repositories silently omits repos that are present and fully queryable in the same graph. The omitted ones entered through episodes (the working_tree file-watcher, a fleet agent's record_episode, or a git-history replay) rather than through an explicit index_directory. get_repository_stats, find_symbol, and get_impact all resolve those repos by repo_id. Only the listing does not show them. An agent that discovers coverage from list_indexed_repositories alone misses them.

Environment

  • memtrace 0.6.46, darwin-arm64
  • engine in remote mode on 127.0.0.1:50051
  • a workspace where some repos were bulk-indexed via index_directory and others entered the graph only via episodes (fleet agents writing working_tree episodes; one via git-history replay)

Repro

list_indexed_repositories returns 36 repos. Three repos that hold symbols in the same graph are absent from that list. Call them repo-g, repo-h, repo-i.

Each absent repo still answers get_repository_stats with full coverage:

get_repository_stats(repo_id="repo-g")
  -> { "symbol_node_count": 105,
       "nodes_by_kind": {"Function":105,"File":31,"APICall":2,"Process":1,"Community":3},
       "last_indexed": null, "last_episode_type": "working_tree", "episode_count": 38 }

find_symbol resolves symbols whose file_path is under those repos:

find_symbol(name="load_catalog")
  -> Function  load_catalog  repo-g/lib/catalog.py:137   (plus copies in repo-h, repo-i)

get_impact on such a symbol returns a non-empty, repo-scoped blast radius. So the repo is fully present in the graph; only list_indexed_repositories hides it.

Observed discriminator

The discriminator is exact across the four repos involved:

repo nodes_by_kind has Repository last_indexed last_episode_type in listing
repo-f (bulk-indexed) Repository: 1 set git_commit yes
repo-g (no Repository key) null working_tree no
repo-h (no Repository key) null working_tree no
repo-i (no Repository key) null git_commit no

Across these repos, a repo appears in list_indexed_repositories exactly when its get_repository_stats reports a Repository entry in nodes_by_kind and a non-null last_indexed. The listing's own empty-state note ("no Repository rows under a strongly-anchored .memdb") points the same way: it enumerates Repository rows. In these observed episode-only cases, get_repository_stats reports File / Function / APICall / Process / Community counts, no Repository key in nodes_by_kind, and last_indexed: null, and those repos are absent from the listing while their symbols stay queryable. Episode type is not the discriminator. repo-f and repo-i are both git_commit, and only the bulk-indexed one is listed.

Expected

list_indexed_repositories reflects every repo the graph can answer queries about, or at minimum offers a way to see episode-only repos. The tool's stated role is "call this first to discover available repo_ids; most other tools require a repo_id". That role is undercut when a repo_id accepted by get_repository_stats, find_symbol, and get_impact is not discoverable from the listing.

Actual

Episode-only repos (no Repository row, last_indexed null) are absent from the listing while fully queryable by repo_id. In a fleet setup this is a coverage blind spot. A repo that another agent populated via working_tree episodes, without a local index_directory, stays invisible to discovery yet holds usable symbols.

Ask

  1. Either surface episode-only repos in list_indexed_repositories (for instance with a field marking ingestion path or last_indexed: null), or add an opt-in flag to include them.
  2. If the bulk-indexed-only scope is intentional, document it on the tool: the listing reflects index_directory'd repos, not all repos present in the graph, so an agent enumerating coverage knows to also probe episode-only repos by repo_id.

This is a discoverability gap rather than a defect in what each tool returns. Happy to help scope the listing change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions