fix(search): never render a backend error or truncation as "no matches"#261
fix(search): never render a backend error or truncation as "no matches"#261khustup2 wants to merge 1 commit into
Conversation
Hivemind memory search could fail silently: a backend error, timeout, or capped result was indistinguishable from a genuine zero-match, so the agent confidently reported "not in hivemind" for content that was there. Root cause was outcome conflation across the read/search layers (NOT the "two backing stores" theory — cat and grep both read summary::text). Fixes, each grounded in grep's exit-code contract (0 match / 1 no-match / 2 error) and the principle that "couldn't search" / "incomplete" must be surfaced, never collapsed into empty: - virtual-table-query: queryUnionRows no longer swallows a total backend failure to []. When the UNION and BOTH per-table fallbacks fail it throws, so cat/Read surface a real error instead of a misleading "No such file". A single surviving fallback still returns its partial rows. - grep-interceptor: track the backend error through the in-memory fallback; when nothing else matches, return exit code 2 + stderr instead of exit 1 / empty. Fallback data still clears it (exit 0). - grep-core: searchDeeplakeTables reports a per-source cap hit via an optional meta out-param; grepBothTables / interceptor / MCP append an explicit incomplete-results notice so a capped page isn't mistaken for the full set. - grep-direct / pre-tool-use: let backend errors propagate to the existing outer catch -> sandboxed VFS-shell fallback (which re-attempts and signals honestly) instead of short-circuiting that retry. Honest signaling lives in the final layer (library / VFS shell), keeping the fast-path retry intact. codex/cursor/hermes inherit via the shared modules. 100% statement and branch coverage on the changed lines.
📝 WalkthroughWalkthroughThis PR extends grep search functionality to distinguish backend failures from zero-match results and signal when result sets are truncated. Changes propagate previously-swallowed errors through the UNION query layer, add truncation tracking throughout semantic and lexical search branches, and update exit codes to correctly report search state to callers. ChangesError Propagation, Truncation Detection, and Result Signaling
Sequence DiagramsequenceDiagram
participant Client
participant Interceptor as grep-interceptor
participant Semantic as searchDeeplakeTables<br/>(semantic)
participant Lexical as searchDeeplakeTables<br/>(lexical)
participant Fallback as In-memory FS
participant Output
Client->>Interceptor: grep query
Interceptor->>Semantic: search with meta, 3s timeout
alt Semantic returns rows
Semantic-->>Interceptor: rows, meta.truncated=false
Interceptor->>Output: exit 0, rows + notice
else Semantic returns empty
Semantic-->>Interceptor: [], meta.truncated=false
Interceptor->>Lexical: retry with meta, 3s timeout
alt Lexical returns rows
Lexical-->>Interceptor: rows, meta.truncated=true
Interceptor->>Output: exit 0, rows + notice
else Lexical returns empty
Lexical-->>Interceptor: []
Interceptor->>Fallback: scan cached paths
alt Fallback finds matches
Fallback-->>Interceptor: rows
Interceptor->>Output: exit 0, rows
else Fallback empty
Fallback-->>Interceptor: []
Interceptor->>Output: exit 1, empty
end
else Lexical throws
Lexical-->>Interceptor: error, backendError=set
Interceptor->>Output: exit 2, stderr
end
else Semantic throws
Semantic-->>Interceptor: error, backendError=set
Interceptor->>Fallback: scan cached paths
alt Fallback finds matches
Fallback-->>Interceptor: rows, backendError=clear
Interceptor->>Output: exit 0, rows
else Fallback empty
Fallback-->>Interceptor: []
Interceptor->>Output: exit 2, stderr
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Coverage ReportScope: files changed in this PR. Enforced threshold: 90% per metric (per file via
File Coverage — 6 files changed
Generated for commit 915a6dd. |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/hooks/virtual-table-query.ts (1)
202-212:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't collapse a total
/index.mdfetch failure into an empty index.These two
.catch(() => [])branches reintroduce the same false-negative the PR is fixing: if the exact-path probe returns no physical/index.mdrow and both section queries fail,readVirtualPathContent("/index.md")still returns a synthetic “empty” index instead of throwing. That preventssrc/hooks/pre-tool-use.tsfrom distinguishing “backend unavailable” from “workspace has no memory” on the Read path.Mirror
queryUnionRowshere: allow one section query to fail and render a partial index, but rethrow when both section fetches fail.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/hooks/virtual-table-query.ts` around lines 202 - 212, The two api.query calls for summaryRows and sessionRows currently swallow errors via .catch(() => []), which masks a total backend failure; change the logic to use Promise.allSettled (or otherwise capture per-query errors) for the two queries used in readVirtualPathContent so that if both queries reject you rethrow the combined error, but if at least one succeeds you proceed using the successful result(s) (default the failed one to []); update references to summaryRows, sessionRows and the api.query calls accordingly and mirror the behavior used by queryUnionRows.src/shell/grep-core.ts (1)
374-381:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftHybrid truncation detection misses single-source caps.
This only flags truncation when the combined result hits
outerLimit, but each hybrid subquery is capped independently. A single lexical/semantic source can fill its ownLIMITwhile the union still returns fewer thanouterLimitrows, leavingmeta.truncated = falseand downstream callers treating an incomplete search as complete.Suggested direction
- const memSemQuery = - `SELECT path, summary::text AS content, 0 AS source_order, '' AS creation_date, ` + + const memSemQuery = + `SELECT path, summary::text AS content, 0 AS source_order, '' AS creation_date, 'mem_sem' AS cap_source, ` + `(summary_embedding <#> ${vecLit}) AS score ` + `FROM "${memoryTable}" WHERE ARRAY_LENGTH(summary_embedding, 1) > 0${pathFilter} ` + `ORDER BY score DESC LIMIT ${semanticLimit}`; ... - const rows = await api.query( - `SELECT path, content, source_order, creation_date, score FROM (` + + const rows = await api.query( + `SELECT path, content, source_order, creation_date, cap_source, score FROM (` + unionSql + `) AS combined ORDER BY score DESC LIMIT ${outerLimit}` ); - - if (meta && rows.length >= outerLimit) meta.truncated = true; + if (meta) { + let memSemCount = 0; + let sessSemCount = 0; + let memLexCount = 0; + let sessLexCount = 0; + for (const row of rows) { + switch (String(row["cap_source"])) { + case "mem_sem": memSemCount++; break; + case "sess_sem": sessSemCount++; break; + case "mem_lex": memLexCount++; break; + case "sess_lex": sessLexCount++; break; + } + } + if ( + memSemCount >= semanticLimit || + sessSemCount >= semanticLimit || + memLexCount >= lexicalLimit || + sessLexCount >= lexicalLimit + ) { + meta.truncated = true; + } + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/shell/grep-core.ts` around lines 374 - 381, The current truncation check only inspects combined rows vs outerLimit and misses when one subquery hit its per-source cap; change the logic to detect per-source truncation by either (a) including a source identifier in the unionSql (e.g., add a column like source_type or source_tag in each subquery) and then after api.query inspect counts per source against semanticLimit and lexicalLimit to set meta.truncated = true if any source reached its limit, or (b) run lightweight COUNT/LIMIT checks for each subquery separately before/after the union and set meta.truncated = true if either count >= semanticLimit or count >= lexicalLimit; check and update meta.truncated after obtaining rows. Ensure you reference outerLimit, semanticLimit, lexicalLimit, unionSql, rows, api.query and meta.truncated when making the change.
🧹 Nitpick comments (1)
tests/claude-code/virtual-table-query.test.ts (1)
67-74: ⚡ Quick winUse exact error assertions for these new propagation tests.
Both mocks use fixed error messages, but the expectations only match broad substrings. That makes the tests pass even if the wrong failure path is thrown. Assert the exact message or exact error instance here so the regression guard actually proves which backend failure surfaced. As per coding guidelines,
tests/**should prefer specific values over generic substrings.Also applies to: 235-237
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/claude-code/virtual-table-query.test.ts` around lines 67 - 74, The test currently matches a broad substring when asserting the error from readVirtualPathContent; change the assertion to expect the exact error message thrown by the mock (e.g. "deeplake 500: internal error") or the exact Error instance instead of a regex so the test verifies the precise propagated failure; update the expect(...).rejects.toThrow(...) in this test (and the similar assertion around lines 235-237) to assert the exact message/text produced by the mocked Error.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/shell/grep-core.ts`:
- Around line 680-682: withTruncationNotice currently returns an empty array
when lines is empty even if truncated is true, which causes handleGrepDirect to
emit "(no matches)" and a false non-zero exit; change withTruncationNotice so
that whenever truncated is true it appends TRUNCATION_NOTICE and returns at
least [TRUNCATION_NOTICE] (i.e., return [...lines, TRUNCATION_NOTICE] even if
lines.length === 0), leaving the non-truncated path unchanged; update the
function referenced as withTruncationNotice and ensure callers like
handleGrepDirect/grep-interceptor will receive the truncation notice instead of
an empty result.
In `@tests/claude-code/grep-direct.test.ts`:
- Around line 112-124: Replace the broad regex check in the test that looks for
a truncation warning with an exact assertion using the TRUNCATION_NOTICE
constant: import TRUNCATION_NOTICE and in the "appends an incomplete-results
notice..." test assert that String(r) includes TRUNCATION_NOTICE (and in the
"does NOT add the notice..." test assert it does not include TRUNCATION_NOTICE)
so the tests rely on the canonical truncation message instead of a loose regex;
locate the assertions around handleGrepDirect and update them to use
TRUNCATION_NOTICE.
In `@tests/claude-code/mcp-server.test.ts`:
- Around line 169-171: Replace the generic substring check for the truncation
warning with an assertion against the canonical truncation notice constant
(instead of "results incomplete"); locate the test call to
registeredTools.get("hivemind_search")!.handler and change the second
expectation to assert the exact message by comparing out.content[0].text (or the
specific text segment) to the shared canonical truncation notice value (e.g.,
TRUNCATION_NOTICE or canonicalTruncationNotice) used elsewhere in tests/fixtures
so the test fails if the server emits the wrong warning text.
---
Outside diff comments:
In `@src/hooks/virtual-table-query.ts`:
- Around line 202-212: The two api.query calls for summaryRows and sessionRows
currently swallow errors via .catch(() => []), which masks a total backend
failure; change the logic to use Promise.allSettled (or otherwise capture
per-query errors) for the two queries used in readVirtualPathContent so that if
both queries reject you rethrow the combined error, but if at least one succeeds
you proceed using the successful result(s) (default the failed one to []);
update references to summaryRows, sessionRows and the api.query calls
accordingly and mirror the behavior used by queryUnionRows.
In `@src/shell/grep-core.ts`:
- Around line 374-381: The current truncation check only inspects combined rows
vs outerLimit and misses when one subquery hit its per-source cap; change the
logic to detect per-source truncation by either (a) including a source
identifier in the unionSql (e.g., add a column like source_type or source_tag in
each subquery) and then after api.query inspect counts per source against
semanticLimit and lexicalLimit to set meta.truncated = true if any source
reached its limit, or (b) run lightweight COUNT/LIMIT checks for each subquery
separately before/after the union and set meta.truncated = true if either count
>= semanticLimit or count >= lexicalLimit; check and update meta.truncated after
obtaining rows. Ensure you reference outerLimit, semanticLimit, lexicalLimit,
unionSql, rows, api.query and meta.truncated when making the change.
---
Nitpick comments:
In `@tests/claude-code/virtual-table-query.test.ts`:
- Around line 67-74: The test currently matches a broad substring when asserting
the error from readVirtualPathContent; change the assertion to expect the exact
error message thrown by the mock (e.g. "deeplake 500: internal error") or the
exact Error instance instead of a regex so the test verifies the precise
propagated failure; update the expect(...).rejects.toThrow(...) in this test
(and the similar assertion around lines 235-237) to assert the exact
message/text produced by the mocked Error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 75cc80d4-6867-4b92-9504-622e8d7ec29a
📒 Files selected for processing (11)
src/hooks/grep-direct.tssrc/hooks/pre-tool-use.tssrc/hooks/virtual-table-query.tssrc/mcp/server.tssrc/shell/grep-core.tssrc/shell/grep-interceptor.tstests/claude-code/grep-core.test.tstests/claude-code/grep-direct.test.tstests/claude-code/grep-interceptor.test.tstests/claude-code/mcp-server.test.tstests/claude-code/virtual-table-query.test.ts
| export function withTruncationNotice(lines: string[], truncated: boolean): string[] { | ||
| if (truncated && lines.length > 0) return [...lines, TRUNCATION_NOTICE]; | ||
| return lines; |
There was a problem hiding this comment.
Don't collapse truncated searches back to empty output.
When the capped fetch happens before regex refinement, it's reachable for meta.truncated to be true and lines to be empty. Returning [] here makes handleGrepDirect emit "(no matches)" and grep-interceptor return exit code 1, even though the search is explicitly incomplete.
Proposed fix
export function withTruncationNotice(lines: string[], truncated: boolean): string[] {
- if (truncated && lines.length > 0) return [...lines, TRUNCATION_NOTICE];
+ if (truncated) {
+ return lines.length > 0 ? [...lines, TRUNCATION_NOTICE] : [TRUNCATION_NOTICE];
+ }
return lines;
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/shell/grep-core.ts` around lines 680 - 682, withTruncationNotice
currently returns an empty array when lines is empty even if truncated is true,
which causes handleGrepDirect to emit "(no matches)" and a false non-zero exit;
change withTruncationNotice so that whenever truncated is true it appends
TRUNCATION_NOTICE and returns at least [TRUNCATION_NOTICE] (i.e., return
[...lines, TRUNCATION_NOTICE] even if lines.length === 0), leaving the
non-truncated path unchanged; update the function referenced as
withTruncationNotice and ensure callers like handleGrepDirect/grep-interceptor
will receive the truncation notice instead of an empty result.
| it("appends an incomplete-results notice when a source hits the row cap", async () => { | ||
| const rows = Array.from({ length: 100 }, (_, i) => ({ | ||
| path: `/summaries/s${i}.md`, content: "foo match", source_order: 0, | ||
| })); | ||
| const api = { query: vi.fn().mockResolvedValueOnce(rows) } as any; | ||
| const r = await handleGrepDirect(api, "memory", "sessions", baseParams); | ||
| expect(String(r).toLowerCase()).toMatch(/cap|incomplete|more match|refine/); | ||
| }); | ||
|
|
||
| it("does NOT add the notice for a normal, fully-returned result", async () => { | ||
| const api = mockApi([{ path: "/summaries/a.md", content: "foo line", source_order: 0 }]); | ||
| const r = await handleGrepDirect(api, "memory", "sessions", baseParams); | ||
| expect(String(r).toLowerCase()).not.toMatch(/cap|incomplete|more match|refine/); |
There was a problem hiding this comment.
Assert the exact truncation notice here.
The regex at Line 118 is broad enough to pass on unrelated warning text, so it won't reliably protect the new signaling contract. Import TRUNCATION_NOTICE and assert on that exact value instead.
Suggested change
import { parseBashGrep, handleGrepDirect, type GrepParams } from "../../src/hooks/grep-direct.js";
+import { TRUNCATION_NOTICE } from "../../src/shell/grep-core.js";
...
- expect(String(r).toLowerCase()).toMatch(/cap|incomplete|more match|refine/);
+ expect(r).toContain(TRUNCATION_NOTICE);As per coding guidelines, tests/**: Prefer asserting on specific values (paths, messages) over generic substrings.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/claude-code/grep-direct.test.ts` around lines 112 - 124, Replace the
broad regex check in the test that looks for a truncation warning with an exact
assertion using the TRUNCATION_NOTICE constant: import TRUNCATION_NOTICE and in
the "appends an incomplete-results notice..." test assert that String(r)
includes TRUNCATION_NOTICE (and in the "does NOT add the notice..." test assert
it does not include TRUNCATION_NOTICE) so the tests rely on the canonical
truncation message instead of a loose regex; locate the assertions around
handleGrepDirect and update them to use TRUNCATION_NOTICE.
Source: Coding guidelines
| const out = await registeredTools.get("hivemind_search")!.handler({ query: "x" }) as { content: { text: string }[] }; | ||
| expect(out.content[0].text).toContain("/summaries/alice/a.md"); | ||
| expect(out.content[0].text.toLowerCase()).toContain("results incomplete"); |
There was a problem hiding this comment.
Assert the exact truncation notice here.
Line 171 only checks a generic substring, so this still passes if the server emits the wrong warning text. Since this test already controls the canonical notice, please assert that exact message instead.
🎯 Proposed test tightening
const out = await registeredTools.get("hivemind_search")!.handler({ query: "x" }) as { content: { text: string }[] };
expect(out.content[0].text).toContain("/summaries/alice/a.md");
- expect(out.content[0].text.toLowerCase()).toContain("results incomplete");
+ expect(out.content[0].text).toContain(
+ "[hivemind: results incomplete — a per-source row cap was hit, so more matches likely exist. Narrow the path or use a more specific pattern to see them.]",
+ );As per coding guidelines, "tests/**: Prefer asserting on specific values (paths, messages) over generic substrings."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/claude-code/mcp-server.test.ts` around lines 169 - 171, Replace the
generic substring check for the truncation warning with an assertion against the
canonical truncation notice constant (instead of "results incomplete"); locate
the test call to registeredTools.get("hivemind_search")!.handler and change the
second expectation to assert the exact message by comparing out.content[0].text
(or the specific text segment) to the shared canonical truncation notice value
(e.g., TRUNCATION_NOTICE or canonicalTruncationNotice) used elsewhere in
tests/fixtures so the test fails if the server emits the wrong warning text.
Source: Coding guidelines
Problem
Hivemind memory search could fail silently: a backend error, a timeout, or a capped result was indistinguishable from a genuine zero-match. The agent would confidently report "not in hivemind" for content that was actually there — then
git/catwould find it.A prior investigation blamed "two different backing stores" (
catvsgrep). That's inaccurate — both readsummary::text. The real defect is outcome conflation: errors, truncation, and genuine-empty all rendered identically.Fix
Grounded in grep's own exit-code contract (
0match /1no-match /2error) and the principle that "couldn't search" / "incomplete" must be surfaced, never collapsed into empty:virtual-table-query.ts—queryUnionRowsno longer swallows a total backend failure to[]. When the UNION and both per-table fallbacks fail it throws, socat/Read surface a real error instead of a misleadingNo such file or directory. A single surviving fallback still returns its partial rows.grep-interceptor.ts— track the backend error through the in-memory fallback; when nothing else matches, return exit code 2 + stderr instead of exit 1 / empty. Fallback data still clears it (exit 0).grep-core.ts—searchDeeplakeTablesreports a per-source row-cap hit via an optionalmetaout-param;grepBothTables/ interceptor / MCP append an explicit incomplete-results notice so a capped page isn't mistaken for the full set. (Covers the regex-only content scan, which inspects only the first 100 unordered rows.)grep-direct.ts/pre-tool-use.ts— let backend errors propagate to the existing outer catch → sandboxed VFS-shell fallback (which re-attempts and signals honestly) instead of short-circuiting that retry.Honest signaling lives in the final layer (library / VFS shell), keeping the fast-path retry intact.
codex/cursor/hermesinherit the fixes via the shared modules.Tests
TDD throughout — each defect got a red test before the fix. New/updated coverage:
Errorrejection / daemon-down)100% statement and branch coverage on the changed lines (43/43 stmts, 36/36 branches, measured
git diff∩ v8 coverage). Full suite: zero new failures vs. baseline.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes