Skip to content

feat: LLM review stage for enhanced reachability detection#50

Merged
ar7casper merged 8 commits into
knostic:release/2026-05-12from
joshbouncesecurity:feat/issue17-llm-reachability
May 13, 2026
Merged

feat: LLM review stage for enhanced reachability detection#50
ar7casper merged 8 commits into
knostic:release/2026-05-12from
joshbouncesecurity:feat/issue17-llm-reachability

Conversation

@joshbouncesecurity
Copy link
Copy Markdown
Contributor

Summary

Adds a new LLM review stage (off by default, enabled via --llm-reachability on openant scan) that uses a strong model (Opus by default) to surface additional reachability signals beyond what the structural analysis catches:

  • Likely entry points the structural pass missed (framework-specific handlers, plugin/CLI registrations, message handlers).
  • External content ingestion (HTTP request bodies, file/network reads, env/argv, stdin, untrusted IPC).
  • Cross-process or async data flows.

Signals are advisory and only promote a unit's reachability — they never demote one that the structural analysis already kept. This matches the issue's "complements, not replaces" intent.

Output:

  • llm_reachability.json in the scan dir with the full signal list.
  • llm_reachability_signals: [...] field on each unit in dataset.json.
  • High-confidence entry_point signals set is_entry_point: true on the target unit.

Cost & rate-limit safety: reuses the existing GlobalRateLimiter via AnthropicClient, opt-in only, and prompts are batched (default 25 units/call).

Addresses #17 (does not close — let the maintainer review the prompt + heuristics first).

Test plan

  • Unit tests for analyze_reachability with mocked LLM: fixed JSON, malformed JSON, exception in client, app_context threading, batch chunking.
  • Unit tests for apply_signals: promote-only semantics (high-confidence promotes, medium does not, never demotes), per-unit signal accumulation, unknown-id rejection.
  • CLI plumbing tests: --llm-reachability appears in openant scan --help; default does not pass the flag through to scan_repository; setting it threads llm_reachability=True.
  • Existing pytest suite passes: 112 passed, 23 skipped (env-dependent, e.g. Go binary).
  • Manual: enable on a small Express fixture; verify route handlers gain entry-point signals.

@joshbouncesecurity
Copy link
Copy Markdown
Contributor Author

joshbouncesecurity commented May 4, 2026

Manual verification

Off by default. Requires API key. Cost note: enabling adds approximately one Opus call per 25 units.

  • openant scan --help shows --llm-reachability with the cost note.
  • Default behavior: openant scan <repo> (no flag) — pipeline unchanged from current behavior; no LLM reachability cost incurred.
  • Enabled: openant scan <repo> --llm-reachability — emits llm_reachability.json in the scan dir; merged signals appear on dataset.json units as llm_reachability_signals.
  • Promote-only: high-confidence entry_point signals on a previously-unreachable unit promote it to is_entry_point: true. Existing entry points are NEVER demoted.
  • Pipeline ordering: app-context generation runs BEFORE the LLM reachability stage, so app context is threaded into the reachability prompt. Verify with --quiet step ordering.
  • Markdown-wrapped JSON: model occasionally returns the response inside a fenced json ... block — the parser strips the fences and recovers the signals.
  • Empty units: openant scan <empty-or-failed-parse> --llm-reachability: stage skips gracefully (no LLM call).

@joshbouncesecurity
Copy link
Copy Markdown
Contributor Author

Local test results

Reinstalled openant-core from this branch and ran openant scan --llm-reachability end-to-end on the in-tree sample_python_repo (5 reachable units after the structural pass). Skipped enhance/verify/report/dynamic-test to keep cost minimal.

Commands run:

go build -o openant.exe ./
pip install -e libs/openant-core/

openant scan <sample_python_repo> --output <out> \
  --llm-reachability \
  --no-context --no-enhance --no-report --skip-dynamic-test \
  --limit 1 --model sonnet

Outcome (against the manual-verification checklist):

  • openant scan --help lists --llm-reachability with the cost note: "Off by default — enabling this may incur additional LLM cost (one Opus call per ~25 units)" ✅
  • Enabled run: llm_reachability.json written to the scan dir; the new pipeline step llm-reachability ran successfully and emitted llm-reachability.report.json with cost_usd: 0.01245, token_usage.total_tokens: 782, units_reviewed: 5
  • Promote-only semantics: Opus reviewed 5 units (the structural pass had already correctly tagged the two Flask endpoints as is_entry_point: True); model returned signals: []. signals_added: 0, entry_points_promoted: 0, units_touched: 0. Existing entry points were not demoted ✅
  • Pipeline ordered as documented: parse → llm-reachability → analyze. llm-reachability.report.json is timestamped before analyze.report.json
  • Reachability stage runs Opus (per the report) regardless of --model sonnet (which only governs analyze) — matches cost note ✅
  • Did not exercise the markdown-fence recovery path or the empty-units skip path here — covered by the unit tests in the diff. Did not separately confirm app-context-threading because I passed --no-context.

Total cost: $0.024 (reachability $0.012 + analyze 1 unit $0.012). Well under the budget.

@ar7casper
Copy link
Copy Markdown
Collaborator

🔴 High — Architecture · core/scanner.py:228 + core/parser_adapter.py:351

Issue: The LLM stage runs after parse, but parse already applied the structural
reachability filter when processing_level != "all" (default is "reachable"). So by the
time the LLM sees the dataset, units that weren't reachable from the structural entry
points are already discarded. The headline value prop — "Likely entry points the structural
pass missed (framework-specific handlers, plugin/CLI registrations, message handlers)"

cannot be delivered as written: if the structural pass missed an Express route handler, the
unit for that handler was filtered out at parse time. The LLM never sees it. Promoting
is_entry_point on a unit that's no longer in the dataset is impossible.

What the stage does still deliver in current form:

  • Attaches llm_reachability_signals to surviving units (metadata for reports)
  • Promotes is_entry_point on surviving units, which flows into
    agentic_enhancer/prompts.py:106 (so the enhancer treats those units as entry points when
    prompting)

What it cannot do:

  • Bring back units the structural pass dropped
  • "Surface missed entry points" — those units are gone

joshbouncesecurity added a commit to joshbouncesecurity/OpenAnt that referenced this pull request May 12, 2026
The stage was running after the structural filter had already discarded
units that weren't reachable from heuristic entry points — meaning the
LLM could never find the missed entry points that are the feature's main
value prop.

Fix:
- When --llm-reachability is set, parse with processing_level="all" so
  every unit is visible to the LLM.
- After apply_signals promotes LLM-identified entry points, re-run the
  structural reachability filter (apply_reachability_filter) with those
  unit IDs added as extra BFS seeds.
- The final dataset is filtered to the user's requested processing_level
  but expanded by any entry points the LLM found.

Also:
- Expose apply_reachability_filter as a public function in
  parser_adapter.py with an extra_entry_points parameter; preserve any
  is_entry_point=True already set by the LLM stage when re-stamping.
- Update help text in cli.py and scan.go to reflect that cost scales with
  total repo size, not the filtered unit count.
- Update llm_reachability.py docstring to document the correct pipeline
  ordering.

Addresses review comment on PR knostic#50.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@joshbouncesecurity
Copy link
Copy Markdown
Contributor Author

Good catch — fixed in faa8948.

The stage was running against the already-filtered dataset, so any unit the structural pass dropped was gone before the LLM could see it. The fix:

  1. Parse with processing_level="all" when --llm-reachability is set, so the LLM reviews the full codebase.
  2. After signals are applied, re-run apply_reachability_filter with the LLM-promoted entry point IDs added as extra BFS seeds, so the final dataset is still filtered to the user's requested processing_level — but now expanded by whatever the LLM found.
  3. apply_reachability_filter is exposed as a public function in parser_adapter.py with an extra_entry_points parameter; it also preserves any is_entry_point=True already set by the LLM stage when re-stamping units.
  4. Help text updated in both cli.py and scan.go — cost now scales with total repo size, not the filtered unit count, which is a meaningful difference on large repos.

All 29 existing tests still pass.

@joshbouncesecurity joshbouncesecurity marked this pull request as ready for review May 12, 2026 05:32
@joshbouncesecurity
Copy link
Copy Markdown
Contributor Author

@ar7casper responded to the comment

@ar7casper ar7casper changed the base branch from master to release/2026-05-12 May 12, 2026 12:47
Copy link
Copy Markdown
Collaborator

@ar7casper ar7casper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round-2 review after the faa8948 fix — the architectural rework for the Python path is correct. One High-severity finding I missed in the prior review, plus two Mediums and a few Lows.


🟠 High — Architecture · core/scanner.py:283-294 + core/parser_adapter.py:247-252

The post-LLM re-filter call to apply_reachability_filter requires call_graph.json on disk. Only the Python parser (parse_repository.py:173) and the Zig parser (zig/test_pipeline.py:123) write this file. JS, Go, C, Ruby, and PHP parsers do their reachability filtering internally inside their per-language test_pipeline.py and never persist call_graph.json.

When a user runs openant scan <ts-repo> --llm-reachability (default --level reachable):

  1. effective_parse_level = "all" → JS parser parses the full unfiltered codebase ✅
  2. LLM reviews all units, promotes some via is_entry_point=True
  3. scanner.py:289 calls apply_reachability_filter(dataset, output_dir, "reachable", extra_entry_points=...)
  4. Function looks for output_dir/call_graph.json → not found
  5. parser_adapter.py:248-252 logs [Warning] call_graph.json not found — skipping reachability filter to stderr and returns the dataset unchanged
  6. Dataset remains the FULL unfiltered codebase
  7. Downstream stages (enhance, analyze) process every function in the repo

Net effect: --llm-reachability silently disables --level reachable on every non-Python (and non-Zig) codebase. A typical Express scan that should have processed ~30 reachable units processes ~500 instead — 10–20× cost spike in the analyze stage. The user has no easy way to know --level was ignored.

This is the same architectural class as the round-1 issue (LLM stage interacting with the structural filter), but the round-1 fix only addresses the Python path.

Suggestion (any one of these):

  • Most thorough: make all parsers write call_graph.json to output_dir (uniform), so apply_reachability_filter works for every language.
  • Pragmatic: skip the post-LLM re-filter for non-Python languages and print a loud warning that --llm-reachability doesn't yet support per-language re-filtering, including the cost implication.
  • Minimum acceptable: in parser_adapter.py:247-252, raise a RuntimeError (or escalate the warning to a hard failure) when called with non-empty extra_entry_points and call_graph.json is missing — fail loudly so the bug is visible at scan time.

🟡 Medium — Quality · core/llm_reachability.py:51-53

MODEL_PRIMARY = "claude-opus-4-20250514" (Opus 4 from May 2025) is older than the codebase convention. analyzer.py, enhancer.py, reporter.py, finding_verifier.py all use the newer claude-opus-4-6. The source comment claims "matches the convention in core/analyzer.py / utilities/llm_client.py" but analyzer.py:316 is model_id = "claude-opus-4-6" if model == "opus" else "claude-sonnet-4-20250514" — the comment is misleading.

Suggestion: align MODEL_PRIMARY to claude-opus-4-6 to match the rest of the Opus-using stages, and fix the comment.


🟡 Medium — Coupling · core/scanner.py:267

analyze_reachability(..., max_units=limit) couples the LLM reachability stage's coverage to the analyze-stage --limit flag. A user running openant scan --limit 50 --llm-reachability to constrain analyze cost will silently limit LLM reachability to the first 50 units. If the entry point that would have promoted units 1–50 sits at unit 51, those units never enter the reachable set.

Suggestion: decouple — either a separate --llm-reachability-max-units flag, or don't apply limit to LLM reachability (let it always review the full dataset; cost is controlled by batch_size).


🟢 Lows (non-blocking, listed for completeness)

  1. Unused importcore/llm_reachability.py:47: field imported but never used. PR 56's ruff rules don't catch this (F401 isn't enabled). Drop it.
  2. Always-null prompt fieldcore/llm_reachability.py:174: unit.get("reachable_from_entry") always returns None because nothing in the project writes that field. The reachability filter writes reachable: bool. Either remove from the prompt projection or rename to reachable.
  3. Aggressive code trimmingcore/llm_reachability.py:61: MAX_CODE_BYTES = 1500 (~30-50 lines) often misses entry-point indicators in handlers with embedded business logic. Consider 4-8 KB or env-var configurable.
  4. Style inconsistencycore/scanner.py:248,259,272,299: new with open(..., encoding="utf-8") as f: calls work but don't follow PR 56's new convention of read_json / write_json from utilities.file_io.
  5. Observability gapcore/scanner.py:245: step_context inputs={"model": "opus"} is hardcoded string, not actual model ID. LLM-promoted units have empty entry_point_reason (the structural detector only sets it for its own detections) — hard to tell from a unit's metadata that the LLM promoted it without cross-referencing llm_reachability_signals.

What's good

  • Promote-only semantics with regression test against demotion is exactly right.
  • Failure isolation per batch in analyze_reachability (try/except per batch) is defensive in the right place.
  • 5 commits with self-iteration including 3 fixes from prior review feedback. Engineering hygiene.
  • The architecture refactor commit (3d83787e) self-caught a real app_context_path = None ordering bug.
  • Test coverage is excellent — 29 tests across 6 classes including the never-demote semantic and the non-positive batch-size guard.
  • Help text honestly mentions cost scales with total repo size, not the filtered count.

Verdict: the High finding is the actionable ask — once that's resolved (or scoped to Python+Zig only with a loud warning), the rest are nits / follow-ups.

@joshbouncesecurity joshbouncesecurity marked this pull request as draft May 12, 2026 13:16
@joshbouncesecurity
Copy link
Copy Markdown
Contributor Author

Addressed the call_graph.json gap and CI runtime coverage in two commits:

feat: write call_graph.json for all parser languages (417883b)

All five parsers that were missing call_graph.json now write it:

Parser Approach
C builder.export() result written directly (already correct snake_case format)
Ruby Same — builder.export()
PHP Same — builder.export()
JavaScript Reads back analyzer_output.json after subprocess, normalises camelCase → snake_case (callGraphcall_graph, reverseCallGraphreverse_call_graph)
Go Reconstructs call_graph/reverse_call_graph from direct_calls/direct_callers in unit metadata (same fields apply_reachability_filter already reads when call_graph.json is absent)

Also added tests/test_call_graph_output.py with 17 tests:

  • 5 always-run unit tests for apply_reachability_filter public API (synthetic fixture, no external deps)
  • 2 always-run integration tests for Python parser
  • 2 tests each for JS/Go/C/Ruby/PHP (skip-guarded by availability checks)

ci: ensure go_parser binary is built in python-tests job (c82498f)

The python-tests CI job previously had no Go toolchain, so Go parser tests always skipped (and by extension the call_graph.json test for Go). Fixed:

  • Added setup-go step in python-tests using parsers/go/go_parser/go.mod
  • Added per-platform go build steps (outputs go_parser on Linux/macOS, go_parser.exe on Windows)
  • Updated _go_parser_available() in the test file to try-execute the binary (catches WinError 193 when a Linux ELF is present on Windows) and check both go_parser and go_parser.exe candidates
  • Changed test run command to pytest tests/ -v (all tests) to match the go-tests job pattern

@joshbouncesecurity
Copy link
Copy Markdown
Contributor Author

Addressed all findings from this review. Summary by severity:


🔴 High — call_graph.json missing for JS/Go/C/Ruby/PHP

Went with the most thorough fix: all five parsers now write call_graph.json to output_dir.

Parser Approach
C / Ruby / PHP builder.export() written directly — already correct snake_case format
JavaScript Reads back analyzer_output.json after subprocess, normalises callGraphcall_graph / reverseCallGraphreverse_call_graph
Go Reconstructs from direct_calls/direct_callers in unit metadata (the same fields apply_reachability_filter already falls back to when the file is absent)

Added tests/test_call_graph_output.py with 17 tests covering all 6 languages plus 5 unit tests for the apply_reachability_filter public API. CI (python-tests job) now installs Go and builds go_parser per-platform so those tests don't silently skip. The _go_parser_available() guard now try-executes the binary to catch WinError 193 (Linux ELF on Windows).

The os.path.exists guard + loud WARNING in scanner.py is kept as a safety net for any future parser that doesn't yet write the file.


🟡 Medium — MODEL_PRIMARY string

Fixed: MODEL_PRIMARY = "claude-opus-4-6" to match analyzer.py / enhancer.py / etc. Updated step_context inputs to use the constant rather than a hardcoded string.

🟡 Medium — --limit coupling

Fixed: removed max_units=limit from the analyze_reachability call. LLM reachability now always reviews the full parsed dataset; --limit only constrains the downstream analyse stage as intended.


🟢 Lows

  1. Unused field import — removed.
  2. reachable_from_entryreachable — fixed field name in the prompt projection.
  3. is_entry_point_reasonentry_point_reason — fixed field name; set in apply_signals when LLM promotes a unit.
  4. step_context inputs model string — now uses _LLM_REACH_MODEL constant, not hardcoded "opus".

(Low 3 — MAX_CODE_BYTES — left unchanged for now; that's a tuning call better made with empirical data on real repos rather than guesswork.)

joshbouncesecurity and others added 8 commits May 12, 2026 17:32
Adds an opt-in LLM review stage (off by default, enabled via the new
`--llm-reachability` flag on `openant scan`) that uses a strong model
(Opus by default) to surface additional reachability signals beyond
what the structural pass catches:

- Likely entry points the structural analysis may miss (framework
  hooks, plugin/CLI registrations, message handlers).
- External-input sites (HTTP request bodies, file/network reads,
  env/argv, stdin, untrusted IPC).
- Cross-process / async data-flow indicators.

Signals are advisory and *promote-only*: high-confidence entry-point
signals can set `is_entry_point=True` on a unit, but no signal ever
demotes a unit that the structural analysis already kept. This matches
the "complements, does not replace" intent in issue #17.

Output:
- `llm_reachability.json` written to the scan dir with the full signal
  list.
- Each unit gains an `llm_reachability_signals` array on the dataset.

Cost & rate-limit safety: opt-in only, prompts are batched, and the
client integration goes through the existing `AnthropicClient` (which
respects `GlobalRateLimiter`).

Refs #17.
The Python CLI defines --llm-reachability for the LLM reachability stage
(issue #17), but the Go CLI proxy did not expose it. The test
TestHelp::test_scan_help_advertises_llm_reachability inspects 'openant
scan --help' (Go cobra output) and was failing on all 3 OS targets.

Register --llm-reachability as a Bool flag on the Go scan command and
pass it through to the Python invocation when set.
- scanner.py: forward-declare app_context_path before step 1.5 so the
  LLM reachability block doesn't hit a NameError when --llm-reachability
  is enabled (the block ran before the app-context step that defined it).
- llm_reachability._chunk: non-positive batch_size used to reference an
  unbound loop variable; now collapses to a single batch covering all
  items. Adds a regression test.
- Help text (Python CLI + Go CLI): note that --llm-reachability may
  incur additional LLM cost, per cost-safety review.
The LLM reachability stage threads app_context into its prompt to help
the model reason about expected entry points (web_app vs cli_tool, etc).
The previous ordering ran it before app-context generation, so the
app_context_path was always None at the call site — the prompt threading
silently no-op'd. Reordering the steps makes the threading actually work.

This also retires the temporary forward-declaration introduced in the
previous commit; app_context_path is now defined naturally by the
preceding step before the LLM reachability block reads it.
The stage was running after the structural filter had already discarded
units that weren't reachable from heuristic entry points — meaning the
LLM could never find the missed entry points that are the feature's main
value prop.

Fix:
- When --llm-reachability is set, parse with processing_level="all" so
  every unit is visible to the LLM.
- After apply_signals promotes LLM-identified entry points, re-run the
  structural reachability filter (apply_reachability_filter) with those
  unit IDs added as extra BFS seeds.
- The final dataset is filtered to the user's requested processing_level
  but expanded by any entry points the LLM found.

Also:
- Expose apply_reachability_filter as a public function in
  parser_adapter.py with an extra_entry_points parameter; preserve any
  is_entry_point=True already set by the LLM stage when re-stamping.
- Update help text in cli.py and scan.go to reflect that cost scales with
  total repo size, not the filtered unit count.
- Update llm_reachability.py docstring to document the correct pipeline
  ordering.

Addresses review comment on PR knostic#50.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
High — JS/Go/C/Ruby/PHP don't persist call_graph.json so the post-LLM
re-filter was silently falling back to returning the full unfiltered
dataset. Guard the re-filter call with an os.path.exists check on
call_graph.json; when absent, print a prominent WARNING with the unit
count and cost implication so the user knows --level was not applied.
Also add refilter_supported to the step report summary.

Medium 1 — align MODEL_PRIMARY from "claude-opus-4-20250514" to
"claude-opus-4-6" to match analyzer.py and the rest of the Opus-using
stages. Fix the misleading comment.

Medium 2 — remove max_units=limit from the analyze_reachability call.
--limit governs the analyze stage; the LLM reachability pass must review
the full codebase to find missed entry points. Coverage is controlled by
batch_size.

Low 1 — remove unused `field` import from llm_reachability.py.

Low 2 — fix prompt projection: reachable_from_entry (never written by any
path) → reachable (the actual field name written by the reachability filter).

Low 5 — set entry_point_reason on LLM-promoted units in apply_signals so
the promotion source is visible without cross-referencing
llm_reachability_signals. Also surface the actual MODEL_PRIMARY constant
in the step_context inputs instead of the hardcoded string "opus".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Enables the post-LLM reachability re-filter to work on JS, Go, C, Ruby,
and PHP repositories, not just Python and Zig.

C / Ruby / PHP (trivial — 3 lines each):
  call_graph.json is written immediately after CallGraphBuilder.export()
  in test_pipeline.py. graph_result already contains functions,
  call_graph, and reverse_call_graph in the format apply_reachability_filter
  expects.

JavaScript (easy):
  call_graph.json is written at the end of run_typescript_analyzer(), right
  after analyzer_output.json is available. Keys are normalised from
  camelCase (callGraph, reverseCallGraph) to snake_case so the Python-side
  filter can read them without any extra handling.

Go (moderate):
  call_graph.json is written at the end of run_go_parser_all(), after both
  analyzer_output.json and dataset.json are available. Functions are
  normalised to the camelCase shape EntryPointDetector expects (same
  conversion already done in apply_reachability_filter). Call graph edges
  are reconstructed from unit metadata.direct_calls / direct_callers (the
  same source apply_reachability_filter used).

Tests (test_call_graph_output.py):
  - TestApplyReachabilityFilterPublicAPI (5 tests, always run): verifies
    the public apply_reachability_filter API — basic filtering, extra
    entry points expand the reachable set, is_entry_point stamping,
    LLM-promoted flag preservation, missing-file graceful return.
  - TestPythonCallGraphOutput (2 tests, always run): parse_repository with
    processing_level=all and reachable both produce call_graph.json.
  - TestJavaScript/Go/C/Ruby/PHP CallGraphOutput (2 tests each, skip-guarded):
    same assertions, skip when the parser's runtime dependency is absent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add Go toolchain setup and per-platform go_parser build steps to the
python-tests CI job so call_graph.json tests for Go and JS don't silently
skip on all platforms. Also harden _go_parser_available() to try-execute
the binary (catching WinError 193 for cross-platform Linux ELF) and check
both go_parser and go_parser.exe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@joshbouncesecurity joshbouncesecurity force-pushed the feat/issue17-llm-reachability branch from c515fa4 to cf005c6 Compare May 12, 2026 14:42
@joshbouncesecurity joshbouncesecurity marked this pull request as ready for review May 12, 2026 14:48
@joshbouncesecurity
Copy link
Copy Markdown
Contributor Author

@ar7casper ready for any further review you want to do

Copy link
Copy Markdown
Collaborator

@ar7casper ar7casper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round-3 review — thanks for the substantive response to round-2.5. You nailed the High + both Mediums + 3 of 5 Lows.

Verified each fix:

Round-2.5 Status
🟠 High — multi-language correctness gap Fixed via the most thorough option. All 5 missing parsers (C/Ruby/PHP/JS/Go) now write call_graph.json (2b0dddf). PLUS defensive os.path.exists guard in scanner.py with a loud cost-impact WARNING if file is ever missing (3df8f0b). 422-line test_call_graph_output.py with 17 cross-parser tests locks in the contract. Belt-and-suspenders.
🟡 Medium — model version MODEL_PRIMARY = "claude-opus-4-6", comment fixed
🟡 Medium — --limit coupling ✅ Removed max_units=limit. Comment explains why.
🟢 Low — unused field import ✅ Removed
🟢 Low — always-null reachable_from_entry field ✅ Prompt projection now uses reachable (actually written by filter)
🟢 Low — observability step_context uses MODEL_PRIMARY constant; LLM-promoted units get entry_point_reason: "llm_reachability: ..."
🟢 Low — MAX_CODE_BYTES = 1500 ⏭️ Deferred (carry-over)
🟢 Low — bare with open(...) style ⏭️ Deferred (carry-over)

Plus the CI tweak (cf005c6) to ensure go_parser binary is built in the python-tests job — necessary for the new cross-parser tests to actually run. All 9 CI checks green.

The commit messages on 3df8f0b and 2b0dddf are exemplary — itemize every finding addressed and the per-parser approach. Easy to verify against round-2.5.

Three small Lows for awareness (none blocking):


🟢 Low — Import-before-sys-path-fix in C/Ruby/PHP test_pipeline.py

2b0dddf added from utilities.file_io import ... to parsers/{c,ruby,php}/test_pipeline.py to support the new call_graph.json writes. But the order is wrong — the import comes before the sys.path setup:

# parsers/php/test_pipeline.py (current)
from utilities.file_io import open_utf8, read_json, run_utf8, write_json   # line 45 — import FIRST
# ...
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))   # line 48 — sys.path LATER

Compare to JS/Go (which got the correct order in #56):

# parsers/javascript/test_pipeline.py
_core_root = _parser_dir.parent.parent
if str(_core_root) not in sys.path:
    sys.path.insert(0, str(_core_root))   # sys.path FIRST
# ...
from utilities.file_io import open_utf8, read_json, run_utf8, write_json   # import AFTER

CI passes because the python-tests workflow does pip install -e . which makes utilities globally importable from the venv (subprocess inherits). Production also works for the same reason. But local dev without pip install -e . hits ModuleNotFoundError: No module named 'utilities.file_io' when the parser subprocess is spawned — I confirmed this locally; 6 cross-parser tests fail without PYTHONPATH=. set; pass with it.

Suggestion: swap the order in C/Ruby/PHP test_pipeline.py to match the JS/Go pattern. Trivial 3-line change per file. Low impact (works in CI / production already), but consistency with #56's JS/Go pattern would close the local-dev gap.


🟢 Low — Cross-parser test contract doesn't verify unit_type field shape

_assert_call_graph_valid in test_call_graph_output.py:88-100 only checks that call_graph.json exists with the 3 required top-level keys. It doesn't verify the functions value shape — JS and Go's unitType (camelCase) versus Python/C/Ruby/PHP's unit_type (snake_case) inconsistency persists and isn't caught.

This is the documented pre-existing parser-issues #23-26 / Room-for-Improvement #10 issue. PR 50 didn't introduce it, but the new cross-parser test contract was a natural place to lock down the schema. Doesn't block this PR — for the LLM reachability functionality, LLM-promoted IDs flow through correctly via extra_entry_points. The structural-detection gap means JS/Go scans get fewer Check-1 (unit_type) entry points than they should, same as before this PR.

Suggestion (post-merge follow-up): add a contract assertion that unit_type exists in each function entry, surfacing the camelCase/snake_case drift if it ever causes a test to fail. Or normalize at the test level.


🟢 Low — PR body stale

Body still describes the original scope. Doesn't mention:

  • The cross-parser call_graph.json expansion (5 more parsers now write it)
  • The defensive WARNING for unsupported languages
  • That parse runs with processing_level="all" when --llm-reachability is set

Suggestion: add a "Round-2.5 fix summary" section or rewrite the Output bullet list to include the new behavior.


Verdict: ✅ ready to merge. The 3 Lows are housekeeping. Approving via UI.

@ar7casper ar7casper merged commit 8be2cd2 into knostic:release/2026-05-12 May 13, 2026
9 checks passed
@ar7casper ar7casper mentioned this pull request May 13, 2026
3 tasks
ar7casper added a commit that referenced this pull request May 13, 2026
Covers the seven PRs in this release:
- #35: parse --level default → reachable (CLI consistency with scan + Python CLI)
- #36: auto-detect dep changes via ~/.openant/venv/.deps-hash
- #37: lazy JS parser npm bootstrap on first use
- #39: TypeScript/NestJS DI-aware call resolution (constructor + field + functional inject())
- #40: --language auto opt-in for openant init + non-git path support + shared config/languages.json
- #49: Express anonymous route handler extraction (route_handler / route_middleware)
- #50: --llm-reachability opt-in stage + cross-parser call_graph.json contract

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants