engine: wire LLM TOC builder in cmd/engine (HAL-317)#44
Conversation
cmd/engine's Pipeline construction set HyDE + SummaryAxes but never set TOCEnabled/TOCModel/TOCConcurrency/TOCCheckPages, so Pipeline.TOCEnabled defaulted to false, runTOCBuilder never ran, and documents.toc_tree stayed NULL on the standalone engine — the binary the OSS launch, --local mode, and the Docker image all use. That silently disabled the treewalk citation title_path (HAL-70): BuildHeadingPaths got an empty TOC and returned nothing. cmd/server already wired these; this mirrors it. Also surfaces ingest_mode + toc_enabled in the startup log so a misconfig is visible at boot. Closes HAL-317.
|
Warning Review limit reached
More reviews will be available in 31 minutes and 57 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer's guide (collapsed on small PRs)Reviewer's GuideWires the LLM-based TOC builder configuration into cmd/engine’s ingest pipeline and surfaces ingest mode and TOC enablement in startup logs so TOC-based features work correctly on the standalone engine. Flow diagram for TOC builder wiring in ingest pipelineflowchart TD
A[cfg.Ingest.TOC.Enabled] --> B[Pipeline.TOCEnabled]
B --> C{TOCEnabled?}
C -- true --> D[runTOCBuilder]
D --> E[documents.toc_tree]
E --> F[tree.BuildHeadingPaths]
F --> G[title_path]
C -- false --> H[toc_tree NULL and title_path disabled]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="cmd/engine/main.go" line_range="199-203" />
<code_context>
+ // these, Pipeline.TOCEnabled defaults to false, runTOCBuilder never
+ // runs, documents.toc_tree stays NULL, and the treewalk citation
+ // title_path (HAL-70) can never resolve on the standalone engine.
+ TOCEnabled: cfg.Ingest.TOC.Enabled,
+ TOCModel: cfg.Ingest.TOC.Model,
+ TOCConcurrency: cfg.Ingest.TOC.Concurrency,
+ TOCCheckPages: cfg.Ingest.TOC.TOCCheckPages,
+ GlobalLLMConcurrency: cfg.Ingest.GlobalLLMConcurrency,
})
if cfg.Ingest.Mode == ingest.ModeMinimal {
</code_context>
<issue_to_address>
**issue (bug_risk):** Clarify precedence between TOC config flags and `Ingest.ModeMinimal` behavior to avoid silent no-ops.
These TOC settings are applied regardless of ingest mode, but MINIMAL mode later skips TOC entirely. So a config with `TOC.Enabled=true` and `Ingest.Mode=ModeMinimal` is effectively ignored without any signal. Consider either forcing `TOCEnabled=false` when `ModeMinimal` is selected, or emitting a warning / error when TOC is configured but incompatible with the ingest mode, so operators get clear feedback instead of a silent no-op.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| TOCEnabled: cfg.Ingest.TOC.Enabled, | ||
| TOCModel: cfg.Ingest.TOC.Model, | ||
| TOCConcurrency: cfg.Ingest.TOC.Concurrency, | ||
| TOCCheckPages: cfg.Ingest.TOC.TOCCheckPages, | ||
| GlobalLLMConcurrency: cfg.Ingest.GlobalLLMConcurrency, |
There was a problem hiding this comment.
issue (bug_risk): Clarify precedence between TOC config flags and Ingest.ModeMinimal behavior to avoid silent no-ops.
These TOC settings are applied regardless of ingest mode, but MINIMAL mode later skips TOC entirely. So a config with TOC.Enabled=true and Ingest.Mode=ModeMinimal is effectively ignored without any signal. Consider either forcing TOCEnabled=false when ModeMinimal is selected, or emitting a warning / error when TOC is configured but incompatible with the ingest mode, so operators get clear feedback instead of a silent no-op.
What
cmd/enginenever setTOCEnabledon the ingest Pipeline, soPipeline.TOCEnableddefaulted to false,runTOCBuildernever ran, anddocuments.toc_treestayed NULL on the standalone engine — the binary the OSS launch,--localmode (HAL-186), and the Docker image (HAL-185) all use.That silently disabled the treewalk citation
title_path(HAL-70):tree.BuildHeadingPaths(HAL-109) received an empty TOC and returned nothing, so the structural-citation differentiator was dormant on the entire OSS path.config.ingest.toc.enabled: truewas ignored.cmd/serveralready wired these four fields; this mirrors it.How
TOCEnabled / TOCModel / TOCConcurrency / TOCCheckPagesfromcfg.Ingest.TOCin thecmd/enginePipeline.ingest_mode+toc_enabledin the startup log so a misconfig is visible at boot.Verification (honest)
Pre-fix: full PDF ingest logged
parse → summarize+hyde → readywith no TOC stage line at all, andtoc_treeNULL.Post-fix: the TOC stage now runs — confirmed live by the new log line
ingest: toc-builder failed; falling back to NULL toc_tree(it's now invoked rather than skipped). In that particular run the build then failed onanthropic: no response(GLM/z.ai provider flakiness, tracked separately in HAL-73), sotoc_treewas still NULL for that run — but the wiring fix is doing its job; populating the tree end-to-end now depends only on a healthy LLM provider.go build/vetclean.Closes HAL-317