Skip to content

engine: wire LLM TOC builder in cmd/engine (HAL-317)#44

Merged
hallelx2 merged 1 commit into
mainfrom
halleluyaholudele/hal-317-cmd-engine-wire-toc
Jun 18, 2026
Merged

engine: wire LLM TOC builder in cmd/engine (HAL-317)#44
hallelx2 merged 1 commit into
mainfrom
halleluyaholudele/hal-317-cmd-engine-wire-toc

Conversation

@hallelx2

@hallelx2 hallelx2 commented Jun 18, 2026

Copy link
Copy Markdown
Owner

What

cmd/engine never set TOCEnabled on the ingest Pipeline, so Pipeline.TOCEnabled defaulted to false, runTOCBuilder never ran, and documents.toc_tree stayed NULL on the standalone engine — the binary the OSS launch, --local mode (HAL-186), and the Docker image (HAL-185) all use.

That silently disabled the treewalk citation title_path (HAL-70): tree.BuildHeadingPaths (HAL-109) received an empty TOC and returned nothing, so the structural-citation differentiator was dormant on the entire OSS path. config.ingest.toc.enabled: true was ignored. cmd/server already wired these four fields; this mirrors it.

How

  • Wire TOCEnabled / TOCModel / TOCConcurrency / TOCCheckPages from cfg.Ingest.TOC in the cmd/engine Pipeline.
  • Surface ingest_mode + toc_enabled in the startup log so a misconfig is visible at boot.

Verification (honest)

Pre-fix: full PDF ingest logged parse → summarize+hyde → ready with no TOC stage line at all, and toc_tree NULL.

Post-fix: the TOC stage now runs — confirmed live by the new log line ingest: toc-builder failed; falling back to NULL toc_tree (it's now invoked rather than skipped). In that particular run the build then failed on anthropic: no response (GLM/z.ai provider flakiness, tracked separately in HAL-73), so toc_tree was still NULL for that run — but the wiring fix is doing its job; populating the tree end-to-end now depends only on a healthy LLM provider. go build/vet clean.

Closes HAL-317

cmd/engine's Pipeline construction set HyDE + SummaryAxes but never set
TOCEnabled/TOCModel/TOCConcurrency/TOCCheckPages, so Pipeline.TOCEnabled
defaulted to false, runTOCBuilder never ran, and documents.toc_tree stayed
NULL on the standalone engine — the binary the OSS launch, --local mode, and
the Docker image all use. That silently disabled the treewalk citation
title_path (HAL-70): BuildHeadingPaths got an empty TOC and returned nothing.

cmd/server already wired these; this mirrors it. Also surfaces ingest_mode +
toc_enabled in the startup log so a misconfig is visible at boot.

Closes HAL-317.
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@hallelx2, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 31 minutes and 57 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 04a53ddf-d3b2-4fc3-8ce7-b97cdab86ee4

📥 Commits

Reviewing files that changed from the base of the PR and between c15175a and 1cc4490.

📒 Files selected for processing (1)
  • cmd/engine/main.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch halleluyaholudele/hal-317-cmd-engine-wire-toc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai

sourcery-ai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Wires the LLM-based TOC builder configuration into cmd/engine’s ingest pipeline and surfaces ingest mode and TOC enablement in startup logs so TOC-based features work correctly on the standalone engine.

Flow diagram for TOC builder wiring in ingest pipeline

flowchart TD
    A[cfg.Ingest.TOC.Enabled] --> B[Pipeline.TOCEnabled]
    B --> C{TOCEnabled?}
    C -- true --> D[runTOCBuilder]
    D --> E[documents.toc_tree]
    E --> F[tree.BuildHeadingPaths]
    F --> G[title_path]
    C -- false --> H[toc_tree NULL and title_path disabled]
Loading

File-Level Changes

Change Details Files
Wire TOC-related configuration from cfg.Ingest.TOC into the cmd/engine ingest Pipeline so the LLM TOC builder actually runs.
  • Populate Pipeline.TOCEnabled from cfg.Ingest.TOC.Enabled instead of relying on the false default.
  • Pass through TOCModel, TOCConcurrency, and TOCCheckPages from cfg.Ingest.TOC to the Pipeline configuration.
  • Ensure GlobalLLMConcurrency is still set on the Pipeline alongside the new TOC fields.
cmd/engine/main.go
Expose ingest mode and TOC enablement in engine startup logs to make misconfiguration visible.
  • Add ingest_mode field (from cfg.Ingest.Mode) to the structured startup logger call.
  • Add toc_enabled field (from cfg.Ingest.TOC.Enabled) to the structured startup logger call.
cmd/engine/main.go

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="cmd/engine/main.go" line_range="199-203" />
<code_context>
+		// these, Pipeline.TOCEnabled defaults to false, runTOCBuilder never
+		// runs, documents.toc_tree stays NULL, and the treewalk citation
+		// title_path (HAL-70) can never resolve on the standalone engine.
+		TOCEnabled:           cfg.Ingest.TOC.Enabled,
+		TOCModel:             cfg.Ingest.TOC.Model,
+		TOCConcurrency:       cfg.Ingest.TOC.Concurrency,
+		TOCCheckPages:        cfg.Ingest.TOC.TOCCheckPages,
+		GlobalLLMConcurrency: cfg.Ingest.GlobalLLMConcurrency,
 	})
 	if cfg.Ingest.Mode == ingest.ModeMinimal {
</code_context>
<issue_to_address>
**issue (bug_risk):** Clarify precedence between TOC config flags and `Ingest.ModeMinimal` behavior to avoid silent no-ops.

These TOC settings are applied regardless of ingest mode, but MINIMAL mode later skips TOC entirely. So a config with `TOC.Enabled=true` and `Ingest.Mode=ModeMinimal` is effectively ignored without any signal. Consider either forcing `TOCEnabled=false` when `ModeMinimal` is selected, or emitting a warning / error when TOC is configured but incompatible with the ingest mode, so operators get clear feedback instead of a silent no-op.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread cmd/engine/main.go
Comment on lines +199 to +203
TOCEnabled: cfg.Ingest.TOC.Enabled,
TOCModel: cfg.Ingest.TOC.Model,
TOCConcurrency: cfg.Ingest.TOC.Concurrency,
TOCCheckPages: cfg.Ingest.TOC.TOCCheckPages,
GlobalLLMConcurrency: cfg.Ingest.GlobalLLMConcurrency,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Clarify precedence between TOC config flags and Ingest.ModeMinimal behavior to avoid silent no-ops.

These TOC settings are applied regardless of ingest mode, but MINIMAL mode later skips TOC entirely. So a config with TOC.Enabled=true and Ingest.Mode=ModeMinimal is effectively ignored without any signal. Consider either forcing TOCEnabled=false when ModeMinimal is selected, or emitting a warning / error when TOC is configured but incompatible with the ingest mode, so operators get clear feedback instead of a silent no-op.

@hallelx2 hallelx2 merged commit 7a21b11 into main Jun 18, 2026
8 checks passed
@hallelx2 hallelx2 deleted the halleluyaholudele/hal-317-cmd-engine-wire-toc branch June 18, 2026 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant