Schema-aware coercion of per-call meta arguments (fix Qwen near-miss tool-arg rejections)#1470
Merged
Aaronontheweb merged 8 commits intoJun 24, 2026
Conversation
ChatGPT-trained models (Qwen3.6) name the per-call meta fields the ChatGPT way — TimeoutSeconds/Rationale/bare Timeout instead of the canonical _timeout_seconds/_rationale/_background. Since netclaw-dev#1398's loud validation, those near-misses are hard-rejected ("tool was NOT executed"), which pushes the model off tools and into emitting ChatGPT-style sandbox: download links instead of calling attach_file (observed in sessions D0AC6CKBK5K_1781746527 and C0B1GKNLZD3_1782215358). Resolve a supplied key to a meta field when its normalized form matches a canonical name AND the tool does not declare a real parameter that binds to it. This is schema-aware: a third-party MCP server's own 'timeout' parameter is forwarded untouched, while the model's near-miss is still coerced where no such parameter exists. The value is consumed, never silently dropped — preserving the no-silent-fallback invariant netclaw-dev#1398 established. Ambiguous double-spellings and invalid values are still rejected loudly. - ToolArgumentValidator.ResolveMetaField(tool, key): single schema-aware resolver; validator cache switched to instance-keyed (ConditionalWeakTable) so MCP tools sharing McpToolAdapter no longer conflate schemas. - ToolCallMeta.ExtractFrom / ValidateMetaValues / Extract take a resolver delegate (default exact, schema-blind) used by persistence; the executor passes the schema-aware resolver via IToolExecutor.PrepareToolCall. - Sub-agent loop now runs ValidateToolCall -> PrepareToolCall -> apply timeout hint, the same shared seam as the main pipeline (it previously skipped extraction and silently dropped the hint). - Loud default in the meta switch; dead SuggestFor branch removed; recognition loops merged. - netclaw-operations skill v2.15.0; tool_timeout_arg_recovery eval comment updated. Full Netclaw.Actors.Tests green (2471); Daemon provider-boundary green (91); slopwatch clean; headers verified.
…review findings Follow-up to the schema-aware meta-arg change (d43a16f), addressing an xhigh code review. Three substantive fixes plus cleanups. Correctness — ambiguity guard now covers MCP. The 'two keys map to one meta field' guard lived only in ToolArgumentValidator.ValidateArgumentKeys, which the executor skips for MCP tools; but MCP gets schema-aware near-miss extraction, so a double-spelled meta key on an MCP call silently last-write-wins. Moved the guard into ToolCallMetaExtractor.ValidateMetaValues (runs for every tool via DispatchingToolExecutor.ValidateArguments), checked ahead of value errors so it keeps priority. Removed the now-redundant guard from ValidateArgumentKeys. Altitude — one InterpretToolCall seam. Added IToolExecutor.InterpretToolCall (returns rejection? + meta + cleaned), which resolves the tool and builds the meta resolver ONCE, then validates and (only on success) extracts via a shared ValidateCore. The main pipeline and the sub-agent loop both route through it instead of calling ValidateToolCall then PrepareToolCall as two un-ordered steps (removes the double registry lookup and the extract-without-validate footgun). Timeout-hint application centralized in ToolExecutionContext.ApplyMeta (used by the sub-agent's two sites). Sub-agent's intentional ignore of _background/_rationale is now commented. Persistence matches runtime — ChatMessageConverter.FromAiMessage takes an optional schema-aware interpreter; LlmSessionActor passes the executor's PrepareToolCall at the tool-call persistence site, so a near-miss meta key is stripped and captured in MetaJson in recorded history (was persisted raw, schema-blind). Default stays exact for tests/replay. Cleanups: PrepareToolCall default delegates to ToolCallMetaExtractor.Extract; ValidateArgumentKeys computes NormalizeKey once; removed the stale 'exact-match for MCP' docs and the duplicate XML-doc on ResolveMetaField; added a loud default in the ExtractFrom meta switch plus a test that every MetaFieldNames entry is bound. Accepted (documented): undeclared near-miss on additionalProperties:true MCP schemas (off the declared interface; narrow); ConditionalWeakTable vs ConcurrentDictionary micro-cost (CWT required for per-instance MCP schemas). Tests: MCP ambiguity rejection; InterpretToolCall success/rejection; persistence near-miss round-trip; exact-resolver tests renamed off the misleading 'MCP' name; meta-field coverage guard. Full Netclaw.Actors.Tests green (2476); Daemon provider-boundary green (91); slopwatch clean; headers verified.
A tool call that needs approval, whose session passivates before approval and then cold-recovers, is re-driven through ChatMessageConverter.ToAiMessage, which rebuilt the call from ArgumentsJson only and never re-injected the meta hints that persistence had stripped into MetaJson. So the re-driven execution extracted no meta and silently fell back to the default timeout — dropping the agent's _timeout_seconds (and _background/_rationale). This reproduced the original 'long call killed at the default timeout' symptom on the post-approval re-drive path. Pre-existing (it affected canonical _timeout_seconds before the meta-arg work too; MetaJson was written and round-tripped but only ever read by CompactionPromptBuilder, never by the re-drive dispatch). Surfaced by the deep persistence review. ToAiMessage gains a reinjectMeta flag (default false): when set it parses the persisted MetaJson and re-injects the canonical keys (_rationale/_timeout_seconds/ _background) into the reconstructed arguments, so the re-dispatched call's extraction reapplies the hint. RedriveToolBatchForApproval opts in; the other four ToAiMessage callers (outbound provider history, compaction, sub-agent) keep it false — the model must never receive meta keys. Re-drive reconstructs for execution only (not re-persisted), so no meta keys leak back into the journal. Test: ToAiMessage_reinjectMeta_restores_hint_for_redrive_but_not_outbound proves the strip→reinject→re-extract round-trip restores the 1800s hint on re-drive while outbound history stays clean. Full Netclaw.Actors.Tests green (2477); slopwatch clean; headers verified.
Collaborator
Author
|
Need to run evals for this before I review this PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Self-hosted Qwen3.6 emits per-call meta hints (
_rationale/_timeout_seconds/_background) using ChatGPT-style names without the underscore —TimeoutSeconds,Timeout,Rationale. Since #1398's loud tool-argument validation, those near-misses are hard-rejected ("the tool was NOT executed"), which pushed the model off tools entirely and into emitting ChatGPT-stylesandbox:/…download links instead of callingattach_file.Observed in production sessions
D0AC6CKBK5K/1781746527.976389andC0B1GKNLZD3/1782215358.474099: 26 "Unrecognized argument" rejections across the fleet in the days after the 0.24 deploy (Rationale→_rationale,TimeoutSeconds→_timeout_seconds, …), and a cluster of deadsandbox:links the user kept correcting by hand.Approach
Make per-call meta-argument resolution schema-aware, applied uniformly to every tool:
The daemon loads the full schema for every tool — native and MCP — so the declared-param check is authoritative everywhere; no native/MCP special-casing. The no-silent-discard invariant holds for all tools: every supplied value is either consumed (correctly mapped) or loudly rejected (unknown key / invalid value / ambiguous spelling).
What's in here (3 commits)
ToolArgumentValidator.ResolveMetaField(tool, key)resolves near-miss meta names but yields to a tool's own declared parameter (so a third-party MCPtimeoutis forwarded, never hijacked). Validation and extraction share this one resolver. Validator cache switched to per-instance (ConditionalWeakTable) because MCP tools share a type but carry distinct schemas.InterpretToolCallseam — oneIToolExecutormethod (validate + extract in one step, resolving the tool and resolver once) that both the main pipeline and the sub-agent route through, so neither can skip a step. Fixes the sub-agent silently dropping timeout hints (it previously didn't extract meta at all). The meta-ambiguity guard moved toValidateMetaValues, which runs for every tool (it was native-only, leaving an MCP silent-drop).ToAiMessage(reinjectMeta: true)re-injects the persistedMetaJsonhints on the approval re-drive path, so a re-executed call honors its timeout instead of falling back to the default (a pre-existing latent bug surfaced by the deep review;MetaJsonwas written but never read by the re-drive dispatch).Persistence now records tool calls exactly as the executor interprets them (
LlmSessionActorpasses the schema-aware interpreter toChatMessageConverter.FromAiMessage), so history matches what ran.Review
Authored against an xhigh code review (10 finder angles) plus a focused deep review of the persistence/re-drive seam (3 adversarial tracers). Findings either fixed (ambiguity gap, persistence divergence, re-drive meta loss, stale docs, dead code, dup work) or accepted with rationale in the commit bodies (undeclared near-miss on
additionalProperties:trueMCP schemas — off the declared interface, narrow; CWT vs ConcurrentDictionary micro-cost — required for per-instance MCP schemas).Verification
Netclaw.Actors.Tests: 2477 passed;Netclaw.Daemon.Tests: 738 passed.InterpretToolCallsuccess/rejection, schema-aware resolution yields to declared params, persistence near-miss round-trip, re-drive hint restoration, meta-field switch-coverage guard.dotnet slopwatch analyzeclean; copyright headers verified.netclaw-operationsskill bumped to v2.15.0 (spelling-tolerant arg guidance).Eval/smoke suites N/A (no identity/TUI/skill-matching surface changed). The eval case
tool_timeout_arg_recoverystill passes by construction; worth a run against the live Qwen endpoint before release.