Schema-aware coercion of per-call meta arguments (fix Qwen near-miss tool-arg rejections) by Aaronontheweb · Pull Request #1470 · netclaw-dev/netclaw

Aaronontheweb · 2026-06-23T22:28:11Z

Problem

Self-hosted Qwen3.6 emits per-call meta hints (_rationale/_timeout_seconds/_background) using ChatGPT-style names without the underscore — TimeoutSeconds, Timeout, Rationale. Since #1398's loud tool-argument validation, those near-misses are hard-rejected ("the tool was NOT executed"), which pushed the model off tools entirely and into emitting ChatGPT-style sandbox:/… download links instead of calling attach_file.

Observed in production sessions D0AC6CKBK5K/1781746527.976389 and C0B1GKNLZD3/1782215358.474099: 26 "Unrecognized argument" rejections across the fleet in the days after the 0.24 deploy (Rationale→_rationale, TimeoutSeconds→_timeout_seconds, …), and a cluster of dead sandbox: links the user kept correcting by hand.

Approach

Make per-call meta-argument resolution schema-aware, applied uniformly to every tool:

A key is a meta field iff it's an exact canonical name (_timeout_seconds…), or it resolves to a meta name and is not a declared parameter of this tool.

The daemon loads the full schema for every tool — native and MCP — so the declared-param check is authoritative everywhere; no native/MCP special-casing. The no-silent-discard invariant holds for all tools: every supplied value is either consumed (correctly mapped) or loudly rejected (unknown key / invalid value / ambiguous spelling).

What's in here (3 commits)

Schema-aware coercion — ToolArgumentValidator.ResolveMetaField(tool, key) resolves near-miss meta names but yields to a tool's own declared parameter (so a third-party MCP timeout is forwarded, never hijacked). Validation and extraction share this one resolver. Validator cache switched to per-instance (ConditionalWeakTable) because MCP tools share a type but carry distinct schemas.
Unified InterpretToolCall seam — one IToolExecutor method (validate + extract in one step, resolving the tool and resolver once) that both the main pipeline and the sub-agent route through, so neither can skip a step. Fixes the sub-agent silently dropping timeout hints (it previously didn't extract meta at all). The meta-ambiguity guard moved to ValidateMetaValues, which runs for every tool (it was native-only, leaving an MCP silent-drop).
Re-drive preserves meta — ToAiMessage(reinjectMeta: true) re-injects the persisted MetaJson hints on the approval re-drive path, so a re-executed call honors its timeout instead of falling back to the default (a pre-existing latent bug surfaced by the deep review; MetaJson was written but never read by the re-drive dispatch).

Persistence now records tool calls exactly as the executor interprets them (LlmSessionActor passes the schema-aware interpreter to ChatMessageConverter.FromAiMessage), so history matches what ran.

Review

Authored against an xhigh code review (10 finder angles) plus a focused deep review of the persistence/re-drive seam (3 adversarial tracers). Findings either fixed (ambiguity gap, persistence divergence, re-drive meta loss, stale docs, dead code, dup work) or accepted with rationale in the commit bodies (undeclared near-miss on additionalProperties:true MCP schemas — off the declared interface, narrow; CWT vs ConcurrentDictionary micro-cost — required for per-instance MCP schemas).

Verification

Netclaw.Actors.Tests: 2477 passed; Netclaw.Daemon.Tests: 738 passed.
New coverage: MCP ambiguity rejection, InterpretToolCall success/rejection, schema-aware resolution yields to declared params, persistence near-miss round-trip, re-drive hint restoration, meta-field switch-coverage guard.
dotnet slopwatch analyze clean; copyright headers verified.
netclaw-operations skill bumped to v2.15.0 (spelling-tolerant arg guidance).

Eval/smoke suites N/A (no identity/TUI/skill-matching surface changed). The eval case tool_timeout_arg_recovery still passes by construction; worth a run against the live Qwen endpoint before release.

ChatGPT-trained models (Qwen3.6) name the per-call meta fields the ChatGPT way — TimeoutSeconds/Rationale/bare Timeout instead of the canonical _timeout_seconds/_rationale/_background. Since netclaw-dev#1398's loud validation, those near-misses are hard-rejected ("tool was NOT executed"), which pushes the model off tools and into emitting ChatGPT-style sandbox: download links instead of calling attach_file (observed in sessions D0AC6CKBK5K_1781746527 and C0B1GKNLZD3_1782215358). Resolve a supplied key to a meta field when its normalized form matches a canonical name AND the tool does not declare a real parameter that binds to it. This is schema-aware: a third-party MCP server's own 'timeout' parameter is forwarded untouched, while the model's near-miss is still coerced where no such parameter exists. The value is consumed, never silently dropped — preserving the no-silent-fallback invariant netclaw-dev#1398 established. Ambiguous double-spellings and invalid values are still rejected loudly. - ToolArgumentValidator.ResolveMetaField(tool, key): single schema-aware resolver; validator cache switched to instance-keyed (ConditionalWeakTable) so MCP tools sharing McpToolAdapter no longer conflate schemas. - ToolCallMeta.ExtractFrom / ValidateMetaValues / Extract take a resolver delegate (default exact, schema-blind) used by persistence; the executor passes the schema-aware resolver via IToolExecutor.PrepareToolCall. - Sub-agent loop now runs ValidateToolCall -> PrepareToolCall -> apply timeout hint, the same shared seam as the main pipeline (it previously skipped extraction and silently dropped the hint). - Loud default in the meta switch; dead SuggestFor branch removed; recognition loops merged. - netclaw-operations skill v2.15.0; tool_timeout_arg_recovery eval comment updated. Full Netclaw.Actors.Tests green (2471); Daemon provider-boundary green (91); slopwatch clean; headers verified.

…review findings Follow-up to the schema-aware meta-arg change (d43a16f), addressing an xhigh code review. Three substantive fixes plus cleanups. Correctness — ambiguity guard now covers MCP. The 'two keys map to one meta field' guard lived only in ToolArgumentValidator.ValidateArgumentKeys, which the executor skips for MCP tools; but MCP gets schema-aware near-miss extraction, so a double-spelled meta key on an MCP call silently last-write-wins. Moved the guard into ToolCallMetaExtractor.ValidateMetaValues (runs for every tool via DispatchingToolExecutor.ValidateArguments), checked ahead of value errors so it keeps priority. Removed the now-redundant guard from ValidateArgumentKeys. Altitude — one InterpretToolCall seam. Added IToolExecutor.InterpretToolCall (returns rejection? + meta + cleaned), which resolves the tool and builds the meta resolver ONCE, then validates and (only on success) extracts via a shared ValidateCore. The main pipeline and the sub-agent loop both route through it instead of calling ValidateToolCall then PrepareToolCall as two un-ordered steps (removes the double registry lookup and the extract-without-validate footgun). Timeout-hint application centralized in ToolExecutionContext.ApplyMeta (used by the sub-agent's two sites). Sub-agent's intentional ignore of _background/_rationale is now commented. Persistence matches runtime — ChatMessageConverter.FromAiMessage takes an optional schema-aware interpreter; LlmSessionActor passes the executor's PrepareToolCall at the tool-call persistence site, so a near-miss meta key is stripped and captured in MetaJson in recorded history (was persisted raw, schema-blind). Default stays exact for tests/replay. Cleanups: PrepareToolCall default delegates to ToolCallMetaExtractor.Extract; ValidateArgumentKeys computes NormalizeKey once; removed the stale 'exact-match for MCP' docs and the duplicate XML-doc on ResolveMetaField; added a loud default in the ExtractFrom meta switch plus a test that every MetaFieldNames entry is bound. Accepted (documented): undeclared near-miss on additionalProperties:true MCP schemas (off the declared interface; narrow); ConditionalWeakTable vs ConcurrentDictionary micro-cost (CWT required for per-instance MCP schemas). Tests: MCP ambiguity rejection; InterpretToolCall success/rejection; persistence near-miss round-trip; exact-resolver tests renamed off the misleading 'MCP' name; meta-field coverage guard. Full Netclaw.Actors.Tests green (2476); Daemon provider-boundary green (91); slopwatch clean; headers verified.

A tool call that needs approval, whose session passivates before approval and then cold-recovers, is re-driven through ChatMessageConverter.ToAiMessage, which rebuilt the call from ArgumentsJson only and never re-injected the meta hints that persistence had stripped into MetaJson. So the re-driven execution extracted no meta and silently fell back to the default timeout — dropping the agent's _timeout_seconds (and _background/_rationale). This reproduced the original 'long call killed at the default timeout' symptom on the post-approval re-drive path. Pre-existing (it affected canonical _timeout_seconds before the meta-arg work too; MetaJson was written and round-tripped but only ever read by CompactionPromptBuilder, never by the re-drive dispatch). Surfaced by the deep persistence review. ToAiMessage gains a reinjectMeta flag (default false): when set it parses the persisted MetaJson and re-injects the canonical keys (_rationale/_timeout_seconds/ _background) into the reconstructed arguments, so the re-dispatched call's extraction reapplies the hint. RedriveToolBatchForApproval opts in; the other four ToAiMessage callers (outbound provider history, compaction, sub-agent) keep it false — the model must never receive meta keys. Re-drive reconstructs for execution only (not re-persisted), so no meta keys leak back into the journal. Test: ToAiMessage_reinjectMeta_restores_hint_for_redrive_but_not_outbound proves the strip→reinject→re-extract round-trip restores the 1800s hint on re-drive while outbound history stays clean. Full Netclaw.Actors.Tests green (2477); slopwatch clean; headers verified.

Aaronontheweb · 2026-06-23T23:54:05Z

Need to run evals for this before I review this PR

Aaronontheweb added 4 commits June 22, 2026 19:22

test(skills): block system and external skill mutations

9d89936

Aaronontheweb mentioned this pull request Jun 23, 2026

Schema-aware coercion of per-call meta arguments (fix Qwen near-miss tool-arg rejections) Aaronontheweb/netclaw#3

Closed

Merge branch 'dev' into worktree-coerce-meta-arg-names

872ab24

Aaronontheweb added bug Something isn't working mcp Model context protocol server / client issues. sessions LLM session actor, turn lifecycle, pipelines tools Issues related to agent tools: file_read, web_search, shell_execute, image processing, etc. labels Jun 23, 2026

Aaronontheweb mentioned this pull request Jun 23, 2026

Flaky screenshot regression: mcp-permissions-tool-grid captured before the server→tool-grid transition paints #1471

Open

Merge branch 'dev' into worktree-coerce-meta-arg-names

21247b2

Aaronontheweb enabled auto-merge (squash) June 23, 2026 23:53

Aaronontheweb disabled auto-merge June 23, 2026 23:53

Aaronontheweb added 2 commits June 23, 2026 19:13

Merge branch 'dev' into worktree-coerce-meta-arg-names

8ee2a70

Merge branch 'dev' into worktree-coerce-meta-arg-names

84dc606

Aaronontheweb enabled auto-merge (squash) June 24, 2026 02:46

Aaronontheweb merged commit 79bf66d into netclaw-dev:dev Jun 24, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema-aware coercion of per-call meta arguments (fix Qwen near-miss tool-arg rejections)#1470

Schema-aware coercion of per-call meta arguments (fix Qwen near-miss tool-arg rejections)#1470
Aaronontheweb merged 8 commits into
netclaw-dev:devfrom
Aaronontheweb:worktree-coerce-meta-arg-names

Aaronontheweb commented Jun 23, 2026

Uh oh!

Aaronontheweb commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aaronontheweb commented Jun 23, 2026

Problem

Approach

What's in here (3 commits)

Review

Verification

Uh oh!

Aaronontheweb commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant