fix: batch-eval docs + preserve config-bundle placeholders on AB-test promote#1638
fix: batch-eval docs + preserve config-bundle placeholders on AB-test promote#1638jariy17 wants to merge 2 commits into
Conversation
Builtin.Completeness is listed in the batch-evaluation docs and used in a
CLI example, but the API rejects it ('does not exist'). The valid builtin
list lives in run-eval.ts. Drop it from the evaluator table and switch the
dataset example to Builtin.Correctness.
config-bundle promote fetched the winning version's components from the
service, which keys them by resolved (account/region-specific) runtime ARN,
and wrote them straight into agentcore.json — replacing the committed
{{runtime:<name>}} placeholders with hardcoded ARNs and breaking cross-
account/region portability of the config.
Remap the service-returned ARN keys back to the bundle's existing portable
placeholders (inverting the same resolver deploy uses) before adopting them.
|
Claude Security Review: no high-confidence findings. (run) |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
LGTM — two well-scoped, independent fixes with appropriate test coverage.
Docs fix: Verified Builtin.Completeness is not in BUILTIN_EVALUATOR_LEVELS in src/cli/operations/eval/run-eval.ts (only Correctness, Faithfulness, Helpfulness, ResponseRelevance, Conciseness, Coherence, InstructionFollowing, Refusal, GoalSuccessRate, ToolSelectionAccuracy are listed). Removing the row and swapping the example to Builtin.Correctness is correct.
Promote placeholder fix: restorePlaceholderKeys correctly inverts the local placeholder→ARN map by reusing the same resolveComponentKeyForJsonPath resolver the deploy/recommendation paths use, so behavior stays consistent. Skipping local arn:-prefixed keys and the arn !== key guard (unresolved placeholders) are both right. Pass-through for unmatched service ARNs is a reasonable fallback.
Test quality: Good — the new test only mocks at true I/O boundaries (ConfigIO, getConfigurationBundleVersion) and exercises the real resolver. Not excessively coupled to implementation.
No telemetry needed since this is a bug fix on a path that doesn't currently emit telemetry.
Package TarballHow to installgh release download pr-1638-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.20.2.tgz |
Coverage Report
|
Two independent bug-bash fixes, one commit each.
1.
fix(docs): remove non-existentBuiltin.CompletenessevaluatorThe batch-evaluation docs listed
Builtin.Completenessand used it in a CLI example, but the API rejects it (Builtin evaluator Builtin.Completeness does not exist). The valid builtin list lives inrun-eval.tsand never included it.docs/batch-evaluation.mddocs/commands.mdtoBuiltin.Correctness2.
fix(ab-test): preserve portable component placeholders on promoteconfig-bundle
promotefetched the winning version's components from the service — which keys them by resolved (account/region-specific) runtime ARN — and wrote them straight intoagentcore.json, replacing the committed{{runtime:<name>}}placeholders with hardcoded ARNs and breaking cross-account/region portability.Before:
"{{runtime:cbagent}}"→ after promote:"arn:aws:bedrock-agentcore:us-west-2:...:runtime/cbbugbash_cbagent-N5owhv3MRl"Fix:
restorePlaceholderKeys()inverts the local bundle's placeholder→ARN map (using the same resolver deploy uses) and rewrites each incoming ARN key back to the placeholder before adopting. ARNs with no matching local placeholder pass through unchanged.Testing
promote.test.ts: added a case asserting an ARN-keyed service response is rewritten back to{{runtime:r}}and the ARN key is absent. 13/13 pass.