Conversation
🦋 Changeset detectedLatest commit: 33a712d The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
79078d6 to
0e328aa
Compare
tatomyr
left a comment
There was a problem hiding this comment.
Left a couple of comments. I haven'd fully reviewed the scoring and collectors though as it takes time.
packages/cli/src/commands/score/__tests__/document-metrics.test.ts
Outdated
Show resolved
Hide resolved
tatomyr
left a comment
There was a problem hiding this comment.
I found some dead code. Please check if it's needed and remove if not. Also, I'm not sure what tests to review as many appear to only test that dead code, so let's handle that first.
packages/cli/src/commands/score/__tests__/document-metrics.test.ts
Outdated
Show resolved
Hide resolved
packages/cli/src/commands/score/__tests__/document-metrics.test.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for all 4 issues found in the latest run.
- ✅ Fixed: Discriminator flag lost with oneOf/anyOf polymorphism
- polyStats now sets hasDiscriminator when the parent schema has a discriminator (propertyName or non-empty mapping), merged with branch stats via OR.
- ✅ Fixed: Structured error count double-incremented per media type
- Added errorResponseStructuredCounted so structuredErrorResponseCount increases at most once per error response (description-only in Response.enter, or the first MediaType.enter that counts).
- ✅ Fixed: Independent max of correlated metrics across media types
- totalSchemaProperties, schemaPropertiesWithDescription, and constraintCount are now summed across media types instead of maxed independently.
- ✅ Fixed: Document parsed object mutated during metric collection
- collectMetrics now uses try/finally so restoreCompositionKeywords always runs after stripCompositionKeywords, even when inner work throws.
Or push these changes by commenting:
@cursor push b6568647da
Preview (b6568647da)
diff --git a/packages/cli/src/commands/score/collect-metrics.ts b/packages/cli/src/commands/score/collect-metrics.ts
--- a/packages/cli/src/commands/score/collect-metrics.ts
+++ b/packages/cli/src/commands/score/collect-metrics.ts
@@ -122,6 +122,30 @@
}: CollectMetricsOptions): CollectMetricsResult {
const removedComposition = stripCompositionKeywords(document.parsed);
+ try {
+ return collectMetricsInner({
+ document,
+ types,
+ resolvedRefMap,
+ ctx,
+ debugOperationId,
+ removedComposition,
+ });
+ } finally {
+ restoreCompositionKeywords(removedComposition);
+ }
+}
+
+function collectMetricsInner({
+ document,
+ types,
+ resolvedRefMap,
+ ctx,
+ debugOperationId,
+ removedComposition,
+}: CollectMetricsOptions & {
+ removedComposition: Map<object, StrippedComposition>;
+}): CollectMetricsResult {
const schemaWalkState = createSchemaWalkState();
const schemaVisitor = createSchemaMetricVisitor(schemaWalkState);
const normalizedSchemaVisitors = normalizeVisitors(
@@ -236,6 +260,10 @@
: null;
const hasDiscriminatorBranches =
Array.isArray(discriminatorRefs) && discriminatorRefs.length > 0;
+ const hasParentDiscriminator = !!(
+ disc?.propertyName ||
+ (isPlainObject(disc?.mapping) && Object.keys(disc.mapping).length > 0)
+ );
let result: SchemaStats;
@@ -258,6 +286,7 @@
...maxBranch,
polymorphismCount: maxBranch.polymorphismCount + polyBranches.length,
anyOfCount: maxBranch.anyOfCount + (polyKeyword === 'anyOf' ? polyBranches.length : 0),
+ hasDiscriminator: maxBranch.hasDiscriminator || hasParentDiscriminator,
};
}
@@ -323,8 +352,6 @@
ctx,
});
- restoreCompositionKeywords(removedComposition);
-
return {
metrics: getDocumentMetrics(accumulator),
debugLogs: accumulator.debugLogs,
diff --git a/packages/cli/src/commands/score/collectors/document-metrics.ts b/packages/cli/src/commands/score/collectors/document-metrics.ts
--- a/packages/cli/src/commands/score/collectors/document-metrics.ts
+++ b/packages/cli/src/commands/score/collectors/document-metrics.ts
@@ -212,6 +212,8 @@
inRequestBody: boolean;
inResponse: boolean;
currentResponseCode: string;
+ /** True once structured error counting ran for the current error response (Response or first MediaType). */
+ errorResponseStructuredCounted: boolean;
refsUsed: Set<string>;
}
@@ -278,6 +280,7 @@
inRequestBody: false,
inResponse: false,
currentResponseCode: '',
+ errorResponseStructuredCounted: false,
refsUsed: new Set(),
};
@@ -337,8 +340,10 @@
if (isErrorCode(code)) {
current.totalErrorResponses++;
+ current.errorResponseStructuredCounted = false;
if (!response.content && response.description) {
current.structuredErrorResponseCount++;
+ current.errorResponseStructuredCounted = true;
}
}
},
@@ -359,8 +364,13 @@
if (current.inResponse) current.responseExamplePresent = true;
}
- if (current.inResponse && isErrorCode(current.currentResponseCode)) {
+ if (
+ current.inResponse &&
+ isErrorCode(current.currentResponseCode) &&
+ !current.errorResponseStructuredCounted
+ ) {
current.structuredErrorResponseCount++;
+ current.errorResponseStructuredCounted = true;
}
if (mediaType.schema) {
@@ -372,15 +382,9 @@
const stats = accumulator.walkSchema(mediaType.schema, isDebugTarget);
current.propertyCount = Math.max(current.propertyCount, stats.propertyCount);
- current.totalSchemaProperties = Math.max(
- current.totalSchemaProperties,
- stats.totalSchemaProperties
- );
- current.schemaPropertiesWithDescription = Math.max(
- current.schemaPropertiesWithDescription,
- stats.schemaPropertiesWithDescription
- );
- current.constraintCount = Math.max(current.constraintCount, stats.constraintCount);
+ current.totalSchemaProperties += stats.totalSchemaProperties;
+ current.schemaPropertiesWithDescription += stats.schemaPropertiesWithDescription;
+ current.constraintCount += stats.constraintCount;
current.polymorphismCount = Math.max(current.polymorphismCount, stats.polymorphismCount);
current.anyOfCount = Math.max(current.anyOfCount, stats.anyOfCount);
if (stats.hasDiscriminator) current.hasDiscriminator = true;This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
packages/cli/src/commands/score/__tests__/document-metrics.test.ts
Outdated
Show resolved
Hide resolved
|
Yes, I think there is too much overlap between the two scores so I'm going to consolidate it into one. It will make docs easier, result understanding easier, etc... Addressing other comments too. |
|
Addressed all remaining review comments. The two scores have been consolidated into a single Agent Readiness score with all 9 subscores. Also removed the global |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Constraint clarity mixes metrics from different media types
- Media-type stats are now chosen together via a combined bundle score so constraintCount, totalSchemaProperties, and schemaPropertiesWithDescription always refer to the same walk.
- ✅ Fixed: Redundant
propertyCountandtotalSchemaPropertiesalways equal- Removed propertyCount from SchemaStats and OperationMetrics and use totalSchemaProperties everywhere, including polymorphic branch selection in collect-metrics.
Or push these changes by commenting:
@cursor push dfca434d5c
Preview (dfca434d5c)
diff --git a/packages/cli/src/commands/score/__tests__/dependency-graph.test.ts b/packages/cli/src/commands/score/__tests__/dependency-graph.test.ts
--- a/packages/cli/src/commands/score/__tests__/dependency-graph.test.ts
+++ b/packages/cli/src/commands/score/__tests__/dependency-graph.test.ts
@@ -15,7 +15,6 @@
polymorphismCount: 0,
anyOfCount: 0,
hasDiscriminator: false,
- propertyCount: 0,
operationDescriptionPresent: false,
schemaPropertiesWithDescription: 0,
totalSchemaProperties: 0,
diff --git a/packages/cli/src/commands/score/__tests__/document-metrics.test.ts b/packages/cli/src/commands/score/__tests__/document-metrics.test.ts
--- a/packages/cli/src/commands/score/__tests__/document-metrics.test.ts
+++ b/packages/cli/src/commands/score/__tests__/document-metrics.test.ts
@@ -131,7 +131,7 @@
const op = metrics.operations.get('createItem')!;
expect(op.requestBodyPresent).toBe(true);
expect(op.requestExamplePresent).toBe(true);
- expect(op.propertyCount).toBe(2);
+ expect(op.totalSchemaProperties).toBe(2);
expect(op.constraintCount).toBe(2);
expect(op.schemaPropertiesWithDescription).toBe(1);
});
diff --git a/packages/cli/src/commands/score/__tests__/hotspots.test.ts b/packages/cli/src/commands/score/__tests__/hotspots.test.ts
--- a/packages/cli/src/commands/score/__tests__/hotspots.test.ts
+++ b/packages/cli/src/commands/score/__tests__/hotspots.test.ts
@@ -16,7 +16,6 @@
polymorphismCount: 0,
anyOfCount: 0,
hasDiscriminator: false,
- propertyCount: 0,
operationDescriptionPresent: true,
schemaPropertiesWithDescription: 0,
totalSchemaProperties: 0,
diff --git a/packages/cli/src/commands/score/__tests__/index.test.ts b/packages/cli/src/commands/score/__tests__/index.test.ts
--- a/packages/cli/src/commands/score/__tests__/index.test.ts
+++ b/packages/cli/src/commands/score/__tests__/index.test.ts
@@ -51,7 +51,6 @@
polymorphismCount: 0,
anyOfCount: 0,
hasDiscriminator: false,
- propertyCount: 1,
operationDescriptionPresent: true,
schemaPropertiesWithDescription: 0,
totalSchemaProperties: 1,
diff --git a/packages/cli/src/commands/score/__tests__/scoring.test.ts b/packages/cli/src/commands/score/__tests__/scoring.test.ts
--- a/packages/cli/src/commands/score/__tests__/scoring.test.ts
+++ b/packages/cli/src/commands/score/__tests__/scoring.test.ts
@@ -23,7 +23,6 @@
polymorphismCount: 0,
anyOfCount: 0,
hasDiscriminator: false,
- propertyCount: 0,
operationDescriptionPresent: true,
schemaPropertiesWithDescription: 0,
totalSchemaProperties: 0,
@@ -132,7 +131,6 @@
parameterCount: 1,
requiredParameterCount: 1,
paramsWithDescription: 0,
- propertyCount: 1,
totalSchemaProperties: 1,
maxResponseSchemaDepth: 1,
}),
@@ -146,7 +144,6 @@
requestBodyPresent: true,
requestExamplePresent: true,
constraintCount: 1,
- propertyCount: 2,
totalSchemaProperties: 2,
maxRequestSchemaDepth: 1,
maxResponseSchemaDepth: 1,
diff --git a/packages/cli/src/commands/score/collect-metrics.ts b/packages/cli/src/commands/score/collect-metrics.ts
--- a/packages/cli/src/commands/score/collect-metrics.ts
+++ b/packages/cli/src/commands/score/collect-metrics.ts
@@ -87,7 +87,6 @@
polymorphismCount: 0,
anyOfCount: 0,
hasDiscriminator: false,
- propertyCount: 0,
totalSchemaProperties: 0,
schemaPropertiesWithDescription: 0,
constraintCount: 0,
@@ -102,7 +101,6 @@
polymorphismCount: a.polymorphismCount + b.polymorphismCount,
anyOfCount: a.anyOfCount + b.anyOfCount,
hasDiscriminator: a.hasDiscriminator || b.hasDiscriminator,
- propertyCount: a.propertyCount + b.propertyCount,
totalSchemaProperties: a.totalSchemaProperties + b.totalSchemaProperties,
schemaPropertiesWithDescription:
a.schemaPropertiesWithDescription + b.schemaPropertiesWithDescription,
@@ -172,7 +170,7 @@
let maxBranch = walkSchema(polyBranches[0], debug);
for (let i = 1; i < polyBranches.length; i++) {
const branchStats = walkSchema(polyBranches[i], debug);
- if (branchStats.propertyCount > maxBranch.propertyCount) {
+ if (branchStats.totalSchemaProperties > maxBranch.totalSchemaProperties) {
maxBranch = branchStats;
}
}
@@ -191,7 +189,7 @@
let maxBranch = walkSchema(discriminatorRefs[0], debug);
for (let i = 1; i < discriminatorRefs.length; i++) {
const branchStats = walkSchema(discriminatorRefs[i], debug);
- if (branchStats.propertyCount > maxBranch.propertyCount) {
+ if (branchStats.totalSchemaProperties > maxBranch.totalSchemaProperties) {
maxBranch = branchStats;
}
}
diff --git a/packages/cli/src/commands/score/collectors/document-metrics.ts b/packages/cli/src/commands/score/collectors/document-metrics.ts
--- a/packages/cli/src/commands/score/collectors/document-metrics.ts
+++ b/packages/cli/src/commands/score/collectors/document-metrics.ts
@@ -49,7 +49,6 @@
polymorphismCount: number;
anyOfCount: number;
hasDiscriminator: boolean;
- propertyCount: number;
totalSchemaProperties: number;
schemaPropertiesWithDescription: number;
constraintCount: number;
@@ -68,7 +67,6 @@
polymorphismCount: 0,
anyOfCount: 0,
hasDiscriminator: false,
- propertyCount: 0,
totalSchemaProperties: 0,
schemaPropertiesWithDescription: 0,
constraintCount: 0,
@@ -122,7 +120,6 @@
if (isPlainObject(schema.properties)) {
const props = Object.entries(schema.properties) as [string, any][];
state.totalSchemaProperties += props.length;
- state.propertyCount += props.length;
for (const [name, prop] of props) {
localPropertyNames.push(name);
@@ -166,7 +163,6 @@
maxRequestSchemaDepth: number;
maxResponseSchemaDepth: number;
- propertyCount: number;
totalSchemaProperties: number;
schemaPropertiesWithDescription: number;
constraintCount: number;
@@ -187,6 +183,9 @@
currentResponseCode: string;
errorStructuredCounted: boolean;
+ /** Best-scoring media type for constraint + property metrics (see `mediaTypeBundleScore`). */
+ mediaTypeBundleScore: number;
+
refsUsed: Set<string>;
}
@@ -233,7 +232,6 @@
maxRequestSchemaDepth: 0,
maxResponseSchemaDepth: 0,
- propertyCount: 0,
totalSchemaProperties: 0,
schemaPropertiesWithDescription: 0,
constraintCount: 0,
@@ -254,6 +252,8 @@
currentResponseCode: '',
errorStructuredCounted: false,
+ mediaTypeBundleScore: -1,
+
refsUsed: new Set(),
};
}
@@ -273,7 +273,6 @@
polymorphismCount: ctx.polymorphismCount,
anyOfCount: ctx.anyOfCount,
hasDiscriminator: ctx.hasDiscriminator,
- propertyCount: ctx.propertyCount,
operationDescriptionPresent: ctx.operationDescriptionPresent,
schemaPropertiesWithDescription: ctx.schemaPropertiesWithDescription,
totalSchemaProperties: ctx.totalSchemaProperties,
@@ -350,12 +349,13 @@
const stats = accumulator.walkSchema(mediaType.schema, isDebugTarget);
- current.propertyCount = Math.max(current.propertyCount, stats.propertyCount);
- if (stats.totalSchemaProperties > current.totalSchemaProperties) {
+ const bundleScore = mediaTypeBundleScore(stats);
+ if (bundleScore > current.mediaTypeBundleScore) {
+ current.mediaTypeBundleScore = bundleScore;
current.totalSchemaProperties = stats.totalSchemaProperties;
current.schemaPropertiesWithDescription = stats.schemaPropertiesWithDescription;
+ current.constraintCount = stats.constraintCount;
}
- current.constraintCount = Math.max(current.constraintCount, stats.constraintCount);
current.polymorphismCount = Math.max(current.polymorphismCount, stats.polymorphismCount);
current.anyOfCount = Math.max(current.anyOfCount, stats.anyOfCount);
if (stats.hasDiscriminator) current.hasDiscriminator = true;
@@ -388,7 +388,7 @@
accumulator.debugLogs.push({
context,
entries: stats.debugEntries,
- totalProperties: stats.propertyCount,
+ totalProperties: stats.totalSchemaProperties,
totalPolymorphism: stats.polymorphismCount,
totalConstraints: stats.constraintCount,
maxDepth: stats.maxDepth,
@@ -475,3 +475,13 @@
const num = parseInt(code, 10);
return num >= 400 && num < 600;
}
+
+/** Matches `propCount` in scoring.ts so operation-level rollup stays from one media type. */
+function propCountForScoring(totalSchemaProperties: number): number {
+ return totalSchemaProperties || 1;
+}
+
+function mediaTypeBundleScore(stats: SchemaStats): number {
+ const p = propCountForScoring(stats.totalSchemaProperties);
+ return stats.constraintCount / p + stats.schemaPropertiesWithDescription / p;
+}
diff --git a/packages/cli/src/commands/score/formatters/stylish.ts b/packages/cli/src/commands/score/formatters/stylish.ts
--- a/packages/cli/src/commands/score/formatters/stylish.ts
+++ b/packages/cli/src/commands/score/formatters/stylish.ts
@@ -85,7 +85,7 @@
params.push(m.parameterCount);
depths.push(Math.max(m.maxRequestSchemaDepth, m.maxResponseSchemaDepth));
polys.push(m.polymorphismCount);
- props.push(m.propertyCount);
+ props.push(m.totalSchemaProperties);
if (m.requestExamplePresent) opsWithReqExample++;
if (m.responseExamplePresent) opsWithResExample++;
if (m.operationDescriptionPresent) opsWithDescription++;
@@ -140,7 +140,7 @@
out(cyan(' ' + '─'.repeat(header.length - 2)));
const entries = [...result.rawMetrics.operations.entries()].sort(
- ([, a], [, b]) => b.propertyCount - a.propertyCount
+ ([, a], [, b]) => b.totalSchemaProperties - a.totalSchemaProperties
);
for (const [, m] of entries) {
@@ -149,7 +149,7 @@
const line =
' ' +
label.padEnd(50) +
- String(m.propertyCount).padStart(7) +
+ String(m.totalSchemaProperties).padStart(7) +
String(m.polymorphismCount).padStart(7) +
String(depth).padStart(7) +
String(m.parameterCount).padStart(8) +
diff --git a/packages/cli/src/commands/score/types.ts b/packages/cli/src/commands/score/types.ts
--- a/packages/cli/src/commands/score/types.ts
+++ b/packages/cli/src/commands/score/types.ts
@@ -15,7 +15,6 @@
polymorphismCount: number;
anyOfCount: number;
hasDiscriminator: boolean;
- propertyCount: number;
operationDescriptionPresent: boolean;
schemaPropertiesWithDescription: number;
@@ -125,7 +124,6 @@
polymorphismCount: number;
anyOfCount: number;
hasDiscriminator: boolean;
- propertyCount: number;
totalSchemaProperties: number;
schemaPropertiesWithDescription: number;
constraintCount: number;
diff --git a/tests/e2e/score/score-json/snapshot.txt b/tests/e2e/score/score-json/snapshot.txt
--- a/tests/e2e/score/score-json/snapshot.txt
+++ b/tests/e2e/score/score-json/snapshot.txt
@@ -4,8 +4,8 @@
"subscores": {
"parameterSimplicity": 0.8625,
"schemaSimplicity": 0.796875,
- "documentationQuality": 0.9017857142857143,
- "constraintClarity": 0.5833333333333333,
+ "documentationQuality": 0.8988095238095238,
+ "constraintClarity": 0.6,
"exampleCoverage": 0.875,
"errorClarity": 1,
"dependencyClarity": 0.7291666666666669,
@@ -29,7 +29,6 @@
"polymorphismCount": 0,
"anyOfCount": 0,
"hasDiscriminator": false,
- "propertyCount": 3,
"operationDescriptionPresent": true,
"schemaPropertiesWithDescription": 2,
"totalSchemaProperties": 3,
@@ -54,7 +53,6 @@
"polymorphismCount": 0,
"anyOfCount": 0,
"hasDiscriminator": false,
- "propertyCount": 6,
"operationDescriptionPresent": true,
"schemaPropertiesWithDescription": 6,
"totalSchemaProperties": 6,
@@ -79,7 +77,6 @@
"polymorphismCount": 0,
"anyOfCount": 0,
"hasDiscriminator": false,
- "propertyCount": 6,
"operationDescriptionPresent": true,
"schemaPropertiesWithDescription": 6,
"totalSchemaProperties": 6,
@@ -104,7 +101,6 @@
"polymorphismCount": 0,
"anyOfCount": 0,
"hasDiscriminator": false,
- "propertyCount": 6,
"operationDescriptionPresent": true,
"schemaPropertiesWithDescription": 6,
"totalSchemaProperties": 6,
@@ -129,7 +125,6 @@
"polymorphismCount": 0,
"anyOfCount": 0,
"hasDiscriminator": false,
- "propertyCount": 2,
"operationDescriptionPresent": true,
"schemaPropertiesWithDescription": 0,
"totalSchemaProperties": 2,
@@ -154,7 +149,6 @@
"polymorphismCount": 0,
"anyOfCount": 0,
"hasDiscriminator": false,
- "propertyCount": 6,
"operationDescriptionPresent": true,
"schemaPropertiesWithDescription": 6,
"totalSchemaProperties": 6,
@@ -179,10 +173,9 @@
"polymorphismCount": 0,
"anyOfCount": 0,
"hasDiscriminator": false,
- "propertyCount": 6,
"operationDescriptionPresent": true,
- "schemaPropertiesWithDescription": 5,
- "totalSchemaProperties": 6,
+ "schemaPropertiesWithDescription": 4,
+ "totalSchemaProperties": 5,
"constraintCount": 4,
"requestExamplePresent": true,
"responseExamplePresent": true,
@@ -204,7 +197,6 @@
"polymorphismCount": 0,
"anyOfCount": 0,
"hasDiscriminator": false,
- "propertyCount": 0,
"operationDescriptionPresent": true,
"schemaPropertiesWithDescription": 0,
"totalSchemaProperties": 0,
@@ -303,12 +295,12 @@
}
},
"buyMuseumTickets": {
- "agentReadiness": 88.8,
+ "agentReadiness": 90,
"subscores": {
"parameterSimplicity": 1,
"schemaSimplicity": 0.875,
- "documentationQuality": 0.8571428571428571,
- "constraintClarity": 0.6666666666666666,
+ "documentationQuality": 0.8333333333333334,
+ "constraintClarity": 0.8,
"exampleCoverage": 1,
"errorClarity": 1,
"dependencyClarity": 0.6666666666666667,
diff --git a/tests/e2e/score/score-stylish/snapshot.txt b/tests/e2e/score/score-stylish/snapshot.txt
--- a/tests/e2e/score/score-stylish/snapshot.txt
+++ b/tests/e2e/score/score-stylish/snapshot.txt
@@ -8,7 +8,7 @@
Parameter Simplicity [█████████████████░░░] 86%
Schema Simplicity [████████████████░░░░] 80%
Documentation Quality [██████████████████░░] 90%
- Constraint Clarity [████████████░░░░░░░░] 58%
+ Constraint Clarity [████████████░░░░░░░░] 60%
Example Coverage [██████████████████░░] 88%
Error Clarity [████████████████████] 100%
Dependency Clarity [███████████████░░░░░] 73%
@@ -22,7 +22,7 @@
Parameters/operation: avg 1.4 median 1.0 min 0 max 4
Schema depth: avg 1.6 median 2.0 min 0 max 3
Polymorphism/operation: avg 0.0 median 0.0 min 0 max 0
- Properties/operation: avg 4.4 median 6.0 min 0 max 6
+ Properties/operation: avg 4.3 median 5.5 min 0 max 6
Operations with request examples: 3/8 (38%)
Operations with response examples: 7/8 (88%)
Operations with description: 8/8 (100%)This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
|
Addressed all review comments in fdb8c94:
|
| result.polymorphismClarity /= n; | ||
|
|
||
| return result; | ||
| } |
There was a problem hiding this comment.
Document subscores and composite score use inconsistent aggregation
Medium Severity
aggregateSubscores computes the arithmetic mean of per-operation subscores, while computeDocumentScores computes the median of per-operation composite scores. Because median-of-weighted-sums ≠ weighted-sum-of-means, applying computeAgentReadiness to the displayed aggregate subscores yields a different number than the displayed agentReadiness. The subscores shown to the user don't actually explain the composite score, which is misleading.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit fdb8c94. Configure here.
tatomyr
left a comment
There was a problem hiding this comment.
LGTM. There are a few bugbot comments left -- please review whether they make sense.
| prop && '$ref' in prop && prop.$ref ? (ctx.resolve(prop)?.node ?? prop) : prop; | ||
| if (res?.description) state.schemaPropertiesWithDescription++; | ||
| if (res?.example !== undefined || res?.examples) state.hasPropertyExamples = true; | ||
| if (!res?.readOnly) state.writableTopLevelFields++; |
There was a problem hiding this comment.
Writable field counter ignores depth, miscounts properties
Medium Severity
The writableTopLevelFields counter in createSchemaMetricVisitor increments for every non-readOnly property at every nesting depth, not just at depth 0. Since Schema.enter is called recursively for nested schemas, a schema like { name: string, address: { street: string, city: string } } would report 3 writable "top-level" fields instead of 2. This incorrect value flows through SchemaStats.writableTopLevelFields → OperationMetrics.topLevelWritableFieldCount and appears in the JSON output, giving users misleading data.
Reviewed by Cursor Bugbot for commit a7a0f2a. Configure here.
- Add "AI" before "agent readiness" in changeset and docs for clarity - Replace <pre> block with fenced code block in score.md - Add security scheme coverage to metrics documentation - Remove resolveIfRef helper, replace with resolveNode that falls back to the original node when resolution fails - Refactor to use walkDocument visitor approach (matching stats command pattern) instead of manually iterating the document tree - Use resolveDocument + normalizeVisitors + walkDocument from openapi-core for proper $ref resolution and spec-format extensibility - Update index.test.ts to mock the new walk infrastructure Made-with: Cursor
Made-with: Cursor
Co-authored-by: Jacek Łękawa <164185257+JLekawa@users.noreply.github.com>
- Inline collect-metrics.ts test helper into document-metrics.test.ts - Use parseYaml as yaml directly instead of wrapper function - Remove default parameter from getStylishOutput in formatter tests - Use getMajorSpecVersion + exitWithError for spec version check - Add explicit case 'stylish' before default in format switch - Remove unsupported 'markdown' from score command format choices - Add comment explaining depth=-1 initialization - Clarify anyOf penalty and dependency terminology in docs - Update non-oas3 rejection test for exitWithError (throws) Made-with: Cursor
- Add type cast for parseYaml (returns unknown) in document-metrics tests - Inline collectDocumentMetrics helper into example-coverage tests - Remove collect-metrics.js import from scoring tests, use direct metrics - Add missing debugLogs property to accumulator mock in index tests Made-with: Cursor
Extract the metric-collection pipeline from handleScore into a standalone collect-metrics.ts module with two exports: - collectMetrics(): low-level function used by handleScore - collectDocumentMetrics(): high-level convenience used by tests This eliminates ~300 lines of duplicated walker setup across three test files (document-metrics, example-coverage, index) and ensures tests exercise the same code path as the production command. Add $ref-keyed memoization to walkSchema so repeated references to the same component schema return cached stats instead of re-walking. Stripe API: 37.6s → 11.3s (3.3× faster). Made-with: Cursor
Add median alongside averages for parameters, schema depth, polymorphism, and properties in the stylish output. Rename the misleading "Avg max schema depth" to "Schema depth". Made-with: Cursor
Rename workflowClarity to dependencyClarity, workflowDepths to dependencyDepths, and workflow-graph to dependency-graph to align code naming with the "Dependency Clarity" display label. Also adds discoverability subscore, recursive composition keyword stripping for accurate property counting, and updates e2e snapshots. Made-with: Cursor
- Use isPlainObject from openapi-core for schema cycle detection instead of raw typeof checks that match arrays - Remove formatter unit tests in favor of e2e snapshot coverage - Clarify makeScores() purpose in hotspot tests Made-with: Cursor
- Add experimental admonition to score command docs - Remove --config from usage examples - Remove redundant mockClear and process.exitCode in tests - Simplify JSON formatter by using spread instead of manual field mapping - Use isRef from openapi-core for $ref detection in collect-metrics - Use isPlainObject and isNotEmptyObject from openapi-core in document-metrics - Replace != null with Array.isArray for discriminatorRefs check - Remove unnecessary non-null assertions on discriminatorRefs - Lower branch coverage threshold to 70% per reviewer guidance Made-with: Cursor
- Fix duplicate import from @redocly/openapi-core (CI lint error) - Propagate hasDiscriminator flag for oneOf/anyOf + discriminator schemas - Prevent structured error response double-counting per media type - Keep totalSchemaProperties and schemaPropertiesWithDescription paired from the same schema to avoid misleading documentation quality ratios - Wrap composition stripping in try/finally to guarantee restoration - Export isMappingRef from openapi-core and use in collect-metrics - Rename bfsMaxDepth to computeLongestBfsPath for clarity Made-with: Cursor
- Add allOf member count to polymorphismCount for parity with oneOf/anyOf branch counting - Handle RFC 6901 JSON Pointer escaping (~0 → ~ and ~1 → /) in resolveJsonPointer - Remove unused scores parameter from getHotspotReasons Made-with: Cursor
… single score Merge the two overlapping composite scores into a single "Agent Readiness" score with all 9 subscores. This eliminates duplication in weights, scoring functions, and output. Also addresses code review feedback: use unescapePointerFragment, simplify resetSchemaWalkState via Object.assign, consolidate SchemaStats into types.ts, remove collect() test wrapper, simplify hasExample return, and use spread for walkSchemaRaw return. Made-with: Cursor
- Remove global stripCompositionKeywords that silently lost nested oneOf/anyOf within properties/items — the parentOnly shallow copy already prevents double-counting at the top level while preserving nested composition for walkSchemaRaw to count naturally - Remove unused maxPropertiesGood threshold (dead code) - Extract duplicate median function to shared utils.ts Made-with: Cursor
- Remove as any casts from normalizeVisitors — types infer correctly - Remove redundant propertyCount (always equals totalSchemaProperties) - Fix constraintCount mixing across media types — keep all three correlated metrics from the same schema walk - Consolidate CurrentOperationContext with OperationMetrics field names, simplify buildOperationMetrics to destructuring spread - Export Oas3MediaType from openapi-core, use in hasExample signature - Remove any type on MediaType.enter — visitor infers type by default - Move collectDocumentMetrics to __tests__/collect-metrics-helper.ts since it is test-only code Made-with: Cursor
- Preserve top-level $ref in refsUsed when composition branch is taken, so computeDependencyDepths correctly tracks cross-operation schema sharing for oneOf/anyOf/allOf/discriminator schemas - Return constraintClarity=1 when totalSchemaProperties is 0, matching errorClarity's "no data" pattern — avoids unfairly penalizing DELETE and binary-returning endpoints that have no schema properties Made-with: Cursor
a7a0f2a to
33a712d
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Independent max of polymorphism metrics inflates penalties
- Polymorphism and anyOf counts are now taken together from the single schema that maximizes the same effective-polymorphism formula as scoring, so mixed values from different media types can no longer inflate penalties.
Or push these changes by commenting:
@cursor push 6629d2a2fe
Preview (6629d2a2fe)
diff --git a/packages/cli/src/commands/score/__tests__/document-metrics.test.ts b/packages/cli/src/commands/score/__tests__/document-metrics.test.ts
--- a/packages/cli/src/commands/score/__tests__/document-metrics.test.ts
+++ b/packages/cli/src/commands/score/__tests__/document-metrics.test.ts
@@ -239,4 +239,45 @@
);
expect(metrics.operations.get('listItems')!.responseExamplePresent).toBe(true);
});
+
+ it('keeps polymorphismCount and anyOfCount from the same schema (max effective polymorphism across media types)', async () => {
+ const { metrics } = await collectDocumentMetrics(
+ yaml(outdent`
+ openapi: 3.0.0
+ info:
+ title: Test
+ version: '1.0'
+ paths:
+ /items:
+ post:
+ operationId: mixedPoly
+ requestBody:
+ content:
+ application/json:
+ schema:
+ oneOf:
+ - type: string
+ - type: number
+ - type: boolean
+ - type: object
+ - type: array
+ responses:
+ '200':
+ description: OK
+ content:
+ application/json:
+ schema:
+ anyOf:
+ - type: object
+ - type: object
+ - type: object
+ `)
+ );
+
+ const op = metrics.operations.get('mixedPoly')!;
+ // Request alone: poly 5 / anyOf 0 (effective 5). Response alone: poly 3 / anyOf 3 (effective 6).
+ // Must not mix into poly 5 + anyOf 3 (invalid effective 8).
+ expect(op.polymorphismCount).toBe(3);
+ expect(op.anyOfCount).toBe(3);
+ });
});
diff --git a/packages/cli/src/commands/score/collectors/document-metrics.ts b/packages/cli/src/commands/score/collectors/document-metrics.ts
--- a/packages/cli/src/commands/score/collectors/document-metrics.ts
+++ b/packages/cli/src/commands/score/collectors/document-metrics.ts
@@ -12,7 +12,7 @@
type Referenced,
} from '@redocly/openapi-core';
-import { AMBIGUOUS_PARAM_NAMES } from '../constants.js';
+import { AMBIGUOUS_PARAM_NAMES, DEFAULT_SCORING_CONSTANTS } from '../constants.js';
import type {
DebugMediaTypeLog,
DebugSchemaEntry,
@@ -25,6 +25,13 @@
type Param = Oas3Parameter<Schema>;
type ResolveFn = UserContext['resolve'];
+const ANY_OF_PENALTY_MULTIPLIER = DEFAULT_SCORING_CONSTANTS.weights.anyOfPenaltyMultiplier;
+
+function schemaEffectivePolymorphism(polymorphismCount: number, anyOfCount: number): number {
+ const otherPoly = polymorphismCount - anyOfCount;
+ return otherPoly + anyOfCount * ANY_OF_PENALTY_MULTIPLIER;
+}
+
const CONSTRAINT_KEYS: readonly string[] = [
'enum',
'const',
@@ -169,6 +176,7 @@
constraintCount: number;
polymorphismCount: number;
anyOfCount: number;
+ maxEffectivePolymorphism: number;
hasDiscriminator: boolean;
topLevelWritableFieldCount: number;
@@ -235,6 +243,7 @@
constraintCount: 0,
polymorphismCount: 0,
anyOfCount: 0,
+ maxEffectivePolymorphism: 0,
hasDiscriminator: false,
topLevelWritableFieldCount: 0,
@@ -260,6 +269,7 @@
inResponse: _2,
currentResponseCode: _3,
errorStructuredCounted: _4,
+ maxEffectivePolymorphism: _5,
...metrics
} = ctx;
return metrics;
@@ -333,8 +343,12 @@
current.schemaPropertiesWithDescription = stats.schemaPropertiesWithDescription;
current.constraintCount = stats.constraintCount;
}
- current.polymorphismCount = Math.max(current.polymorphismCount, stats.polymorphismCount);
- current.anyOfCount = Math.max(current.anyOfCount, stats.anyOfCount);
+ const effective = schemaEffectivePolymorphism(stats.polymorphismCount, stats.anyOfCount);
+ if (effective > current.maxEffectivePolymorphism) {
+ current.maxEffectivePolymorphism = effective;
+ current.polymorphismCount = stats.polymorphismCount;
+ current.anyOfCount = stats.anyOfCount;
+ }
if (stats.hasDiscriminator) current.hasDiscriminator = true;
for (const ref of stats.refsUsed) current.refsUsed.add(ref);This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 33a712d. Configure here.
| current.constraintCount = stats.constraintCount; | ||
| } | ||
| current.polymorphismCount = Math.max(current.polymorphismCount, stats.polymorphismCount); | ||
| current.anyOfCount = Math.max(current.anyOfCount, stats.anyOfCount); |
There was a problem hiding this comment.
Independent max of polymorphism metrics inflates penalties
Medium Severity
polymorphismCount and anyOfCount are independently maximized across different media types (request body and responses), creating a phantom metric state that doesn't correspond to any actual schema. The effectivePolymorphism function in scoring.ts computes otherPoly = polymorphismCount - anyOfCount, but when these values originate from different schemas, the result is inflated. For example, a request body with 5 polymorphism (1 anyOf) and a response with 3 polymorphism (3 anyOf) yields mixed values of 5 and 3, producing effectivePolymorphism of 8 instead of the correct maximum of 6 from either schema alone. This over-penalizes both schemaSimplicity and polymorphismClarity subscores.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 33a712d. Configure here.




What/Why/How?
What: Adds a new score command to Redocly CLI that analyzes OpenAPI 3.x descriptions and produces two composite scores: Integration Simplicity (0-100, how easy is this API to integrate) and Agent Readiness (0-100, how usable is this API by AI agents/LLM tooling).
Why: API producers currently lack a quick, deterministic way to assess how developer-friendly or AI-agent-friendly their API descriptions are. The existing stats command counts structural elements but doesn't evaluate quality signals like documentation coverage, example presence, schema complexity, or error response structure. This command fills that gap with actionable, explainable scores.
How: The implementation follows the same pattern as the
statscommand (bundle+analyze), with a clean separation between metric collection and score calculation:collectors/): Walks the bundled document, resolving internal $refs, to gather per-operation raw metrics (parameter counts, schema depth, polymorphism, description/constraint/example coverage, structured error responses, workflow dependency depth via shared schema refs).scoring.ts): Pure functions that normalize raw metrics into subscores and compute weighted composite scores. Thresholds and weights are configurable constants. anyOf is penalized more heavily than oneOf/allOf; discriminator presence improves the agent readiness polymorphism clarity subscore.hotspots.ts): Identifies the operations with the most issues, sorted by number of reasons, with human-readable explanations.--format=stylish(default, with color bar charts) and--format=json(machine-readable for CI/dashboards).Reference
Related to API governance and developer experience tooling. No existing issue -- this is a new feature.
Testing
$refresolution, polymorphism counting (oneOf/anyOf/allOf), constraint detection (including const), example coverage scoring,anyOfpenalty multiplier,discriminatorimpact on agent readiness, deterministic output, and score range validation.tsc --noEmit), all existing tests continue to pass.Screenshots (optional)
Stylish output for Redocly Cafe:
Check yourself
Security
Note
Medium Risk
Introduces a new CLI command with non-trivial metric collection and scoring logic over bundled OpenAPI specs; risk is mainly correctness/performance and output stability rather than security-critical behavior.
Overview
Adds a new experimental
redocly scoreCLI command (OpenAPI 3.x only) that computes an Agent Readiness score (0–100) plus normalized subscores, dependency depths (from shared$refusage), and a ranked list of hotspot operations with human-readable reasons.Implements end-to-end metric collection by walking resolved specs (including schema depth/polymorphism/constraints/examples/descriptions, ambiguous params, and error response structure), scoring/aggregation utilities, and both
stylishandjsonoutput formats (with optional per-operation details and a--debug-operation-idschema breakdown).Updates docs/navigation and adds unit + e2e snapshot coverage for scoring, hotspots, dependency graph, and output; also exports
isMappingRefandOas3MediaTypefrom core and slightly relaxes coverage thresholds.Reviewed by Cursor Bugbot for commit 33a712d. Bugbot is set up for automated code reviews on this repo. Configure here.