Pawang/perf testing parallel2 by pwang347 · Pull Request #310215 · microsoft/vscode

pwang347 · 2026-04-15T19:52:30Z

No description provided.

github-actions · 2026-04-15T19:56:48Z

Screenshot Changes

Base: 7a79f779 Current: 1ae67c00

Changed (2)

chat/aiCustomizations/aiCustomizationManagementEditor/McpBrowseMode/Light

Before	After

editor/inlineCompletions/other/JumpToHint/Dark

Before	After

blocks-ci screenshots changed

Replace the contents of test/componentFixtures/blocks-ci-screenshots.md with:

Updated blocks-ci-screenshots.md

<!-- auto-generated by CI — do not edit manually -->

#### editor/codeEditor/CodeEditor/Dark
![screenshot](https://hediet-screenshots.azurewebsites.net/images/cb32a3e854b5734fe5aaca2318f2e0a42ee821b05ea97883ea42c5ba95edb3c3)

#### editor/codeEditor/CodeEditor/Light
![screenshot](https://hediet-screenshots.azurewebsites.net/images/42624fbba5e0db7f32c224b5eb9c5dd3b08245697ae2e7d2a88be0d7c287129b)

Copilot

Pull request overview

Adds a chat performance benchmarking harness to the VS Code repo, including a mock LLM server, scenario definitions, local CLI entrypoints, and a GitHub Actions workflow to run comparisons (and optional leak checks) in CI.

Changes:

Introduce chat perf regression runner + memory leak checker scripts under scripts/chat-simulation/.
Add mock LLM server + scenario registry (content, tool-call, multi-turn) and supporting fixtures/config.
Wire up new npm run perf:chat / perf:chat-leak scripts and add a CI workflow + docs for running the benchmarks.

Show a summary per file

File	Description
scripts/chat-simulation/test-chat-perf-regression.js	New Playwright-based perf runner collecting timing/rendering/memory metrics and comparing vs a baseline build/JSON.
scripts/chat-simulation/test-chat-mem-leaks.js	New state-based leak checker that cycles scenarios and measures post-GC heap + DOM node deltas.
scripts/chat-simulation/common/utils.js	Shared utilities for build resolution, VS Code launch, stats, and config handling.
scripts/chat-simulation/common/mock-llm-server.js	Local SSE mock server implementing Copilot/CAPI-like endpoints and scenario-based responses.
scripts/chat-simulation/common/perf-scenarios.js	Defines and registers benchmark scenarios (content, tool-call, multi-turn) used by the runners.
scripts/chat-simulation/config.jsonc	Default configuration for perf regression and leak thresholds.
scripts/chat-simulation/fixtures/*.ts	Fixture TS files used as deterministic tool-call read/edit targets for scenarios.
package.json	Adds `perf:chat` and `perf:chat-leak` npm scripts.
.github/workflows/chat-perf.yml	New CI workflow to run perf comparisons (matrix-split scenarios) and optional leak checks.
.github/skills/chat-perf/SKILL.md	Documentation for running the new perf + leak tools and interpreting results.
.gitignore	Ignores `.chat-simulation-data` output directory.
build/filters.ts	Excludes `*.jsonc` from the copyright filter (to allow the new config).

Copilot's findings

Files reviewed: 18/19 changed files
Comments generated: 5

Copilot · 2026-04-15T19:58:42Z

+
+	// -- Save baseline ---------------------------------------------------
+	if (opts.saveBaseline) {
+		if (!opts.baseline) { console.error('[chat-simulation] --save-baseline requires --baseline <path>'); process.exit(1); }


--save-baseline can overwrite the baseline JSON generated earlier in the run: opts.baseline is set to the downloaded baseline build’s JSON path, so --save-baseline will end up writing the test results into the baseline file. This breaks subsequent comparisons/caching. Consider requiring an explicit --baseline <path> argument for --save-baseline (separate from --baseline-build) and writing the saved baseline to that user-provided path (or a dedicated --save-baseline-as).

Suggested change

if (!opts.baseline) { console.error('[chat-simulation] --save-baseline requires --baseline <path>'); process.exit(1); }

if (!opts.baseline) { console.error('[chat-simulation] --save-baseline requires --baseline <path>'); process.exit(1); }

if (opts.baselineBuild) {

console.error('[chat-simulation] --save-baseline cannot be used with --baseline-build; pass an explicit --baseline <path> to save a new baseline');

process.exit(1);

}

Copilot · 2026-04-15T19:58:43Z

+				if (!cur || !bas || !bas.median) { continue; }
+				const change = (cur.median - bas.median) / bas.median;
+				const pct = `${change > 0 ? '+' : ''}${(change * 100).toFixed(1)}%`;


Baseline comparison skips metrics when the baseline median is 0 because of !bas.median. Several count-based metrics (layouts/style recalcs/long tasks) can legitimately have a median of 0, which would silently exclude them from regression detection. Use an explicit null/undefined check (and handle division-by-zero for percent change) instead of a truthiness check.

Suggested change

if (!cur || !bas || !bas.median) { continue; }

const change = (cur.median - bas.median) / bas.median;

const pct = `${change > 0 ? '+' : ''}${(change * 100).toFixed(1)}%`;

if (!cur || !bas || cur.median === null || cur.median === undefined || bas.median === null || bas.median === undefined) { continue; }

const change = bas.median !== 0

? (cur.median - bas.median) / bas.median

: (cur.median === 0 ? 0 : Number.POSITIVE_INFINITY);

const pct = Number.isFinite(change)

? `${change > 0 ? '+' : ''}${(change * 100).toFixed(1)}%`

: '+∞%';

Copilot · 2026-04-15T19:58:43Z

+					console.log(`  [ext-host] Error collecting metrics: ${err}`);
+				}
+			} finally {
+				extHostInspector.close();


The extension-host inspector is closed in the inner finally without guarding against errors. If close() throws (e.g., WS already closed), the whole run will fail and metrics/diagnostics are lost. Wrap this close() in a try/catch (and/or set extHostInspector = null after closing) to keep cleanup best-effort.

Suggested change

extHostInspector.close();

try {

extHostInspector.close();

} catch (err) {

if (verbose) {

console.log(` [ext-host] Error closing inspector: ${err}`);

}

}

Copilot · 2026-04-15T19:58:43Z

+	const opts = {
+		iterations: CONFIG.iterations ?? 3,
+		messages: CONFIG.messages ?? 5,
+		verbose: false,
+		/** @type {string | undefined} */
+		build: undefined,
+		leakThresholdMB: CONFIG.leakThresholdMB ?? 5,
+	};


--messages is parsed and documented as controlling “messages to send per iteration”, but it is never used. The leak check currently always cycles through all scenarios once per iteration. Either remove the flag/help text, or implement it (e.g., limit scenarios per iteration or repeat a subset) so CLI behavior matches documentation.

Copilot · 2026-04-15T19:58:44Z

+/** @param {string} text */
+function stripJsoncComments(text) { return text.replace(/\/\/.*/g, '').replace(/\/\*[\s\S]*?\*\//g, ''); }
+
+/**
+ * Load a namespaced section from config.jsonc.
+ * @param {string} section - Top-level key (e.g. 'perfRegression', 'memLeaks')
+ * @returns {Record<string, any>}
+ */
+function loadConfig(section) {
+	const raw = fs.readFileSync(path.join(__dirname, '..', 'config.jsonc'), 'utf-8');
+	const config = JSON.parse(stripJsoncComments(raw));
+	return config[section] ?? {};


stripJsoncComments() uses regexes to remove // and /* */ comments. This is not a safe JSONC parser (it will also strip comment-like sequences inside strings, e.g. "https://..."), which can make config parsing brittle. Since the repo already depends on jsonc-parser, consider using it to parse config.jsonc instead of regex stripping.

pwang347 added 17 commits April 13, 2026 18:46

updates

620c954

PR

a4a562b

wip

2e1b4a7

PR

36ef007

PR

bcdcda3

more metrics

9cfbfd6

update

4d8aad2

clean

1bb24d5

pipeline fix

3bd7ba3

fix

1649a5d

updates

df6478c

PR

e756e47

update

3f6aac3

update

52ef4f1

fix deps

29b4cbe

fixes

543cc19

output improvements

7150f9c

Copilot AI review requested due to automatic review settings April 15, 2026 19:52

Copilot started reviewing on behalf of pwang347 April 15, 2026 19:54 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

vs-code-engineering bot assigned pwang347 Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pawang/perf testing parallel2#310215

Pawang/perf testing parallel2#310215
pwang347 wants to merge 17 commits intomainfrom
pawang/perfTestingParallel2

pwang347 commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-				if (!cur || !bas || !bas.median) { continue; }
-				const change = (cur.median - bas.median) / bas.median;
-				const pct = `${change > 0 ? '+' : ''}${(change * 100).toFixed(1)}%`;
+				if (!cur || !bas || cur.median === null || cur.median === undefined || bas.median === null || bas.median === undefined) { continue; }
+				const change = bas.median !== 0
+					? (cur.median - bas.median) / bas.median
+					: (cur.median === 0 ? 0 : Number.POSITIVE_INFINITY);
+				const pct = Number.isFinite(change)
+					? `${change > 0 ? '+' : ''}${(change * 100).toFixed(1)}%`
+					: '+∞%';

-				extHostInspector.close();
+				try {
+					extHostInspector.close();
+				} catch (err) {
+					if (verbose) {
+						console.log(`  [ext-host] Error closing inspector: ${err}`);
+					}
+				}

Conversation

pwang347 commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Screenshot Changes

Changed (2)

blocks-ci screenshots changed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants