Pawang/perf testing parallel3 by pwang347 · Pull Request #310220 · microsoft/vscode

pwang347 · 2026-04-15T20:06:46Z

No description provided.

Copilot

Pull request overview

Adds a chat performance benchmarking harness (perf regression + memory leak check) under scripts/chat-simulation/, along with a CI workflow and docs to run/interpret the benchmarks.

Changes:

Introduces a Playwright-driven chat perf regression runner with baseline comparison and CI Markdown summaries.
Adds a state-based chat “memory leak” checker and supporting mock LLM server/scenario fixtures.
Wires new npm scripts (perf:chat, perf:chat-leak), a scheduled/dispatchable GitHub Actions workflow, and a skill doc.

Show a summary per file

File	Description
scripts/chat-simulation/test-chat-perf-regression.js	New perf benchmark runner with trace/profiling + baseline comparison and CI summary output.
scripts/chat-simulation/test-chat-mem-leaks.js	New state-based leak checker cycling scenarios and measuring post-GC heap/DOM deltas.
scripts/chat-simulation/common/utils.js	Shared helpers for config, build download, launching VS Code via CDP, stats, and pre-seeding storage.
scripts/chat-simulation/common/mock-llm-server.js	Local deterministic mock LLM server supporting streaming + multi-turn/tool-call scenarios.
scripts/chat-simulation/common/perf-scenarios.js	Scenario catalog (content/tool-call/multi-turn) used by the perf and leak scripts.
scripts/chat-simulation/config.jsonc	Default settings for runs/thresholds/baseline version/leak thresholds.
scripts/chat-simulation/fixtures/*	Fixture TS files used as deterministic “tool read/edit” targets in scenarios.
package.json	Adds `perf:chat` and `perf:chat-leak` npm scripts.
.github/workflows/chat-perf.yml	Adds CI workflow to run the benchmark (matrix) and optionally leak check.
.github/skills/chat-perf/SKILL.md	Documents how to run/interpret the benchmarks.
build/filters.ts	Excludes `*.jsonc` from the copyright header filter.
.gitignore	Ignores `.chat-simulation-data` output directory.

Copilot's findings

Files reviewed: 18/19 changed files
Comments generated: 6

Copilot · 2026-04-15T20:11:40Z

+function preseedStorage(userDataDir) {
+	const globalStorageDir = path.join(userDataDir, 'User', 'globalStorage');
+	fs.mkdirSync(globalStorageDir, { recursive: true });
+	const dbPath = path.join(globalStorageDir, 'state.vscdb');
+	execSync(`sqlite3 "${dbPath}" "CREATE TABLE IF NOT EXISTS ItemTable (key TEXT UNIQUE ON CONFLICT REPLACE, value BLOB); INSERT INTO ItemTable (key, value) VALUES ('builtinChatExtensionEnablementMigration', 'true'); INSERT INTO ItemTable (key, value) VALUES ('chat.tools.global.autoApprove.optIn', 'true');"`);
+}


preseedStorage shells out via execSync with an interpolated file path. Since userDataDir includes runId (which comes from scenario IDs/CLI input), a runId containing quotes or shell metacharacters can break quoting and potentially lead to command injection or simply failing to preseed storage. Prefer spawnSync('sqlite3', [dbPath, sql], { shell: false }) (or equivalent) and/or validate/sanitize runId before it becomes part of a shell command.

Copilot · 2026-04-15T20:11:41Z

+	const opts = {
+		iterations: CONFIG.iterations ?? 3,
+		messages: CONFIG.messages ?? 5,
+		verbose: false,
+		/** @type {string | undefined} */
+		build: undefined,
+		leakThresholdMB: CONFIG.leakThresholdMB ?? 5,
+	};


parseArgs() defines messages (and the help text advertises --messages), but the rest of the script never uses opts.messages. This makes the CLI misleading and suggests either a missing feature or dead code. Either wire messages into the leak-check behavior, or remove the flag/config entry and update the usage docs accordingly.

Copilot · 2026-04-15T20:11:41Z

+Launches one VS Code session, sends N messages sequentially, forces GC between each, and measures renderer heap and DOM node count. Uses **linear regression** on the samples to compute per-message growth rate, which is compared against a threshold.
+
+### Key flags
+
+| Flag | Default | Description |
+|---|---|---|
+| `--messages <n>` / `-n` | `10` | Number of messages to send. More = more accurate slope. |
+| `--build <path\|ver>` / `-b` | local dev | Build to test. |
+| `--threshold <MB>` | `2` | Max per-message heap growth in MB. |
+| `--verbose` | — | Print per-message heap/DOM counts. |
+
+### What it measures
+
+- **Heap growth slope** (MB/message) — linear regression over forced-GC heap samples. A leak shows as sustained positive slope.
+- **DOM node growth** (nodes/message) — catches rendering leaks where elements aren't cleaned up. Healthy chat virtualizes old messages so node count plateaus.
+
+### Interpreting results
+
+- `0.3–1.0 MB/msg` — normal (V8 internal overhead, string interning)
+- `>2.0 MB/msg` — likely leak, investigate retained objects
+- DOM nodes stable after first message — normal (chat list virtualization working)
+- DOM nodes growing linearly — rendering leak, check disposable cleanup


The leak-check documentation describes a message-based approach (N messages, slope MB/message, etc.), but the implementation in test-chat-mem-leaks.js is iteration/state-based (cycle through all scenarios per iteration, then open a new chat and compare). Please update this doc to reflect the current behavior/flags (e.g. --iterations, total residual MB threshold) so readers don’t run the wrong command or misinterpret results.

Suggested change

Launches one VS Code session, sends N messages sequentially, forces GC between each, and measures renderer heap and DOM node count. Uses **linear regression** on the samples to compute per-message growth rate, which is compared against a threshold.

### Key flags

| Flag | Default | Description |

|---|---|---|

| `--messages <n>` / `-n` | `10` | Number of messages to send. More = more accurate slope. |

| `--build <path\|ver>` / `-b` | local dev | Build to test. |

| `--threshold <MB>` | `2` | Max per-message heap growth in MB. |

| `--verbose` | — | Print per-message heap/DOM counts. |

### What it measures

- **Heap growth slope** (MB/message) — linear regression over forced-GC heap samples. A leak shows as sustained positive slope.

- **DOM node growth** (nodes/message) — catches rendering leaks where elements aren't cleaned up. Healthy chat virtualizes old messages so node count plateaus.

### Interpreting results

- `0.3–1.0 MB/msg` — normal (V8 internal overhead, string interning)

- `>2.0 MB/msg` — likely leak, investigate retained objects

- DOM nodes stable after first message — normal (chat list virtualization working)

- DOM nodes growing linearly — rendering leak, check disposable cleanup

Launches one VS Code session, runs the full leak scenario set for a configured number of iterations, forces cleanup/GC between states, then opens a fresh chat and compares the resulting renderer heap and DOM node count against the initial baseline. This check is **iteration/state-based**, not message-slope-based: it looks for residual memory or DOM growth that remains after repeated scenario cycles.

### Key flags

| Flag | Default | Description |

|---|---|---|

| `--iterations <n>` / `-n` | `10` | Number of full scenario cycles to run before comparing against a fresh chat. Higher values make retained-state leaks easier to reproduce. |

| `--build <path\|ver>` / `-b` | local dev | Build to test. |

| `--threshold <MB>` | `2` | Max allowed **total residual heap growth** in MB after all iterations complete. |

| `--verbose` | — | Print per-iteration heap/DOM counts and the final residual comparison. |

### What it measures

- **Residual heap growth** (MB total) — how much renderer heap remains above baseline after running all scenarios repeatedly and then starting a new chat. A leak shows up as memory that does not return close to baseline.

- **Residual DOM growth** (nodes total) — catches rendering leaks where chat elements or listeners remain retained after scenario cleanup. Healthy behavior should return close to the original DOM footprint when a fresh chat is opened.

### Interpreting results

- Small residual deltas after the final fresh-chat comparison — normal (temporary allocations were cleaned up)

- Residual heap above the configured `--threshold` — likely leak, investigate retained chat/session state

- DOM node count returning near baseline — normal (chat UI state was disposed correctly)

- DOM node count staying elevated after the final fresh-chat comparison — rendering leak, check disposable cleanup and retained view state

Copilot · 2026-04-15T20:11:41Z

+on:
+  pull_request:
+    paths:
+      - '.github/workflows/chat-perf.yml'


The pull_request.paths filter only includes the workflow file itself, so this workflow will not run on PRs that change chat code. This conflicts with the intent described in the skill doc (“gate PRs that touch chat UI code”). If the goal is PR gating, broaden the paths filter (or remove it) to include the relevant chat/workbench files or the chat-simulation scripts.

Suggested change

- '.github/workflows/chat-perf.yml'

- '.github/workflows/chat-perf.yml'

- 'src/vs/workbench/contrib/chat/**'

- 'src/vs/workbench/contrib/inlineChat/**'

- 'src/vs/workbench/contrib/chatInput/**'

Copilot · 2026-04-15T20:11:42Z

+				const cur = current[group]?.[metric];
+				const bas = base[group]?.[metric];
+				if (!cur || !bas || !bas.median) { continue; }
+				const change = (cur.median - bas.median) / bas.median;


In baseline comparison, the check if (!cur || !bas || !bas.median) treats a baseline median of 0 as “missing” and skips the metric. Some metrics (e.g. counts) can legitimately have a median of 0, which would prevent regressions from being detected. Use an explicit null/undefined check for bas.median (and cur.median) instead of a truthiness check.

Copilot · 2026-04-15T20:11:42Z

+							offset: 1,
+							limit: 50,


multi-turn-user's read-file tool call uses offset/limit, but Copilot's read_file tool schema expects startLine/endLine (1-indexed). With the current arguments the tool invocation is likely to be rejected or ignored, which would invalidate this scenario's behavior/perf results.

Suggested change

offset: 1,

limit: 50,

startLine: 1,

endLine: 50,

github-actions · 2026-04-15T20:11:52Z

Screenshot Changes

Base: 7a79f779 Current: bc946134

Changed (2)

chat/aiCustomizations/aiCustomizationManagementEditor/McpBrowseMode/Light

Before	After

editor/inlineCompletions/other/JumpToHint/Dark

Before	After

blocks-ci screenshots changed

Replace the contents of test/componentFixtures/blocks-ci-screenshots.md with:

Updated blocks-ci-screenshots.md

<!-- auto-generated by CI — do not edit manually -->

#### editor/codeEditor/CodeEditor/Dark
![screenshot](https://hediet-screenshots.azurewebsites.net/images/cb32a3e854b5734fe5aaca2318f2e0a42ee821b05ea97883ea42c5ba95edb3c3)

#### editor/codeEditor/CodeEditor/Light
![screenshot](https://hediet-screenshots.azurewebsites.net/images/42624fbba5e0db7f32c224b5eb9c5dd3b08245697ae2e7d2a88be0d7c287129b)

pwang347 added 19 commits April 13, 2026 18:46

updates

620c954

PR

a4a562b

wip

2e1b4a7

PR

36ef007

PR

bcdcda3

more metrics

9cfbfd6

update

4d8aad2

clean

1bb24d5

pipeline fix

3bd7ba3

fix

1649a5d

updates

df6478c

PR

e756e47

update

3f6aac3

update

52ef4f1

fix deps

29b4cbe

fixes

543cc19

output improvements

7150f9c

fix terminal

1863d3b

update

2773978

Copilot AI review requested due to automatic review settings April 15, 2026 20:06

Copilot started reviewing on behalf of pwang347 April 15, 2026 20:07 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

vs-code-engineering bot assigned pwang347 Apr 15, 2026

pwang347 added 6 commits April 15, 2026 13:49

update

41b7641

update

e979b6e

clean

a2780a4

update

bf59aea

update

0a5f297

don't skip

1f8e86d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pawang/perf testing parallel3#310220

Pawang/perf testing parallel3#310220
pwang347 wants to merge 25 commits intomainfrom
pawang/perfTestingParallel3

pwang347 commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-      - '.github/workflows/chat-perf.yml'
+      - '.github/workflows/chat-perf.yml'
+      - 'src/vs/workbench/contrib/chat/**'
+      - 'src/vs/workbench/contrib/inlineChat/**'
+      - 'src/vs/workbench/contrib/chatInput/**'

Conversation

pwang347 commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Screenshot Changes

Changed (2)

blocks-ci screenshots changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Apr 15, 2026 •

edited

Loading