Skip to content

Pawang/perf testing parallel3#310220

Draft
pwang347 wants to merge 25 commits intomainfrom
pawang/perfTestingParallel3
Draft

Pawang/perf testing parallel3#310220
pwang347 wants to merge 25 commits intomainfrom
pawang/perfTestingParallel3

Conversation

@pwang347
Copy link
Copy Markdown
Member

No description provided.

Copilot AI review requested due to automatic review settings April 15, 2026 20:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a chat performance benchmarking harness (perf regression + memory leak check) under scripts/chat-simulation/, along with a CI workflow and docs to run/interpret the benchmarks.

Changes:

  • Introduces a Playwright-driven chat perf regression runner with baseline comparison and CI Markdown summaries.
  • Adds a state-based chat “memory leak” checker and supporting mock LLM server/scenario fixtures.
  • Wires new npm scripts (perf:chat, perf:chat-leak), a scheduled/dispatchable GitHub Actions workflow, and a skill doc.
Show a summary per file
File Description
scripts/chat-simulation/test-chat-perf-regression.js New perf benchmark runner with trace/profiling + baseline comparison and CI summary output.
scripts/chat-simulation/test-chat-mem-leaks.js New state-based leak checker cycling scenarios and measuring post-GC heap/DOM deltas.
scripts/chat-simulation/common/utils.js Shared helpers for config, build download, launching VS Code via CDP, stats, and pre-seeding storage.
scripts/chat-simulation/common/mock-llm-server.js Local deterministic mock LLM server supporting streaming + multi-turn/tool-call scenarios.
scripts/chat-simulation/common/perf-scenarios.js Scenario catalog (content/tool-call/multi-turn) used by the perf and leak scripts.
scripts/chat-simulation/config.jsonc Default settings for runs/thresholds/baseline version/leak thresholds.
scripts/chat-simulation/fixtures/* Fixture TS files used as deterministic “tool read/edit” targets in scenarios.
package.json Adds perf:chat and perf:chat-leak npm scripts.
.github/workflows/chat-perf.yml Adds CI workflow to run the benchmark (matrix) and optionally leak check.
.github/skills/chat-perf/SKILL.md Documents how to run/interpret the benchmarks.
build/filters.ts Excludes *.jsonc from the copyright header filter.
.gitignore Ignores .chat-simulation-data output directory.

Copilot's findings

  • Files reviewed: 18/19 changed files
  • Comments generated: 6

Comment on lines +144 to +149
function preseedStorage(userDataDir) {
const globalStorageDir = path.join(userDataDir, 'User', 'globalStorage');
fs.mkdirSync(globalStorageDir, { recursive: true });
const dbPath = path.join(globalStorageDir, 'state.vscdb');
execSync(`sqlite3 "${dbPath}" "CREATE TABLE IF NOT EXISTS ItemTable (key TEXT UNIQUE ON CONFLICT REPLACE, value BLOB); INSERT INTO ItemTable (key, value) VALUES ('builtinChatExtensionEnablementMigration', 'true'); INSERT INTO ItemTable (key, value) VALUES ('chat.tools.global.autoApprove.optIn', 'true');"`);
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preseedStorage shells out via execSync with an interpolated file path. Since userDataDir includes runId (which comes from scenario IDs/CLI input), a runId containing quotes or shell metacharacters can break quoting and potentially lead to command injection or simply failing to preseed storage. Prefer spawnSync('sqlite3', [dbPath, sql], { shell: false }) (or equivalent) and/or validate/sanitize runId before it becomes part of a shell command.

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +61
const opts = {
iterations: CONFIG.iterations ?? 3,
messages: CONFIG.messages ?? 5,
verbose: false,
/** @type {string | undefined} */
build: undefined,
leakThresholdMB: CONFIG.leakThresholdMB ?? 5,
};
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseArgs() defines messages (and the help text advertises --messages), but the rest of the script never uses opts.messages. This makes the CLI misleading and suggests either a missing feature or dead code. Either wire messages into the leak-check behavior, or remove the flag/config entry and update the usage docs accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +120 to +141
Launches one VS Code session, sends N messages sequentially, forces GC between each, and measures renderer heap and DOM node count. Uses **linear regression** on the samples to compute per-message growth rate, which is compared against a threshold.

### Key flags

| Flag | Default | Description |
|---|---|---|
| `--messages <n>` / `-n` | `10` | Number of messages to send. More = more accurate slope. |
| `--build <path\|ver>` / `-b` | local dev | Build to test. |
| `--threshold <MB>` | `2` | Max per-message heap growth in MB. |
| `--verbose` | — | Print per-message heap/DOM counts. |

### What it measures

- **Heap growth slope** (MB/message) — linear regression over forced-GC heap samples. A leak shows as sustained positive slope.
- **DOM node growth** (nodes/message) — catches rendering leaks where elements aren't cleaned up. Healthy chat virtualizes old messages so node count plateaus.

### Interpreting results

- `0.3–1.0 MB/msg` — normal (V8 internal overhead, string interning)
- `>2.0 MB/msg` — likely leak, investigate retained objects
- DOM nodes stable after first message — normal (chat list virtualization working)
- DOM nodes growing linearly — rendering leak, check disposable cleanup
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leak-check documentation describes a message-based approach (N messages, slope MB/message, etc.), but the implementation in test-chat-mem-leaks.js is iteration/state-based (cycle through all scenarios per iteration, then open a new chat and compare). Please update this doc to reflect the current behavior/flags (e.g. --iterations, total residual MB threshold) so readers don’t run the wrong command or misinterpret results.

Suggested change
Launches one VS Code session, sends N messages sequentially, forces GC between each, and measures renderer heap and DOM node count. Uses **linear regression** on the samples to compute per-message growth rate, which is compared against a threshold.
### Key flags
| Flag | Default | Description |
|---|---|---|
| `--messages <n>` / `-n` | `10` | Number of messages to send. More = more accurate slope. |
| `--build <path\|ver>` / `-b` | local dev | Build to test. |
| `--threshold <MB>` | `2` | Max per-message heap growth in MB. |
| `--verbose` || Print per-message heap/DOM counts. |
### What it measures
- **Heap growth slope** (MB/message) — linear regression over forced-GC heap samples. A leak shows as sustained positive slope.
- **DOM node growth** (nodes/message) — catches rendering leaks where elements aren't cleaned up. Healthy chat virtualizes old messages so node count plateaus.
### Interpreting results
- `0.3–1.0 MB/msg` — normal (V8 internal overhead, string interning)
- `>2.0 MB/msg` — likely leak, investigate retained objects
- DOM nodes stable after first message — normal (chat list virtualization working)
- DOM nodes growing linearly — rendering leak, check disposable cleanup
Launches one VS Code session, runs the full leak scenario set for a configured number of iterations, forces cleanup/GC between states, then opens a fresh chat and compares the resulting renderer heap and DOM node count against the initial baseline. This check is **iteration/state-based**, not message-slope-based: it looks for residual memory or DOM growth that remains after repeated scenario cycles.
### Key flags
| Flag | Default | Description |
|---|---|---|
| `--iterations <n>` / `-n` | `10` | Number of full scenario cycles to run before comparing against a fresh chat. Higher values make retained-state leaks easier to reproduce. |
| `--build <path\|ver>` / `-b` | local dev | Build to test. |
| `--threshold <MB>` | `2` | Max allowed **total residual heap growth** in MB after all iterations complete. |
| `--verbose` || Print per-iteration heap/DOM counts and the final residual comparison. |
### What it measures
- **Residual heap growth** (MB total) — how much renderer heap remains above baseline after running all scenarios repeatedly and then starting a new chat. A leak shows up as memory that does not return close to baseline.
- **Residual DOM growth** (nodes total) — catches rendering leaks where chat elements or listeners remain retained after scenario cleanup. Healthy behavior should return close to the original DOM footprint when a fresh chat is opened.
### Interpreting results
- Small residual deltas after the final fresh-chat comparison — normal (temporary allocations were cleaned up)
- Residual heap above the configured `--threshold` — likely leak, investigate retained chat/session state
- DOM node count returning near baseline — normal (chat UI state was disposed correctly)
- DOM node count staying elevated after the final fresh-chat comparison — rendering leak, check disposable cleanup and retained view state

Copilot uses AI. Check for mistakes.
on:
pull_request:
paths:
- '.github/workflows/chat-perf.yml'
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull_request.paths filter only includes the workflow file itself, so this workflow will not run on PRs that change chat code. This conflicts with the intent described in the skill doc (“gate PRs that touch chat UI code”). If the goal is PR gating, broaden the paths filter (or remove it) to include the relevant chat/workbench files or the chat-simulation scripts.

Suggested change
- '.github/workflows/chat-perf.yml'
- '.github/workflows/chat-perf.yml'
- 'src/vs/workbench/contrib/chat/**'
- 'src/vs/workbench/contrib/inlineChat/**'
- 'src/vs/workbench/contrib/chatInput/**'

Copilot uses AI. Check for mistakes.
Comment on lines +1397 to +1400
const cur = current[group]?.[metric];
const bas = base[group]?.[metric];
if (!cur || !bas || !bas.median) { continue; }
const change = (cur.median - bas.median) / bas.median;
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In baseline comparison, the check if (!cur || !bas || !bas.median) treats a baseline median of 0 as “missing” and skips the metric. Some metrics (e.g. counts) can legitimately have a median of 0, which would prevent regressions from being detected. Use an explicit null/undefined check for bas.median (and cur.median) instead of a truthiness check.

Copilot uses AI. Check for mistakes.
Comment on lines +507 to +508
offset: 1,
limit: 50,
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-turn-user's read-file tool call uses offset/limit, but Copilot's read_file tool schema expects startLine/endLine (1-indexed). With the current arguments the tool invocation is likely to be rejected or ignored, which would invalidate this scenario's behavior/perf results.

Suggested change
offset: 1,
limit: 50,
startLine: 1,
endLine: 50,

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 15, 2026

Screenshot Changes

Base: 7a79f779 Current: bc946134

Changed (2)

chat/aiCustomizations/aiCustomizationManagementEditor/McpBrowseMode/Light
Before After
before after
editor/inlineCompletions/other/JumpToHint/Dark
Before After
before after

blocks-ci screenshots changed

Replace the contents of test/componentFixtures/blocks-ci-screenshots.md with:

Updated blocks-ci-screenshots.md
<!-- auto-generated by CI — do not edit manually -->

#### editor/codeEditor/CodeEditor/Dark
![screenshot](https://hediet-screenshots.azurewebsites.net/images/cb32a3e854b5734fe5aaca2318f2e0a42ee821b05ea97883ea42c5ba95edb3c3)

#### editor/codeEditor/CodeEditor/Light
![screenshot](https://hediet-screenshots.azurewebsites.net/images/42624fbba5e0db7f32c224b5eb9c5dd3b08245697ae2e7d2a88be0d7c287129b)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants