perf(system-tests): share local-backend base image via qcow2 overlays#10591
perf(system-tests): share local-backend base image via qcow2 overlays#10591basvandijk wants to merge 4 commits into
Conversation
The local (libvirt + QEMU) system-test backend decompressed the same primary GuestOS image once per node when starting a testnet's VMs. Extract each `src` once into a shared, content-addressed base (image_cache/<key>.img) under a per-key flock, then give every VM a thin copy-on-write qcow2 overlay backed by it (`qemu-img create -f qcow2 -F raw -b <base>`). Overlay creation is near-instant and filesystem-independent (unlike `cp --reflink`, which only avoids a full copy on CoW filesystems), so this is a strict speedup. Per-node config disks stay raw. Disk-prep for an 8-node testnet (GuestOS image, 556M zst -> 1.6G sparse), wall time: filesystem master (8x decompress) qcow2 overlays local disk 11.5s 9.0s CEPH-backed ~64s 9-19s
There was a problem hiding this comment.
Pull request overview
This PR improves performance of the local (libvirt + QEMU) system-test backend by extracting the primary GuestOS image once into a shared cache and booting each VM from a thin qcow2 overlay backed by that shared raw base image.
Changes:
- Add a shared base-image cache (
image_cache/<key>.img) with per-keyflockto serialize extraction across threads/processes. - Switch primary VM disks to qcow2 overlays (
primary.qcow2) created viaqemu-img create -f qcow2 -F raw -b <base> ..., preserving reboot semantics by reusing the existing overlay. - Extend the guestos libvirt XML template to support per-disk driver formats via
DiskEntry.driver_type(qcow2 for primary, raw for config/extra disks).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| rs/tests/driver/templates/guestos_vm_template.xml | Make libvirt disk <driver type=...> configurable per disk to support qcow2 overlays for the primary disk. |
| rs/tests/driver/src/driver/local_backend.rs | Add shared base-image caching and create per-VM qcow2 overlays backed by the cached raw base; plumb per-disk driver type into the template. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- extract_image: make the `.tar.zst` scratch dir unique per destination (`.extract-<pid>-<dst>`) so concurrent extractions of different base images into the shared `image_cache` directory cannot collide. - start_vm: capture `qemu-img create`'s exit status and stderr in the error so overlay-creation failures are debuggable in CI.
Benchmark:
|
| metric | this PR | master 06‑27 | master 06‑26 | master 06‑25 | vs master‑avg |
|---|---|---|---|---|---|
| Bazel step wall‑clock | 204 min | 219 | 215 | 216 | −13 min (−5.8%) |
| Σ of 166 per‑test runtimes | 30,420s | 32,888 | 32,127 | 33,393 | −2,382s (−7.3%) |
The PR is below all three master runs on both metrics, so this isn't just a less‑contended runner.
Per‑test
145 / 166 tests faster, 21 slower, 0 equal; median per‑test delta −12.8s.
Biggest wins (heavy multi‑node tests, where the most redundant image decompressions are avoided):
upgrade_downgrade_nns_subnet_test_head_nns_local 658 vs 767 -109s
upgrade_downgrade_app_subnet_test_head_nns_local 816 vs 897 -81s
message_routing:rejoin_test_local 186 vs 252 -66s
nns_cycles_minting_multi_app_subnets_test 116 vs 176 -60s
ckbtc:ckdoge_minter_basics_test 232 vs 290 -58s
networking:canister_http_fault_tolerance_test 129 vs 187 -58s
Regressions are essentially noise
19 of the 21 "slower" tests are within ±~18s (CI jitter). The two large‑looking ones aren't reliable:
rejoin_test_large_state_local(+98s) is the noisiest test in the suite — it varies by 266s across the 3 master runs alone, so the delta is within its own spread.system_api_security_test_local(+126s, normally ~78s) is a single suspicious data point, most likely a one‑off contended run; worth a re‑check.
Caveat: one PR run vs a 3‑run master average on a contended 64‑CPU runner; per‑test deltas smaller than a test's own run‑to‑run range aren't individually meaningful. The consistency (PR below all master runs on both metrics, 87% of tests faster, median −12.8s) makes the ~6–7% overall speedup credible.
Full per-test table (166 tests, sorted by delta; seconds)
target PR master delta
//rs/tests/consensus/upgrade:upgrade_downgrade_nns_subnet_test_head_nns_local 658.0 766.9 -108.9
//rs/tests/consensus/upgrade:upgrade_downgrade_app_subnet_test_head_nns_local 816.3 897.4 -81.1
//rs/tests/message_routing:rejoin_test_local 186.0 252.3 -66.3
//rs/tests/nns:nns_cycles_minting_multi_app_subnets_test_local 116.2 176.2 -60.0
//rs/tests/ckbtc:ckdoge_minter_basics_test_local 232.1 290.1 -58.0
//rs/tests/networking:canister_http_fault_tolerance_test_local 128.6 186.6 -58.0
//rs/tests/consensus/upgrade:upgrade_downgrade_old_nns_subnet_test_head_nns_local 715.8 771.9 -56.1
//rs/tests/consensus/upgrade:upgrade_downgrade_old_nns_subnet_test_local 732.9 781.7 -48.8
//rs/tests/ckbtc:ckbtc_minter_checker_head_nns_local 279.4 322.0 -42.6
//rs/tests/ckbtc:ckdoge_minter_basics_test_head_nns_local 242.1 283.4 -41.3
//rs/tests/cross_chain:doge_adapter_basics_test_local 148.9 189.4 -40.5
//rs/tests/ckbtc:ckbtc_minter_checker_local 271.0 310.5 -39.5
//rs/tests/query_stats:query_stats_https_outcalls_local 163.2 201.3 -38.1
//rs/tests/consensus/orchestrator:node_assign_test_head_nns_local 218.6 256.6 -38.0
//rs/tests/query_stats:query_stats_below_threshold_local 155.5 193.5 -38.0
//rs/tests/consensus/orchestrator:ssh_access_to_nodes_test_local 507.7 543.5 -35.8
//rs/tests/networking:canister_http_fault_tolerance_test_head_nns_local 135.0 170.8 -35.8
//rs/tests/consensus/upgrade:upgrade_downgrade_old_unassigned_nodes_test_local 135.1 170.6 -35.5
//rs/tests/nns:nns_dapp_test_head_nns_local 106.6 141.7 -35.1
//rs/tests/nns:nns_dapp_test_local 105.7 140.8 -35.1
//rs/tests/ckbtc:ckbtc_minter_update_balance_head_nns_local 277.4 311.2 -33.8
//rs/tests/nns:nns_token_balance_test_local 87.0 120.3 -33.3
//rs/tests/nns:rent_subnet_test_local 197.8 230.9 -33.1
//rs/tests/ckbtc:ckbtc_minter_retrieve_btc_local 170.9 203.2 -32.3
//rs/tests/ckbtc:ckbtc_minter_deposit_and_withdrawal_head_nns_local 202.2 234.2 -32.0
//rs/tests/ckbtc:ckbtc_minter_deposit_and_withdrawal_local 187.0 219.0 -32.0
//rs/tests/consensus/upgrade:upgrade_downgrade_nns_subnet_test_local 706.9 737.1 -30.2
//rs/tests/consensus:node_graceful_leaving_test_head_nns_local 161.6 191.6 -30.0
//rs/tests/query_stats:query_stats_above_threshold_local 151.7 181.5 -29.8
//rs/tests/consensus:subnet_splitting_test_local 203.9 233.2 -29.3
//rs/tests/boundary_nodes:salt_sharing_canister_test_local 95.1 123.6 -28.5
//rs/tests/consensus/orchestrator:node_reassignment_test_local 308.0 335.0 -27.0
//rs/tests/consensus/upgrade:upgrade_with_alternative_urls_local 124.1 151.0 -26.9
//rs/tests/consensus/orchestrator:unstuck_subnet_test_head_nns_local 208.2 234.9 -26.7
//rs/tests/message_routing:global_reboot_test_head_nns_local 157.3 183.9 -26.6
//rs/tests/message_routing:global_reboot_test_local 160.7 187.1 -26.4
//rs/tests/networking:http_endpoints_public_spec_test_local 78.8 104.0 -25.2
//rs/tests/message_routing:rejoin_test_head_nns_local 208.1 232.7 -24.6
//rs/tests/consensus/upgrade:upgrade_downgrade_unassigned_nodes_test_local 140.9 165.0 -24.1
//rs/tests/crypto:ingress_verification_test_local 172.3 195.0 -22.7
//rs/tests/ckbtc:ckbtc_minter_basics_test_local 166.1 188.5 -22.4
//rs/tests/consensus/tecdsa:tecdsa_remove_nodes_test_local 213.3 235.3 -22.0
//rs/tests/consensus/upgrade:upgrade_downgrade_unassigned_nodes_test_head_nns_local 143.4 164.8 -21.4
//rs/tests/execution:compute_allocation_test_local 45.6 67.0 -21.4
//rs/tests/financial_integrations/rosetta:rosetta_neuron_follow_test_head_nns_local 141.4 162.7 -21.3
//rs/tests/consensus:adding_nodes_to_subnet_test_local 341.5 362.5 -21.0
//rs/tests/ckbtc:ckbtc_minter_batching_local 202.4 223.3 -20.9
//rs/tests/consensus/tecdsa:tecdsa_two_signing_subnets_test_head_nns_local 163.7 184.6 -20.9
//rs/tests/cross_chain:btc_adapter_basics_test_local 114.2 135.0 -20.8
//rs/tests/ckbtc:ckbtc_minter_batching_head_nns_local 207.3 227.9 -20.6
//rs/tests/consensus/tecdsa:tecdsa_remove_nodes_test_head_nns_local 211.3 231.9 -20.6
//rs/tests/ckbtc:ckbtc_minter_update_balance_local 281.4 301.3 -19.9
//rs/tests/execution:inter_canister_queries_test_local 51.2 70.9 -19.7
//rs/tests/networking:canister_http_test_local 74.9 93.5 -18.6
//rs/tests/financial_integrations/rosetta:rosetta_test_head_nns_local 107.8 126.3 -18.5
//rs/tests/consensus/tecdsa:tecdsa_complaint_test_local 117.0 135.4 -18.4
//rs/tests/consensus:catch_up_loop_prevention_test_local 256.7 275.1 -18.4
//rs/tests/message_routing/xnet:xnet_compatibility_local 517.1 535.2 -18.1
//rs/tests/networking:canister_http_test_head_nns_local 76.4 94.3 -17.9
//rs/tests/consensus/orchestrator:ssh_access_to_nodes_test_head_nns_local 496.6 514.2 -17.6
//rs/tests/financial_integrations/rosetta:rosetta_neuron_follow_test_local 139.9 157.5 -17.6
//rs/tests/consensus:adding_nodes_to_subnet_test_head_nns_local 337.6 355.1 -17.5
//rs/tests/execution:cycles_restrictions_test_local 66.4 83.8 -17.4
//rs/tests/consensus:replica_determinism_test_local 66.2 83.3 -17.1
//rs/tests/cross_chain:ic_xc_cketh_test_head_nns_local 189.3 206.4 -17.1
//rs/tests/consensus:request_auth_malicious_replica_test_local 106.3 122.9 -16.6
//rs/tests/idx:basic_health_test_local 65.1 81.7 -16.6
//rs/tests/financial_integrations/rosetta:rosetta_test_local 107.2 123.6 -16.4
//rs/tests/boundary_nodes:api_bn_decentralization_test_head_nns_local 126.5 142.8 -16.3
//rs/tests/consensus/vetkd:vetkd_key_life_cycle_test_local 172.3 188.5 -16.2
//rs/tests/ckbtc:ckbtc_minter_basics_test_head_nns_local 172.3 188.3 -16.0
//rs/tests/networking:canister_http_non_replicated_test_head_nns_local 85.4 100.3 -14.9
//rs/tests/sdk:dfx_smoke_test_local 109.0 123.8 -14.8
//rs/tests/execution:canister_migration_test_head_nns_local 499.7 514.4 -14.7
//rs/tests/execution:resource_limits_test_local 38.9 53.6 -14.7
//rs/tests/sdk:dfx_check_old_wallet_version_test_local 38.8 53.2 -14.4
//rs/tests/consensus/orchestrator:rotate_ecdsa_idkg_key_test_local 385.4 399.7 -14.3
//rs/tests/networking:canister_http_flexible_test_local 80.1 94.3 -14.2
//rs/tests/boundary_nodes:api_bn_integration_test_head_nns_local 65.9 79.6 -13.7
//rs/tests/boundary_nodes:api_bn_integration_test_local 64.4 77.8 -13.4
//rs/tests/consensus:safety_test_local 60.7 73.9 -13.2
//rs/tests/execution:system_subnets_test_local 67.0 80.2 -13.2
//rs/tests/consensus/tecdsa:tecdsa_two_signing_subnets_test_local 165.9 178.9 -13.0
//rs/tests/networking:canister_http_time_out_test_head_nns_local 79.6 92.1 -12.5
//rs/tests/consensus:guestos_recovery_engine_smoke_test_local 72.0 84.3 -12.3
//rs/tests/networking:canister_http_non_replicated_test_local 81.9 94.1 -12.2
//rs/tests/consensus/subnet_recovery:sr_nns_failover_nodes_test_head_nns_local 234.7 246.5 -11.8
//rs/tests/consensus/subnet_recovery:sr_nns_failover_nodes_test_local 229.1 240.9 -11.8
//rs/tests/consensus/tecdsa:tschnorr_message_sizes_test_local 218.8 230.6 -11.8
//rs/tests/consensus/tecdsa:tschnorr_message_sizes_test_head_nns_local 215.5 227.2 -11.7
//rs/tests/networking:canister_http_time_out_test_local 85.3 97.0 -11.7
//rs/tests/consensus/upgrade:upgrade_with_alternative_urls_head_nns_local 143.5 155.1 -11.6
//rs/tests/consensus:max_xnet_payload_size_test_local 48.6 59.9 -11.3
//rs/tests/financial_integrations:transaction_ledger_correctness_test_head_nns_local 122.0 133.2 -11.2
//rs/tests/execution:max_number_of_canisters_test_local 39.5 50.4 -10.9
//rs/tests/node:guestos_no_failed_systemd_units_local 36.1 46.9 -10.8
//rs/tests/cross_chain:ic_xc_cketh_test_local 192.7 202.4 -9.7
//rs/tests/financial_integrations:icrc1_agent_test_local 55.8 65.4 -9.6
//rs/tests/consensus/orchestrator:node_assign_test_local 229.2 238.3 -9.1
//rs/tests/networking/firewall:firewall_priority_test_local 173.1 182.0 -8.9
//rs/tests/consensus/orchestrator:rotate_ecdsa_idkg_key_test_head_nns_local 386.3 395.1 -8.8
//rs/tests/consensus/upgrade:upgrade_downgrade_app_subnet_test_local 870.5 879.3 -8.8
//rs/tests/execution:general_execution_test_local 79.2 87.6 -8.4
//rs/tests/cross_chain:btc_get_balance_test_local 67.7 76.0 -8.3
//rs/tests/consensus:cup_explorer_test_head_nns_local 142.5 150.5 -8.0
//rs/tests/idx:ii_delegation_test_local 49.1 57.1 -8.0
//rs/tests/consensus:node_graceful_leaving_test_local 169.7 177.5 -7.8
//rs/tests/networking/firewall:firewall_max_connections_test_local 96.5 104.3 -7.8
//rs/tests/consensus/tecdsa:pre_signature_stash_management_test_local 169.6 177.3 -7.7
//rs/tests/cross_chain:ic_xc_ledger_suite_orchestrator_test_local 107.2 114.8 -7.6
//rs/tests/consensus/upgrade:upgrade_downgrade_old_unassigned_nodes_test_head_nns_local 158.1 165.5 -7.4
//rs/tests/message_routing/xnet:xnet_malicious_slices_local 84.6 92.0 -7.4
//rs/tests/node:kill_start_long_test_local 247.4 254.8 -7.4
//rs/tests/nns:node_swap_test_local 54.7 62.0 -7.3
//rs/tests/consensus:liveness_with_equivocation_test_local 51.6 58.8 -7.2
//rs/tests/consensus:subnet_splitting_test_head_nns_local 218.5 225.7 -7.2
//rs/tests/networking:canister_http_flexible_test_head_nns_local 86.6 93.7 -7.1
//rs/tests/ckbtc:ckbtc_minter_retrieve_btc_head_nns_local 189.9 196.8 -6.9
//rs/tests/nns:bitcoin_set_config_proposal_test_local 102.9 109.8 -6.9
//rs/tests/cross_chain:btc_get_balance_test_head_nns_local 62.1 68.6 -6.5
//rs/tests/crypto:ic_crypto_csp_start_order_test_local 35.5 41.7 -6.2
//rs/tests/networking:read_state_test_local 87.7 93.6 -5.9
//rs/tests/message_routing:memory_safety_test_local 50.2 55.9 -5.7
//rs/tests/networking:http_endpoints_public_spec_test_head_nns_local 91.5 97.1 -5.6
//rs/tests/node:kill_start_short_test_local 78.1 83.7 -5.6
//rs/tests/consensus/upgrade:upgrade_downgrade_old_app_subnet_test_local 849.9 855.4 -5.5
//rs/tests/networking:read_state_test_head_nns_local 92.6 98.1 -5.5
//rs/tests/nns:certified_registry_test_local 52.2 57.7 -5.5
//rs/tests/crypto:ic_crypto_csp_socket_test_local 34.6 40.0 -5.4
//rs/tests/networking:canister_http_correctness_test_head_nns_local 108.9 113.9 -5.0
//rs/tests/consensus/orchestrator:node_reassignment_test_head_nns_local 328.8 333.2 -4.4
//rs/tests/boundary_nodes:api_bn_decentralization_test_local 131.0 135.2 -4.2
//rs/tests/nns:node_removal_from_registry_test_local 76.4 80.4 -4.0
//rs/tests/crypto:ic_crypto_fstrim_tool_test_local 39.3 42.4 -3.1
//rs/tests/boundary_nodes:rate_limit_canister_test_local 155.2 157.9 -2.7
//rs/tests/consensus:cup_explorer_test_local 143.9 146.6 -2.7
//rs/tests/consensus/tecdsa:tecdsa_complaint_test_head_nns_local 124.2 126.7 -2.5
//rs/tests/boundary_nodes:rate_limit_canister_test_head_nns_local 157.1 159.4 -2.3
//rs/tests/consensus:max_ingress_payload_size_test_local 81.1 83.3 -2.2
//rs/tests/crypto:ic_crypto_csp_metrics_test_local 36.3 38.3 -2.0
//rs/tests/consensus:dual_workload_test_local 68.5 70.4 -1.9
//rs/tests/networking/firewall:firewall_priority_test_head_nns_local 180.0 181.9 -1.9
//rs/tests/crypto:request_signature_test_local 41.8 43.0 -1.2
//rs/tests/nns:nns_cycles_minting_test_local 145.0 146.1 -1.1
//rs/tests/consensus:catch_up_possible_test_local 261.7 262.5 -0.8
//rs/tests/boundary_nodes:salt_sharing_canister_test_head_nns_local 141.0 124.7 +16.3
//rs/tests/consensus/orchestrator:unstuck_subnet_test_local 235.0 234.2 +0.8
//rs/tests/consensus/tecdsa:pre_signature_stash_management_test_head_nns_local 189.0 186.9 +2.1
//rs/tests/consensus/tecdsa:tecdsa_add_nodes_test_head_nns_local 233.1 218.0 +15.1
//rs/tests/consensus/tecdsa:tecdsa_add_nodes_test_local 240.1 229.6 +10.5
//rs/tests/consensus/upgrade:upgrade_downgrade_old_app_subnet_test_head_nns_local 887.1 858.5 +28.6
//rs/tests/consensus/vetkd:vetkd_key_life_cycle_test_head_nns_local 191.6 183.3 +8.3
//rs/tests/consensus:max_ingress_payload_size_test_head_nns_local 82.2 80.9 +1.3
//rs/tests/cross_chain:ic_xc_ledger_suite_orchestrator_test_head_nns_local 124.9 120.1 +4.8
//rs/tests/crypto:canister_sig_verification_cache_test_local 100.7 88.8 +11.9
//rs/tests/crypto:ic_crypto_csp_umask_test_local 41.0 40.4 +0.6
//rs/tests/crypto:rpc_csp_vault_reconnection_test_local 39.8 38.6 +1.2
//rs/tests/execution:cycles_cost_schedule_test_local 96.1 92.6 +3.5
//rs/tests/execution:system_api_security_test_local 204.4 78.4 +126.0
//rs/tests/financial_integrations:transaction_ledger_correctness_test_local 138.3 132.2 +6.1
//rs/tests/message_routing:rejoin_test_large_state_local 512.9 414.6 +98.3
//rs/tests/networking/firewall:firewall_max_connections_test_head_nns_local 98.9 98.5 +0.4
//rs/tests/networking:canister_http_correctness_test_local 128.8 122.4 +6.4
//rs/tests/node:root_tests_local 115.2 103.4 +11.8
//rs/tests/query_stats:query_stats_basic_local 184.2 183.5 +0.7
//rs/tests/sdk:dfx_smoke_test_head_nns_local 138.1 119.9 +18.2
| .append(true) | ||
| .open(&lock_path) | ||
| .with_context(|| format!("opening image cache lock {}", lock_path.display()))?; | ||
| nix::fcntl::flock(lock.as_raw_fd(), nix::fcntl::FlockArg::LockExclusive) |
There was a problem hiding this comment.
This is released implicitly when the function returns and the lock is dropped, closing the file descriptor and releasing the lock, correct?
Also are there any forks potentially happening on other threads while the FD is open (which would inherit it)?
| // Grow the overlay's virtual size to `min_gib` only when it exceeds | ||
| // the base image's size (the base is raw, so its byte length is its | ||
| // virtual size). Mirrors the old `truncate --size=>{min_gib}G` | ||
| // grow-only semantics; a smaller request leaves the base size. |
There was a problem hiding this comment.
let's make sure the comments don't reference old code
| // Extract the shared pristine image once into the content-addressed | ||
| // base cache, then give this VM a thin copy-on-write qcow2 overlay |
There was a problem hiding this comment.
Just out of curiosity, which part is RO RAW and which part is mutable? I'm not familiar with the GuestOS' file structure
| ) | ||
| })?; | ||
| } | ||
| Ok(base) |
There was a problem hiding this comment.
It's very important that this image never gets relocated, correct? Is the path absolute btw?
| // directory — e.g. distinct base images under `image_cache`, which are | ||
| // guarded by distinct per-key locks — cannot collide on the scratch dir. | ||
| let tmp = parent.join(format!( | ||
| ".extract-{}-{}", |
There was a problem hiding this comment.
nit: instead of using the image name and for extra robustness, could also push the PID idea further and append the thread id (since I'm guessing it's the thread concurrency that might cause issues here)
What
Alternative to #10590. Same goal — stop re-decompressing the shared primary GuestOS image once per node in the local (libvirt + QEMU) system-test backend — but using qcow2 backing-file overlays instead of
cp --reflink=auto.#10590(cp --reflink) only avoids a full copy on CoW filesystems (btrfs/XFS). On ext4 (this dev container, and CI) it falls back to a full copy and is a slight regression. qcow2 overlays are near-instant and filesystem-independent.We should really use the same trick on Farm as well!
How
ensure_base_image(src)extracts the shared image once intoimage_cache/<key>.imgunder a per-keyflock(base kept read-only0o444).start_vmgives each VM a thin overlay:qemu-img create -f qcow2 -F raw -b <base> primary.qcow2.min_boot_image_size_gibgrows the overlay's virtual size only when it exceeds the raw base size.DiskEntry.driver_type(qcow2for the primary,rawfor config disks) wired throughguestos_vm_template.xml. The overlay header records the backing chain, so no<backingStore>element is needed.attach_disk_images) are unchanged (they are per-VM unique).Benchmark (8-node testnet, GuestOS image 556M zst → 1.6G sparse, 32 cores)
Full benchmarks below.
Disk-prep wall time:
cp --reflink)/dev/vdd)/dev/vdb)qcow2 is faster than master on both filesystems (modestly on fast local disk, ~3–7× on slow CEPH), whereas the
cp --reflinkapproach regressed on ext4.It also uses far less scratch space (one 1.6G base + thin overlays vs 8×1.6G full copies). Example, currently the
//rs/tests/idx:basic_health_test_localuses the following space:With qcow2 this drops to: