Skip to content

perf(system-tests): share local-backend base image via qcow2 overlays#10591

Open
basvandijk wants to merge 4 commits into
masterfrom
bas/local-backend-qcow2-overlays
Open

perf(system-tests): share local-backend base image via qcow2 overlays#10591
basvandijk wants to merge 4 commits into
masterfrom
bas/local-backend-qcow2-overlays

Conversation

@basvandijk

@basvandijk basvandijk commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

What

Alternative to #10590. Same goal — stop re-decompressing the shared primary GuestOS image once per node in the local (libvirt + QEMU) system-test backend — but using qcow2 backing-file overlays instead of cp --reflink=auto.

#10590 (cp --reflink) only avoids a full copy on CoW filesystems (btrfs/XFS). On ext4 (this dev container, and CI) it falls back to a full copy and is a slight regression. qcow2 overlays are near-instant and filesystem-independent.

We should really use the same trick on Farm as well!

How

  • ensure_base_image(src) extracts the shared image once into image_cache/<key>.img under a per-key flock (base kept read-only 0o444).
  • start_vm gives each VM a thin overlay: qemu-img create -f qcow2 -F raw -b <base> primary.qcow2. min_boot_image_size_gib grows the overlay's virtual size only when it exceeds the raw base size.
  • Per-disk format: a new DiskEntry.driver_type (qcow2 for the primary, raw for config disks) wired through guestos_vm_template.xml. The overlay header records the backing chain, so no <backingStore> element is needed.
  • Per-node config disks (attach_disk_images) are unchanged (they are per-VM unique).

Benchmark (8-node testnet, GuestOS image 556M zst → 1.6G sparse, 32 cores)

Full benchmarks below.

Disk-prep wall time:

filesystem master (8× decompress) #10590 (cp --reflink) this PR (qcow2)
local disk (ext4, /dev/vdd) 11.5s 13–17.6s 9.0s
CEPH-backed (ext4, /dev/vdb) ~64s 116–149s 9–19s

qcow2 is faster than master on both filesystems (modestly on fast local disk, ~3–7× on slow CEPH), whereas the cp --reflink approach regressed on ext4.

It also uses far less scratch space (one 1.6G base + thin overlays vs 8×1.6G full copies). Example, currently the //rs/tests/idx:basic_health_test_local uses the following space:

du -chs  ~/.cache/test_tmpdir_current/
27G	/home/ubuntu/.cache/test_tmpdir_current/
27G	total

With qcow2 this drops to:

du -chs  ~/.cache/test_tmpdir_qcow2/
16G	/home/ubuntu/.cache/test_tmpdir_qcow2/
16G	total

The local (libvirt + QEMU) system-test backend decompressed the same
primary GuestOS image once per node when starting a testnet's VMs.

Extract each `src` once into a shared, content-addressed base
(image_cache/<key>.img) under a per-key flock, then give every VM a thin
copy-on-write qcow2 overlay backed by it (`qemu-img create -f qcow2 -F raw
-b <base>`). Overlay creation is near-instant and filesystem-independent
(unlike `cp --reflink`, which only avoids a full copy on CoW filesystems),
so this is a strict speedup. Per-node config disks stay raw.

Disk-prep for an 8-node testnet (GuestOS image, 556M zst -> 1.6G sparse),
wall time:

  filesystem    master (8x decompress)   qcow2 overlays
  local disk    11.5s                    9.0s
  CEPH-backed   ~64s                     9-19s

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves performance of the local (libvirt + QEMU) system-test backend by extracting the primary GuestOS image once into a shared cache and booting each VM from a thin qcow2 overlay backed by that shared raw base image.

Changes:

  • Add a shared base-image cache (image_cache/<key>.img) with per-key flock to serialize extraction across threads/processes.
  • Switch primary VM disks to qcow2 overlays (primary.qcow2) created via qemu-img create -f qcow2 -F raw -b <base> ..., preserving reboot semantics by reusing the existing overlay.
  • Extend the guestos libvirt XML template to support per-disk driver formats via DiskEntry.driver_type (qcow2 for primary, raw for config/extra disks).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
rs/tests/driver/templates/guestos_vm_template.xml Make libvirt disk <driver type=...> configurable per disk to support qcow2 overlays for the primary disk.
rs/tests/driver/src/driver/local_backend.rs Add shared base-image caching and create per-VM qcow2 overlays backed by the cached raw base; plumb per-disk driver type into the template.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rs/tests/driver/src/driver/local_backend.rs
Comment thread rs/tests/driver/src/driver/local_backend.rs Outdated
- extract_image: make the `.tar.zst` scratch dir unique per destination
  (`.extract-<pid>-<dst>`) so concurrent extractions of different base
  images into the shared `image_cache` directory cannot collide.
- start_vm: capture `qemu-img create`'s exit status and stderr in the
  error so overlay-creation failures are debuggable in CI.
Comment thread .github/workflows/local-system-tests.yml Outdated
@basvandijk

Copy link
Copy Markdown
Collaborator Author

Benchmark: local-system-tests runtimes — this PR vs master

Compared this PR's run (28293490349) against the 3 most recent master runs (2026‑06‑25 / 26 / 27) to average out CI noise. All runs ran the same 166 tests, all passing.

Aggregate — faster than every master run on both metrics

metric this PR master 06‑27 master 06‑26 master 06‑25 vs master‑avg
Bazel step wall‑clock 204 min 219 215 216 −13 min (−5.8%)
Σ of 166 per‑test runtimes 30,420s 32,888 32,127 33,393 −2,382s (−7.3%)

The PR is below all three master runs on both metrics, so this isn't just a less‑contended runner.

Per‑test

145 / 166 tests faster, 21 slower, 0 equal; median per‑test delta −12.8s.

Biggest wins (heavy multi‑node tests, where the most redundant image decompressions are avoided):

upgrade_downgrade_nns_subnet_test_head_nns_local   658 vs 767   -109s
upgrade_downgrade_app_subnet_test_head_nns_local   816 vs 897    -81s
message_routing:rejoin_test_local                  186 vs 252    -66s
nns_cycles_minting_multi_app_subnets_test          116 vs 176    -60s
ckbtc:ckdoge_minter_basics_test                    232 vs 290    -58s
networking:canister_http_fault_tolerance_test      129 vs 187    -58s

Regressions are essentially noise

19 of the 21 "slower" tests are within ±~18s (CI jitter). The two large‑looking ones aren't reliable:

  • rejoin_test_large_state_local (+98s) is the noisiest test in the suite — it varies by 266s across the 3 master runs alone, so the delta is within its own spread.
  • system_api_security_test_local (+126s, normally ~78s) is a single suspicious data point, most likely a one‑off contended run; worth a re‑check.

Caveat: one PR run vs a 3‑run master average on a contended 64‑CPU runner; per‑test deltas smaller than a test's own run‑to‑run range aren't individually meaningful. The consistency (PR below all master runs on both metrics, 87% of tests faster, median −12.8s) makes the ~6–7% overall speedup credible.

Full per-test table (166 tests, sorted by delta; seconds)
target                                                                                 PR   master    delta
//rs/tests/consensus/upgrade:upgrade_downgrade_nns_subnet_test_head_nns_local       658.0    766.9   -108.9
//rs/tests/consensus/upgrade:upgrade_downgrade_app_subnet_test_head_nns_local       816.3    897.4    -81.1
//rs/tests/message_routing:rejoin_test_local                                        186.0    252.3    -66.3
//rs/tests/nns:nns_cycles_minting_multi_app_subnets_test_local                      116.2    176.2    -60.0
//rs/tests/ckbtc:ckdoge_minter_basics_test_local                                    232.1    290.1    -58.0
//rs/tests/networking:canister_http_fault_tolerance_test_local                      128.6    186.6    -58.0
//rs/tests/consensus/upgrade:upgrade_downgrade_old_nns_subnet_test_head_nns_local    715.8    771.9    -56.1
//rs/tests/consensus/upgrade:upgrade_downgrade_old_nns_subnet_test_local            732.9    781.7    -48.8
//rs/tests/ckbtc:ckbtc_minter_checker_head_nns_local                                279.4    322.0    -42.6
//rs/tests/ckbtc:ckdoge_minter_basics_test_head_nns_local                           242.1    283.4    -41.3
//rs/tests/cross_chain:doge_adapter_basics_test_local                               148.9    189.4    -40.5
//rs/tests/ckbtc:ckbtc_minter_checker_local                                         271.0    310.5    -39.5
//rs/tests/query_stats:query_stats_https_outcalls_local                             163.2    201.3    -38.1
//rs/tests/consensus/orchestrator:node_assign_test_head_nns_local                   218.6    256.6    -38.0
//rs/tests/query_stats:query_stats_below_threshold_local                            155.5    193.5    -38.0
//rs/tests/consensus/orchestrator:ssh_access_to_nodes_test_local                    507.7    543.5    -35.8
//rs/tests/networking:canister_http_fault_tolerance_test_head_nns_local             135.0    170.8    -35.8
//rs/tests/consensus/upgrade:upgrade_downgrade_old_unassigned_nodes_test_local      135.1    170.6    -35.5
//rs/tests/nns:nns_dapp_test_head_nns_local                                         106.6    141.7    -35.1
//rs/tests/nns:nns_dapp_test_local                                                  105.7    140.8    -35.1
//rs/tests/ckbtc:ckbtc_minter_update_balance_head_nns_local                         277.4    311.2    -33.8
//rs/tests/nns:nns_token_balance_test_local                                          87.0    120.3    -33.3
//rs/tests/nns:rent_subnet_test_local                                               197.8    230.9    -33.1
//rs/tests/ckbtc:ckbtc_minter_retrieve_btc_local                                    170.9    203.2    -32.3
//rs/tests/ckbtc:ckbtc_minter_deposit_and_withdrawal_head_nns_local                 202.2    234.2    -32.0
//rs/tests/ckbtc:ckbtc_minter_deposit_and_withdrawal_local                          187.0    219.0    -32.0
//rs/tests/consensus/upgrade:upgrade_downgrade_nns_subnet_test_local                706.9    737.1    -30.2
//rs/tests/consensus:node_graceful_leaving_test_head_nns_local                      161.6    191.6    -30.0
//rs/tests/query_stats:query_stats_above_threshold_local                            151.7    181.5    -29.8
//rs/tests/consensus:subnet_splitting_test_local                                    203.9    233.2    -29.3
//rs/tests/boundary_nodes:salt_sharing_canister_test_local                           95.1    123.6    -28.5
//rs/tests/consensus/orchestrator:node_reassignment_test_local                      308.0    335.0    -27.0
//rs/tests/consensus/upgrade:upgrade_with_alternative_urls_local                    124.1    151.0    -26.9
//rs/tests/consensus/orchestrator:unstuck_subnet_test_head_nns_local                208.2    234.9    -26.7
//rs/tests/message_routing:global_reboot_test_head_nns_local                        157.3    183.9    -26.6
//rs/tests/message_routing:global_reboot_test_local                                 160.7    187.1    -26.4
//rs/tests/networking:http_endpoints_public_spec_test_local                          78.8    104.0    -25.2
//rs/tests/message_routing:rejoin_test_head_nns_local                               208.1    232.7    -24.6
//rs/tests/consensus/upgrade:upgrade_downgrade_unassigned_nodes_test_local          140.9    165.0    -24.1
//rs/tests/crypto:ingress_verification_test_local                                   172.3    195.0    -22.7
//rs/tests/ckbtc:ckbtc_minter_basics_test_local                                     166.1    188.5    -22.4
//rs/tests/consensus/tecdsa:tecdsa_remove_nodes_test_local                          213.3    235.3    -22.0
//rs/tests/consensus/upgrade:upgrade_downgrade_unassigned_nodes_test_head_nns_local    143.4    164.8    -21.4
//rs/tests/execution:compute_allocation_test_local                                   45.6     67.0    -21.4
//rs/tests/financial_integrations/rosetta:rosetta_neuron_follow_test_head_nns_local    141.4    162.7    -21.3
//rs/tests/consensus:adding_nodes_to_subnet_test_local                              341.5    362.5    -21.0
//rs/tests/ckbtc:ckbtc_minter_batching_local                                        202.4    223.3    -20.9
//rs/tests/consensus/tecdsa:tecdsa_two_signing_subnets_test_head_nns_local          163.7    184.6    -20.9
//rs/tests/cross_chain:btc_adapter_basics_test_local                                114.2    135.0    -20.8
//rs/tests/ckbtc:ckbtc_minter_batching_head_nns_local                               207.3    227.9    -20.6
//rs/tests/consensus/tecdsa:tecdsa_remove_nodes_test_head_nns_local                 211.3    231.9    -20.6
//rs/tests/ckbtc:ckbtc_minter_update_balance_local                                  281.4    301.3    -19.9
//rs/tests/execution:inter_canister_queries_test_local                               51.2     70.9    -19.7
//rs/tests/networking:canister_http_test_local                                       74.9     93.5    -18.6
//rs/tests/financial_integrations/rosetta:rosetta_test_head_nns_local               107.8    126.3    -18.5
//rs/tests/consensus/tecdsa:tecdsa_complaint_test_local                             117.0    135.4    -18.4
//rs/tests/consensus:catch_up_loop_prevention_test_local                            256.7    275.1    -18.4
//rs/tests/message_routing/xnet:xnet_compatibility_local                            517.1    535.2    -18.1
//rs/tests/networking:canister_http_test_head_nns_local                              76.4     94.3    -17.9
//rs/tests/consensus/orchestrator:ssh_access_to_nodes_test_head_nns_local           496.6    514.2    -17.6
//rs/tests/financial_integrations/rosetta:rosetta_neuron_follow_test_local          139.9    157.5    -17.6
//rs/tests/consensus:adding_nodes_to_subnet_test_head_nns_local                     337.6    355.1    -17.5
//rs/tests/execution:cycles_restrictions_test_local                                  66.4     83.8    -17.4
//rs/tests/consensus:replica_determinism_test_local                                  66.2     83.3    -17.1
//rs/tests/cross_chain:ic_xc_cketh_test_head_nns_local                              189.3    206.4    -17.1
//rs/tests/consensus:request_auth_malicious_replica_test_local                      106.3    122.9    -16.6
//rs/tests/idx:basic_health_test_local                                               65.1     81.7    -16.6
//rs/tests/financial_integrations/rosetta:rosetta_test_local                        107.2    123.6    -16.4
//rs/tests/boundary_nodes:api_bn_decentralization_test_head_nns_local               126.5    142.8    -16.3
//rs/tests/consensus/vetkd:vetkd_key_life_cycle_test_local                          172.3    188.5    -16.2
//rs/tests/ckbtc:ckbtc_minter_basics_test_head_nns_local                            172.3    188.3    -16.0
//rs/tests/networking:canister_http_non_replicated_test_head_nns_local               85.4    100.3    -14.9
//rs/tests/sdk:dfx_smoke_test_local                                                 109.0    123.8    -14.8
//rs/tests/execution:canister_migration_test_head_nns_local                         499.7    514.4    -14.7
//rs/tests/execution:resource_limits_test_local                                      38.9     53.6    -14.7
//rs/tests/sdk:dfx_check_old_wallet_version_test_local                               38.8     53.2    -14.4
//rs/tests/consensus/orchestrator:rotate_ecdsa_idkg_key_test_local                  385.4    399.7    -14.3
//rs/tests/networking:canister_http_flexible_test_local                              80.1     94.3    -14.2
//rs/tests/boundary_nodes:api_bn_integration_test_head_nns_local                     65.9     79.6    -13.7
//rs/tests/boundary_nodes:api_bn_integration_test_local                              64.4     77.8    -13.4
//rs/tests/consensus:safety_test_local                                               60.7     73.9    -13.2
//rs/tests/execution:system_subnets_test_local                                       67.0     80.2    -13.2
//rs/tests/consensus/tecdsa:tecdsa_two_signing_subnets_test_local                   165.9    178.9    -13.0
//rs/tests/networking:canister_http_time_out_test_head_nns_local                     79.6     92.1    -12.5
//rs/tests/consensus:guestos_recovery_engine_smoke_test_local                        72.0     84.3    -12.3
//rs/tests/networking:canister_http_non_replicated_test_local                        81.9     94.1    -12.2
//rs/tests/consensus/subnet_recovery:sr_nns_failover_nodes_test_head_nns_local      234.7    246.5    -11.8
//rs/tests/consensus/subnet_recovery:sr_nns_failover_nodes_test_local               229.1    240.9    -11.8
//rs/tests/consensus/tecdsa:tschnorr_message_sizes_test_local                       218.8    230.6    -11.8
//rs/tests/consensus/tecdsa:tschnorr_message_sizes_test_head_nns_local              215.5    227.2    -11.7
//rs/tests/networking:canister_http_time_out_test_local                              85.3     97.0    -11.7
//rs/tests/consensus/upgrade:upgrade_with_alternative_urls_head_nns_local           143.5    155.1    -11.6
//rs/tests/consensus:max_xnet_payload_size_test_local                                48.6     59.9    -11.3
//rs/tests/financial_integrations:transaction_ledger_correctness_test_head_nns_local    122.0    133.2    -11.2
//rs/tests/execution:max_number_of_canisters_test_local                              39.5     50.4    -10.9
//rs/tests/node:guestos_no_failed_systemd_units_local                                36.1     46.9    -10.8
//rs/tests/cross_chain:ic_xc_cketh_test_local                                       192.7    202.4     -9.7
//rs/tests/financial_integrations:icrc1_agent_test_local                             55.8     65.4     -9.6
//rs/tests/consensus/orchestrator:node_assign_test_local                            229.2    238.3     -9.1
//rs/tests/networking/firewall:firewall_priority_test_local                         173.1    182.0     -8.9
//rs/tests/consensus/orchestrator:rotate_ecdsa_idkg_key_test_head_nns_local         386.3    395.1     -8.8
//rs/tests/consensus/upgrade:upgrade_downgrade_app_subnet_test_local                870.5    879.3     -8.8
//rs/tests/execution:general_execution_test_local                                    79.2     87.6     -8.4
//rs/tests/cross_chain:btc_get_balance_test_local                                    67.7     76.0     -8.3
//rs/tests/consensus:cup_explorer_test_head_nns_local                               142.5    150.5     -8.0
//rs/tests/idx:ii_delegation_test_local                                              49.1     57.1     -8.0
//rs/tests/consensus:node_graceful_leaving_test_local                               169.7    177.5     -7.8
//rs/tests/networking/firewall:firewall_max_connections_test_local                   96.5    104.3     -7.8
//rs/tests/consensus/tecdsa:pre_signature_stash_management_test_local               169.6    177.3     -7.7
//rs/tests/cross_chain:ic_xc_ledger_suite_orchestrator_test_local                   107.2    114.8     -7.6
//rs/tests/consensus/upgrade:upgrade_downgrade_old_unassigned_nodes_test_head_nns_local    158.1    165.5     -7.4
//rs/tests/message_routing/xnet:xnet_malicious_slices_local                          84.6     92.0     -7.4
//rs/tests/node:kill_start_long_test_local                                          247.4    254.8     -7.4
//rs/tests/nns:node_swap_test_local                                                  54.7     62.0     -7.3
//rs/tests/consensus:liveness_with_equivocation_test_local                           51.6     58.8     -7.2
//rs/tests/consensus:subnet_splitting_test_head_nns_local                           218.5    225.7     -7.2
//rs/tests/networking:canister_http_flexible_test_head_nns_local                     86.6     93.7     -7.1
//rs/tests/ckbtc:ckbtc_minter_retrieve_btc_head_nns_local                           189.9    196.8     -6.9
//rs/tests/nns:bitcoin_set_config_proposal_test_local                               102.9    109.8     -6.9
//rs/tests/cross_chain:btc_get_balance_test_head_nns_local                           62.1     68.6     -6.5
//rs/tests/crypto:ic_crypto_csp_start_order_test_local                               35.5     41.7     -6.2
//rs/tests/networking:read_state_test_local                                          87.7     93.6     -5.9
//rs/tests/message_routing:memory_safety_test_local                                  50.2     55.9     -5.7
//rs/tests/networking:http_endpoints_public_spec_test_head_nns_local                 91.5     97.1     -5.6
//rs/tests/node:kill_start_short_test_local                                          78.1     83.7     -5.6
//rs/tests/consensus/upgrade:upgrade_downgrade_old_app_subnet_test_local            849.9    855.4     -5.5
//rs/tests/networking:read_state_test_head_nns_local                                 92.6     98.1     -5.5
//rs/tests/nns:certified_registry_test_local                                         52.2     57.7     -5.5
//rs/tests/crypto:ic_crypto_csp_socket_test_local                                    34.6     40.0     -5.4
//rs/tests/networking:canister_http_correctness_test_head_nns_local                 108.9    113.9     -5.0
//rs/tests/consensus/orchestrator:node_reassignment_test_head_nns_local             328.8    333.2     -4.4
//rs/tests/boundary_nodes:api_bn_decentralization_test_local                        131.0    135.2     -4.2
//rs/tests/nns:node_removal_from_registry_test_local                                 76.4     80.4     -4.0
//rs/tests/crypto:ic_crypto_fstrim_tool_test_local                                   39.3     42.4     -3.1
//rs/tests/boundary_nodes:rate_limit_canister_test_local                            155.2    157.9     -2.7
//rs/tests/consensus:cup_explorer_test_local                                        143.9    146.6     -2.7
//rs/tests/consensus/tecdsa:tecdsa_complaint_test_head_nns_local                    124.2    126.7     -2.5
//rs/tests/boundary_nodes:rate_limit_canister_test_head_nns_local                   157.1    159.4     -2.3
//rs/tests/consensus:max_ingress_payload_size_test_local                             81.1     83.3     -2.2
//rs/tests/crypto:ic_crypto_csp_metrics_test_local                                   36.3     38.3     -2.0
//rs/tests/consensus:dual_workload_test_local                                        68.5     70.4     -1.9
//rs/tests/networking/firewall:firewall_priority_test_head_nns_local                180.0    181.9     -1.9
//rs/tests/crypto:request_signature_test_local                                       41.8     43.0     -1.2
//rs/tests/nns:nns_cycles_minting_test_local                                        145.0    146.1     -1.1
//rs/tests/consensus:catch_up_possible_test_local                                   261.7    262.5     -0.8
//rs/tests/boundary_nodes:salt_sharing_canister_test_head_nns_local                 141.0    124.7    +16.3
//rs/tests/consensus/orchestrator:unstuck_subnet_test_local                         235.0    234.2     +0.8
//rs/tests/consensus/tecdsa:pre_signature_stash_management_test_head_nns_local      189.0    186.9     +2.1
//rs/tests/consensus/tecdsa:tecdsa_add_nodes_test_head_nns_local                    233.1    218.0    +15.1
//rs/tests/consensus/tecdsa:tecdsa_add_nodes_test_local                             240.1    229.6    +10.5
//rs/tests/consensus/upgrade:upgrade_downgrade_old_app_subnet_test_head_nns_local    887.1    858.5    +28.6
//rs/tests/consensus/vetkd:vetkd_key_life_cycle_test_head_nns_local                 191.6    183.3     +8.3
//rs/tests/consensus:max_ingress_payload_size_test_head_nns_local                    82.2     80.9     +1.3
//rs/tests/cross_chain:ic_xc_ledger_suite_orchestrator_test_head_nns_local          124.9    120.1     +4.8
//rs/tests/crypto:canister_sig_verification_cache_test_local                        100.7     88.8    +11.9
//rs/tests/crypto:ic_crypto_csp_umask_test_local                                     41.0     40.4     +0.6
//rs/tests/crypto:rpc_csp_vault_reconnection_test_local                              39.8     38.6     +1.2
//rs/tests/execution:cycles_cost_schedule_test_local                                 96.1     92.6     +3.5
//rs/tests/execution:system_api_security_test_local                                 204.4     78.4   +126.0
//rs/tests/financial_integrations:transaction_ledger_correctness_test_local         138.3    132.2     +6.1
//rs/tests/message_routing:rejoin_test_large_state_local                            512.9    414.6    +98.3
//rs/tests/networking/firewall:firewall_max_connections_test_head_nns_local          98.9     98.5     +0.4
//rs/tests/networking:canister_http_correctness_test_local                          128.8    122.4     +6.4
//rs/tests/node:root_tests_local                                                    115.2    103.4    +11.8
//rs/tests/query_stats:query_stats_basic_local                                      184.2    183.5     +0.7
//rs/tests/sdk:dfx_smoke_test_head_nns_local                                        138.1    119.9    +18.2

@basvandijk basvandijk marked this pull request as ready for review June 28, 2026 12:05
@basvandijk basvandijk requested a review from a team as a code owner June 28, 2026 12:05
@github-actions github-actions Bot added the @idx label Jun 28, 2026
.append(true)
.open(&lock_path)
.with_context(|| format!("opening image cache lock {}", lock_path.display()))?;
nix::fcntl::flock(lock.as_raw_fd(), nix::fcntl::FlockArg::LockExclusive)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is released implicitly when the function returns and the lock is dropped, closing the file descriptor and releasing the lock, correct?

Also are there any forks potentially happening on other threads while the FD is open (which would inherit it)?

Comment on lines +1005 to +1008
// Grow the overlay's virtual size to `min_gib` only when it exceeds
// the base image's size (the base is raw, so its byte length is its
// virtual size). Mirrors the old `truncate --size=>{min_gib}G`
// grow-only semantics; a smaller request leaves the base size.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make sure the comments don't reference old code

Comment on lines +982 to +983
// Extract the shared pristine image once into the content-addressed
// base cache, then give this VM a thin copy-on-write qcow2 overlay

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, which part is RO RAW and which part is mutable? I'm not familiar with the GuestOS' file structure

)
})?;
}
Ok(base)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very important that this image never gets relocated, correct? Is the path absolute btw?

// directory — e.g. distinct base images under `image_cache`, which are
// guarded by distinct per-key locks — cannot collide on the scratch dir.
let tmp = parent.join(format!(
".extract-{}-{}",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of using the image name and for extra robustness, could also push the PID idea further and append the thread id (since I'm guessing it's the thread concurrency that might cause issues here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants