Consolidate scan tuning params into one IcebergPerfConfig (by-value)#108
Open
hall-alex wants to merge 2 commits into
Open
Consolidate scan tuning params into one IcebergPerfConfig (by-value)#108hall-alex wants to merge 2 commits into
hall-alex wants to merge 2 commits into
Conversation
Replace the per-knob `with_*!` scan setters with a single `IcebergPerfConfig` struct passed by value into `new_scan` / `new_incremental_scan` at construction. This makes the config the single authoritative home for every tuning default and turns a forgotten knob into a compile error rather than a runtime/silent-default bug. Julia side: - Add `IcebergPerfConfig` (UInt64 fields, @kwdef defaults, full tuning docstring) in scan_common.jl; export it. This is now the one place defaults live. - `new_scan(table, perf)` / `new_incremental_scan(table, from, to, perf)` take the config by value over the FFI; delete `with_batch_size!`, `with_file_prefetch_depth!`, `with_serialization_concurrency_limit!`, `with_manifest_*_concurrency_limit!`, and the `with_data_file_concurrency_limit!` no-ops (plus their exports). - Add an `ENV["ICEBERG_RUST_FFI_LIB"]` escape hatch on `const rust_lib` (mirrors RustyObjectStore's `OBJECT_STORE_LIB`) so local/unreleased dylibs load in subprocesses such as ReTestItems workers, which run in a temp test env that does not inherit a `LocalPreferences.toml` override. Rust FFI side: - Add `#[repr(C)] IcebergPerfConfigFFI` (5x u64, no Default) as the by-value wire struct; `iceberg_new_scan` / `iceberg_new_incremental_scan` take it and apply the manifest/batch knobs to the iceberg-rs builder, storing the rest on the scan. - Drop `batch_size: Option<usize>` -> `usize`, remove the dead `file_concurrency` field, and remove all `0 = auto` (cpu_count) fallbacks; values are now used verbatim from the config. - Delete the dead `with_*` FFI setters/macros and the `data_file_concurrency_limit` no-ops. Tests: - Add perf_config_tests.jl: a server-free binary-compatibility check (isbits, sizeof, and an `iceberg_perf_config_roundtrip` FFI echo with distinct sentinels) plus a Rust `cargo test` asserting struct size/align/offsets. - Migrate scan_tests.jl to the new construction API. Versions: bump iceberg_rust_ffi crate and RustyIceberg.jl to 0.9.0 (ABI break), and require `iceberg_rust_ffi_jll = "0.9"`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
robertbuessow
approved these changes
Jun 25, 2026
robertbuessow
left a comment
Contributor
There was a problem hiding this comment.
How about MAX_BUFFERED_BYTES_PER_TASK and MAX_PREFETCH_BUFFERS_OF_WAITING_FILE
Collaborator
Author
@robertbuessow, good idea! Having Claude add these do the struct as well. |
Follow-up to Robert's review on #108: fold the three previously hard-coded per-file output-buffer constants into the per-scan `IcebergPerfConfig` so they join the other tuning knobs as caller-supplied, single-source-of-truth defaults. - `max_buffered_bytes_per_task`, `max_prefetch_buffers_of_waiting_file`, `max_prefetch_buffers_of_active_file` are added to `IcebergPerfConfig` (Julia, defaults 100 MiB / 1 / 8) and to the `IcebergPerfConfigFFI` wire struct (now 8 × u64). - Rust: the `MAX_*` consts are deleted; the values are carried per-scan in a new `BufferLimits` struct on `IcebergScan` / `IcebergIncrementalScan` and threaded through `create_nested_pipeline` / `spawn_file_task` to size the byte semaphore, the waiting/active slot budgets, and the per-file mpsc capacity. The stats display reads the per-scan byte budget from `STATS` instead of the dead const. - Also fixes the two CI lint failures on #108: `cargo fmt` and a `cargo clippy` `-D warnings` (empty-line-after-doc-comment + items-after-test-module in scan_common.rs). - perf_config_tests.jl round-trip and the Rust ABI test extended to 8 fields; pipeline-unification.md updated (the "expose these as FFI-tunable knobs" future- work item is now done). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
RAI-50776
Consolidate all Iceberg scan tuning parameters into a single
IcebergPerfConfigthat is the one authoritative home for every default, passed by value intonew_scan/new_incremental_scanat construction. Removes the per-knobwith_*!setters, theOption/0 = autohidden defaults, the deaddata_file_concurrency_limit, and the hard-coded per-file buffer constants — a forgotten knob is now a compile error, not a silent wrong default.🔀 How to merge (do these in order)
The dependency chain is raicode → RustyIceberg.jl (Julia, git-rev pinned) →
iceberg_rust_ffi_jll(the Rust dylib). Because this is an ABI break, the JLL must be published first so this PR'stest-juliaCI can resolveiceberg_rust_ffi_jll = "0.9"and go green.iceberg_rust_ffiatv0.9.0with theGitSourcepointing at this branch's pushed commit (5493fe4). The Rust crate is fetched by commit SHA, so this does not require this PR to be merged — it can (and should) be done now. Pattern: Yggdrasil#13808 (Robert's0.8.0), Yggdrasil#13839 (Gerald's bump).iceberg_rust_ffi_jll 0.9.xis registered (the[compat]here requires"0.9"). Until then,test-julia-*correctly fails withUnsatisfiable requirements … iceberg_rust_ffi_jll. ✅ Existing consumers are unaffected meanwhile: raicodemasterpins RustyIceberg by an exact git-rev in itsManifest.tomllockfile, so it keeps resolving the old0.8.3+ oldiceberg_rust_ffi_jll 0.8.x.RustyIceberg.jl 0.9.0release (open the merge commit and comment to kick off the registry/TagBot flow — see this repo's README → Release). Version is already bumped to0.9.0here.Manifest.tomlto this new rev +iceberg_rust_ffi_jll 0.9.xand starts using the new API. See that PR for its own steps.Prior art for this exact 3-repo dance (RustyIceberg.jl → Yggdrasil → raicode): the Arrow C Data Interface breaking change — RustyIceberg.jl#100 → Yggdrasil#13808 → raicode#27727.
Summary
IcebergPerfConfig(Julia,scan_common.jl) holds every tuning default with the full tuning docstring; raicode re-exports it and no longer defines its own. Knobs:batch_size,manifest_file_concurrency_limit,manifest_entry_concurrency_limit,file_prefetch_depth,serialization_concurrency_limit, plus the per-file output-buffer limitsmax_buffered_bytes_per_task,max_prefetch_buffers_of_waiting_file,max_prefetch_buffers_of_active_file.new_scan(table, perf)/new_incremental_scan(table, from, to, perf)take the config and apply the manifest/batch knobs to the iceberg-rs builder at construction. Deleted:with_batch_size!,with_file_prefetch_depth!,with_serialization_concurrency_limit!,with_manifest_*_concurrency_limit!, and thewith_data_file_concurrency_limit!no-ops (+ their exports).MAX_BUFFERED_BYTES_PER_TASK/MAX_PREFETCH_BUFFERS_OF_{WAITING,ACTIVE}_FILEconstants are gone; the values are carried per-scan in aBufferLimitsstruct on the scan and threaded throughcreate_nested_pipeline/spawn_file_taskto size the byte semaphore, the waiting/active slot budgets, and the per-file mpsc capacity. Defaults (100 MiB / 1 / 8) live only in the JuliaIcebergPerfConfig.#[repr(C)] IcebergPerfConfigFFI(8×u64, noDefault) as the by-value wire struct;batch_size: Option<usize>→usize; removed the deadfile_concurrencyfield and every0 = auto(cpu_count) fallback — values are used verbatim. Dead FFI setters/macros removed.const rust_libnow honorsENV["ICEBERG_RUST_FFI_LIB"](mirrors RustyObjectStore'sOBJECT_STORE_LIB) so local/unreleased dylibs load in subprocesses such as ReTestItems workers, which run in a temp test env that does not inherit aLocalPreferences.tomloverride.iceberg_rust_fficrate +RustyIceberg.jl→0.9.0;[compat] iceberg_rust_ffi_jll = "0.9".Organized as two commits: the consolidation, then the per-file buffer-limit knobs (+ the
cargo fmt/clippyfixes from the first CI run).Test plan
cargo test— crate compiles;perf_config_abi_tests(struct size/align/offsets, 8× u64) passes;cargo fmt --checkandcargo clippy --all-targets -- -D warningsclean.test/perf_config_tests.jl— server-free binary-compat round-trip (isbits,sizeof == 8 × u64,iceberg_perf_config_roundtripecho with 8 distinct sentinels) verifies the Julia ↔ Rust struct layout agree.test/scan_tests.jlmigrated to the new construction API.ENV["ICEBERG_RUST_FFI_LIB"].test-julia-*CI — green onceiceberg_rust_ffi_jll 0.9.xis published (step 1).🤖 Generated with Claude Code