Skip to content

Widen VarBinBuilder offets during FSST compress#7853

Open
connortsui20 wants to merge 1 commit intodevelopfrom
ct/offsets-i64
Open

Widen VarBinBuilder offets during FSST compress#7853
connortsui20 wants to merge 1 commit intodevelopfrom
ct/offsets-i64

Conversation

@connortsui20
Copy link
Copy Markdown
Contributor

@connortsui20 connortsui20 commented May 8, 2026

Summary

Fixes: #7833

I don't think that a more complicated solution here makes a lot of sense. We narrow these offsets is the compressor anyways, so there is no reason to do something more complicated than this.

Additionally adds the logic for comparing with arrow large binary arrays that was missing before.

To be honest, I feel like it doesn't really make any sense to allow the user to specify what the offsets and sizes types should be at all, I think we can have the builder always use the 64-bit types because we know that they will be compressed by the compressor anyways.

Testing

Adds some regression tests. I've ignored the big test because it allocates too much memory.

You can run the ignored test with:

cargo t --release -p vortex-fsst varbin_compress_offsets_overflow_i32 -- --ignored --no-capture

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
@connortsui20 connortsui20 added the changelog/fix A bug fix label May 8, 2026
@connortsui20 connortsui20 changed the title Widen varbinbuilder offets during FSST compress Widen VarBinBuilder offets during FSST compress May 8, 2026
@connortsui20 connortsui20 enabled auto-merge (squash) May 8, 2026 21:42
Copy link
Copy Markdown
Contributor

@joseph-isaacs joseph-isaacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please have the test run in ci. what is the problem with doing it?

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 8, 2026

Merging this PR will degrade performance by 15.69%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 2 improved benchmarks
❌ 2 regressed benchmarks
✅ 1204 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation take_map[(0.1, 0.5)] 1,111.5 µs 989.4 µs +12.34%
Simulation take_map[(0.1, 1.0)] 1.8 ms 1.6 ms +11.81%
Simulation eq_pushdown_high_match 1.1 ms 1.3 ms -14.44%
Simulation eq_pushdown_low_match 982.4 µs 1,165.3 µs -15.69%

Comparing ct/offsets-i64 (0275c57) with develop (ff12040)

Open in CodSpeed

@connortsui20
Copy link
Copy Markdown
Contributor Author

Oh I forgot about test with

@connortsui20 connortsui20 disabled auto-merge May 8, 2026 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/fix A bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FSST: fsst_compress panics on cumulative output >2 GiB (i32 offset overflow)

2 participants