Validate row-index selections#8574
Conversation
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Merging this PR will improve performance by 11.9%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | encode_varbin[(1000, 4)] |
157 µs | 139.8 µs | +12.34% |
| ⚡ | Simulation | encode_varbin[(1000, 32)] |
162.5 µs | 144.8 µs | +12.22% |
| ⚡ | Simulation | encode_varbin[(1000, 8)] |
157.3 µs | 140.4 µs | +12.05% |
| ⚡ | Simulation | encode_varbin[(1000, 2)] |
156.1 µs | 140.7 µs | +10.98% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ngates/selection-index-sortedness (a2e44d2) with develop (2a19323)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
| 1 => Selection::IncludeByIndex(StrictSortedBuffer::try_new(Buffer::copy_from( | ||
| selection_idx, | ||
| ))?), | ||
| 2 => Selection::ExcludeByIndex(StrictSortedBuffer::try_new(Buffer::copy_from( | ||
| selection_idx, | ||
| ))?), |
There was a problem hiding this comment.
can you remove the validation from java? JNI actually gets validated selections
There was a problem hiding this comment.
I have no idea where you want to assert that? Seems odd to accept it unchecked over FFI?
There was a problem hiding this comment.
If you look at the code you will notice that with this change this validation happens twice. I am asking you to have it happen once again as before this change. The validation happens on the java side of the jni bindings
There was a problem hiding this comment.
in particular the logic that I am talking about is here https://github.com/vortex-data/vortex/blob/develop/java/vortex-jni/src/main/java/dev/vortex/api/ScanOptions.java#L94
Rational for this change
We binary search the row selections and therefore panic if they are out of order. Therefore we should validate them on-construction to be sorted
What changes are included in this PR?
Include/ExcludeByIndex has try_new constructors.
An alternative would be to have a SortedBuffer wrapper type perhaps?
What APIs are changed? Are there any user-facing changes?
Include/ExcludeByIndex