Skip to content

Speed-up reduce view merging in fabric#6046

Open
nickva wants to merge 1 commit into
mainfrom
view-reduce-find-min-key
Open

Speed-up reduce view merging in fabric#6046
nickva wants to merge 1 commit into
mainfrom
view-reduce-find-min-key

Conversation

@nickva

@nickva nickva commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Previously, find_next_key/5 reduce collector [1] effectively ran hd(lists:sort(...)) to get the next key. It was spending time sorting, then throwing everything away except the head. For the next row, it was sorting again, throwing everything away, and taking only the head, and so on... Needless to say, that's wasteful and we should simply find the minimum and return that.

As an another optimization replace the dict with a map. Maps are more efficient and generally faster.

Also, add extra unit tests asserting ICU collation behavior actually takes place. In a few cases explicitly assert that the new behavior is the same as the previous hd(lists:sort(...)) behavior.

To benchmark used a simple reduce view with a built-in _count reducer [2]. For 100k docs, Q=8 this optimization leads to an ~2x speedup in view querying. Theoretically it would be an O(n log n) -> O(n) speedup.

PR

./reduce_group_bench.py --ndocs 100000 --q 8 --reps 7
db=reduce_group_bench q=8 n=1 ndocs=100000 group=true
loading...
  loaded 100000 docs in 4.4s
building index...
  built in 4.5s
timing group=true query x7...
  rows=100000
  min=5644.0ms  median=5786.6ms  max=6108.0ms

MAIN

 ./reduce_group_bench.py --ndocs 100000 --q 8 --reps 7
db=reduce_group_bench q=8 n=1 ndocs=100000 group=true
loading...
  loaded 100000 docs in 4.5s
building index...
  built in 4.3s
timing group=true query x7...
  rows=100000
  min=10910.8ms  median=11212.5ms  max=11627.4ms

[1] Reduce collector buffers rows from workers and keys them by the view key. For each row it emits it 1) finds the next key 2) takes all the rows that collate equal to it. This optimization will only improve reduce views

[2] https://gist.github.com/nickva/cc0a3fb296be965e014eec8c484aa8a5

Previously, `find_next_key/5` reduce collector [1] effectively ran
`hd(lists:sort(...))` to get the next key. It was spending time sorting, then
throwing everything away except the head. For the next row, it was sorting
again, throwing everything away, and taking only the head, and so on...
Needless to say, that's wasteful and we should simply find the minimum and
return that.

As an another optimization replace the dict with a map. Maps are more
efficient and generally faster.

Also, add extra unit tests asserting ICU collation behavior actually takes
place. In a few cases explicitly assert that the new behavior is the same as
the previous `hd(lists:sort(...))` behavior.

To benchmark used a simple reduce view with a built-in _count reducer [2]. For
100k docs, Q=8 this optimization leads to an ~2x speedup in view querying.
Theoretically it would be an O(n log n) -> O(n) speedup.

PR

```
./reduce_group_bench.py --ndocs 100000 --q 8 --reps 7
db=reduce_group_bench q=8 n=1 ndocs=100000 group=true
loading...
  loaded 100000 docs in 4.4s
building index...
  built in 4.5s
timing group=true query x7...
  rows=100000
  min=5644.0ms  median=5786.6ms  max=6108.0ms
```

MAIN

```
 ./reduce_group_bench.py --ndocs 100000 --q 8 --reps 7
db=reduce_group_bench q=8 n=1 ndocs=100000 group=true
loading...
  loaded 100000 docs in 4.5s
building index...
  built in 4.3s
timing group=true query x7...
  rows=100000
  min=10910.8ms  median=11212.5ms  max=11627.4ms
```

[1] Reduce collector buffers rows from workers and keys them by the view key.
For each row it emits it 1) finds the next key 2) takes all the rows that
collate equal to it. This optimization will only improve reduce views

[2] https://gist.github.com/nickva/cc0a3fb296be965e014eec8c484aa8a5
@nickva nickva force-pushed the view-reduce-find-min-key branch from 96ef801 to c401c5d Compare June 21, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant