Skip to content

Commit a7bdc58

Browse files
author
miranov25
committed
bench: limit quick mode to 2k groups, add performance warnings
Changes: - Reduce 'quick' mode from 5k to 2k groups (robust is too slow) - Add warning comment about robust performance on small groups - Document that robust is 85-90× slower than v2 on small groups - Document that v4 is 17,000-40,000× faster than robust Findings from benchmarks: - Robust: ~26 s/1k groups (small groups) - v2: ~0.3 s/1k groups - v4: ~0.0001 s/1k groups Recommendation: Use v2/v4 for small groups (< 50 rows/group). Robust is designed for large groups with robust statistics needs. Note: Large numerical disagreement (0.57) observed on small groups. This requires investigation but is deferred post-restructuring.
1 parent 0898280 commit a7bdc58

File tree

4 files changed

+65
-1
lines changed

4 files changed

+65
-1
lines changed
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
date,host,commit,scenario,engine,n_groups,rows_per_group,duration_s,per_1k_s,speedup,notes
2+
2025-10-25T15:45:17.308904,Marians-MBP-3.fritz.box,unknown,Tiny (100×5),robust,100,5,2.680,26.797,1.00,
3+
2025-10-25T15:45:17.308911,Marians-MBP-3.fritz.box,unknown,Tiny (100×5),v2,100,5,0.778,7.777,3.45,
4+
2025-10-25T15:45:17.308916,Marians-MBP-3.fritz.box,unknown,Tiny (100×5),v4,100,5,0.684,6.840,3.92,
5+
2025-10-25T15:45:17.308919,Marians-MBP-3.fritz.box,unknown,Small (1k×5),robust,1000,5,25.852,25.852,1.00,
6+
2025-10-25T15:45:17.308923,Marians-MBP-3.fritz.box,unknown,Small (1k×5),v2,1000,5,0.290,0.290,89.06,
7+
2025-10-25T15:45:17.308927,Marians-MBP-3.fritz.box,unknown,Small (1k×5),v4,1000,5,0.001,0.001,17705.52,
8+
2025-10-25T15:45:17.308931,Marians-MBP-3.fritz.box,unknown,Medium (5k×5),robust,5000,5,126.362,25.272,1.00,
9+
2025-10-25T15:45:17.308934,Marians-MBP-3.fritz.box,unknown,Medium (5k×5),v2,5000,5,1.497,0.299,84.43,
10+
2025-10-25T15:45:17.308938,Marians-MBP-3.fritz.box,unknown,Medium (5k×5),v4,5000,5,0.003,0.001,41552.90,
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
GroupBy Regression: Engine Comparison
2+
======================================================================
3+
Date: 2025-10-25T15:45:17.308438
4+
Host: Marians-MBP-3.fritz.box
5+
6+
7+
Scenario: Tiny (100×5)
8+
Dataset: 100 groups × 5 rows
9+
----------------------------------------------------------------------
10+
robust : 2.68s ( 26.80s/1k) [100 groups]
11+
v2 : 0.78s ( 7.78s/1k) [100 groups]
12+
v4 : 0.68s ( 6.84s/1k) [100 groups]
13+
14+
Speedup vs robust:
15+
v2: 3.4×
16+
v4: 3.9×
17+
18+
Scenario: Small (1k×5)
19+
Dataset: 1000 groups × 5 rows
20+
----------------------------------------------------------------------
21+
robust : 25.85s ( 25.85s/1k) [1000 groups]
22+
v2 : 0.29s ( 0.29s/1k) [1000 groups]
23+
v4 : 0.00s ( 0.00s/1k) [1000 groups]
24+
25+
Speedup vs robust:
26+
v2: 89.1×
27+
v4: 17705.5×
28+
29+
Scenario: Medium (5k×5)
30+
Dataset: 5000 groups × 5 rows
31+
----------------------------------------------------------------------
32+
robust : 126.36s ( 25.27s/1k) [5000 groups]
33+
v2 : 1.50s ( 0.30s/1k) [5000 groups]
34+
v4 : 0.00s ( 0.00s/1k) [5000 groups]
35+
36+
Speedup vs robust:
37+
v2: 84.4×
38+
v4: 41552.9×
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
scenario,engine,n_groups,rows_per_group,duration_s,per_1k_s,n_groups_actual,speedup
2+
Tiny (100×5),robust,100,5,2.680,26.797,100,1.00
3+
Tiny (100×5),v2,100,5,0.778,7.777,100,3.45
4+
Tiny (100×5),v4,100,5,0.684,6.840,100,3.92
5+
Small (1k×5),robust,1000,5,25.852,25.852,1000,1.00
6+
Small (1k×5),v2,1000,5,0.290,0.290,1000,89.06
7+
Small (1k×5),v4,1000,5,0.001,0.001,1000,17705.52
8+
Medium (5k×5),robust,5000,5,126.362,25.272,5000,1.00
9+
Medium (5k×5),v2,5000,5,1.497,0.299,5000,84.43
10+
Medium (5k×5),v4,5000,5,0.003,0.001,5000,41552.90

UTILS/dfextensions/groupby_regression/benchmarks/bench_comparison.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,12 @@
1010
Usage:
1111
python bench_comparison.py
1212
python bench_comparison.py --scenarios all # More scenarios
13+
14+
# IMPORTANT: Robust implementation is very slow on small groups (< 50 rows/group)
15+
# Quick mode limited to 2k groups max to keep runtime reasonable.
16+
# For small group sizes, use optimized implementations (v2/v3/v4).
17+
# Robust is designed for large groups with robust statistics needs.
18+
1319
"""
1420

1521
import argparse
@@ -379,7 +385,7 @@ def main():
379385
scenarios = [
380386
("Tiny (100×5)", 100, 5),
381387
("Small (1k×5)", 1000, 5),
382-
("Medium (5k×5)", 5000, 5),
388+
("Medium (5k×5)", 2000, 5),
383389
]
384390
else: # all
385391
scenarios = [

0 commit comments

Comments
 (0)