You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Follow-up to #109.#109 makes the row cap chart-type-aware but keeps the current order of operations in buildChartData() (src/core/chart-data.js:235-271): rows.slice(0, cap) happens first, and the per-category sums/groups maps are built only from that slice.
Reframed scope. ClickHouse is the aggregation engine here, not the browser — the chart only ever sees whatever rows the query already returned (bounded by the result's rowLimit, already flagged via the "first N (capped)" badge elsewhere in src/ui/results.js). JS-side grouping in buildChartData is a display convenience for results that aren't already pre-aggregated in SQL (e.g. a raw SELECT with a repeating category column and no GROUP BY) — it should never be built out to look authoritative (top-N-by-value ranking, "Other" buckets, etc.), because it fundamentally can't be: it's summing a browser-side window, not the dataset. The right fix is smaller than originally scoped:
Aggregate over everything already fetched, not a truncated prefix of it. This is free (no extra network round-trip, the rows are already in memory) and strictly more correct relative to what was fetched — currently the code discards rows it already has before summing, which can drop or short-total categories that appear after the cap.
Cap categories, not raw rows, for legibility — apply the type's row cap (Chart-type-aware row cap: chartRowCap(type) replaces flat CHART_ROW_CAP=500 #109's chartRowCap(type)) to the post-aggregation category count, in first-seen order (same "first N" semantics as today, just moved after the dedup/sum step). No value-ranking, no "Other" bucket — that would imply curation this feature doesn't and shouldn't claim to do.
Fix the truncation note so it doesn't overstate what's shown. Today it always says "first N of M rows" (src/ui/results.js:914-916), which is only accurate when the SQL already returns one row per aggregation cell.
Duplicate detection must match the actual grouping key, not just X. The aggregation cell is x when cfg.series == null, but (x, series) when cfg.series != null — those are two different group-bys (buildChartData's existing groups/sums branches already split on this). Detecting duplicates on X alone is wrong for the grouped case: a legitimate
SELECT month, region, sum(revenue) FROM t GROUP BY month, region
result has Jan/EU, Jan/US, Feb/EU, Feb/US — distinct(month) < rows.length, but every (month, region) cell is already unique; there's no browser-side summing happening at all, and no note should imply there is. So:
no series: duplicate check is on x alone.
with series: duplicate check is on the (x, series) pair.
Also don't frame the absence of duplicates as "SQL already pre-aggregated" — that inference isn't safe in general (e.g. a result could legitimately have one row per raw entity with no aggregation intent at all); just describe what the chart did, not what the SQL did or didn't do.
Note copy by case:
no duplicate cells → first N of M categories
duplicate x (no series) → first N of M categories; duplicate X rows summed in the browser
duplicate (x, series) (with series) → first N of M categories; duplicate X/series rows summed in the browser
Explicitly not doing: decimation/sampling for line/area (still #109's territory, or a later design pass — see the decimation-plugin note there), and no attempt to make client-side aggregation "correct" over the full backing table — that's what SQL is for.
Scope.
buildChartData(): aggregate over the full rows array (keyed by x or (x, series) per above), then slice the resulting category list to chartRowCap(cfg.type) (first-seen order). Return the meta block above.
src/ui/results.js: use meta.duplicateCellsSummed + meta.groupKey to pick the right note copy (three cases above) instead of the current flat "first N of M rows".
Tests in tests/unit/chart-data.test.js: categories beyond the cap come from post-aggregation order, not pre-aggregation row order; multi-row-per-cell sums are correct even when rows straddle the old cap boundary; duplicate detection is correct for both the plain-X and (x, series) cases (including the GROUP BY month, region example above, which must report duplicateCellsSummed: false). Tests in tests/unit/results.test.js: note text covers all three cases.
Acceptance.
Bar/column/pie aggregate over the full fetched result before capping to categories; no data is dropped/short-totaled due to summing only a row prefix.
No value-ranking or "Other" bucket — capping stays first-seen-category-order, matching today's "first N" semantics.
Duplicate detection is keyed on (x, series) when a series is set, not on x alone — a legitimate multi-series GROUP BY result must not be flagged as browser-side summing.
Truncation note has three cases (no duplicates / duplicate X / duplicate X+series) and never asserts what the SQL did or didn't do — only what the chart did.
Follow-up to #109. #109 makes the row cap chart-type-aware but keeps the current order of operations in
buildChartData()(src/core/chart-data.js:235-271):rows.slice(0, cap)happens first, and the per-categorysums/groupsmaps are built only from that slice.Reframed scope. ClickHouse is the aggregation engine here, not the browser — the chart only ever sees whatever rows the query already returned (bounded by the result's
rowLimit, already flagged via the "first N (capped)" badge elsewhere insrc/ui/results.js). JS-side grouping inbuildChartDatais a display convenience for results that aren't already pre-aggregated in SQL (e.g. a rawSELECTwith a repeating category column and noGROUP BY) — it should never be built out to look authoritative (top-N-by-value ranking, "Other" buckets, etc.), because it fundamentally can't be: it's summing a browser-side window, not the dataset. The right fix is smaller than originally scoped:chartRowCap(type)) to the post-aggregation category count, in first-seen order (same "first N" semantics as today, just moved after the dedup/sum step). No value-ranking, no "Other" bucket — that would imply curation this feature doesn't and shouldn't claim to do.src/ui/results.js:914-916), which is only accurate when the SQL already returns one row per aggregation cell.Duplicate detection must match the actual grouping key, not just X. The aggregation cell is
xwhencfg.series == null, but(x, series)whencfg.series != null— those are two different group-bys (buildChartData's existinggroups/sumsbranches already split on this). Detecting duplicates on X alone is wrong for the grouped case: a legitimateresult has
Jan/EU,Jan/US,Feb/EU,Feb/US—distinct(month) < rows.length, but every(month, region)cell is already unique; there's no browser-side summing happening at all, and no note should imply there is. So:series: duplicate check is onxalone.series: duplicate check is on the(x, series)pair.Also don't frame the absence of duplicates as "SQL already pre-aggregated" — that inference isn't safe in general (e.g. a result could legitimately have one row per raw entity with no aggregation intent at all); just describe what the chart did, not what the SQL did or didn't do.
Note copy by case:
first N of M categoriesx(no series) →first N of M categories; duplicate X rows summed in the browser(x, series)(with series) →first N of M categories; duplicate X/series rows summed in the browserImplementation shape.
buildChartData()returns metadata alongsidelabels/datasetssoresults.jsdoesn't recompute duplicate detection itself:Explicitly not doing: decimation/sampling for line/area (still #109's territory, or a later design pass — see the decimation-plugin note there), and no attempt to make client-side aggregation "correct" over the full backing table — that's what SQL is for.
Scope.
buildChartData(): aggregate over the fullrowsarray (keyed byxor(x, series)per above), then slice the resulting category list tochartRowCap(cfg.type)(first-seen order). Return themetablock above.src/ui/results.js: usemeta.duplicateCellsSummed+meta.groupKeyto pick the right note copy (three cases above) instead of the current flat "first N of M rows".tests/unit/chart-data.test.js: categories beyond the cap come from post-aggregation order, not pre-aggregation row order; multi-row-per-cell sums are correct even when rows straddle the old cap boundary; duplicate detection is correct for both the plain-X and(x, series)cases (including theGROUP BY month, regionexample above, which must reportduplicateCellsSummed: false). Tests intests/unit/results.test.js: note text covers all three cases.Acceptance.
(x, series)when a series is set, not onxalone — a legitimate multi-seriesGROUP BYresult must not be flagged as browser-side summing.npm testgreen at the per-file coverage gate.