Skip to content

Chart aggregation sums only a truncated prefix of fetched rows, and the note overstates what the chart represents #111

Description

@BorisTyshkevich

Follow-up to #109. #109 makes the row cap chart-type-aware but keeps the current order of operations in buildChartData() (src/core/chart-data.js:235-271): rows.slice(0, cap) happens first, and the per-category sums/groups maps are built only from that slice.

Reframed scope. ClickHouse is the aggregation engine here, not the browser — the chart only ever sees whatever rows the query already returned (bounded by the result's rowLimit, already flagged via the "first N (capped)" badge elsewhere in src/ui/results.js). JS-side grouping in buildChartData is a display convenience for results that aren't already pre-aggregated in SQL (e.g. a raw SELECT with a repeating category column and no GROUP BY) — it should never be built out to look authoritative (top-N-by-value ranking, "Other" buckets, etc.), because it fundamentally can't be: it's summing a browser-side window, not the dataset. The right fix is smaller than originally scoped:

  1. Aggregate over everything already fetched, not a truncated prefix of it. This is free (no extra network round-trip, the rows are already in memory) and strictly more correct relative to what was fetched — currently the code discards rows it already has before summing, which can drop or short-total categories that appear after the cap.
  2. Cap categories, not raw rows, for legibility — apply the type's row cap (Chart-type-aware row cap: chartRowCap(type) replaces flat CHART_ROW_CAP=500 #109's chartRowCap(type)) to the post-aggregation category count, in first-seen order (same "first N" semantics as today, just moved after the dedup/sum step). No value-ranking, no "Other" bucket — that would imply curation this feature doesn't and shouldn't claim to do.
  3. Fix the truncation note so it doesn't overstate what's shown. Today it always says "first N of M rows" (src/ui/results.js:914-916), which is only accurate when the SQL already returns one row per aggregation cell.

Duplicate detection must match the actual grouping key, not just X. The aggregation cell is x when cfg.series == null, but (x, series) when cfg.series != null — those are two different group-bys (buildChartData's existing groups/sums branches already split on this). Detecting duplicates on X alone is wrong for the grouped case: a legitimate

SELECT month, region, sum(revenue) FROM t GROUP BY month, region

result has Jan/EU, Jan/US, Feb/EU, Feb/USdistinct(month) < rows.length, but every (month, region) cell is already unique; there's no browser-side summing happening at all, and no note should imply there is. So:

  • no series: duplicate check is on x alone.
  • with series: duplicate check is on the (x, series) pair.

Also don't frame the absence of duplicates as "SQL already pre-aggregated" — that inference isn't safe in general (e.g. a result could legitimately have one row per raw entity with no aggregation intent at all); just describe what the chart did, not what the SQL did or didn't do.

Note copy by case:

  • no duplicate cells → first N of M categories
  • duplicate x (no series) → first N of M categories; duplicate X rows summed in the browser
  • duplicate (x, series) (with series) → first N of M categories; duplicate X/series rows summed in the browser

Implementation shape. buildChartData() returns metadata alongside labels/datasets so results.js doesn't recompute duplicate detection itself:

{
  labels,
  datasets,
  meta: {
    totalRows,
    totalCategories,
    shownCategories,
    duplicateCellsSummed,       // bool
    groupKey: cfg.series == null ? 'x' : 'x+series',
  },
}

Explicitly not doing: decimation/sampling for line/area (still #109's territory, or a later design pass — see the decimation-plugin note there), and no attempt to make client-side aggregation "correct" over the full backing table — that's what SQL is for.

Scope.

  • buildChartData(): aggregate over the full rows array (keyed by x or (x, series) per above), then slice the resulting category list to chartRowCap(cfg.type) (first-seen order). Return the meta block above.
  • src/ui/results.js: use meta.duplicateCellsSummed + meta.groupKey to pick the right note copy (three cases above) instead of the current flat "first N of M rows".
  • Tests in tests/unit/chart-data.test.js: categories beyond the cap come from post-aggregation order, not pre-aggregation row order; multi-row-per-cell sums are correct even when rows straddle the old cap boundary; duplicate detection is correct for both the plain-X and (x, series) cases (including the GROUP BY month, region example above, which must report duplicateCellsSummed: false). Tests in tests/unit/results.test.js: note text covers all three cases.

Acceptance.

  • Bar/column/pie aggregate over the full fetched result before capping to categories; no data is dropped/short-totaled due to summing only a row prefix.
  • No value-ranking or "Other" bucket — capping stays first-seen-category-order, matching today's "first N" semantics.
  • Duplicate detection is keyed on (x, series) when a series is set, not on x alone — a legitimate multi-series GROUP BY result must not be flagged as browser-side summing.
  • Truncation note has three cases (no duplicates / duplicate X / duplicate X+series) and never asserts what the SQL did or didn't do — only what the chart did.
  • npm test green at the per-file coverage gate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions