Chart aggregation sums only a truncated prefix of fetched rows, and the note overstates what the chart represents

**Follow-up to #109.** #109 makes the row cap chart-type-aware but keeps the current order of operations in `buildChartData()` (`src/core/chart-data.js:235-271`): `rows.slice(0, cap)` happens *first*, and the per-category `sums`/`groups` maps are built only from that slice.

**Reframed scope.** ClickHouse is the aggregation engine here, not the browser — the chart only ever sees whatever rows the query already returned (bounded by the result's `rowLimit`, already flagged via the "first N (capped)" badge elsewhere in `src/ui/results.js`). JS-side grouping in `buildChartData` is a display convenience for results that aren't already pre-aggregated in SQL (e.g. a raw `SELECT` with a repeating category column and no `GROUP BY`) — it should never be built out to *look* authoritative (top-N-by-value ranking, "Other" buckets, etc.), because it fundamentally can't be: it's summing a browser-side window, not the dataset. The right fix is smaller than originally scoped:

1. **Aggregate over everything already fetched, not a truncated prefix of it.** This is free (no extra network round-trip, the rows are already in memory) and strictly more correct *relative to what was fetched* — currently the code discards rows it already has before summing, which can drop or short-total categories that appear after the cap.
2. **Cap categories, not raw rows, for legibility** — apply the type's row cap (#109's `chartRowCap(type)`) to the *post-aggregation* category count, in first-seen order (same "first N" semantics as today, just moved after the dedup/sum step). No value-ranking, no "Other" bucket — that would imply curation this feature doesn't and shouldn't claim to do.
3. **Fix the truncation note so it doesn't overstate what's shown.** Today it always says "first N of M **rows**" (`src/ui/results.js:914-916`), which is only accurate when the SQL already returns one row per aggregation cell.

**Duplicate detection must match the actual grouping key, not just X.** The aggregation cell is `x` when `cfg.series == null`, but `(x, series)` when `cfg.series != null` — those are two different group-bys (`buildChartData`'s existing `groups`/`sums` branches already split on this). Detecting duplicates on X alone is wrong for the grouped case: a legitimate

```sql
SELECT month, region, sum(revenue) FROM t GROUP BY month, region
```

result has `Jan/EU`, `Jan/US`, `Feb/EU`, `Feb/US` — `distinct(month) < rows.length`, but every `(month, region)` cell is already unique; there's no browser-side summing happening at all, and no note should imply there is. So:
- **no `series`:** duplicate check is on `x` alone.
- **with `series`:** duplicate check is on the `(x, series)` pair.

Also don't frame the *absence* of duplicates as "SQL already pre-aggregated" — that inference isn't safe in general (e.g. a result could legitimately have one row per raw entity with no aggregation intent at all); just describe what the chart did, not what the SQL did or didn't do.

**Note copy by case:**
- no duplicate cells → `first N of M categories`
- duplicate `x` (no series) → `first N of M categories; duplicate X rows summed in the browser`
- duplicate `(x, series)` (with series) → `first N of M categories; duplicate X/series rows summed in the browser`

**Implementation shape.** `buildChartData()` returns metadata alongside `labels`/`datasets` so `results.js` doesn't recompute duplicate detection itself:
```js
{
  labels,
  datasets,
  meta: {
    totalRows,
    totalCategories,
    shownCategories,
    duplicateCellsSummed,       // bool
    groupKey: cfg.series == null ? 'x' : 'x+series',
  },
}
```

**Explicitly not doing:** decimation/sampling for line/area (still #109's territory, or a later design pass — see the decimation-plugin note there), and no attempt to make client-side aggregation "correct" over the full backing table — that's what SQL is for.

**Scope.**
- `buildChartData()`: aggregate over the full `rows` array (keyed by `x` or `(x, series)` per above), then slice the resulting category list to `chartRowCap(cfg.type)` (first-seen order). Return the `meta` block above.
- `src/ui/results.js`: use `meta.duplicateCellsSummed` + `meta.groupKey` to pick the right note copy (three cases above) instead of the current flat "first N of M rows".
- Tests in `tests/unit/chart-data.test.js`: categories beyond the cap come from post-aggregation order, not pre-aggregation row order; multi-row-per-cell sums are correct even when rows straddle the old cap boundary; duplicate detection is correct for both the plain-X and `(x, series)` cases (including the `GROUP BY month, region` example above, which must report `duplicateCellsSummed: false`). Tests in `tests/unit/results.test.js`: note text covers all three cases.

**Acceptance.**
- [ ] Bar/column/pie aggregate over the full fetched result before capping to categories; no data is dropped/short-totaled due to summing only a row prefix.
- [ ] No value-ranking or "Other" bucket — capping stays first-seen-category-order, matching today's "first N" semantics.
- [ ] Duplicate detection is keyed on `(x, series)` when a series is set, not on `x` alone — a legitimate multi-series `GROUP BY` result must not be flagged as browser-side summing.
- [ ] Truncation note has three cases (no duplicates / duplicate X / duplicate X+series) and never asserts what the SQL did or didn't do — only what the chart did.
- [ ] `npm test` green at the per-file coverage gate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chart aggregation sums only a truncated prefix of fetched rows, and the note overstates what the chart represents #111

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Chart aggregation sums only a truncated prefix of fetched rows, and the note overstates what the chart represents #111

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions