Skip to content

Add aggregate functionality for base layers#384

Open
thomasp85 wants to merge 22 commits intomainfrom
issue160-aggregate
Open

Add aggregate functionality for base layers#384
thomasp85 wants to merge 22 commits intomainfrom
issue160-aggregate

Conversation

@thomasp85
Copy link
Copy Markdown
Collaborator

@thomasp85 thomasp85 commented Apr 27, 2026

Fix #160

  • basic infrastructure
  • ensure aggregation happens with numeric pos1
  • special case for range layers
  • Consider segment
  • update documentation
  • What about non-positional aggregation

Comment thread src/plot/layer/geom/stat_aggregate.rs Outdated
Comment thread src/plot/layer/geom/stat_aggregate.rs
@thomasp85 thomasp85 marked this pull request as ready for review May 4, 2026 19:00
Copy link
Copy Markdown
Collaborator

@teunbrand teunbrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited for this! Don't let the nitpicky comments convince you otherwise

Comment thread doc/syntax/clause/draw.qmd Outdated
Comment thread doc/syntax/clause/draw.qmd Outdated
Comment thread doc/syntax/clause/draw.qmd Outdated
Comment on lines +100 to +114
The simple functions are:

* `'count'`: Non-null tally of the bound column.
* `'sum'` and `'prod'`: The sum or product
* `'min'`, `'max'`, and `'range'`: Extremes and max - min
* `'mean'`, and `'median'`: Central tendency
* `'geomean'`, `'harmean'`, and `'rms'`: Geometric, harmonic, and root-mean-square
* `'sdev'`, `'var'`, `'iqr'`, and `'se'`: Standard deviation, variance, interquartile range, and standard error
* `'p05'`, `'p10'`, `'p25'`, `'p50'`, `'p75'`, `'p90'`, and `'p95'`: Percentiles

For band functions you combine an offset with an expansion, potentially multiplied. An example could be `'mean-1.96sdev'` which does exactly what you'd expect it to be. The general form is `<offset>±<multiplier><expansion>` with `<multiplier>` being optional (defaults to `1`).

Allowed offsets are: `'mean'`, `'median'`, `'geomean'`, `'harmean'`, `'rms'`, `'sum'`, `'prod'`, `'min'`, `'max'`, and `'p05'`–`'p95'`

Allowed expansions are: `'sdev'`, `'se'`, `'var'`, `'iqr'`, and `'range'`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should come before the explanation of how >1 aggregations interact.

Comment thread doc/syntax/layer/type/area.qmd Outdated

#[derive(Debug, Clone, PartialEq)]
pub struct Band {
pub sign: char,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why sign can't be folded into the mod_value?
mean-sdev would then just resolve to `mod_value = -1.

Comment thread src/plot/layer/geom/stat_aggregate.rs Outdated
Comment on lines +437 to +439
"sdev" => format!("STDDEV_POP({})", qcol),
"se" => format!("(STDDEV_POP({c}) / SQRT(COUNT({c})))", c = qcol),
"var" => format!("VAR_POP({})", qcol),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these might be duckdb dialect specific

Comment on lines +711 to +721
"Warning: aggregate dropped numeric mapping for aesthetic '{}' (no applicable default and no targeted function)",
aesthetic_ctx.map_internal_to_user(d)
);
}
return Ok(StatResult::Identity);
}

for d in &dropped {
eprintln!(
"Warning: aggregate dropped numeric mapping for aesthetic '{}' (no applicable default and no targeted function)",
aesthetic_ctx.map_internal_to_user(d)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is me being pedantic and not really a priority.
These warnings messages should probably hint at what the user should be doing to mitigate the problem. I think the solution it is to include an aggregation for the aesthetic (e.g. 'color:mean'), but I'll let you preside over that.
Also we could probably collapse dropped into a comma-separated list for a single warning, but this is not really necessary.

Comment on lines +726 to +736
let mut group_cols: Vec<String> = Vec::new();
for g in group_by {
if !group_cols.contains(g) {
group_cols.push(g.clone());
}
}
for c in &kept_cols {
if !group_cols.contains(c) {
group_cols.push(c.clone());
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the group_by variable already contain PARTITION BY + discrete columns? If so, we'd only need to add the domain aesthetics.

format!("'{}'", s.replace('\'', "''"))
}

#[cfg(test)]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test that build_group_by_query and build_group_by_query produce the right shape of query, or is that already included implicitly in other tests?

Comment thread src/plot/layer/geom/types.rs Outdated
Comment thread doc/syntax/clause/draw.qmd Outdated
Comment thread doc/syntax/clause/draw.qmd Outdated
thomasp85 and others added 2 commits May 6, 2026 13:36
Co-authored-by: Thomas Lin Pedersen <thomasp85@gmail.com>
Co-authored-by: Teun van den Brand <49372158+teunbrand@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Including summary metrics

2 participants