Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ palette = "0.7"
# Utilities
regex = "1.10"
chrono = "0.4"
rand = "0.8"
const_format = "0.2"
uuid = { version = "1.0", features = ["v4"] }

Expand Down
7 changes: 6 additions & 1 deletion doc/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,12 @@ website:
href: syntax/clause/label.qmd
- section: Layers
contents:
- auto: syntax/layer/*
- section: Types
contents:
- auto: syntax/layer/type/*
- section: Position adjustment
contents:
- auto: syntax/layer/position/*
- section: Scales
contents:
- section: Types
Expand Down
5 changes: 5 additions & 0 deletions doc/styles.scss
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@ code {
font-variant-ligatures: none
}

// Add spacing below rendered plots so text doesn't crowd them
.cell-output-display {
margin-bottom: 1.5rem;
}

.hero-banner {
padding: 0;
margin: 0;
Expand Down
3 changes: 3 additions & 0 deletions doc/syntax/clause/draw.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,9 @@ The `SETTING` clause can be used for to different things:
* *Setting parameters*: Some layers take additional arguments that control how they behave. Often, but not always, these modify the statistical transformation in some way. An example would be the binwidth parameter in histogram which controls the width of each bin during histogram calculation. This is not a statistical property since it is not related to each record, but to the calculation as a whole.
* *Setting aesthetics*: If you wish to set a specific aesthetic to a literal value, e.g. 'red' (as in the color red) then you can do so in the `SETTING` clause. Aesthetics that are set will not go through a scale but will use the provided value as-is. You cannot set an aesthetic to a column, only to a scalar literal value.

#### Position
A special setting is `position` which controls how overlapping objects are repositioned to avoid overlapping etc. Position adjustments have special mapping requirements so all position adjustments will not be relevant for all layer types. Different layers have different defaults as detailed in their documentation. You can read about each different position adjustment at [their own documentation sites](../index.qmd#position-adjustments).

### `FILTER`
```ggsql
FILTER <condition>
Expand Down
39 changes: 23 additions & 16 deletions doc/syntax/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,27 +15,34 @@ ggsql augments the standard SQL syntax with a number of new clauses to describe
## Layers
There are many different layers to choose from when visualising your data. Some are straightforward translations of your data into visual marks such as a point layer, while others perform more or less complicated calculations like e.g. the histogram layer. A layer is selected by providing the layer name after the `DRAW` clause

- [`point`](layer/point.qmd) is used to create a scatterplot layer.
- [`line`](layer/line.qmd) is used to produce lineplots with the data sorted along the x axis.
- [`path`](layer/path.qmd) is like `line` above but does not sort the data but plot it according to its own order.
- [`segment`](layer/segment.qmd) connects two points with a line segment.
- [`linear`](layer/linear.qmd) draws a long line parameterised by a coefficient and intercept.
- [`rule`](layer/rule.qmd) draws horizontal and vertical reference lines.
- [`area`](layer/area.qmd) is used to display series as an area chart.
- [`ribbon`](layer/ribbon.qmd) is used to display series extrema.
- [`polygon`](layer/polygon.qmd) is used to display arbitrary shapes as polygons.
- [`bar`](layer/bar.qmd) creates a bar chart, optionally calculating y from the number of records in each bar.
- [`density`](layer/density.qmd) creates univariate kernel density estimates, showing the distribution of a variable.
- [`violin`](layer/violin.qmd) displays a rotated kernel density estimate.
- [`histogram`](layer/histogram.qmd) bins the data along the x axis and produces a bar for each bin showing the number of records in it.
- [`boxplot`](layer/boxplot.qmd) displays continuous variables as 5-number summaries.
- [`errorbar`](layer/errorbar.qmd) a line segment with hinges at the endpoints.
### Layer types
- [`point`](layer/type/point.qmd) is used to create a scatterplot layer.
- [`line`](layer/type/line.qmd) is used to produce lineplots with the data sorted along the x axis.
- [`path`](layer/type/path.qmd) is like `line` above but does not sort the data but plot it according to its own order.
- [`segment`](layer/type/segment.qmd) connects two points with a line segment.
- [`linear`](layer/type/linear.qmd) draws a long line parameterised by a coefficient and intercept.
- [`rule`](layer/type/rule.qmd) draws horizontal and vertical reference lines.
- [`area`](layer/type/area.qmd) is used to display series as an area chart.
- [`ribbon`](layer/type/ribbon.qmd) is used to display series extrema.
- [`polygon`](layer/type/polygon.qmd) is used to display arbitrary shapes as polygons.
- [`bar`](layer/type/bar.qmd) creates a bar chart, optionally calculating y from the number of records in each bar.
- [`density`](layer/type/density.qmd) creates univariate kernel density estimates, showing the distribution of a variable.
- [`violin`](layer/type/violin.qmd) displays a rotated kernel density estimate.
- [`histogram`](layer/type/histogram.qmd) bins the data along the x axis and produces a bar for each bin showing the number of records in it.
- [`boxplot`](layer/type/boxplot.qmd) displays continuous variables as 5-number summaries.
- [`errorbar`](layer/type/errorbar.qmd) a line segment with hinges at the endpoints.

### Position adjustments
- [`stack`](layer/position/stack.qmd) places objects with a shared baseline on top of each other.
- [`dodge`](layer/position/dodge.qmd) places objects that share the same discrete position side by side
- [`jitter`](layer/position/jitter.qmd) adds a small random offset to objects sharing the same discrete position
- [`identity`](layer/position/identity.qmd) does nothing, i.e. turns off position adjustment

## Scales
A scale is responsible for translating a data value to an aesthetic literal, e.g. a specific color for the fill aesthetic, or a radius in points for the size aesthetic. A scale is a combination of a specific aesthetic and a scale type

### Aesthetics
- [Position](scale/aesthetic/0_position.qmd) aesthetics are those aesthetics realted to the spatial location of the data in the coordinate system.
- [Position](scale/aesthetic/0_position.qmd) aesthetics are those aesthetics related to the spatial location of the data in the coordinate system.
- [Color](scale/aesthetic/1_color.qmd) aesthetics are related to the color of fill and stroke
- [`opacity`](scale/aesthetic/2_opacity.qmd) is the aesthetic that determines the opacity of the color
- [`linetype`](scale/aesthetic/linetype.qmd) governs the stroke pattern of strokes
Expand Down
46 changes: 46 additions & 0 deletions doc/syntax/layer/position/dodge.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: Dodge
---

> Positions are set within the [`DRAW` clause](../../clause/draw.qmd), using the `SETTING` subclause. Read the documentation for this clause for a thorough description of how to use it.

The dodge adjustment is intended to move entities that share the same position on a discrete scale side by side so they don't overlap. It is most often used for boxplots and violin plots, but can also be used in e.g. bar plots as an alternative to [stacking](stack.qmd).

## Position scale requirements
Dodge doesn't have specific requirements to the scale type of the plot, but will only affect discrete scales (including binned and ordinal). If only one scale is discrete, the dodging happens in that scale's direction. If both scales are discrete, the dodging happens as a 2D grid.

## Settings
Apart from the settings of the layer type, setting `position => 'dodge'` will allow these additional settings:

* `width`: The total width the dodging will occupy as a proportion of the space available on the scale. Defaults to 0.9 but any defaults from the layer will take precedence.

## Examples

Dodging is default in boxplots (and violin plots)

```{ggsql}
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
DRAW boxplot
```

Turning it off allows you to see the effect of it

```{ggsql}
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
DRAW boxplot SETTING position => 'identity'
```

Dodge can be used for bar plots as an alternative to the default stack

```{ggsql}
VISUALISE species AS x, island AS fill FROM ggsql:penguins
DRAW bar SETTING position => 'dodge'
```

Often `width` is part of the layer settings and gets used directly by the dodge position, but for layers with no inherent width setting dodge provides that setting as well

```{ggsql}
VISUALISE species AS x, bill_dep AS y, sex AS shape FROM ggsql:penguins
DRAW point SETTING position => 'dodge', width => 0.5
```

7 changes: 7 additions & 0 deletions doc/syntax/layer/position/identity.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
title: Identity
---

> Positions are set within the [`DRAW` clause](../../clause/draw.qmd), using the `SETTING` subclause. Read the documentation for this clause for a thorough description of how to use it.

The identity position is a position adjustment that does nothing, i.e. it leaves the data where it is. It is used to turn off any position adjustments for layers that defaults to non-identity position adjustments. It takes no arguments and has no requirements.
67 changes: 67 additions & 0 deletions doc/syntax/layer/position/jitter.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
title: Jitter
---

> Positions are set within the [`DRAW` clause](../../clause/draw.qmd), using the `SETTING` subclause. Read the documentation for this clause for a thorough description of how to use it.

Jitter adjustment adds a random offset to the data point to avoid overplotting on discrete axes. It is mainly used in conjunction with point layers.

## Position scale requirements
Jitter requires at least one axis to be discrete as it only jitters along discrete axes.

## Settings
Apart from the settings of the layer type, setting `position => 'jitter'` will allow these additional settings:

* `width`: The total width the jittering will occupy as a proportion of the space available on the scale. Defaults to 0.9
* `dodge`: Should dodging be applied before jittering. The dodging behavior follows the [dodge position](dodge.qmd) behavior? Default to `true`
* `distribution`: Which kind of distribution should the jittering follow? One of:
- `'uniform'` (default): Jittering is sampled from a uniform distribution between `-width/2` and `width/2`
- `'normal'`: Jittering is sampled from a normal distribution with σ as `width/4` resulting in 95% of the points falling inside the given width
- `'density'`: Jittering follows the density distribution within the group so that the jitter occupies the same area as an equivalent [violin plot](../type/violin.qmd) with density remapped to offset
- `'intensity'`: Jittering follows the intensity distribution within the group so that the jitter occupies the same area as an equivalent [violin plot](../type/violin.qmd) with intensity remapped to offset

If `distribution` is either `'density'` or `'intensity'` then one of the axes must be continuous
* `bandwidth`: A numerical value setting the smoothing bandwidth to use for the `'density'` and `'intensity'` distributions. If absent (default), the bandwidth will be computed using Silverman's rule of thumb.
* `adjust`: A numerical value as multiplier for the `bandwidth` setting, with 1 as default.

## Examples
When plotting points on a discrete axis they are all placed in the middle

```{ggsql}
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
DRAW point
```

Use jittering to better see the individual points

```{ggsql}
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
DRAW point
SETTING position => 'jitter'
```

By default, dodging is applied to separate the groups. Turn this off if you want the jitter to occupy the same space regardless of grouping

```{ggsql}
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
DRAW point
SETTING position => 'jitter', dodge => false
```

Use a `'density'` distribution to also indicate the distribution shape with the jitter

```{ggsql}
VISUALISE species AS x, bill_dep AS y FROM ggsql:penguins
DRAW point
SETTING position => 'jitter', distribution => 'density'
```

When both axes are discrete the dodging follows a grid

```{ggsql}
VISUALISE species AS x, sex AS y, body_mass AS fill FROM ggsql:penguins
DRAW point
SETTING position => 'jitter'
SCALE BINNED fill
SETTING breaks => 4, pretty => false
```
61 changes: 61 additions & 0 deletions doc/syntax/layer/position/stack.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: Stack
---

> Positions are set within the [`DRAW` clause](../../clause/draw.qmd), using the `SETTING` subclause. Read the documentation for this clause for a thorough description of how to use it.

The stack position adjustment works by stacking objects on top of each other. It makes the most sense for layer types where their height is the primary encoding (i.e. they naturally extend from 0). Stack is the default position for bar and area plots

## Position scale requirements
Stack requires a continuous scale with a range mapping (e.g. either `y` + `yend` or `ymin` + `ymax`) and requires all ranges to be positive with a baseline of zero. The axis that satisfies this will be used as the stacking direction

## Settings
Apart from the settings of the layer type, setting `position => 'stack'` will allow these additional settings:

* `center`: Should the full stack be centered around 0. Can be used in conjunction with area layers to create steamgraphs. Default to `false`
* `total`: Sets a total value to which each stack height is normalised. Setting this value leads to 'fill' behaviour. Defaults to `null` (no normalisation)

## Examples

Stack is the default for bar and area

```{ggsql}
VISUALISE Day AS x, Wind AS y FROM ggsql:airquality
DRAW area
MAPPING Month AS fill
FILTER Day <= 30
SCALE ORDINAL fill
```

Turn it off to see the effect (stacking is nonsensical for wind measurements)

```{ggsql}
VISUALISE Day AS x, Wind AS y FROM ggsql:airquality
DRAW area
MAPPING Month AS fill
SETTING position => 'identity'
FILTER Day <= 30
SCALE ORDINAL fill
```

Set `center => true` to create a steamgraph

```{ggsql}
VISUALISE Day AS x, Wind AS y FROM ggsql:airquality
DRAW area
MAPPING Month AS fill
SETTING center => true
FILTER Day <= 30
SCALE ORDINAL fill
```

Use `total` to see the percentage contribution from each group

```{ggsql}
VISUALISE Day AS x, Wind AS y FROM ggsql:airquality
DRAW area
MAPPING Month AS fill
SETTING total => 100
FILTER Day <= 30
SCALE ORDINAL fill
```
25 changes: 14 additions & 11 deletions doc/syntax/layer/area.qmd → doc/syntax/layer/type/area.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Area"
---

> Layers are declared with the [`DRAW` clause](../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.
> Layers are declared with the [`DRAW` clause](../../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.

The area layer is used to display absolute amounts over a sorted x-axis. It can be seen as a [ribbon layer](ribbon.qmd) where the `ymin` is anchored at zero.

Expand All @@ -21,10 +21,7 @@ The following aesthetics are recognised by the area layer.
* `linewidth`: The width of the contour lines.

## Settings
* `stacking`: Determines how multiple groups are displayed. One of the following:
* `'off'`: The groups `y`-values are displayed as-is (default).
* `'on'`: The `y`-values are stacked per `x` position, accumulating over groups.
* `'fill'`: Like `'on'` but displayed as a fraction of the total per `x` position.
* `position`: Determines the position adjustment to use for the layer (default is `'stack'`)

## Data transformation
The area layer does not transform its data but passes it through unchanged.
Expand Down Expand Up @@ -56,17 +53,23 @@ VISUALISE Date AS x, Value AS y FROM long_airquality
DRAW area MAPPING Series AS colour
```

We can stack the series by using `stacking => 'on'`. The line serves as a reference for 'unstacked' data.
By default the areas are stacked on top of each other. If you'd rather see all with a 0 baseline set the position to identity

```{ggsql}
VISUALISE Date AS x, Value AS y, Series AS colour FROM long_airquality
DRAW area SETTING stacking => 'on', opacity => 0.5
DRAW line
DRAW area SETTING position => 'identity', opacity => 0.5
```

When `stacking => 'fill'` we're plotting stacked proportions. These only make sense if every series is measured in the same absolute unit. (Wind and temperature have different units and the temperature is not absolute.)
When `position => 'stack_fill'` we're plotting stacked proportions. These only make sense if every series is measured in the same absolute unit. (Wind and temperature have different units and the temperature is not absolute.)

```{ggsql}
VISUALISE Date AS x, Value AS y, Series AS colour FROM long_airquality
DRAW area SETTING stacking => 'fill'
```
DRAW area SETTING position => 'fill'
```

An alternative is to center the stacks to create a steamgraph

```{ggsql}
VISUALISE Date AS x, Value AS y, Series AS colour FROM long_airquality
DRAW area SETTING position => 'stack', center => true
```
17 changes: 15 additions & 2 deletions doc/syntax/layer/bar.qmd → doc/syntax/layer/type/bar.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Bar"
---

> Layers are declared with the [`DRAW` clause](../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.
> Layers are declared with the [`DRAW` clause](../../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.

The bar layer is used to create bar plots. You can either specify the height of the bars directly or let the layer calculate it either as the count of records within the same group or as a weighted sum of the records.

Expand All @@ -23,7 +23,7 @@ The bar layer has no required aesthetics
* `linetype`: The type of stroke, i.e. the dashing pattern

## Settings

* `position`: Determines the position adjustment to use for the layer (default is `'stack'`)
* `width`: The width of the bars as a proportion of the available width

## Data transformation
Expand Down Expand Up @@ -68,6 +68,15 @@ DRAW bar
MAPPING species AS x, island AS fill
```

Or change the position setting to e.g. get a dodged bar chart

```{ggsql}
VISUALISE FROM ggsql:penguins
DRAW bar
MAPPING species AS x, sex AS fill
SETTING position => 'dodge'
```

Map to y if the dataset already contains the value you want to show

```{ggsql}
Expand All @@ -87,3 +96,7 @@ DRAW bar
SCALE BINNED x
SETTING breaks => 10
```

And use with a polar coordinate system to create a pie chart

**TBD**
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Boxplot"
---
> Layers are declared with the [`DRAW` clause](../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.
> Layers are declared with the [`DRAW` clause](../../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.

Boxplots display a summary of a continuous distribution. In the style of Tukey, it displays the median, two hinges and two whiskers as well as outlying points.

Expand All @@ -23,6 +23,7 @@ The following aesthetics are recognised by the boxplot layer.
* `shape` The shape of outlier points.

## Settings
* `position`: Determines the position adjustment to use for the layer (default is `'dodge'`)
* `outliers`: Whether to display outliers as points. Defaults to `true`.
* `coef`: A number indicating the length of the whiskers as a multiple of the interquartile range (IQR). Defaults to `1.5`.
* `width`: Relative width of the boxes. Defaults to `0.9`.
Expand Down
Loading