From 892eb5de4d14e38745d549d0b76a71e272e94135 Mon Sep 17 00:00:00 2001 From: Yevgen Safronov Date: Fri, 5 Jun 2026 11:31:25 +0100 Subject: [PATCH 1/7] [R2 SQL] Add support for UNION, INTERSECT, EXCEPT, and SELECT DISTINCT --- ...union-intersect-except-select-distinct.mdx | 57 ++++++++++++++ .../reference/limitations-best-practices.mdx | 6 +- .../docs/r2-sql/sql-reference/index.mdx | 74 ++++++++++++++++++- 3 files changed, 132 insertions(+), 5 deletions(-) create mode 100644 src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx diff --git a/src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx b/src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx new file mode 100644 index 00000000000..43008b9cbb0 --- /dev/null +++ b/src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx @@ -0,0 +1,57 @@ +--- +title: R2 SQL now supports UNION, INTERSECT, EXCEPT, and SELECT DISTINCT +description: Combine query results with set operations and eliminate duplicates with SELECT DISTINCT. +products: + - r2-sql +date: 2026-06-05 +--- + +[R2 SQL](/r2-sql/) now supports set operations (`UNION`, `INTERSECT`, `EXCEPT`) and `SELECT DISTINCT`, expanding the range of analytical queries you can run directly on [Apache Iceberg](https://iceberg.apache.org/) tables in [R2 Data Catalog](/r2/data-catalog/). + +## Set operations + +Combine the results of multiple `SELECT` statements: + +- **`UNION`** — returns all rows from both queries, removing duplicates +- **`UNION ALL`** — returns all rows from both queries, including duplicates +- **`INTERSECT`** — returns only rows that appear in both queries +- **`EXCEPT`** — returns rows from the first query that do not appear in the second + +```sql +-- Find zones that had either firewall blocks OR high-risk requests +SELECT zone_id FROM my_namespace.firewall_events WHERE action = 'block' +UNION +SELECT zone_id FROM my_namespace.http_requests WHERE risk_score > 0.8 +``` + +```sql +-- Find zones with both firewall blocks AND high traffic +SELECT zone_id FROM my_namespace.firewall_events WHERE action = 'block' +INTERSECT +SELECT zone_id FROM my_namespace.http_requests +GROUP BY zone_id +HAVING COUNT(*) > 10000 +``` + +```sql +-- Find enterprise zones that have not been compacted +SELECT zone_id FROM my_namespace.zones WHERE plan = 'enterprise' +EXCEPT +SELECT zone_id FROM my_namespace.compaction_history +``` + +## SELECT DISTINCT + +Eliminate duplicate rows from query results: + +```sql +SELECT DISTINCT region, department +FROM my_namespace.sales_data +WHERE total_amount > 1000 +ORDER BY region, department +LIMIT 100 +``` + +For large datasets where approximate results are acceptable, `approx_distinct()` remains a faster alternative for counting unique values. + +For the full syntax reference, refer to the [SQL reference](/r2-sql/sql-reference/). For performance guidance, refer to [Limitations and best practices](/r2-sql/reference/limitations-best-practices/). diff --git a/src/content/docs/r2-sql/reference/limitations-best-practices.mdx b/src/content/docs/r2-sql/reference/limitations-best-practices.mdx index ca3746a34a6..d4c84eb959a 100644 --- a/src/content/docs/r2-sql/reference/limitations-best-practices.mdx +++ b/src/content/docs/r2-sql/reference/limitations-best-practices.mdx @@ -37,9 +37,9 @@ This page summarizes supported features, limitations, and best practices. | Derived tables (FROM subqueries) | Yes | Can be nested and joined. `LATERAL` derived tables not supported. | | Self-joins | Yes | Same table with different aliases | | Window functions (`OVER`) | No | | -| `SELECT DISTINCT` | No | Use `approx_distinct` | +| `SELECT DISTINCT` | Yes | | | `OFFSET` | No | | -| `UNION` / `INTERSECT` / `EXCEPT` | No | | +| `UNION` / `INTERSECT` / `EXCEPT` | Yes | `UNION ALL` also supported | | `INSERT` / `UPDATE` / `DELETE` | No | Read-only | | `CREATE` / `DROP` / `ALTER` | No | Read-only | @@ -51,9 +51,7 @@ For the full SQL syntax, refer to the [SQL reference](/r2-sql/sql-reference/). | Feature | Error | | :---------------------------------------------------------------------------- | :------------------------------------------------------- | -| `SELECT DISTINCT` | `unsupported feature: SELECT DISTINCT is not supported` | | `OFFSET` | `unsupported feature: OFFSET clause is not supported` | -| `UNION` / `INTERSECT` / `EXCEPT` | Set operations not supported | | Window functions (`OVER`) | `unsupported feature: window functions (OVER clause)` | | `INSERT` / `UPDATE` / `DELETE` | `only read-only queries are allowed` | | `CREATE` / `DROP` / `ALTER` | `only read-only queries are allowed` | diff --git a/src/content/docs/r2-sql/sql-reference/index.mdx b/src/content/docs/r2-sql/sql-reference/index.mdx index 70227b97d4d..2d9bb2ccc6a 100644 --- a/src/content/docs/r2-sql/sql-reference/index.mdx +++ b/src/content/docs/r2-sql/sql-reference/index.mdx @@ -76,7 +76,7 @@ DESCRIBE namespace_name.table_name; ### Syntax ```sql -SELECT column_specification [, column_specification, ...] +SELECT [DISTINCT] column_specification [, column_specification, ...] ``` ### Column specification @@ -87,6 +87,20 @@ SELECT column_specification [, column_specification, ...] - **Column alias**: `column_name AS alias` - **Expressions**: arithmetic, function calls, CASE expressions, and casts +### SELECT DISTINCT + +Use `DISTINCT` to eliminate duplicate rows from the result set: + +```sql +SELECT DISTINCT region, department +FROM my_namespace.sales_data +WHERE total_amount > 1000 +ORDER BY region, department +LIMIT 100 +``` + +For large datasets where approximate results are acceptable, `approx_distinct()` is a faster alternative for counting unique values. + ### Examples ```sql @@ -581,6 +595,64 @@ SELECT * FROM my_namespace.sales_data LIMIT 100 --- +## Set operations + +Set operations combine the results of two or more `SELECT` statements. + +### Syntax + +```sql +SELECT ... FROM table1 +UNION | UNION ALL | INTERSECT | EXCEPT +SELECT ... FROM table2 +``` + +### Supported operations + +| Operation | Description | +| :------------ | :--------------------------------------------------------------- | +| `UNION` | Returns all rows from both queries, removing duplicates | +| `UNION ALL` | Returns all rows from both queries, including duplicates | +| `INTERSECT` | Returns only rows that appear in both query results | +| `EXCEPT` | Returns rows from the first query that do not appear in the second | + +### Examples + +#### UNION + +```sql +-- Find zones that had either firewall blocks OR high-risk requests +SELECT zone_id FROM my_namespace.firewall_events WHERE action = 'block' +UNION +SELECT zone_id FROM my_namespace.http_requests WHERE risk_score > 0.8 +``` + +#### INTERSECT + +```sql +-- Find zones with both firewall blocks AND entries in the zones table +SELECT zone_id FROM my_namespace.firewall_events WHERE action = 'block' +INTERSECT +SELECT zone_id FROM my_namespace.zones WHERE plan = 'enterprise' +``` + +#### EXCEPT + +```sql +-- Find enterprise zones that have no firewall events +SELECT zone_id FROM my_namespace.zones WHERE plan = 'enterprise' +EXCEPT +SELECT zone_id FROM my_namespace.firewall_events +``` + +### Requirements + +- All queries in a set operation must return the same number of columns. +- Corresponding columns must have compatible data types. +- Column names in the result are taken from the first query. + +--- + ## EXPLAIN Returns the execution plan for a query without running it. From d05e30c5aed59b7a1dd3ec1cdb4a8e5fcdd2b642 Mon Sep 17 00:00:00 2001 From: Yevgen Safronov Date: Mon, 8 Jun 2026 10:27:39 +0100 Subject: [PATCH 2/7] Apply suggestion from @sejoker --- .../2026-06-05-union-intersect-except-select-distinct.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx b/src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx index 43008b9cbb0..4ffc0c2d9e6 100644 --- a/src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx +++ b/src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx @@ -3,7 +3,7 @@ title: R2 SQL now supports UNION, INTERSECT, EXCEPT, and SELECT DISTINCT description: Combine query results with set operations and eliminate duplicates with SELECT DISTINCT. products: - r2-sql -date: 2026-06-05 +date: 2026-06-08 --- [R2 SQL](/r2-sql/) now supports set operations (`UNION`, `INTERSECT`, `EXCEPT`) and `SELECT DISTINCT`, expanding the range of analytical queries you can run directly on [Apache Iceberg](https://iceberg.apache.org/) tables in [R2 Data Catalog](/r2/data-catalog/). From e806328f63bf9d467e4dfcf601d5e89c2ab28162 Mon Sep 17 00:00:00 2001 From: Yevgen Safronov Date: Mon, 8 Jun 2026 10:28:26 +0100 Subject: [PATCH 3/7] Update src/content/docs/r2-sql/sql-reference/index.mdx Co-authored-by: ask-bonk[bot] <249159057+ask-bonk[bot]@users.noreply.github.com> --- src/content/docs/r2-sql/sql-reference/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/docs/r2-sql/sql-reference/index.mdx b/src/content/docs/r2-sql/sql-reference/index.mdx index 2d9bb2ccc6a..0addf61b200 100644 --- a/src/content/docs/r2-sql/sql-reference/index.mdx +++ b/src/content/docs/r2-sql/sql-reference/index.mdx @@ -87,7 +87,7 @@ SELECT [DISTINCT] column_specification [, column_specification, ...] - **Column alias**: `column_name AS alias` - **Expressions**: arithmetic, function calls, CASE expressions, and casts -### SELECT DISTINCT +### Select distinct Use `DISTINCT` to eliminate duplicate rows from the result set: From b78d1cbddb777c305a7f7b1d444b31e3f764671e Mon Sep 17 00:00:00 2001 From: Yevgen Safronov Date: Mon, 8 Jun 2026 10:28:59 +0100 Subject: [PATCH 4/7] Update src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx Co-authored-by: ask-bonk[bot] <249159057+ask-bonk[bot]@users.noreply.github.com> --- .../2026-06-05-union-intersect-except-select-distinct.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx b/src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx index 4ffc0c2d9e6..b7678894582 100644 --- a/src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx +++ b/src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx @@ -40,7 +40,7 @@ EXCEPT SELECT zone_id FROM my_namespace.compaction_history ``` -## SELECT DISTINCT +## Select distinct Eliminate duplicate rows from query results: From 5cda9aa16fa73094586ed4de363e5d1dc3645331 Mon Sep 17 00:00:00 2001 From: Yevgen Safronov Date: Mon, 8 Jun 2026 10:47:00 +0100 Subject: [PATCH 5/7] Update src/content/docs/r2-sql/sql-reference/index.mdx Co-authored-by: ask-bonk[bot] <249159057+ask-bonk[bot]@users.noreply.github.com> --- src/content/docs/r2-sql/sql-reference/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/docs/r2-sql/sql-reference/index.mdx b/src/content/docs/r2-sql/sql-reference/index.mdx index 0addf61b200..fb63c90cd39 100644 --- a/src/content/docs/r2-sql/sql-reference/index.mdx +++ b/src/content/docs/r2-sql/sql-reference/index.mdx @@ -618,7 +618,7 @@ SELECT ... FROM table2 ### Examples -#### UNION +#### Union ```sql -- Find zones that had either firewall blocks OR high-risk requests From 8b22f3c51068380b32438dcbd222e4899fa3a8f1 Mon Sep 17 00:00:00 2001 From: Yevgen Safronov Date: Mon, 8 Jun 2026 10:47:10 +0100 Subject: [PATCH 6/7] Update src/content/docs/r2-sql/sql-reference/index.mdx Co-authored-by: ask-bonk[bot] <249159057+ask-bonk[bot]@users.noreply.github.com> --- src/content/docs/r2-sql/sql-reference/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/docs/r2-sql/sql-reference/index.mdx b/src/content/docs/r2-sql/sql-reference/index.mdx index fb63c90cd39..5851d16ad14 100644 --- a/src/content/docs/r2-sql/sql-reference/index.mdx +++ b/src/content/docs/r2-sql/sql-reference/index.mdx @@ -627,7 +627,7 @@ UNION SELECT zone_id FROM my_namespace.http_requests WHERE risk_score > 0.8 ``` -#### INTERSECT +#### Intersect ```sql -- Find zones with both firewall blocks AND entries in the zones table From 9fbb4af60628d161a990feb57f9bd08a437d064e Mon Sep 17 00:00:00 2001 From: Yevgen Safronov Date: Mon, 8 Jun 2026 10:47:20 +0100 Subject: [PATCH 7/7] Update src/content/docs/r2-sql/sql-reference/index.mdx Co-authored-by: ask-bonk[bot] <249159057+ask-bonk[bot]@users.noreply.github.com> --- src/content/docs/r2-sql/sql-reference/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/docs/r2-sql/sql-reference/index.mdx b/src/content/docs/r2-sql/sql-reference/index.mdx index 5851d16ad14..47599cc58b0 100644 --- a/src/content/docs/r2-sql/sql-reference/index.mdx +++ b/src/content/docs/r2-sql/sql-reference/index.mdx @@ -636,7 +636,7 @@ INTERSECT SELECT zone_id FROM my_namespace.zones WHERE plan = 'enterprise' ``` -#### EXCEPT +#### Except ```sql -- Find enterprise zones that have no firewall events