diff --git a/src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx b/src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx new file mode 100644 index 00000000000..43008b9cbb0 --- /dev/null +++ b/src/content/changelog/r2-sql/2026-06-05-union-intersect-except-select-distinct.mdx @@ -0,0 +1,57 @@ +--- +title: R2 SQL now supports UNION, INTERSECT, EXCEPT, and SELECT DISTINCT +description: Combine query results with set operations and eliminate duplicates with SELECT DISTINCT. +products: + - r2-sql +date: 2026-06-05 +--- + +[R2 SQL](/r2-sql/) now supports set operations (`UNION`, `INTERSECT`, `EXCEPT`) and `SELECT DISTINCT`, expanding the range of analytical queries you can run directly on [Apache Iceberg](https://iceberg.apache.org/) tables in [R2 Data Catalog](/r2/data-catalog/). + +## Set operations + +Combine the results of multiple `SELECT` statements: + +- **`UNION`** — returns all rows from both queries, removing duplicates +- **`UNION ALL`** — returns all rows from both queries, including duplicates +- **`INTERSECT`** — returns only rows that appear in both queries +- **`EXCEPT`** — returns rows from the first query that do not appear in the second + +```sql +-- Find zones that had either firewall blocks OR high-risk requests +SELECT zone_id FROM my_namespace.firewall_events WHERE action = 'block' +UNION +SELECT zone_id FROM my_namespace.http_requests WHERE risk_score > 0.8 +``` + +```sql +-- Find zones with both firewall blocks AND high traffic +SELECT zone_id FROM my_namespace.firewall_events WHERE action = 'block' +INTERSECT +SELECT zone_id FROM my_namespace.http_requests +GROUP BY zone_id +HAVING COUNT(*) > 10000 +``` + +```sql +-- Find enterprise zones that have not been compacted +SELECT zone_id FROM my_namespace.zones WHERE plan = 'enterprise' +EXCEPT +SELECT zone_id FROM my_namespace.compaction_history +``` + +## SELECT DISTINCT + +Eliminate duplicate rows from query results: + +```sql +SELECT DISTINCT region, department +FROM my_namespace.sales_data +WHERE total_amount > 1000 +ORDER BY region, department +LIMIT 100 +``` + +For large datasets where approximate results are acceptable, `approx_distinct()` remains a faster alternative for counting unique values. + +For the full syntax reference, refer to the [SQL reference](/r2-sql/sql-reference/). For performance guidance, refer to [Limitations and best practices](/r2-sql/reference/limitations-best-practices/). diff --git a/src/content/docs/r2-sql/reference/limitations-best-practices.mdx b/src/content/docs/r2-sql/reference/limitations-best-practices.mdx index ca3746a34a6..d4c84eb959a 100644 --- a/src/content/docs/r2-sql/reference/limitations-best-practices.mdx +++ b/src/content/docs/r2-sql/reference/limitations-best-practices.mdx @@ -37,9 +37,9 @@ This page summarizes supported features, limitations, and best practices. | Derived tables (FROM subqueries) | Yes | Can be nested and joined. `LATERAL` derived tables not supported. | | Self-joins | Yes | Same table with different aliases | | Window functions (`OVER`) | No | | -| `SELECT DISTINCT` | No | Use `approx_distinct` | +| `SELECT DISTINCT` | Yes | | | `OFFSET` | No | | -| `UNION` / `INTERSECT` / `EXCEPT` | No | | +| `UNION` / `INTERSECT` / `EXCEPT` | Yes | `UNION ALL` also supported | | `INSERT` / `UPDATE` / `DELETE` | No | Read-only | | `CREATE` / `DROP` / `ALTER` | No | Read-only | @@ -51,9 +51,7 @@ For the full SQL syntax, refer to the [SQL reference](/r2-sql/sql-reference/). | Feature | Error | | :---------------------------------------------------------------------------- | :------------------------------------------------------- | -| `SELECT DISTINCT` | `unsupported feature: SELECT DISTINCT is not supported` | | `OFFSET` | `unsupported feature: OFFSET clause is not supported` | -| `UNION` / `INTERSECT` / `EXCEPT` | Set operations not supported | | Window functions (`OVER`) | `unsupported feature: window functions (OVER clause)` | | `INSERT` / `UPDATE` / `DELETE` | `only read-only queries are allowed` | | `CREATE` / `DROP` / `ALTER` | `only read-only queries are allowed` | diff --git a/src/content/docs/r2-sql/sql-reference/index.mdx b/src/content/docs/r2-sql/sql-reference/index.mdx index 70227b97d4d..2d9bb2ccc6a 100644 --- a/src/content/docs/r2-sql/sql-reference/index.mdx +++ b/src/content/docs/r2-sql/sql-reference/index.mdx @@ -76,7 +76,7 @@ DESCRIBE namespace_name.table_name; ### Syntax ```sql -SELECT column_specification [, column_specification, ...] +SELECT [DISTINCT] column_specification [, column_specification, ...] ``` ### Column specification @@ -87,6 +87,20 @@ SELECT column_specification [, column_specification, ...] - **Column alias**: `column_name AS alias` - **Expressions**: arithmetic, function calls, CASE expressions, and casts +### SELECT DISTINCT + +Use `DISTINCT` to eliminate duplicate rows from the result set: + +```sql +SELECT DISTINCT region, department +FROM my_namespace.sales_data +WHERE total_amount > 1000 +ORDER BY region, department +LIMIT 100 +``` + +For large datasets where approximate results are acceptable, `approx_distinct()` is a faster alternative for counting unique values. + ### Examples ```sql @@ -581,6 +595,64 @@ SELECT * FROM my_namespace.sales_data LIMIT 100 --- +## Set operations + +Set operations combine the results of two or more `SELECT` statements. + +### Syntax + +```sql +SELECT ... FROM table1 +UNION | UNION ALL | INTERSECT | EXCEPT +SELECT ... FROM table2 +``` + +### Supported operations + +| Operation | Description | +| :------------ | :--------------------------------------------------------------- | +| `UNION` | Returns all rows from both queries, removing duplicates | +| `UNION ALL` | Returns all rows from both queries, including duplicates | +| `INTERSECT` | Returns only rows that appear in both query results | +| `EXCEPT` | Returns rows from the first query that do not appear in the second | + +### Examples + +#### UNION + +```sql +-- Find zones that had either firewall blocks OR high-risk requests +SELECT zone_id FROM my_namespace.firewall_events WHERE action = 'block' +UNION +SELECT zone_id FROM my_namespace.http_requests WHERE risk_score > 0.8 +``` + +#### INTERSECT + +```sql +-- Find zones with both firewall blocks AND entries in the zones table +SELECT zone_id FROM my_namespace.firewall_events WHERE action = 'block' +INTERSECT +SELECT zone_id FROM my_namespace.zones WHERE plan = 'enterprise' +``` + +#### EXCEPT + +```sql +-- Find enterprise zones that have no firewall events +SELECT zone_id FROM my_namespace.zones WHERE plan = 'enterprise' +EXCEPT +SELECT zone_id FROM my_namespace.firewall_events +``` + +### Requirements + +- All queries in a set operation must return the same number of columns. +- Corresponding columns must have compatible data types. +- Column names in the result are taken from the first query. + +--- + ## EXPLAIN Returns the execution plan for a query without running it.