DataFusion provides a flexible extension system that allows you to customize SQL parsing and planning without modifying the core codebase. This is useful when you need to:
- Support custom operators from other SQL dialects (e.g., PostgreSQL's
->for JSON) - Add custom data types not natively supported
- Implement SQL constructs like
TABLESAMPLE,PIVOT/UNPIVOT, orMATCH_RECOGNIZE
You can read more about this topic in the Extending SQL in DataFusion: from ->> to TABLESAMPLE blog.
When DataFusion processes a SQL query, it goes through these stages:
┌─────────────┐ ┌─────────┐ ┌──────────────────────┐ ┌─────────────┐
│ SQL String │───▶│ Parser │───▶│ SqlToRel │───▶│ LogicalPlan │
└─────────────┘ └─────────┘ │ (SQL to LogicalPlan) │ └─────────────┘
└──────────────────────┘
│
│ uses
▼
┌───────────────────────┐
│ Extension Planners │
│ • ExprPlanner │
│ • TypePlanner │
│ • RelationPlanner │
└───────────────────────┘
The extension planners intercept specific parts of the SQL AST during the
SqlToRel phase and allow you to customize how they are converted to DataFusion's
logical plan.
DataFusion provides three planner traits for extending SQL:
| Trait | Purpose | Registration Method |
|---|---|---|
ExprPlanner |
Custom expressions and operators | ctx.register_expr_planner() |
TypePlanner |
Custom SQL data types | SessionStateBuilder::with_type_planner() |
RelationPlanner |
Custom FROM clause elements (relations) | ctx.register_relation_planner() |
Planner Precedence: Multiple ExprPlanners and RelationPlanners can be
registered; they are invoked in reverse registration order (last registered wins).
Return Original(...) to delegate to the next planner. Only one TypePlanner
can be active at a time.
Use ExprPlanner to customize how SQL expressions are converted to DataFusion
logical expressions. This is useful for:
- Custom binary operators (e.g.,
->,->>,@>,?) - Custom field access patterns
- Custom aggregate or window function handling
| Category | Methods |
|---|---|
| Operators | plan_binary_op, plan_any |
| Literals | plan_array_literal, plan_dictionary_literal, plan_struct_literal |
| Functions | plan_extract, plan_substring, plan_overlay, plan_position, plan_make_map |
| Identifiers | plan_field_access, plan_compound_identifier |
| Aggregates/Windows | plan_aggregate, plan_window |
See the ExprPlanner API documentation for full method signatures.
This example maps the -> operator to string concatenation:
# use std::sync::Arc;
# use datafusion::common::DFSchema;
# use datafusion::error::Result;
# use datafusion::logical_expr::Operator;
# use datafusion::prelude::*;
# use datafusion::sql::sqlparser::ast::BinaryOperator;
use datafusion_expr::planner::{ExprPlanner, PlannerResult, RawBinaryExpr};
# use datafusion_expr::BinaryExpr;
#[derive(Debug)]
struct MyCustomPlanner;
impl ExprPlanner for MyCustomPlanner {
fn plan_binary_op(
&self,
expr: RawBinaryExpr,
_schema: &DFSchema,
) -> Result<PlannerResult<RawBinaryExpr>> {
match &expr.op {
// Map `->` to string concatenation
BinaryOperator::Arrow => {
Ok(PlannerResult::Planned(Expr::BinaryExpr(BinaryExpr {
left: Box::new(expr.left.clone()),
right: Box::new(expr.right.clone()),
op: Operator::StringConcat,
})))
}
_ => Ok(PlannerResult::Original(expr)),
}
}
}
#[tokio::main]
async fn main() -> Result<()> {
// Use postgres dialect to enable `->` operator parsing
let config = SessionConfig::new()
.set_str("datafusion.sql_parser.dialect", "postgres");
let mut ctx = SessionContext::new_with_config(config);
// Register the custom planner
ctx.register_expr_planner(Arc::new(MyCustomPlanner))?;
// Now `->` works as string concatenation
let results = ctx.sql("SELECT 'hello'->'world'").await?.collect().await?;
// Returns: "helloworld"
Ok(())
}For more details, see the ExprPlanner API documentation and the expr_planner test examples.
Use TypePlanner to map SQL data types to Arrow/DataFusion types. This is useful
when you need to support SQL types that aren't natively recognized.
# use std::sync::Arc;
# use arrow::datatypes::{DataType, FieldRef, TimeUnit};
# use datafusion::error::Result;
# use datafusion::prelude::*;
# use datafusion::execution::SessionStateBuilder;
use datafusion_expr::planner::TypePlanner;
# use sqlparser::ast;
#[derive(Debug)]
struct MyTypePlanner;
impl TypePlanner for MyTypePlanner {
fn plan_type_field(&self, sql_type: &ast::DataType) -> Result<Option<FieldRef>> {
match sql_type {
// Map DATETIME(precision) to Arrow Timestamp
ast::DataType::Datetime(precision) => {
let time_unit = match precision {
Some(0) => TimeUnit::Second,
Some(3) => TimeUnit::Millisecond,
Some(6) => TimeUnit::Microsecond,
None | Some(9) => TimeUnit::Nanosecond,
_ => return Ok(None), // Let default handling take over
};
Ok(Some(
DataType::Timestamp(time_unit, None).into_nullable_field_ref()
))
}
_ => Ok(None), // Return None for types we don't handle
}
}
}
#[tokio::main]
async fn main() -> Result<()> {
let state = SessionStateBuilder::new()
.with_default_features()
.with_type_planner(Arc::new(MyTypePlanner))
.build();
let ctx = SessionContext::new_with_state(state);
// Now DATETIME type is recognized
ctx.sql("CREATE TABLE events (ts DATETIME(3))").await?;
Ok(())
}# use std::sync::Arc;
# use arrow::datatypes::{DataType, FieldRef, TimeUnit};
# use datafusion::error::Result;
# use datafusion::prelude::*;
# use datafusion::execution::SessionStateBuilder;
use datafusion_expr::planner::TypePlanner;
# use sqlparser::ast;
#[derive(Debug)]
struct MyTypePlanner;
impl TypePlanner for MyTypePlanner {
fn plan_type_field(&self, sql_type: &ast::DataType) -> Result<Option<FieldRef>> {
match sql_type {
sqlparser::ast::DataType::Uuid => Ok(Some(Arc::new(
Field::new("", DataType::FixedSizeBinary(16), true).with_metadata(
[("ARROW:extension:name".to_string(), "arrow.uuid".to_string())]
.into(),
),
))),
_ => Ok(None),
}
}
}
#[tokio::main]
async fn main() -> Result<()> {
let state = SessionStateBuilder::new()
.with_default_features()
.with_type_planner(Arc::new(MyTypePlanner))
.build();
let ctx = SessionContext::new_with_state(state);
// Now UUID type is recognized
ctx.sql("CREATE TABLE idx (uuid UUID)").await?;
Ok(())
}For more details, see the TypePlanner API documentation.
Use RelationPlanner to handle custom relations in the FROM clause. This
enables you to implement SQL constructs like:
TABLESAMPLEfor sampling dataPIVOT/UNPIVOTfor data reshapingMATCH_RECOGNIZEfor pattern matching- Any custom relation syntax parsed by sqlparser
When implementing RelationPlanner, you receive a RelationPlannerContext that
provides utilities for planning:
| Method | Purpose |
|---|---|
plan(relation) |
Recursively plan a nested relation |
sql_to_expr(expr, schema) |
Convert SQL expression to DataFusion Expr |
context_provider() |
Access session configuration, tables, functions |
See the RelationPlanner API documentation for additional methods like
normalize_ident() and object_name_to_table_reference().
There are two main approaches when implementing a RelationPlanner:
-
Rewrite to Standard SQL: Transform custom syntax into equivalent standard operations that DataFusion already knows how to execute (e.g., PIVOT → GROUP BY with CASE expressions). This is the simplest approach when possible.
-
Custom Logical and Physical Nodes: Create a
UserDefinedLogicalNodeto represent the operation in the logical plan, along with a customExecutionPlanto execute it. Both are required for end-to-end execution.
# use std::sync::Arc;
# use datafusion::error::Result;
# use datafusion::prelude::*;
use datafusion_expr::planner::{
PlannedRelation, RelationPlanner, RelationPlannerContext, RelationPlanning,
};
use datafusion_sql::sqlparser::ast::TableFactor;
#[derive(Debug)]
struct MyRelationPlanner;
impl RelationPlanner for MyRelationPlanner {
fn plan_relation(
&self,
relation: TableFactor,
ctx: &mut dyn RelationPlannerContext,
) -> Result<RelationPlanning> {
match relation {
// Handle your custom relation
TableFactor::Pivot { table, alias, .. } => {
// Plan the input table
let input = ctx.plan(*table)?;
// Transform or wrap the plan as needed
// ...
Ok(RelationPlanning::Planned(PlannedRelation::new(input, alias)))
}
// Return Original for relations you don't handle
other => Ok(RelationPlanning::Original(other)),
}
}
}
#[tokio::main]
async fn main() -> Result<()> {
let ctx = SessionContext::new();
// Register the custom planner
ctx.register_relation_planner(Arc::new(MyRelationPlanner))?;
Ok(())
}The DataFusion repository includes comprehensive examples demonstrating each approach:
The table_sample.rs example shows a complete end-to-end implementation of how to support queries such as:
SELECT * FROM table TABLESAMPLE BERNOULLI(10 PERCENT) REPEATABLE(42)The pivot_unpivot.rs example demonstrates rewriting custom syntax to standard SQL for queries such as:
SELECT * FROM sales
PIVOT (SUM(amount) FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4'))- Use
ExprPlannerfor custom operators and expression handling - Use [
TypePlannerfor custom SQL data types - Use
RelationPlannerfor custom FROM clause syntax (TABLESAMPLE, PIVOT, etc.) - Register planners via
SessionContextorSessionStateBuilder
- API Documentation:
ExprPlanner,TypePlanner,RelationPlanner - relation_planner examples - Complete TABLESAMPLE, PIVOT/UNPIVOT implementations
- expr_planner test examples - Custom operator examples
- Custom Expression Planning in the UDF guide