Skip to content

chore: refactor to extract common and jni-bridge as separate crates#3667

Open
andygrove wants to merge 12 commits intoapache:mainfrom
andygrove:refactor/extract-jvm-bridge-crate
Open

chore: refactor to extract common and jni-bridge as separate crates#3667
andygrove wants to merge 12 commits intoapache:mainfrom
andygrove:refactor/extract-jvm-bridge-crate

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Mar 11, 2026

Which issue does this PR close?

N/A

Rationale for this change

I would like to split the core create (~35k LOC) into separate crates by functional areas, particularly for parquet and shuffle. IMO, this will make it easier to make sure we have comprehensive test coverage and good code coverage in each crate. This PR is the first step towards this.

What changes are included in this PR?

common source files:

./src/error.rs
./src/lib.rs
./src/query_context.rs

jni-bridge source files:

./src/batch_iterator.rs
./src/lib.rs
./src/comet_metric_node.rs
./src/errors.rs
./src/comet_exec.rs
./src/comet_task_memory_manager.rs

How are these changes tested?

Existing tests

Extract JNI bridge code (errors, jvm_bridge, SparkError, QueryContext)
from core and spark-expr into a new datafusion-comet-jvm-bridge crate.

This fixes the dependency direction so that jvm-bridge is the lowest-level
crate with no dependency on spark-expr. The dependency graph is now:
jvm-bridge <- spark-expr <- core

Moved to jvm-bridge:
- errors.rs (CometError, ExecutionError, ExpressionError, JNI helpers)
- jvm_bridge/ (JNI macros, JVMClasses, cached method IDs)
- SparkError, SparkErrorWithContext, SparkResult, decimal_overflow_error
- QueryContext, QueryContextMap, create_query_context_map

spark-expr re-exports SparkError/QueryContext types for backward
compatibility.
This is expression-level convenience logic, not JNI bridge
infrastructure.
- SparkError::exception_class() -> pub(crate) (only used in errors.rs)
- SparkError::error_class() -> private (only used within spark_error.rs)
- QueryContext::fragment() -> #[cfg(test)] private (only used in tests)
- JVMClasses method ID fields -> private (only used within jvm_bridge mod)
Remove lazy_static, thiserror, regex from core; log from jvm-bridge;
serde, thiserror from spark-expr. Update cargo-machete ignore list.
@andygrove andygrove marked this pull request as ready for review March 11, 2026 18:19
}
}
}
// Re-export SparkError types from jvm-bridge crate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has no jvm specific code and is explicitly for Spark. It doesn't seem like a good idea that this should be in another crate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this went too far. I ran into an issue with implementing From in a different crate to where things were defined. I'll see if I can find a better approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to create a separate crate for the error types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jvm-bridge code contains CometError, which depends on SparkError. The easiest solution was to create a new spark-errors crate as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we can just have that one bit of jni code that we need for errors in the spark-expr module and move the rest?
I think a spark-errors crate works though tbh the amount of code is not large enough to justify a crate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the spark-errors crate to common. It is likely that more code will move to this crate with the next phases of refactoring to extract the parquet and shuffle crates.

@comphead
Copy link
Contributor

Thanks @andygrove we might want to steal the name from https://github.com/apache/auron/tree/master/native-engine/auron-jni-bridge and name it jni-bridge ?

@andygrove
Copy link
Member Author

Thanks @andygrove we might want to steal the name from https://github.com/apache/auron/tree/master/native-engine/auron-jni-bridge and name it jni-bridge ?

I do prefer jni-bridge over jvm-bridge

SparkError and QueryContext are Spark expression types with no JNI
dependency, but were placed in jvm-bridge during the crate extraction,
forcing spark-expr to depend on jvm-bridge just to re-export its own
error types.

Move them back to spark-expr where they belong and introduce a pluggable
ExternalErrorHandler callback in jvm-bridge so that core can register
SparkError-specific JNI exception handling without jvm-bridge needing to
know about SparkError.

Dependency graph after this change:
- core -> spark-expr (SparkError types)
- core -> jvm-bridge (JNI bridge, CometError, JAVA_VM)
- spark-expr has no jvm-bridge dependency
- jvm-bridge has no spark-expr dependency
Move SparkError, SparkErrorWithContext, QueryContext, and QueryContextMap
into a new spark-errors crate that both jvm-bridge and spark-expr depend
on. This allows jvm-bridge to directly downcast SparkError variants in
throw_exception, eliminating the ExternalErrorHandler callback, OnceCell,
and init-time registration that were previously needed.
Move macros, JVMClasses, and helper types from the jvm_bridge submodule
to the crate root, eliminating the redundant jvm_bridge::jvm_bridge path.
@andygrove andygrove changed the title chore: refactor to extract jvm-bridge as separate crate chore: refactor to extract jni-bridge as separate crate Mar 12, 2026
@andygrove andygrove changed the title chore: refactor to extract jni-bridge as separate crate chore: refactor to extract common and jni-bridge as separate crates Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants