chore: refactor to extract common and jni-bridge as separate crates#3667
chore: refactor to extract common and jni-bridge as separate crates#3667andygrove wants to merge 12 commits intoapache:mainfrom
common and jni-bridge as separate crates#3667Conversation
Extract JNI bridge code (errors, jvm_bridge, SparkError, QueryContext) from core and spark-expr into a new datafusion-comet-jvm-bridge crate. This fixes the dependency direction so that jvm-bridge is the lowest-level crate with no dependency on spark-expr. The dependency graph is now: jvm-bridge <- spark-expr <- core Moved to jvm-bridge: - errors.rs (CometError, ExecutionError, ExpressionError, JNI helpers) - jvm_bridge/ (JNI macros, JVMClasses, cached method IDs) - SparkError, SparkErrorWithContext, SparkResult, decimal_overflow_error - QueryContext, QueryContextMap, create_query_context_map spark-expr re-exports SparkError/QueryContext types for backward compatibility.
This is expression-level convenience logic, not JNI bridge infrastructure.
- SparkError::exception_class() -> pub(crate) (only used in errors.rs) - SparkError::error_class() -> private (only used within spark_error.rs) - QueryContext::fragment() -> #[cfg(test)] private (only used in tests) - JVMClasses method ID fields -> private (only used within jvm_bridge mod)
Remove lazy_static, thiserror, regex from core; log from jvm-bridge; serde, thiserror from spark-expr. Update cargo-machete ignore list.
native/spark-expr/src/error.rs
Outdated
| } | ||
| } | ||
| } | ||
| // Re-export SparkError types from jvm-bridge crate |
There was a problem hiding this comment.
This file has no jvm specific code and is explicitly for Spark. It doesn't seem like a good idea that this should be in another crate.
There was a problem hiding this comment.
I agree that this went too far. I ran into an issue with implementing From in a different crate to where things were defined. I'll see if I can find a better approach.
There was a problem hiding this comment.
Another option would be to create a separate crate for the error types.
There was a problem hiding this comment.
The jvm-bridge code contains CometError, which depends on SparkError. The easiest solution was to create a new spark-errors crate as well.
There was a problem hiding this comment.
Or maybe we can just have that one bit of jni code that we need for errors in the spark-expr module and move the rest?
I think a spark-errors crate works though tbh the amount of code is not large enough to justify a crate.
There was a problem hiding this comment.
I renamed the spark-errors crate to common. It is likely that more code will move to this crate with the next phases of refactoring to extract the parquet and shuffle crates.
|
Thanks @andygrove we might want to steal the name from https://github.com/apache/auron/tree/master/native-engine/auron-jni-bridge and name it |
I do prefer |
SparkError and QueryContext are Spark expression types with no JNI dependency, but were placed in jvm-bridge during the crate extraction, forcing spark-expr to depend on jvm-bridge just to re-export its own error types. Move them back to spark-expr where they belong and introduce a pluggable ExternalErrorHandler callback in jvm-bridge so that core can register SparkError-specific JNI exception handling without jvm-bridge needing to know about SparkError. Dependency graph after this change: - core -> spark-expr (SparkError types) - core -> jvm-bridge (JNI bridge, CometError, JAVA_VM) - spark-expr has no jvm-bridge dependency - jvm-bridge has no spark-expr dependency
Move SparkError, SparkErrorWithContext, QueryContext, and QueryContextMap into a new spark-errors crate that both jvm-bridge and spark-expr depend on. This allows jvm-bridge to directly downcast SparkError variants in throw_exception, eliminating the ExternalErrorHandler callback, OnceCell, and init-time registration that were previously needed.
Move macros, JVMClasses, and helper types from the jvm_bridge submodule to the crate root, eliminating the redundant jvm_bridge::jvm_bridge path.
jvm-bridge as separate cratejni-bridge as separate crate
jni-bridge as separate cratecommon and jni-bridge as separate crates
Which issue does this PR close?
N/A
Rationale for this change
I would like to split the
corecreate (~35k LOC) into separate crates by functional areas, particularly forparquetandshuffle. IMO, this will make it easier to make sure we have comprehensive test coverage and good code coverage in each crate. This PR is the first step towards this.What changes are included in this PR?
commonsource files:jni-bridgesource files:How are these changes tested?
Existing tests