branch-4.1: [feature](jsf) Treat JuiceFS (jfs://) as HDFS-compatible in FE/BE #61031#61706
branch-4.1: [feature](jsf) Treat JuiceFS (jfs://) as HDFS-compatible in FE/BE #61031#61706xylaaaaa wants to merge 1 commit intoapache:branch-4.1from
Conversation
…61031) ## Summary - Treat `jfs://` (JuiceFS) as HDFS-compatible in FE and BE code paths. - Add regression case for JuiceFS HMS catalog read. - Keep docker/startup-script local changes out of this PR. ## FE Changes - Add `jfs` handling to storage schema/type mapping and HDFS properties flow. - Update resource/resource-manager related checks to recognize JuiceFS as HDFS-like. - Fix `CreateResourceCommandTest` lambda effectively-final compilation issue. ## BE Changes - Add `jfs://` recognition in file factory and paimon Doris file system integration. ## Regression - Add suite: - `regression-test/suites/external_table_p0/refactor_storage_param/test_jfs_hms_catalog_read.groovy` ## Local Validation - Built FE/BE with the above changes. - Local end-to-end smoke verified for catalog read/write: - create catalog -> create db/table with `jfs://...` location -> insert -> select - readback rows: `101`, `202` --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> (cherry picked from commit 109f2e0)
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
This PR backports support for treating JuiceFS (jfs://) as an HDFS-compatible filesystem across FE/BE, including packaging the JuiceFS Hadoop client jar and adding regression coverage for HMS catalog access via jfs://.
Changes:
- FE: Map
jfs/juicefsresource and schema handling onto the HDFS-compatible path; pass throughjuicefs.*Hadoop configs. - BE: Treat
JFSbackend type as HDFS for file factory routing; add configurable URI-scheme→file-type mappings for paimon-cpp. - Tooling/CI: Package the
juicefs-hadoopjar into FE/BE outputs; add Docker + regression test coverage for Hive/HMS with JuiceFS.
Reviewed changes
Copilot reviewed 30 out of 32 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| thirdparty/vars.sh | Adds JuiceFS Hadoop jar as a thirdparty artifact and enables it in the thirdparty download list. |
| thirdparty/juicefs-helpers.sh | Introduces shared helper functions for locating/downloading JuiceFS Hadoop jars. |
| thirdparty/build-thirdparty.sh | Adds a juicefs thirdparty “build” step to install the jar into installed/juicefs_libs. |
| regression-test/suites/external_table_p0/refactor_storage_param/test_jfs_hms_catalog_read.groovy | New regression suite validating HMS catalog read/write using jfs:// and JuiceFS meta properties. |
| regression-test/pipeline/external/conf/regression-conf.groovy | Adds external pipeline defaults for running the new JuiceFS regression suite. |
| fe/fe-core/src/test/java/org/apache/doris/nereids/trees/plans/commands/CreateResourceCommandTest.java | Extends validation tests to ensure type=jfs is accepted as HDFS-compatible. |
| fe/fe-core/src/test/java/org/apache/doris/datasource/property/storage/HdfsPropertiesUtilsTest.java | Adds jfs to supported schemas and a conversion test for jfs:// URIs. |
| fe/fe-core/src/test/java/org/apache/doris/datasource/property/storage/HdfsPropertiesTest.java | Ensures jfs:// URIs map to HdfsProperties and JuiceFS properties are passed through to BE config. |
| fe/fe-core/src/test/java/org/apache/doris/common/util/LocationPathTest.java | Updates expectations so jfs:// is treated as HDFS-compatible (HDFS file type for BE). |
| fe/fe-core/src/main/java/org/apache/doris/fs/SchemaTypeMapper.java | Remaps jfs schema to HDFS-compatible storage/file types. |
| fe/fe-core/src/main/java/org/apache/doris/datasource/property/storage/HdfsProperties.java | Adds jfs to supported schemas and passes through juicefs.* keys into overridden Hadoop config. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/ResourceMgr.java | Updates error message to reflect JuiceFS is handled under HDFS resources. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/Resource.java | Maps resource type values jfs/juicefs to HDFS. |
| fe/fe-core/src/main/java/org/apache/doris/analysis/OutFileClause.java | Treats JFS storage type like HDFS for defaultFS extraction. |
| fe/fe-core/src/main/java/org/apache/doris/analysis/BrokerDesc.java | Maps JFS storage type to FILE_HDFS (instead of broker). |
| docker/thirdparties/run-thirdparties-docker.sh | Adds JuiceFS jar/CLI discovery, jar syncing to Hive, and metadata formatting helpers. |
| docker/thirdparties/docker-compose/hive/scripts/hive-metastore.sh | Copies JuiceFS jar into Hadoop classpath locations in Hive containers. |
| docker/thirdparties/docker-compose/hive/hive-3x_settings.env | Adds JFS_CLUSTER_META env configuration for Hive3. |
| docker/thirdparties/docker-compose/hive/hive-3x.yaml.tpl | Mounts a local bucket directory for JuiceFS file-based storage in Hive3. |
| docker/thirdparties/docker-compose/hive/hive-2x_settings.env | Adds JFS_CLUSTER_META env configuration for Hive2. |
| docker/thirdparties/docker-compose/hive/hadoop-hive.env.tpl | Adds core-site config env vars for fs.jfs.impl and juicefs.cluster.meta. |
| docker/thirdparties/docker-compose/hive/hadoop-hive-3x.env.tpl | Adds hive-site config env vars for fs.jfs.impl and juicefs.cluster.meta. |
| build.sh | Adds JuiceFS jar packaging into FE/BE outputs and refactors FE Maven build with a retry mechanism. |
| bin/start_fe.sh | Adds packaged JuiceFS jars to FE CLASSPATH (after core libs). |
| bin/start_be.sh | Adds packaged JuiceFS jars to BE CLASSPATH (after hadoop_hdfs libs). |
| be/test/format/table/paimon_doris_file_system_test.cpp | New unit tests for paimon scheme mapping defaults and overrides. |
| be/src/io/file_factory.h | Routes TStorageBackendType::JFS to FILE_HDFS. |
| be/src/gen_cpp/CMakeLists.txt | Excludes _server.skeleton.cpp from generated cpp compilation inputs. |
| be/src/format/table/paimon_doris_file_system.h | Exposes scheme→file-type mapping function for tests. |
| be/src/format/table/paimon_doris_file_system.cpp | Implements configurable scheme→file-type mapping via paimon_file_system_scheme_mappings. |
| be/src/common/config.h | Declares new paimon_file_system_scheme_mappings configuration. |
| be/src/common/config.cpp | Defines default mapping list and validates mapping format/uniqueness. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "${mvn_cmd[@]}" | ||
| rm -f "${log_file}" | ||
| return 0 |
There was a problem hiding this comment.
In the Maven lock-contention retry path, the function returns success unconditionally. If the retry Maven invocation fails, the script will still return 0 and the build may proceed with missing/partial FE artifacts. Capture and check the retry command's exit status and return non-zero on failure (and ideally still print the saved log for diagnosis).
| "${mvn_cmd[@]}" | |
| rm -f "${log_file}" | |
| return 0 | |
| if "${mvn_cmd[@]}"; then | |
| rm -f "${log_file}" | |
| return 0 | |
| fi | |
| echo "FE Maven build retry with -T ${retry_thread_count} failed. Original lock-contention log follows:" | |
| cat "${log_file}" | |
| rm -f "${log_file}" | |
| return 1 |
| "/mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/*/Cluster*/fe/lib/juicefs/juicefs-hadoop-[0-9]*.jar" | ||
| "/mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/*/Cluster*/be/lib/java_extensions/juicefs/juicefs-hadoop-[0-9]*.jar" | ||
| ) |
There was a problem hiding this comment.
find_juicefs_hadoop_jar() includes hard-coded absolute paths under /mnt/ssd01/pipline/..., which are environment-specific and will fail (or add unnecessary filesystem scans) on most developer/CI machines. Please remove these absolute paths or gate them behind an explicit env var (e.g. a configurable search root) so the script remains portable.
| "/mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/*/Cluster*/fe/lib/juicefs/juicefs-hadoop-[0-9]*.jar" | |
| "/mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/*/Cluster*/be/lib/java_extensions/juicefs/juicefs-hadoop-[0-9]*.jar" | |
| ) | |
| ) | |
| if [[ -n "${JUICEFS_CLUSTER_ENV_ROOT:-}" ]]; then | |
| jar_globs+=( | |
| "${JUICEFS_CLUSTER_ENV_ROOT}/clusterEnv/*/Cluster*/fe/lib/juicefs/juicefs-hadoop-[0-9]*.jar" | |
| "${JUICEFS_CLUSTER_ENV_ROOT}/clusterEnv/*/Cluster*/be/lib/java_extensions/juicefs/juicefs-hadoop-[0-9]*.jar" | |
| ) | |
| fi |
| assertEquals("2", rows[1][0].toString()) | ||
| assertEquals("jfs_2", rows[1][1].toString()) | ||
| } finally { | ||
| sql """switch internal""" |
There was a problem hiding this comment.
The suite creates and switches to a new catalog but the finally block only switches back to internal and never drops the created catalog. This can leave state behind for subsequent suites/reruns. Consider dropping the catalog in finally (e.g. switch internal then drop catalog if exists ...) so the test is self-cleaning even on failures.
| sql """switch internal""" | |
| sql """switch internal""" | |
| sql """drop catalog if exists ${catalogName}""" |
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Cherry-picked from #61031