Skip to content

branch-4.1: [feature](jsf) Treat JuiceFS (jfs://) as HDFS-compatible in FE/BE #61031#61706

Open
xylaaaaa wants to merge 1 commit intoapache:branch-4.1from
xylaaaaa:auto-pick-61031-branch-4.1
Open

branch-4.1: [feature](jsf) Treat JuiceFS (jfs://) as HDFS-compatible in FE/BE #61031#61706
xylaaaaa wants to merge 1 commit intoapache:branch-4.1from
xylaaaaa:auto-pick-61031-branch-4.1

Conversation

@xylaaaaa
Copy link
Contributor

Cherry-picked from #61031

…61031)

## Summary
- Treat `jfs://` (JuiceFS) as HDFS-compatible in FE and BE code paths.
- Add regression case for JuiceFS HMS catalog read.
- Keep docker/startup-script local changes out of this PR.

## FE Changes
- Add `jfs` handling to storage schema/type mapping and HDFS properties
flow.
- Update resource/resource-manager related checks to recognize JuiceFS
as HDFS-like.
- Fix `CreateResourceCommandTest` lambda effectively-final compilation
issue.

## BE Changes
- Add `jfs://` recognition in file factory and paimon Doris file system
integration.

## Regression
- Add suite:
-
`regression-test/suites/external_table_p0/refactor_storage_param/test_jfs_hms_catalog_read.groovy`

## Local Validation
- Built FE/BE with the above changes.
- Local end-to-end smoke verified for catalog read/write:
- create catalog -> create db/table with `jfs://...` location -> insert
-> select
  - readback rows: `101`, `202`

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit 109f2e0)
Copilot AI review requested due to automatic review settings March 25, 2026 06:10
@xylaaaaa xylaaaaa requested a review from yiguolei as a code owner March 25, 2026 06:10
@Thearas
Copy link
Contributor

Thearas commented Mar 25, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR backports support for treating JuiceFS (jfs://) as an HDFS-compatible filesystem across FE/BE, including packaging the JuiceFS Hadoop client jar and adding regression coverage for HMS catalog access via jfs://.

Changes:

  • FE: Map jfs/juicefs resource and schema handling onto the HDFS-compatible path; pass through juicefs.* Hadoop configs.
  • BE: Treat JFS backend type as HDFS for file factory routing; add configurable URI-scheme→file-type mappings for paimon-cpp.
  • Tooling/CI: Package the juicefs-hadoop jar into FE/BE outputs; add Docker + regression test coverage for Hive/HMS with JuiceFS.

Reviewed changes

Copilot reviewed 30 out of 32 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
thirdparty/vars.sh Adds JuiceFS Hadoop jar as a thirdparty artifact and enables it in the thirdparty download list.
thirdparty/juicefs-helpers.sh Introduces shared helper functions for locating/downloading JuiceFS Hadoop jars.
thirdparty/build-thirdparty.sh Adds a juicefs thirdparty “build” step to install the jar into installed/juicefs_libs.
regression-test/suites/external_table_p0/refactor_storage_param/test_jfs_hms_catalog_read.groovy New regression suite validating HMS catalog read/write using jfs:// and JuiceFS meta properties.
regression-test/pipeline/external/conf/regression-conf.groovy Adds external pipeline defaults for running the new JuiceFS regression suite.
fe/fe-core/src/test/java/org/apache/doris/nereids/trees/plans/commands/CreateResourceCommandTest.java Extends validation tests to ensure type=jfs is accepted as HDFS-compatible.
fe/fe-core/src/test/java/org/apache/doris/datasource/property/storage/HdfsPropertiesUtilsTest.java Adds jfs to supported schemas and a conversion test for jfs:// URIs.
fe/fe-core/src/test/java/org/apache/doris/datasource/property/storage/HdfsPropertiesTest.java Ensures jfs:// URIs map to HdfsProperties and JuiceFS properties are passed through to BE config.
fe/fe-core/src/test/java/org/apache/doris/common/util/LocationPathTest.java Updates expectations so jfs:// is treated as HDFS-compatible (HDFS file type for BE).
fe/fe-core/src/main/java/org/apache/doris/fs/SchemaTypeMapper.java Remaps jfs schema to HDFS-compatible storage/file types.
fe/fe-core/src/main/java/org/apache/doris/datasource/property/storage/HdfsProperties.java Adds jfs to supported schemas and passes through juicefs.* keys into overridden Hadoop config.
fe/fe-core/src/main/java/org/apache/doris/catalog/ResourceMgr.java Updates error message to reflect JuiceFS is handled under HDFS resources.
fe/fe-core/src/main/java/org/apache/doris/catalog/Resource.java Maps resource type values jfs/juicefs to HDFS.
fe/fe-core/src/main/java/org/apache/doris/analysis/OutFileClause.java Treats JFS storage type like HDFS for defaultFS extraction.
fe/fe-core/src/main/java/org/apache/doris/analysis/BrokerDesc.java Maps JFS storage type to FILE_HDFS (instead of broker).
docker/thirdparties/run-thirdparties-docker.sh Adds JuiceFS jar/CLI discovery, jar syncing to Hive, and metadata formatting helpers.
docker/thirdparties/docker-compose/hive/scripts/hive-metastore.sh Copies JuiceFS jar into Hadoop classpath locations in Hive containers.
docker/thirdparties/docker-compose/hive/hive-3x_settings.env Adds JFS_CLUSTER_META env configuration for Hive3.
docker/thirdparties/docker-compose/hive/hive-3x.yaml.tpl Mounts a local bucket directory for JuiceFS file-based storage in Hive3.
docker/thirdparties/docker-compose/hive/hive-2x_settings.env Adds JFS_CLUSTER_META env configuration for Hive2.
docker/thirdparties/docker-compose/hive/hadoop-hive.env.tpl Adds core-site config env vars for fs.jfs.impl and juicefs.cluster.meta.
docker/thirdparties/docker-compose/hive/hadoop-hive-3x.env.tpl Adds hive-site config env vars for fs.jfs.impl and juicefs.cluster.meta.
build.sh Adds JuiceFS jar packaging into FE/BE outputs and refactors FE Maven build with a retry mechanism.
bin/start_fe.sh Adds packaged JuiceFS jars to FE CLASSPATH (after core libs).
bin/start_be.sh Adds packaged JuiceFS jars to BE CLASSPATH (after hadoop_hdfs libs).
be/test/format/table/paimon_doris_file_system_test.cpp New unit tests for paimon scheme mapping defaults and overrides.
be/src/io/file_factory.h Routes TStorageBackendType::JFS to FILE_HDFS.
be/src/gen_cpp/CMakeLists.txt Excludes _server.skeleton.cpp from generated cpp compilation inputs.
be/src/format/table/paimon_doris_file_system.h Exposes scheme→file-type mapping function for tests.
be/src/format/table/paimon_doris_file_system.cpp Implements configurable scheme→file-type mapping via paimon_file_system_scheme_mappings.
be/src/common/config.h Declares new paimon_file_system_scheme_mappings configuration.
be/src/common/config.cpp Defines default mapping list and validates mapping format/uniqueness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +849 to +851
"${mvn_cmd[@]}"
rm -f "${log_file}"
return 0
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Maven lock-contention retry path, the function returns success unconditionally. If the retry Maven invocation fails, the script will still return 0 and the build may proceed with missing/partial FE artifacts. Capture and check the retry command's exit status and return non-zero on failure (and ideally still print the saved log for diagnosis).

Suggested change
"${mvn_cmd[@]}"
rm -f "${log_file}"
return 0
if "${mvn_cmd[@]}"; then
rm -f "${log_file}"
return 0
fi
echo "FE Maven build retry with -T ${retry_thread_count} failed. Original lock-contention log follows:"
cat "${log_file}"
rm -f "${log_file}"
return 1

Copilot uses AI. Check for mistakes.
Comment on lines +245 to +247
"/mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/*/Cluster*/fe/lib/juicefs/juicefs-hadoop-[0-9]*.jar"
"/mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/*/Cluster*/be/lib/java_extensions/juicefs/juicefs-hadoop-[0-9]*.jar"
)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_juicefs_hadoop_jar() includes hard-coded absolute paths under /mnt/ssd01/pipline/..., which are environment-specific and will fail (or add unnecessary filesystem scans) on most developer/CI machines. Please remove these absolute paths or gate them behind an explicit env var (e.g. a configurable search root) so the script remains portable.

Suggested change
"/mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/*/Cluster*/fe/lib/juicefs/juicefs-hadoop-[0-9]*.jar"
"/mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/*/Cluster*/be/lib/java_extensions/juicefs/juicefs-hadoop-[0-9]*.jar"
)
)
if [[ -n "${JUICEFS_CLUSTER_ENV_ROOT:-}" ]]; then
jar_globs+=(
"${JUICEFS_CLUSTER_ENV_ROOT}/clusterEnv/*/Cluster*/fe/lib/juicefs/juicefs-hadoop-[0-9]*.jar"
"${JUICEFS_CLUSTER_ENV_ROOT}/clusterEnv/*/Cluster*/be/lib/java_extensions/juicefs/juicefs-hadoop-[0-9]*.jar"
)
fi

Copilot uses AI. Check for mistakes.
assertEquals("2", rows[1][0].toString())
assertEquals("jfs_2", rows[1][1].toString())
} finally {
sql """switch internal"""
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suite creates and switches to a new catalog but the finally block only switches back to internal and never drops the created catalog. This can leave state behind for subsequent suites/reruns. Consider dropping the catalog in finally (e.g. switch internal then drop catalog if exists ...) so the test is self-cleaning even on failures.

Suggested change
sql """switch internal"""
sql """switch internal"""
sql """drop catalog if exists ${catalogName}"""

Copilot uses AI. Check for mistakes.
@xylaaaaa
Copy link
Contributor Author

run buildall

@xylaaaaa xylaaaaa changed the title branch-4.1: [feature] Treat JuiceFS (jfs://) as HDFS-compatible in FE/BE #61031 branch-4.1: [feature](jsf) Treat JuiceFS (jfs://) as HDFS-compatible in FE/BE #61031 Mar 25, 2026
@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 78.26% (36/46) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.76% (19526/37008)
Line Coverage 36.16% (182282/504110)
Region Coverage 32.42% (140532/433520)
Branch Coverage 33.64% (61689/183359)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 78.26% (36/46) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.97% (25701/36216)
Line Coverage 53.79% (270216/502393)
Region Coverage 51.19% (223948/437486)
Branch Coverage 52.71% (96925/183879)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants