branch-4.1: [Refactor](Multi Catalog) Unify external meta cache framework #60937#61725
Open
suxiaogang223 wants to merge 1 commit intoapache:branch-4.1from
Open
branch-4.1: [Refactor](Multi Catalog) Unify external meta cache framework #60937#61725suxiaogang223 wants to merge 1 commit intoapache:branch-4.1from
suxiaogang223 wants to merge 1 commit intoapache:branch-4.1from
Conversation
…#60937) Part of apache#60686 This PR continues the external metadata cache refactor by introducing a unified engine-scoped cache framework and aligning multiple external catalog implementations on top of it. The goal is not to finish every historical cache migration in one shot. The goal is to make the framework shape explicit and consistent enough that later migrations can continue on top of one model instead of adding more engine-specific cache flows. At a high level, this PR does three things: - introduces a common framework for external metadata cache lifecycle, routing, invalidation, and stats - moves more engine-specific cache behavior behind engine adapters instead of scattered table- and util-level entry points - improves cross-engine consistency and test coverage while keeping legacy compatibility migration incremental The current framework can be viewed as three layers: 1. manager layer: owns engine cache lifecycle and routes cache operations 2. engine layer: each engine implements its own cache adapter on the shared framework 3. catalog/entry layer: each engine keeps per-catalog cache groups and typed cache entries ```mermaid flowchart TD A["ExternalMetaCacheMgr"] --> B["ExternalMetaCacheRegistry"] A --> C["ExternalMetaCacheRouteResolver"] A --> D["ExternalMetaCache(engine facade)"] D --> E["AbstractExternalMetaCache"] E --> F["CatalogEntryGroup(catalog scoped)"] F --> G["MetaCacheEntry(table/schema/partition/...)"] H["IcebergExternalMetaCache"] --> E I["PaimonExternalMetaCache"] --> E J["HudiExternalMetaCache"] --> E K["MaxComputeExternalMetaCache"] --> E L["DorisExternalMetaCache"] --> E M["HiveExternalMetaCache"] --> E ``` This structure makes a few framework boundaries explicit: - manager-level logic is responsible for engine registration, route resolution, and scoped invalidation dispatch - engine adapters own engine-specific metadata loading and cache composition - shared framework code owns per-catalog entry containers, generic entry access, and common lifecycle behavior - add a shared external meta cache framework under `datasource.metacache` - refactor `ExternalMetaCacheMgr` so registration and routing are more explicit instead of staying mixed in one manager path - make cache initialization and invalidation flow clearer at the framework level - align multiple engines with the framework model, including Iceberg, Paimon, Hudi, MaxCompute, Doris, and Hive - keep legacy compatibility migration incremental instead of forcing a one-PR full replacement of every historical cache path - add or extend framework-level and engine-level tests around routing, invalidation, and cache behavior This PR is mainly about pulling different engines closer to one framework shape. - Iceberg and Paimon are aligned with the framework while keeping latest snapshot metadata modeled as table-owned runtime projection - Hudi moves further away from ad hoc cache state and closer to framework-owned entry behavior - MaxCompute and Doris move more cache ownership into engine adapters - Hive keeps its existing complexity where necessary, but more of that logic now sits behind the framework-oriented cache layer The important point is not that every engine is identical now. The important point is that they are being moved toward one consistent framework model. - this is primarily a framework refactor and behavior-alignment change - migration is still incremental, so some legacy compatibility paths are intentionally retained - the purpose of this PR is to reduce structural divergence across engines without requiring a full one-shot migration (cherry picked from commit 010f470)
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-pick #60937 to branch-4.1
What problem does this PR solve?
Backport the unified external meta cache framework refactor to branch-4.1, aligning external catalog cache lifecycle, routing, invalidation, and engine-specific cache adapters on the shared framework.
Cherry-pick commit
010f4706909- [Refactor](Multi Catalog) Unify external meta cache framework ([Refactor](Multi Catalog) Unify external meta cache framework #60937)