Skip to content

Conversation

@liaoxin01
Copy link
Contributor

Cherry-picked from #59892

…pache#59892)

Previously, file cache used packed file path as cache key, which caused
cache entries to be orphaned when stale rowsets were cleaned up (cleanup
uses segment path as key).

This fix moves the cache layer from inner reader to PackedFileReader
wrapper level, ensuring:
1. Cache key = hash(segment_path.filename()) - matches cleanup key
2. Cache size = segment size - correct boundary
3. Each segment has independent cache entry - no interference
@liaoxin01 liaoxin01 requested a review from yiguolei as a code owner January 20, 2026 06:22
Copilot AI review requested due to automatic review settings January 20, 2026 06:22
@Thearas
Copy link
Contributor

Thearas commented Jan 20, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@liaoxin01
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This is a cherry-pick from PR #59892 that fixes a packed file cache cleanup issue. The problem was that file cache used the packed file path as the cache key, causing cache entries to become orphaned when stale rowsets were cleaned up (which uses segment path as the key).

Changes:

  • Restructured cache layer to wrap PackedFileReader with CachedRemoteFileReader instead of the reverse
  • Added async cache write functionality for small files using segment paths as cache keys
  • Updated tests to verify the new cache layer structure

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
be/src/io/fs/packed_file_system.cpp Reordered cache wrapping to use segment path as cache key and disabled cache for inner packed file reader
be/src/io/fs/packed_file_manager.cpp Added async small file cache write functionality with proper cache key generation and disabled write cache for packed files
be/test/io/fs/packed_file_concurrency_test.cpp Updated test to use unique file names and verify new cache wrapper structure

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 76.74% (66/86) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.75% (18902/35833)
Line Coverage 35.84% (175435/489468)
Region Coverage 32.45% (135567/417823)
Branch Coverage 33.34% (58819/176442)

@yiguolei
Copy link
Contributor

skip buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 20, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 4c3a586 into apache:branch-4.0 Jan 20, 2026
35 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants