-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](packed-file) Fix packed file cache cleanup issue #59892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](packed-file) Fix packed file cache cleanup issue #59892
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a packed file cache cleanup issue by moving the cache layer from the inner reader to the PackedFileReader wrapper level. Previously, file cache used the packed file path as the cache key, which caused cache entries to become orphaned when stale rowsets were cleaned up (cleanup uses segment path as key).
Changes:
- Modified cache handling in
PackedFileSystem::open_file_impl()to disable caching at the inner reader level and wrapPackedFileReaderwithCachedRemoteFileReaderusing segment path and size - Added asynchronous cache write functions (
do_write_to_file_cacheandwrite_small_file_to_cache_async) to write small files to cache during packed file creation - Changed packed file writer options to disable
write_file_cacheflag, as individual small files are now cached separately - Updated logging from
LOG(INFO)toVLOG_DEBUGfor less verbose output
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| be/src/io/fs/packed_file_system.cpp | Moved cache layer to wrap PackedFileReader instead of inner reader; ensures cache key uses segment path and cache size uses segment size |
| be/src/io/fs/packed_file_manager.cpp | Added async cache writing for small files during packed file creation; disabled write_file_cache for packed files themselves |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
20954cd to
96a95e4
Compare
|
run buildall |
TPC-H: Total hot run time: 30806 ms |
TPC-DS: Total hot run time: 172662 ms |
ClickBench: Total hot run time: 26.64 s |
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code |
Previously, file cache used packed file path as cache key, which caused cache entries to be orphaned when stale rowsets were cleaned up (cleanup uses segment path as key). This fix moves the cache layer from inner reader to PackedFileReader wrapper level, ensuring: 1. Cache key = hash(segment_path.filename()) - matches cleanup key 2. Cache size = segment size - correct boundary 3. Each segment has independent cache entry - no interference
96a95e4 to
2cdcfe2
Compare
|
run buildall |
TPC-H: Total hot run time: 31129 ms |
TPC-DS: Total hot run time: 174229 ms |
ClickBench: Total hot run time: 26.97 s |
|
run buildall |
TPC-H: Total hot run time: 32236 ms |
TPC-DS: Total hot run time: 178301 ms |
ClickBench: Total hot run time: 27.3 s |
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 31795 ms |
TPC-DS: Total hot run time: 174472 ms |
ClickBench: Total hot run time: 26.67 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
Previously, file cache used packed file path as cache key, which caused cache entries to be orphaned when stale rowsets were cleaned up (cleanup uses segment path as key). This fix moves the cache layer from inner reader to PackedFileReader wrapper level, ensuring: 1. Cache key = hash(segment_path.filename()) - matches cleanup key 2. Cache size = segment size - correct boundary 3. Each segment has independent cache entry - no interference
…pache#59892) Previously, file cache used packed file path as cache key, which caused cache entries to be orphaned when stale rowsets were cleaned up (cleanup uses segment path as key). This fix moves the cache layer from inner reader to PackedFileReader wrapper level, ensuring: 1. Cache key = hash(segment_path.filename()) - matches cleanup key 2. Cache size = segment size - correct boundary 3. Each segment has independent cache entry - no interference
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Previously, file cache used packed file path as cache key, which caused cache entries to be orphaned when stale rowsets were cleaned up (cleanup uses segment path as key).
This fix moves the cache layer from inner reader to PackedFileReader wrapper level, ensuring:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)