Fix rglob TypeError when a blob and a directory share a name (#431)#577
Open
pjbull wants to merge 1 commit into
Open
Fix rglob TypeError when a blob and a directory share a name (#431)#577pjbull wants to merge 1 commit into
pjbull wants to merge 1 commit into
Conversation
When a cloud bucket contains both a blob and a "directory" prefix with the same name, _build_subtree could either crash with 'TypeError: NoneType object is not subscriptable' or silently overwrite already-discovered descendants. Cloud object stores allow this layout even though local filesystems do not. The tree builder now treats a file leaf and a directory subtree as mergeable at the same path: the leaf is promoted to a subtree when a deeper entry arrives, and an existing subtree is preserved when a same named file or directory entry is processed later. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #577 +/- ##
========================================
- Coverage 94.3% 93.6% -0.8%
========================================
Files 28 28
Lines 2267 2272 +5
========================================
- Hits 2140 2128 -12
- Misses 127 144 +17
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #431.
Problem
CloudPath.rglobcrashed withTypeError: 'NoneType' object is not subscriptablewhen a bucket contained both a blob and a "directory" prefix sharing the same name — a layout cloud object stores allow but local filesystems do not. The reporter and a second user hit this on GCS; the same code path is used for every backend.The crash came from
_build_subtreeincloudpath.py. The recursive_build_treehelper writesNonefor a file leaf and a nesteddefaultdictfor a directory. When a file entry was processed first at pathoutput, then a deeper entry likeoutput/13655/0/file1.jsonarrived, the helper didtrunk[branch][next_branch]on theNoneleaf and crashed. The reverse case — directory first, file with the same name second — overwrote the populated subtree withNone, silently losing descendants.Fix
_build_treenow treats a file leaf and a directory subtree as mergeable at the same path:Noneleaf, promote it to aTree()before recursing.This also covers the reverse ordering, where the directory entry for
output/13655/0is listed afteroutput/13655/0/file1.jsonand previously wiped out the file.Test
Added
test_rglob_file_and_dir_same_nameintests/test_cloudpath_file_io.py. It monkeypatchesclient._list_dirto emit a blob and a directory at the same path (which the local-FS-backed mocks can't represent naturally), then assertsrglob("*")returns the descendants without raising. Runs against all 10 rigs.Verification
make lintclean (black, flake8, mypy)make test— 1102 passed, 7 skipped