Skip to content

Fix rglob TypeError when a blob and a directory share a name (#431)#577

Open
pjbull wants to merge 1 commit into
masterfrom
pjbull-fix-431-rglob-file-dir-conflict
Open

Fix rglob TypeError when a blob and a directory share a name (#431)#577
pjbull wants to merge 1 commit into
masterfrom
pjbull-fix-431-rglob-file-dir-conflict

Conversation

@pjbull

@pjbull pjbull commented Jun 29, 2026

Copy link
Copy Markdown
Member

Fixes #431.

Problem

CloudPath.rglob crashed with TypeError: 'NoneType' object is not subscriptable when a bucket contained both a blob and a "directory" prefix sharing the same name — a layout cloud object stores allow but local filesystems do not. The reporter and a second user hit this on GCS; the same code path is used for every backend.

The crash came from _build_subtree in cloudpath.py. The recursive _build_tree helper writes None for a file leaf and a nested defaultdict for a directory. When a file entry was processed first at path output, then a deeper entry like output/13655/0/file1.json arrived, the helper did trunk[branch][next_branch] on the None leaf and crashed. The reverse case — directory first, file with the same name second — overwrote the populated subtree with None, silently losing descendants.

Fix

_build_tree now treats a file leaf and a directory subtree as mergeable at the same path:

  • If a recursion needs to descend into a branch that's currently a None leaf, promote it to a Tree() before recursing.
  • If a leaf write would clobber an existing subtree, keep the subtree (descendants matter more than the leaf marker).
  • If a directory entry arrives for a path that already has a subtree, reuse it rather than overwriting.

This also covers the reverse ordering, where the directory entry for output/13655/0 is listed after output/13655/0/file1.json and previously wiped out the file.

Test

Added test_rglob_file_and_dir_same_name in tests/test_cloudpath_file_io.py. It monkeypatches client._list_dir to emit a blob and a directory at the same path (which the local-FS-backed mocks can't represent naturally), then asserts rglob("*") returns the descendants without raising. Runs against all 10 rigs.

Verification

  • make lint clean (black, flake8, mypy)
  • make test — 1102 passed, 7 skipped
  • New regression test passes on all rigs (S3, GS, Azure, local variants, HTTP)

When a cloud bucket contains both a blob and a "directory" prefix with
the same name, _build_subtree could either crash with
'TypeError: NoneType object is not subscriptable' or silently overwrite
already-discovered descendants. Cloud object stores allow this layout
even though local filesystems do not.

The tree builder now treats a file leaf and a directory subtree as
mergeable at the same path: the leaf is promoted to a subtree when a
deeper entry arrives, and an existing subtree is preserved when a same
named file or directory entry is processed later.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.6%. Comparing base (9d5fa70) to head (15e4c61).

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #577     +/-   ##
========================================
- Coverage    94.3%   93.6%   -0.8%     
========================================
  Files          28      28             
  Lines        2267    2272      +5     
========================================
- Hits         2140    2128     -12     
- Misses        127     144     +17     
Files with missing lines Coverage Δ
cloudpathlib/cloudpath.py 95.2% <100.0%> (-0.3%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Recursive Globbing at GCS bucket top level results in TypeError

1 participant