Skip to content

Only emit "deleting extra files" status when cleanup is needed#1881

Open
CodyCBakerPhD wants to merge 10 commits into
masterfrom
claude/tender-bell-zwqnp9
Open

Only emit "deleting extra files" status when cleanup is needed#1881
CodyCBakerPhD wants to merge 10 commits into
masterfrom
claude/tender-bell-zwqnp9

Conversation

@CodyCBakerPhD

Copy link
Copy Markdown
Contributor

Summary

Optimize the Zarr download cleanup logic to avoid emitting a misleading "deleting extra files" status message on fresh downloads where no orphaned files exist.

Changes

  • Two-pass cleanup scan: Split the Zarr cleanup into a preliminary scan phase that checks for orphaned files without modifying anything, followed by actual deletion only if needed
  • Conditional status emission: Only emit the "deleting extra files" status when the preliminary scan detects actual orphaned files or empty directories to remove
  • Added debug logging: Include debug-level logging when deleting extra files and removing empty directories for better troubleshooting
  • New tests: Added two test cases marked with @pytest.mark.ai_generated:
    • test_download_zarr_clean_no_deleting_status: Verifies that fresh downloads don't emit the deletion status
    • test_download_zarr_deletes_orphan_files: Verifies that orphaned files are properly cleaned up on re-download with the status correctly emitted

Implementation Details

The cleanup logic now:

  1. First scans the Zarr directory tree to detect if any files/directories need removal
  2. Only yields the "deleting extra files" status if the scan found orphaned content
  3. Performs the actual deletion in a second pass if needed
  4. Maintains the same cleanup behavior while avoiding false status messages on clean downloads

https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU

In _download_zarr the {"status": "deleting extra files"} progress message
was emitted on every Zarr download, even on a fresh first-time download where
nothing is stale, which misleadingly implied files were being removed.

Scan the directory tree first to determine whether any stale files or
now-empty directories actually need removal, and only yield the status (and
perform the unlink/rmdir work) when there is something to delete. Add per-path
DEBUG logging before each unlink and rmdir so deletions are auditable.

The whole-tree Zarr checksum verification still runs unconditionally, and the
OSError handling and empty-directory pruning behavior are preserved.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU
@CodyCBakerPhD CodyCBakerPhD self-assigned this Jun 23, 2026
@CodyCBakerPhD CodyCBakerPhD added bug Something isn't working patch Increment the patch version when merged cmd-download labels Jun 23, 2026
claude added 2 commits June 23, 2026 16:48
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU
Comment thread dandi/download.py Outdated
Comment thread dandi/download.py Outdated
Comment thread dandi/download.py Outdated
CodyCBakerPhD and others added 2 commits June 23, 2026 12:52
Replace the separate detection and deletion walks with one non-mutating scan
that collects the stale files and resulting empty directories, then only
announces and performs the removal when there is something to delete. A
guarded rmdir preserves the previous OSError-on-unlink behavior (a file that
can't be removed keeps its directory in place).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.93939% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.82%. Comparing base (65c79c9) to head (de36312).

Files with missing lines Patch % Lines
dandi/download.py 80.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1881      +/-   ##
==========================================
+ Coverage   76.79%   76.82%   +0.03%     
==========================================
  Files          87       87              
  Lines       12805    12836      +31     
==========================================
+ Hits         9833     9861      +28     
- Misses       2972     2975       +3     
Flag Coverage Δ
unittests 76.82% <93.93%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU
@yarikoptic

Copy link
Copy Markdown
Member
  • First scans the Zarr directory tree to detect if any files/directories need removal
  • Only yields the "deleting extra files" status if the scan found orphaned content

I didn't check original code in detail but would 2pass be really needed, since it sounds like a potential additional burden... e.g. could it be just a matter of a flag like yield_delete_files_msg: bool and adding a conditional to yield on first delete thus while keeping original 1 pass implementation?

Replace the separate collection scan with the original single mutating walk,
yielding the 'deleting extra files' status lazily on the first actual removal
via a flag. This avoids the extra full-tree traversal and restores the exact
original OSError-on-unlink handling (no rmdir guard needed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU
Comment thread dandi/download.py Outdated
Comment thread dandi/download.py Outdated
CodyCBakerPhD and others added 2 commits June 23, 2026 15:27
Co-authored-by: Cody Baker <51133164+CodyCBakerPhD@users.noreply.github.com>
@CodyCBakerPhD

Copy link
Copy Markdown
Contributor Author

I didn't check original code in detail but would 2pass be really needed, since it sounds like a potential additional burden... e.g. could it be just a matter of a flag like yield_delete_files_msg: bool and adding a conditional to yield on first delete thus while keeping original 1 pass implementation?

Restored 1-pass implementation

@CodyCBakerPhD

Copy link
Copy Markdown
Contributor Author

The two dev-deps failures (also occuring on daily tests) are fixed in #1882

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cmd-download patch Increment the patch version when merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants