Only emit "deleting extra files" status when cleanup is needed#1881
Only emit "deleting extra files" status when cleanup is needed#1881CodyCBakerPhD wants to merge 10 commits into
Conversation
In _download_zarr the {"status": "deleting extra files"} progress message
was emitted on every Zarr download, even on a fresh first-time download where
nothing is stale, which misleadingly implied files were being removed.
Scan the directory tree first to determine whether any stale files or
now-empty directories actually need removal, and only yield the status (and
perform the unlink/rmdir work) when there is something to delete. Add per-path
DEBUG logging before each unlink and rmdir so deletions are auditable.
The whole-tree Zarr checksum verification still runs unconditionally, and the
OSError handling and empty-directory pruning behavior are preserved.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU
Replace the separate detection and deletion walks with one non-mutating scan that collects the stale files and resulting empty directories, then only announces and performs the removal when there is something to delete. A guarded rmdir preserves the previous OSError-on-unlink behavior (a file that can't be removed keeps its directory in place). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1881 +/- ##
==========================================
+ Coverage 76.79% 76.82% +0.03%
==========================================
Files 87 87
Lines 12805 12836 +31
==========================================
+ Hits 9833 9861 +28
- Misses 2972 2975 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU
I didn't check original code in detail but would 2pass be really needed, since it sounds like a potential additional burden... e.g. could it be just a matter of a flag like |
Replace the separate collection scan with the original single mutating walk, yielding the 'deleting extra files' status lazily on the first actual removal via a flag. This avoids the extra full-tree traversal and restores the exact original OSError-on-unlink handling (no rmdir guard needed). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU
Co-authored-by: Cody Baker <51133164+CodyCBakerPhD@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU
Restored 1-pass implementation |
|
The two dev-deps failures (also occuring on daily tests) are fixed in #1882 |
Summary
Optimize the Zarr download cleanup logic to avoid emitting a misleading "deleting extra files" status message on fresh downloads where no orphaned files exist.
Changes
@pytest.mark.ai_generated:test_download_zarr_clean_no_deleting_status: Verifies that fresh downloads don't emit the deletion statustest_download_zarr_deletes_orphan_files: Verifies that orphaned files are properly cleaned up on re-download with the status correctly emittedImplementation Details
The cleanup logic now:
https://claude.ai/code/session_01LFHXuWAevoNHvEgw3QhGtU