-
Notifications
You must be signed in to change notification settings - Fork 0
Fix/28 full rerun targeted deletion #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
thompsonmj
wants to merge
12
commits into
dev
Choose a base branch
from
fix/28-full-rerun-targeted-deletion
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
9a12673
Refactor output path computation into shared helper
thompsonmj 55236b1
Fix #28: target --full-rerun to TaxonoPy-specific output files via ma…
thompsonmj 504de67
Add tests for manifest-based output tracking
thompsonmj eb7b5bb
Document --full-rerun behavior, manifest, and rerun lifecycle
thompsonmj efae52f
Update contribution convention to use Co-Authored-By for AI attribution
thompsonmj b0b3c62
Prevent symlink and traversal sequences from escaping the output dire…
thompsonmj 98b2e0a
Add security tests for manifest deletion path containment
thompsonmj d392ba1
Fix linting errors
thompsonmj 661bfed
Raise on corrupt or unreadable manifest with an actionable error message
thompsonmj 73019c0
Test that read_manifest raises on corrupt JSON
thompsonmj 75691a8
Document no-hard-wrap convention and fix existing wraps in AGENTS.md
thompsonmj 273edcb
Remove hard wrapping from all files touched by this PR
thompsonmj File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,8 @@ | ||
| # IO | ||
|
|
||
| TaxonoPy accepts CSV or Parquet inputs with the same schema. Use the pages below | ||
| for the exact input columns, the structure of resolved/unsolved outputs, and how | ||
| the cache supports provenance and transparency throughout the resolution process. | ||
| TaxonoPy accepts CSV or Parquet inputs with the same schema. Use the pages below for the exact input columns, the structure of resolved/unsolved outputs, and how the cache supports provenance and transparency throughout the resolution process. | ||
|
|
||
| - [Input](input.md) | ||
| - [Output](output.md) | ||
| - [Cache](cache.md) | ||
| - [Reruns](reruns.md) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| # Reruns | ||
|
|
||
| ## The Guard | ||
|
|
||
| TaxonoPy checks for existing output before processing. If a prior run is detected for the current input, it exits with a warning rather than silently overwriting: | ||
|
|
||
| ``` | ||
| Existing cache (...) and/or output (...) detected for this input. | ||
| Rerun with --full-rerun to replace them. | ||
| ``` | ||
|
|
||
| Detection uses two signals: | ||
|
|
||
| - the presence of a `taxonopy_resolve_manifest.json` in the output directory (written by any run using TaxonoPy v0.3.0 or later), or | ||
| - `.resolved.*` files in the output directory root (legacy fallback for output produced by earlier versions). | ||
|
|
||
| ## `--full-rerun` | ||
|
|
||
| `--full-rerun` is the explicit escape hatch through the guard. It clears the input-scoped cache namespace and removes all TaxonoPy-specific files from the output directory before proceeding. | ||
|
|
||
| ```console | ||
| taxonopy resolve \ | ||
| --input examples/input \ | ||
| --output-dir examples/resolved \ | ||
| --full-rerun | ||
| ``` | ||
|
|
||
| ### What it touches | ||
|
|
||
| - **Cache**: the namespace scoped to the current command, TaxonoPy version, and input fingerprint. Other namespaces (different inputs, different versions) are not affected. | ||
| - **Output files**: only the files listed in `taxonopy_resolve_manifest.json`. Any other files in the output directory are left untouched. | ||
|
|
||
| ### What it does not touch | ||
|
|
||
| - Files not listed in the manifest — including any non-TaxonoPy files you have placed in the output directory. | ||
| - Cache namespaces from other runs. | ||
|
|
||
| ### No manifest found | ||
|
|
||
| If `--full-rerun` is used but no manifest is present (e.g. output from a pre-v0.3.0 run, or a manually populated directory), TaxonoPy logs a warning and proceeds without removing any files: | ||
|
|
||
| ``` | ||
| --full-rerun: no manifest found in <output-dir>; no output files were removed. | ||
| ``` | ||
|
|
||
| The run then writes fresh output and a new manifest. | ||
|
|
||
| ## The Manifest | ||
|
|
||
| Every TaxonoPy run writes a manifest file to the output directory **before** creating any output. This means interrupted runs leave a complete record of what should be cleaned up — `--full-rerun` deletes exactly those files and nothing else. | ||
|
|
||
| Manifest files are command-scoped so they coexist safely if multiple commands share an output directory: | ||
|
|
||
| | Command | Manifest file | | ||
| |---|---| | ||
| | `resolve` | `taxonopy_resolve_manifest.json` | | ||
| | `common-names` | `taxonopy_common_names_manifest.json` | | ||
|
|
||
| ### Schema | ||
|
|
||
| ```json | ||
| { | ||
| "taxonopy_version": "0.3.0", | ||
| "command": "resolve", | ||
| "created_at": "2025-07-19T10:38:04.123456", | ||
| "input": "examples/input", | ||
| "cache_namespace": "~/.cache/taxonopy/resolve_v0.3.0_a3f9b2c1d4e5f678", | ||
thompsonmj marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| "files": [ | ||
| "sample.resolved.parquet", | ||
| "sample.unsolved.parquet", | ||
| "resolution_stats.json", | ||
| "taxonopy_resolve_manifest.json" | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| All paths in `files` are relative to the output directory. `cache_namespace` is `null` for `common-names`, which does not use an input-scoped cache. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.