You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`--full-rerun` (resolve only) — clear the input-scoped cache and remove
44
-
TaxonoPy-specific output files before rerunning. See [Reruns](reruns.md) for
45
-
full details.
42
+
-`--full-rerun` (resolve only) — clear the input-scoped cache and remove TaxonoPy-specific output files before rerunning. See [Reruns](reruns.md) for full details.
46
43
47
-
If you change input files or want to force a clean run, use `--refresh-cache` or
48
-
`--full-rerun`.
44
+
If you change input files or want to force a clean run, use `--refresh-cache` or `--full-rerun`.
Copy file name to clipboardExpand all lines: docs/user-guide/io/index.md
+1-3Lines changed: 1 addition & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,6 @@
1
1
# IO
2
2
3
-
TaxonoPy accepts CSV or Parquet inputs with the same schema. Use the pages below
4
-
for the exact input columns, the structure of resolved/unsolved outputs, and how
5
-
the cache supports provenance and transparency throughout the resolution process.
3
+
TaxonoPy accepts CSV or Parquet inputs with the same schema. Use the pages below for the exact input columns, the structure of resolved/unsolved outputs, and how the cache supports provenance and transparency throughout the resolution process.
The output directory mirrors the input directory structure. Output format is
9
-
controlled by the `--output-format` flag (`csv` or `parquet`).
10
-
11
-
TaxonoPy also writes a manifest file to the output directory before creating
12
-
any other files. This manifest lists every file the run intends to produce and
13
-
is used by `--full-rerun` to clean up precisely. Each command writes its own
14
-
manifest (`taxonopy_resolve_manifest.json` and
15
-
`taxonopy_common_names_manifest.json` respectively) so they coexist safely if
16
-
both commands share an output directory. See [Reruns](reruns.md) for details.
17
-
18
-
## What’s Inside
19
-
20
-
Each output row corresponds to one input record. Resolved entries contain the
21
-
standardized taxonomy where available, while unsolved entries preserve the
22
-
original input ranks. Both outputs include resolution metadata such as status
23
-
and strategy information.
8
+
The output directory mirrors the input directory structure. Output format is controlled by the `--output-format` flag (`csv` or `parquet`).
9
+
10
+
TaxonoPy also writes a manifest file to the output directory before creating any other files. This manifest lists every file the run intends to produce and is used by `--full-rerun` to clean up precisely. Each command writes its own manifest (`taxonopy_resolve_manifest.json` and `taxonopy_common_names_manifest.json` respectively) so they coexist safely if both commands share an output directory. See [Reruns](reruns.md) for details.
11
+
12
+
## What's Inside
13
+
14
+
Each output row corresponds to one input record. Resolved entries contain the standardized taxonomy where available, while unsolved entries preserve the original input ranks. Both outputs include resolution metadata such as status and strategy information.
24
15
25
16
Running through the sample resolution results in the following core files:
Copy file name to clipboardExpand all lines: docs/user-guide/io/reruns.md
+11-28Lines changed: 11 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,7 @@
2
2
3
3
## The Guard
4
4
5
-
TaxonoPy checks for existing output before processing. If a prior run is
6
-
detected for the current input, it exits with a warning rather than silently
7
-
overwriting:
5
+
TaxonoPy checks for existing output before processing. If a prior run is detected for the current input, it exits with a warning rather than silently overwriting:
8
6
9
7
```
10
8
Existing cache (...) and/or output (...) detected for this input.
@@ -13,16 +11,12 @@ Rerun with --full-rerun to replace them.
13
11
14
12
Detection uses two signals:
15
13
16
-
- the presence of a `taxonopy_resolve_manifest.json` in the output directory
17
-
(written by any run using TaxonoPy v0.3.0 or later), or
18
-
-`.resolved.*` files in the output directory root (legacy fallback for output
19
-
produced by earlier versions).
14
+
- the presence of a `taxonopy_resolve_manifest.json` in the output directory (written by any run using TaxonoPy v0.3.0 or later), or
15
+
-`.resolved.*` files in the output directory root (legacy fallback for output produced by earlier versions).
20
16
21
17
## `--full-rerun`
22
18
23
-
`--full-rerun` is the explicit escape hatch through the guard. It clears the
24
-
input-scoped cache namespace and removes all TaxonoPy-specific files from the
25
-
output directory before proceeding.
19
+
`--full-rerun` is the explicit escape hatch through the guard. It clears the input-scoped cache namespace and removes all TaxonoPy-specific files from the output directory before proceeding.
26
20
27
21
```console
28
22
taxonopy resolve \
@@ -33,23 +27,17 @@ taxonopy resolve \
33
27
34
28
### What it touches
35
29
36
-
-**Cache**: the namespace scoped to the current command, TaxonoPy version, and
37
-
input fingerprint. Other namespaces (different inputs, different versions) are
38
-
not affected.
39
-
-**Output files**: only the files listed in `taxonopy_resolve_manifest.json`.
40
-
Any other files in the output directory are left untouched.
30
+
-**Cache**: the namespace scoped to the current command, TaxonoPy version, and input fingerprint. Other namespaces (different inputs, different versions) are not affected.
31
+
-**Output files**: only the files listed in `taxonopy_resolve_manifest.json`. Any other files in the output directory are left untouched.
41
32
42
33
### What it does not touch
43
34
44
-
- Files not listed in the manifest — including any non-TaxonoPy files you have
45
-
placed in the output directory.
35
+
- Files not listed in the manifest — including any non-TaxonoPy files you have placed in the output directory.
46
36
- Cache namespaces from other runs.
47
37
48
38
### No manifest found
49
39
50
-
If `--full-rerun` is used but no manifest is present (e.g. output from a
51
-
pre-v0.3.0 run, or a manually populated directory), TaxonoPy logs a warning
52
-
and proceeds without removing any files:
40
+
If `--full-rerun` is used but no manifest is present (e.g. output from a pre-v0.3.0 run, or a manually populated directory), TaxonoPy logs a warning and proceeds without removing any files:
53
41
54
42
```
55
43
--full-rerun: no manifest found in <output-dir>; no output files were removed.
@@ -59,13 +47,9 @@ The run then writes fresh output and a new manifest.
59
47
60
48
## The Manifest
61
49
62
-
Every TaxonoPy run writes a manifest file to the output directory **before**
63
-
creating any output. This means interrupted runs leave a complete record of
64
-
what should be cleaned up — `--full-rerun` deletes exactly those files and
65
-
nothing else.
50
+
Every TaxonoPy run writes a manifest file to the output directory **before** creating any output. This means interrupted runs leave a complete record of what should be cleaned up — `--full-rerun` deletes exactly those files and nothing else.
66
51
67
-
Manifest files are command-scoped so they coexist safely if multiple commands
68
-
share an output directory:
52
+
Manifest files are command-scoped so they coexist safely if multiple commands share an output directory:
69
53
70
54
| Command | Manifest file |
71
55
|---|---|
@@ -90,5 +74,4 @@ share an output directory:
90
74
}
91
75
```
92
76
93
-
All paths in `files` are relative to the output directory. `cache_namespace`
94
-
is `null` for `common-names`, which does not use an input-scoped cache.
77
+
All paths in `files` are relative to the output directory. `cache_namespace` is `null` for `common-names`, which does not use an input-scoped cache.
Copy file name to clipboardExpand all lines: src/taxonopy/manifest.py
+5-18Lines changed: 5 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,8 @@
1
1
"""Manifest tracking for TaxonoPy output files.
2
2
3
-
Each TaxonoPy command writes a manifest file to its output directory listing
4
-
every file it intends to produce. The manifest is written before any output
5
-
files are created, so interrupted runs leave a complete record of what should
6
-
be cleaned up on the next --full-rerun.
3
+
Each TaxonoPy command writes a manifest file to its output directory listing every file it intends to produce. The manifest is written before any output files are created, so interrupted runs leave a complete record of what should be cleaned up on the next --full-rerun.
7
4
8
-
Manifest files are command-scoped to avoid collisions when multiple commands
9
-
share an output directory.
5
+
Manifest files are command-scoped to avoid collisions when multiple commands share an output directory.
"""Return the full list of files a common-names run intends to write.
65
60
66
-
Output files preserve the input directory structure, so paths are simply
67
-
the relative paths of the annotation files. No naming convention is
68
-
encoded here.
61
+
Output files preserve the input directory structure, so paths are simply the relative paths of the annotation files. No naming convention is encoded here.
"Cannot read manifest at '%s': %s -- automated rerun cleanup is not possible. "
135
-
"To proceed: fix or delete this file and remove previous TaxonoPy output files "
136
-
"from this output directory manually, or specify a new output directory with --output-dir.",
137
-
manifest_path,
138
-
exc,
139
-
)
126
+
logger.error("Cannot read manifest at '%s': %s -- automated rerun cleanup is not possible. To proceed: fix or delete this file and remove previous TaxonoPy output files from this output directory manually, or specify a new output directory with --output-dir.", manifest_path, exc)
"""Return absolute output file path(s) for a single input file.
189
189
190
-
This is the single source of truth for TaxonoPy output file naming.
191
-
Both the generate functions and compute_output_paths use it so that
192
-
naming convention changes need only be made here.
190
+
This is the single source of truth for TaxonoPy output file naming. Both the generate functions and compute_output_paths use it so that naming convention changes need only be made here.
193
191
194
192
Args:
195
193
input_file: Absolute path to the input file.
@@ -222,9 +220,7 @@ def compute_output_paths(
222
220
) ->List[str]:
223
221
"""Return intended output file paths (relative to output_dir) for a resolve run.
224
222
225
-
Used by the manifest system to record files before they are written.
226
-
Does not include fixed outputs such as resolution_stats.json or the
227
-
manifest file itself — callers are responsible for appending those.
223
+
Used by the manifest system to record files before they are written. Does not include fixed outputs such as resolution_stats.json or the manifest file itself — callers are responsible for appending those.
228
224
229
225
Args:
230
226
input_path: The --input argument (file or directory).
0 commit comments