Skip to content

Commit 273edcb

Browse files
thompsonmjclaude
andcommitted
Remove hard wrapping from all files touched by this PR
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 75691a8 commit 273edcb

6 files changed

Lines changed: 29 additions & 78 deletions

File tree

docs/user-guide/io/cache.md

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# Cache
22

3-
TaxonoPy caches intermediate results (like parsed inputs and grouped entries) to
4-
speed up repeated runs on the same dataset.
3+
TaxonoPy caches intermediate results (like parsed inputs and grouped entries) to speed up repeated runs on the same dataset.
54

65
## Location
76

@@ -40,9 +39,6 @@ This keeps caches isolated across datasets and releases.
4039
- `--cache-stats` — show cache statistics and exit.
4140
- `--clear-cache` — remove cached objects.
4241
- `--refresh-cache` (resolve only) — ignore cached parse/group results.
43-
- `--full-rerun` (resolve only) — clear the input-scoped cache and remove
44-
TaxonoPy-specific output files before rerunning. See [Reruns](reruns.md) for
45-
full details.
42+
- `--full-rerun` (resolve only) — clear the input-scoped cache and remove TaxonoPy-specific output files before rerunning. See [Reruns](reruns.md) for full details.
4643

47-
If you change input files or want to force a clean run, use `--refresh-cache` or
48-
`--full-rerun`.
44+
If you change input files or want to force a clean run, use `--refresh-cache` or `--full-rerun`.

docs/user-guide/io/index.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
# IO
22

3-
TaxonoPy accepts CSV or Parquet inputs with the same schema. Use the pages below
4-
for the exact input columns, the structure of resolved/unsolved outputs, and how
5-
the cache supports provenance and transparency throughout the resolution process.
3+
TaxonoPy accepts CSV or Parquet inputs with the same schema. Use the pages below for the exact input columns, the structure of resolved/unsolved outputs, and how the cache supports provenance and transparency throughout the resolution process.
64

75
- [Input](input.md)
86
- [Output](output.md)

docs/user-guide/io/output.md

Lines changed: 7 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,13 @@ When you run `taxonopy resolve`, TaxonoPy writes two outputs for each input file
55
- **Resolved**: `<input_name>.resolved.<csv|parquet>`
66
- **Unsolved**: `<input_name>.unsolved.<csv|parquet>`
77

8-
The output directory mirrors the input directory structure. Output format is
9-
controlled by the `--output-format` flag (`csv` or `parquet`).
10-
11-
TaxonoPy also writes a manifest file to the output directory before creating
12-
any other files. This manifest lists every file the run intends to produce and
13-
is used by `--full-rerun` to clean up precisely. Each command writes its own
14-
manifest (`taxonopy_resolve_manifest.json` and
15-
`taxonopy_common_names_manifest.json` respectively) so they coexist safely if
16-
both commands share an output directory. See [Reruns](reruns.md) for details.
17-
18-
## What’s Inside
19-
20-
Each output row corresponds to one input record. Resolved entries contain the
21-
standardized taxonomy where available, while unsolved entries preserve the
22-
original input ranks. Both outputs include resolution metadata such as status
23-
and strategy information.
8+
The output directory mirrors the input directory structure. Output format is controlled by the `--output-format` flag (`csv` or `parquet`).
9+
10+
TaxonoPy also writes a manifest file to the output directory before creating any other files. This manifest lists every file the run intends to produce and is used by `--full-rerun` to clean up precisely. Each command writes its own manifest (`taxonopy_resolve_manifest.json` and `taxonopy_common_names_manifest.json` respectively) so they coexist safely if both commands share an output directory. See [Reruns](reruns.md) for details.
11+
12+
## What's Inside
13+
14+
Each output row corresponds to one input record. Resolved entries contain the standardized taxonomy where available, while unsolved entries preserve the original input ranks. Both outputs include resolution metadata such as status and strategy information.
2415

2516
Running through the sample resolution results in the following core files:
2617

docs/user-guide/io/reruns.md

Lines changed: 11 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,7 @@
22

33
## The Guard
44

5-
TaxonoPy checks for existing output before processing. If a prior run is
6-
detected for the current input, it exits with a warning rather than silently
7-
overwriting:
5+
TaxonoPy checks for existing output before processing. If a prior run is detected for the current input, it exits with a warning rather than silently overwriting:
86

97
```
108
Existing cache (...) and/or output (...) detected for this input.
@@ -13,16 +11,12 @@ Rerun with --full-rerun to replace them.
1311

1412
Detection uses two signals:
1513

16-
- the presence of a `taxonopy_resolve_manifest.json` in the output directory
17-
(written by any run using TaxonoPy v0.3.0 or later), or
18-
- `.resolved.*` files in the output directory root (legacy fallback for output
19-
produced by earlier versions).
14+
- the presence of a `taxonopy_resolve_manifest.json` in the output directory (written by any run using TaxonoPy v0.3.0 or later), or
15+
- `.resolved.*` files in the output directory root (legacy fallback for output produced by earlier versions).
2016

2117
## `--full-rerun`
2218

23-
`--full-rerun` is the explicit escape hatch through the guard. It clears the
24-
input-scoped cache namespace and removes all TaxonoPy-specific files from the
25-
output directory before proceeding.
19+
`--full-rerun` is the explicit escape hatch through the guard. It clears the input-scoped cache namespace and removes all TaxonoPy-specific files from the output directory before proceeding.
2620

2721
```console
2822
taxonopy resolve \
@@ -33,23 +27,17 @@ taxonopy resolve \
3327

3428
### What it touches
3529

36-
- **Cache**: the namespace scoped to the current command, TaxonoPy version, and
37-
input fingerprint. Other namespaces (different inputs, different versions) are
38-
not affected.
39-
- **Output files**: only the files listed in `taxonopy_resolve_manifest.json`.
40-
Any other files in the output directory are left untouched.
30+
- **Cache**: the namespace scoped to the current command, TaxonoPy version, and input fingerprint. Other namespaces (different inputs, different versions) are not affected.
31+
- **Output files**: only the files listed in `taxonopy_resolve_manifest.json`. Any other files in the output directory are left untouched.
4132

4233
### What it does not touch
4334

44-
- Files not listed in the manifest — including any non-TaxonoPy files you have
45-
placed in the output directory.
35+
- Files not listed in the manifest — including any non-TaxonoPy files you have placed in the output directory.
4636
- Cache namespaces from other runs.
4737

4838
### No manifest found
4939

50-
If `--full-rerun` is used but no manifest is present (e.g. output from a
51-
pre-v0.3.0 run, or a manually populated directory), TaxonoPy logs a warning
52-
and proceeds without removing any files:
40+
If `--full-rerun` is used but no manifest is present (e.g. output from a pre-v0.3.0 run, or a manually populated directory), TaxonoPy logs a warning and proceeds without removing any files:
5341

5442
```
5543
--full-rerun: no manifest found in <output-dir>; no output files were removed.
@@ -59,13 +47,9 @@ The run then writes fresh output and a new manifest.
5947

6048
## The Manifest
6149

62-
Every TaxonoPy run writes a manifest file to the output directory **before**
63-
creating any output. This means interrupted runs leave a complete record of
64-
what should be cleaned up — `--full-rerun` deletes exactly those files and
65-
nothing else.
50+
Every TaxonoPy run writes a manifest file to the output directory **before** creating any output. This means interrupted runs leave a complete record of what should be cleaned up — `--full-rerun` deletes exactly those files and nothing else.
6651

67-
Manifest files are command-scoped so they coexist safely if multiple commands
68-
share an output directory:
52+
Manifest files are command-scoped so they coexist safely if multiple commands share an output directory:
6953

7054
| Command | Manifest file |
7155
|---|---|
@@ -90,5 +74,4 @@ share an output directory:
9074
}
9175
```
9276

93-
All paths in `files` are relative to the output directory. `cache_namespace`
94-
is `null` for `common-names`, which does not use an input-scoped cache.
77+
All paths in `files` are relative to the output directory. `cache_namespace` is `null` for `common-names`, which does not use an input-scoped cache.

src/taxonopy/manifest.py

Lines changed: 5 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,8 @@
11
"""Manifest tracking for TaxonoPy output files.
22
3-
Each TaxonoPy command writes a manifest file to its output directory listing
4-
every file it intends to produce. The manifest is written before any output
5-
files are created, so interrupted runs leave a complete record of what should
6-
be cleaned up on the next --full-rerun.
3+
Each TaxonoPy command writes a manifest file to its output directory listing every file it intends to produce. The manifest is written before any output files are created, so interrupted runs leave a complete record of what should be cleaned up on the next --full-rerun.
74
8-
Manifest files are command-scoped to avoid collisions when multiple commands
9-
share an output directory.
5+
Manifest files are command-scoped to avoid collisions when multiple commands share an output directory.
106
"""
117

128
import json
@@ -37,8 +33,7 @@ def get_intended_files_for_resolve(
3733
) -> List[str]:
3834
"""Return the full list of files a resolve run intends to write.
3935
40-
Delegates output path naming to compute_output_paths (single source of
41-
truth in output_manager), then appends the fixed outputs.
36+
Delegates output path naming to compute_output_paths (single source of truth in output_manager), then appends the fixed outputs.
4237
4338
Args:
4439
input_path: The --input argument (file or directory).
@@ -63,9 +58,7 @@ def get_intended_files_for_common_names(
6358
) -> List[str]:
6459
"""Return the full list of files a common-names run intends to write.
6560
66-
Output files preserve the input directory structure, so paths are simply
67-
the relative paths of the annotation files. No naming convention is
68-
encoded here.
61+
Output files preserve the input directory structure, so paths are simply the relative paths of the annotation files. No naming convention is encoded here.
6962
7063
Args:
7164
annotation_dir: The --resolved-dir argument.
@@ -130,13 +123,7 @@ def read_manifest(output_dir: str, command: str) -> Optional[dict]:
130123
try:
131124
return json.loads(manifest_path.read_text())
132125
except (OSError, UnicodeDecodeError, json.JSONDecodeError) as exc:
133-
logger.error(
134-
"Cannot read manifest at '%s': %s -- automated rerun cleanup is not possible. "
135-
"To proceed: fix or delete this file and remove previous TaxonoPy output files "
136-
"from this output directory manually, or specify a new output directory with --output-dir.",
137-
manifest_path,
138-
exc,
139-
)
126+
logger.error("Cannot read manifest at '%s': %s -- automated rerun cleanup is not possible. To proceed: fix or delete this file and remove previous TaxonoPy output files from this output directory manually, or specify a new output directory with --output-dir.", manifest_path, exc)
140127
raise
141128

142129

src/taxonopy/output_manager.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -187,9 +187,7 @@ def _resolve_output_paths_for_input(
187187
) -> Tuple[str, ...]:
188188
"""Return absolute output file path(s) for a single input file.
189189
190-
This is the single source of truth for TaxonoPy output file naming.
191-
Both the generate functions and compute_output_paths use it so that
192-
naming convention changes need only be made here.
190+
This is the single source of truth for TaxonoPy output file naming. Both the generate functions and compute_output_paths use it so that naming convention changes need only be made here.
193191
194192
Args:
195193
input_file: Absolute path to the input file.
@@ -222,9 +220,7 @@ def compute_output_paths(
222220
) -> List[str]:
223221
"""Return intended output file paths (relative to output_dir) for a resolve run.
224222
225-
Used by the manifest system to record files before they are written.
226-
Does not include fixed outputs such as resolution_stats.json or the
227-
manifest file itself — callers are responsible for appending those.
223+
Used by the manifest system to record files before they are written. Does not include fixed outputs such as resolution_stats.json or the manifest file itself — callers are responsible for appending those.
228224
229225
Args:
230226
input_path: The --input argument (file or directory).

0 commit comments

Comments
 (0)