Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
124f369
Set class correctly
ppinchuk May 8, 2026
fb99095
Filter out empty docs
ppinchuk May 8, 2026
59619ae
Merge remote-tracking branch 'origin/main' into pp/split_doc_collection
ppinchuk May 19, 2026
50dddef
`find_jurisdiction_website` can now run without validation
ppinchuk May 19, 2026
38c35a6
Checking for correct jurisdiction can now be disabled via doc attrs
ppinchuk May 19, 2026
cc9e40c
Add reset function for progress bar
ppinchuk May 20, 2026
bbff3a6
MInor update
ppinchuk May 21, 2026
8512367
More flexible `load_config`
ppinchuk May 21, 2026
4cd67ac
generalized `_move_file` a little
ppinchuk May 21, 2026
6bfd39c
Minor update
ppinchuk May 21, 2026
4ca787f
Add `TempFileCacheCopier`
ppinchuk May 22, 2026
2c86554
No extra key top-level
ppinchuk May 22, 2026
80d436f
MInor logic update
ppinchuk May 22, 2026
8b75bac
Fix bug
ppinchuk May 22, 2026
8e5428c
Func returns docs
ppinchuk May 22, 2026
3bf14ce
Add known website
ppinchuk May 22, 2026
84aaf92
Directories now have a collect-only option
ppinchuk May 22, 2026
c3607ee
Add load function for collected docs
ppinchuk May 22, 2026
9fb3cb5
Add arg
ppinchuk May 22, 2026
b0598e3
Add `_load_docs_from_collection_info`
ppinchuk May 22, 2026
710e475
Add `_persist_doc`
ppinchuk May 22, 2026
9006af2
Add `_collection_doc_key`
ppinchuk May 22, 2026
36037a8
Add `_write_collection_manifest`
ppinchuk May 22, 2026
a6fe5a5
Flexibility
ppinchuk May 22, 2026
37c0353
Add `compile_collection_summary_message`
ppinchuk May 22, 2026
5563b4c
Minor
ppinchuk May 22, 2026
003b157
Add `ParsedFileWriter`
ppinchuk May 22, 2026
8b71af5
Add function to namespace
ppinchuk May 22, 2026
4acff6e
Update priority order
ppinchuk May 22, 2026
5d4b4d9
MVP of collect/extract split
ppinchuk May 22, 2026
b063564
Add common cli utils
ppinchuk May 22, 2026
6d1c58c
Add new CLI commands
ppinchuk May 22, 2026
d472015
Process now uses helper
ppinchuk May 22, 2026
a09ddc7
Add new commands
ppinchuk May 22, 2026
b43c775
Update docs
ppinchuk May 22, 2026
a2045d7
Add basic cli test
ppinchuk May 22, 2026
c61cb93
Minor update
ppinchuk May 22, 2026
0e58bda
Better logging
ppinchuk May 22, 2026
779f889
Bump min num chunks to process
ppinchuk May 22, 2026
b37ca0a
Add tests
ppinchuk May 23, 2026
e63215d
Fix tests
ppinchuk May 23, 2026
b7156b6
WIP
ppinchuk May 23, 2026
fee8b5d
Merge remote-tracking branch 'origin/main' into pp/split_doc_collection
ppinchuk May 23, 2026
8d4f0d5
Fix up cli
ppinchuk May 23, 2026
6439af6
Fix nonlocal bug
ppinchuk May 23, 2026
47a3af8
Add `./` to relative paths
ppinchuk May 26, 2026
d0de2a4
Add option to make collected paths relative
ppinchuk May 26, 2026
5c623b5
Use enum
ppinchuk May 26, 2026
c445ede
Minor formatting
ppinchuk May 26, 2026
7b14ec4
Add data classes module
ppinchuk May 27, 2026
2e8ffe8
Add pipeline init
ppinchuk May 27, 2026
0907c9f
Add main compass run function
ppinchuk May 27, 2026
797a217
Add `DocumentExtractionWorkflow`
ppinchuk May 27, 2026
86d930a
Minor update
ppinchuk May 27, 2026
84d37e5
Minor updates
ppinchuk May 27, 2026
f242836
Minor updates
ppinchuk May 27, 2026
20eceab
Add `PipelineRuntime`
ppinchuk May 27, 2026
da20e09
Add `SingleJurisdictionRun`
ppinchuk May 27, 2026
cd41c04
Add `DocumentCollection`
ppinchuk May 27, 2026
711d5dd
Add collection steps
ppinchuk May 27, 2026
b4b53b0
Add `DocumentDeDuplicator`
ppinchuk May 27, 2026
885b041
Add persistence functions
ppinchuk May 27, 2026
a17dbae
Formatting
ppinchuk May 27, 2026
b2d52da
Formatting
ppinchuk May 27, 2026
5a8f64b
Bug fixes
ppinchuk May 27, 2026
1553c75
Wire CLI
ppinchuk May 27, 2026
a49c198
Update arg
ppinchuk May 27, 2026
4e73fef
Add and use enum
ppinchuk May 27, 2026
0f74f38
Move class
ppinchuk May 27, 2026
16664d0
Remove unused file
ppinchuk May 27, 2026
09c3f31
Delete unused module
ppinchuk May 27, 2026
516d0dd
fix function call
ppinchuk May 27, 2026
0d3ad9e
Update docs
ppinchuk May 27, 2026
196c0f3
Move file
ppinchuk May 27, 2026
8a314b8
Minor formatting
ppinchuk May 27, 2026
cafbdf5
Fix tests
ppinchuk May 27, 2026
24a3942
Formatting
ppinchuk May 27, 2026
c2801c8
Formatting
ppinchuk May 27, 2026
0194600
Formatting
ppinchuk May 27, 2026
76f9239
Write pieces of the collection manifest as you go
ppinchuk May 27, 2026
8ca418e
Allow re-building from shards
ppinchuk May 27, 2026
22da35f
Make use of `jurisdiction_dbs`
ppinchuk May 27, 2026
bc19ec9
Add `GenericFuncRunner`
ppinchuk May 27, 2026
425df61
Include `GenericFuncRunner` service
ppinchuk May 27, 2026
9d89483
Don't hardcode shard directory
ppinchuk May 27, 2026
2dadcec
Use `GenericFuncRunner` to run I/O ops in threaded pool
ppinchuk May 27, 2026
0190e79
Use new funcs
ppinchuk May 27, 2026
3c8c3da
Fix tests
ppinchuk May 27, 2026
ae0edf8
Fix windows paths
ppinchuk May 27, 2026
f8af8cf
More cross-platform compatibility
ppinchuk May 27, 2026
488bf05
Fix docs
ppinchuk May 27, 2026
3d3de9d
Merge remote-tracking branch 'origin/main' into pp/split_doc_collection
ppinchuk May 29, 2026
ac81b4e
Minor updates
ppinchuk May 29, 2026
5a795cf
Add `resolve_plugin`
ppinchuk May 29, 2026
fc5e2bf
Merge remote-tracking branch 'origin/main' into pp/split_doc_collection
ppinchuk May 30, 2026
0d47e7b
print report if requested
ppinchuk May 30, 2026
5585cf7
Bump elm dep
ppinchuk May 30, 2026
ba47a2f
Merge remote-tracking branch 'origin/main' into pp/split_doc_collection
ppinchuk May 30, 2026
aec0e7c
Update lockfile
ppinchuk May 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions compass/_cli/collect.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
"""COMPASS CLI collect subcommand"""

import click

from compass._cli.common import run_async_command, OUT_DIR_POLICY_CHOICES
from compass.plugin import create_schema_based_one_shot_extraction_plugin
from compass.pipeline import CollectionRequest
from compass.utilities.io import load_config


@click.command
@click.option(
"--config",
"-c",
required=True,
type=click.Path(exists=True),
help="Path to a collection configuration JSON or JSON5 file. This file "
"should contain any/all the arguments to pass to "
":class:`~compass.pipeline.data_classes.CollectionRequest`.",
)
@click.option(
"-v",
"--verbose",
count=True,
help="Show logs on the terminal.",
)
@click.option(
"-np",
"--no-progress",
is_flag=True,
help="Flag to hide progress bars during collection.",
)
@click.option(
"--plugin",
"-p",
required=False,
default=None,
help="One-shot plugin configuration to add to COMPASS before collection",
)
@click.option(
"--out-dir-exists",
"-o",
required=False,
default=None,
type=click.Choice(OUT_DIR_POLICY_CHOICES, case_sensitive=False),
help="How to handle an existing output directory."
" Choices: fail, increment, overwrite, prompt."
" If omitted, prompts interactively when running in a terminal,"
" or fails when running non-interactively (e.g. CI).",
)
def collect(config, verbose, no_progress, plugin, out_dir_exists):
"""Collect ordinance documents for a list of jurisdictions"""
config = load_config(config)

if plugin is not None:
create_schema_based_one_shot_extraction_plugin(
config=plugin, tech=config["tech"]
)

run_async_command(
config, CollectionRequest, verbose, no_progress, out_dir_exists
)
189 changes: 178 additions & 11 deletions compass/_cli/common.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,99 @@
"""Shared helpers for COMPASS CLI subcommands"""

import sys
import shutil
import asyncio
import logging
import warnings
import contextlib
import multiprocessing
from pathlib import Path

import click
from rich.console import Console
from rich.live import Live
from rich.logging import RichHandler
from rich.theme import Theme

from compass.pb import COMPASS_PB
from compass.utilities.logs import AddLocationFilter
from compass.pipeline.coordinator import run_compass


def setup_cli_logging(console, verbosity_level, log_level="INFO"):
"""Attach a Rich log handler to selected libraries
OUT_DIR_POLICY_CHOICES = ["fail", "increment", "overwrite", "prompt"]


def run_async_command(
config, request_class, verbose, no_progress, out_dir_exists=None
):
"""Run a COMPASS async command with shared CLI behavior

Parameters
----------
console : rich.console.Console
Console instance used by the Rich log handler.
verbosity_level : int
Number of ``-v`` flags supplied on the command line. Each
increment opts an additional set of libraries into terminal
logging.
log_level : str, optional
Log level applied to each attached library logger and handler.
By default, ``"INFO"``.
config : dict
Configuration dictionary passed as keyword arguments to
`command`. This mapping must include an ``"out_dir"`` entry,
which is resolved according to `out_dir_exists` before command
execution.
request_class : callable
The COMPASS request class to instantiate and pass to the command
function, e.g.
:class:`~compass.pipeline.data_classes.CollectionRequest`.
verbose : int
CLI verbosity level controlling which library loggers are shown
in the console. Higher values enable logs from more underlying
libraries.
no_progress : bool
Option to disable the Rich live progress display. If ``True``,
the command is executed directly without attaching COMPASS
progress bars.
out_dir_exists : str, optional
Policy controlling how an existing output directory should be
handled. Supported values are ``"fail"``, ``"increment"``,
``"overwrite"``, and ``"prompt"``. If ``None``, the policy is
chosen automatically based on whether the session is
interactive. By default, ``None``.
"""
custom_theme = Theme({"logging.level.trace": "rgb(94,79,162)"})
console = Console(theme=custom_theme)

setup_cli_logging(
console, verbose, log_level=config.get("log_level", "INFO")
)

config["out_dir"] = _resolve_out_dir_conflict(
config["out_dir"], out_dir_exists
)

with contextlib.suppress(RuntimeError):
multiprocessing.set_start_method("spawn")

loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)

request = request_class(**config)
if no_progress:
loop.run_until_complete(run_compass(request))
return

warnings.filterwarnings("ignore")

COMPASS_PB.console = console
live_display = Live(
COMPASS_PB.group,
console=console,
refresh_per_second=20,
transient=True,
)
with live_display:
run_msg = loop.run_until_complete(run_compass(request))

console.print(run_msg)
COMPASS_PB.console = None


def setup_cli_logging(console, verbosity_level, log_level="INFO"):
"""[NOT PUBLIC API] Setup logging for CLI"""
libs = []
if verbosity_level >= 1:
libs.append("compass")
Expand Down Expand Up @@ -49,3 +121,98 @@ def setup_cli_logging(console, verbosity_level, log_level="INFO"):
handler.addFilter(AddLocationFilter())
logger.addHandler(handler)
logger.setLevel(log_level)


def _resolve_out_dir_conflict(out_dir, policy):
"""Handle existing output directory using the selected policy"""
out_dir = Path(out_dir)
policy = _resolve_out_dir_policy(policy)

if not out_dir.exists() or policy == "fail":
return out_dir

if policy == "increment":
new_out_dir = _next_versioned_directory(out_dir)
click.echo(
"Output directory exists. "
f"Using incremented directory: {new_out_dir!s}"
)
return new_out_dir

if policy == "overwrite":
click.echo(f"Overwriting existing output directory: {out_dir!s}")
shutil.rmtree(out_dir)
return out_dir

if policy == "prompt":
return _resolve_prompt_out_dir_conflict(out_dir)

msg = (
f"Unknown out_dir_exists policy '{policy}'. "
f"Supported values: {OUT_DIR_POLICY_CHOICES}."
)
raise click.ClickException(msg)


def _next_versioned_directory(out_dir):
"""Create the next available output directory with versioning"""
idx = 2
max_idx = 1_000_000
while idx <= max_idx:
candidate = out_dir.parent / f"{out_dir.name}_v{idx}"
if not candidate.exists():
return candidate
idx += 1

msg = (
f"Unable to find an available versioned directory for '{out_dir!s}' "
f"up to suffix _v{max_idx}."
)
raise click.ClickException(msg)


def _resolve_out_dir_policy(policy):
"""Resolve output directory policy from explicit input

Falls back to terminal mode defaults when no policy is set.
"""
if policy is not None:
return policy.lower()
if sys.stdin.isatty():
return "prompt"
return "fail"


def _resolve_prompt_out_dir_conflict(out_dir):
"""Handle interactive prompt flow for existing output directory"""
if not sys.stdin.isatty():
msg = (
"Cannot use out_dir_exists='prompt' in non-interactive mode. "
"Use one of: fail, increment, overwrite."
)
raise click.ClickException(msg)

create_incremented = click.confirm(
f"Output directory '{out_dir!s}' already exists. "
"Create a new incremented directory automatically?",
default=True,
)
if create_incremented:
new_out_dir = _next_versioned_directory(out_dir)
click.echo(f"Using incremented directory: {new_out_dir!s}")
return new_out_dir

overwrite = click.confirm(
f"Overwrite '{out_dir!s}' by deleting it and continuing?",
default=False,
)
if overwrite:
click.echo(f"Overwriting existing output directory: {out_dir!s}")
shutil.rmtree(out_dir)
return out_dir

msg = (
"Run cancelled. Please update out_dir in config, or rerun with "
"--out_dir_exists increment/overwrite."
)
raise click.ClickException(msg)
62 changes: 62 additions & 0 deletions compass/_cli/extract.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
"""COMPASS CLI extract subcommand"""

import click

from compass._cli.common import run_async_command, OUT_DIR_POLICY_CHOICES
from compass.plugin import create_schema_based_one_shot_extraction_plugin
from compass.pipeline import ExtractionRequest
from compass.utilities.io import load_config


@click.command
@click.option(
"--config",
"-c",
required=True,
type=click.Path(exists=True),
help="Path to an extraction configuration JSON or JSON5 file. This file "
"should contain any/all the arguments to pass to "
":class:`~compass.pipeline.data_classes.ExtractionRequest`.",
)
@click.option(
"-v",
"--verbose",
count=True,
help="Show logs on the terminal.",
)
@click.option(
"-np",
"--no-progress",
is_flag=True,
help="Flag to hide progress bars during extraction.",
)
@click.option(
"--plugin",
"-p",
required=False,
default=None,
help="One-shot plugin configuration to add to COMPASS before extraction",
)
@click.option(
"--out-dir-exists",
"-o",
required=False,
default=None,
type=click.Choice(OUT_DIR_POLICY_CHOICES, case_sensitive=False),
help="How to handle an existing output directory."
" Choices: fail, increment, overwrite, prompt."
" If omitted, prompts interactively when running in a terminal,"
" or fails when running non-interactively (e.g. CI).",
)
def extract(config, verbose, no_progress, plugin, out_dir_exists):
"""Extract structured data from a saved collection manifest"""
config = load_config(config)

if plugin is not None:
create_schema_based_one_shot_extraction_plugin(
config=plugin, tech=config["tech"]
)

run_async_command(
config, ExtractionRequest, verbose, no_progress, out_dir_exists
)
6 changes: 3 additions & 3 deletions compass/_cli/finalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from compass.utilities.io import load_config
from compass.utilities.jurisdictions import Jurisdiction
from compass.utilities.finalize import save_run_meta, doc_infos_to_db, save_db
from compass.scripts.process import _initialize_model_params
from compass.pipeline.coordinator import _build_models


@click.command
Expand All @@ -21,7 +21,7 @@
type=click.Path(exists=True),
help="Path to COMPASS run configuration JSON or JSON5 file. This file "
"should contain any/all the arguments to pass to "
":func:`compass.scripts.process.process_jurisdictions_with_openai`. "
":class:`~compass.pipeline.data_classes.ProcessRequest`. "
"The output directory that this config points to will be finalized.",
)
def finalize(config):
Expand Down Expand Up @@ -57,7 +57,7 @@ def finalize(config):
console = Console(theme=custom_theme)
console.print(f"Finalizing COMPASS run in {dirs.out!s}...")

models = _initialize_model_params(config.get("model", "gpt-4o-mini"))
models = _build_models(config.get("model", "gpt-4o-mini"))
start_datetime = datetime.fromtimestamp(dirs.out.stat().st_ctime)
end_datetime = datetime.fromtimestamp(jurisdictions_fp.stat().st_mtime)

Expand Down
4 changes: 4 additions & 0 deletions compass/_cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
import click

from compass import __version__
from compass._cli.collect import collect
from compass._cli.extract import extract
from compass._cli.process import process
from compass._cli.finalize import finalize
from compass._cli.search import search
Expand All @@ -16,6 +18,8 @@ def main(ctx):
ctx.ensure_object(dict)


main.add_command(collect)
main.add_command(extract)
main.add_command(process)
main.add_command(finalize)
main.add_command(search)
Expand Down
Loading
Loading