Repository Guidelines

Project Structure & Module Organization

Core engine lives in teadata/engine.py, entities.py, query.py, and geometry.py; they define the DataEngine, entity models, query operators, and spatial helpers.
Enrichment logic sits in teadata/enrichment/; database bridges are under teadata/persistence/; utility runners live in teadata/scripts/.
Config and sample data references are in teadata/teadata_sources.yaml and teadata/data/. Snapshots are written to .cache/ and are committed; keep .cache artifacts in version control.
Tests reside in tests/ (e.g., tests/test_snapshot_loading.py), and example notebooks/scripts are in examples/. Packaging artifacts land in build/ and teadata.egg-info/.

Install for iterative work: uv sync --all-extras (this handles dev, database, and notebook extras automatically).
Always run Python commands through uv run and manage packages with uv; avoid bare python or pip to keep environments consistent.
Run the suite: uv run pytest or target a test (uv run pytest tests/test_entities.py::test_campus_to_dict_includes_percent_enrollment_change).
Optional: build a fresh snapshot from the configured spatial files with uv run python -m teadata.load_data (uses teadata_sources.yaml and writes to .cache/).
When code changes affect anything serialized into the snapshot (pickled data), run uv run python -m teadata.load_data as the final task to refresh snapshot artifacts.
Packaging sanity check: uv build after a clean git status if you need a wheel/sdist.
Linting and type checking: uv run ruff check . and uv run ty check ..

Python 3.11+, PEP 8 defaults, 4-space indents, and type hints preferred (modules use from __future__ import annotations).
Modules and functions use snake_case; classes use PascalCase; keep public attributes stable for pickled snapshots.
Keep data loaders/enrichers deterministic: avoid implicit network calls and prefer explicit file paths resolved via config helpers.
Use uv run ruff check . and uv run ty check . when available; keep formatting minimal and readable.

Add pytest cases under tests/ with test_*.py naming; exercise both happy paths and failure branches (e.g., gzip/no-extension snapshot handling).
Use tmp_path/fixtures to avoid touching real data; prefer lightweight synthetic inputs over large datasets.
For coverage when changing core logic: pytest --cov=teadata --cov-report=term-missing.
Keep assertions specific (types, counts, and key fields) to guard query/enrichment regressions.

When adding a new library or tool, update pyproject.toml (dependencies/extras), refresh the lockfile, and adjust test/tooling configs (pyproject.toml tool sections, pytest coverage settings) so CI stays aligned.

Release tags: Always use the thousandths place (e.g., v0.0.101, v0.0.102). If no tags exist, start at v0.0.101.
Standard SemVer (X.Y.Z): Used for Major, Minor, and Patch updates to the codebase.
Data Refreshes (X.Y.Zy): Append an additional digit for data-only updates (e.g., 0.0.71 implies a refresh of version 0.0.7).
Release Process: Increment the version in pyproject.toml prior to executing a production load_data.py run or distributing new snapshots.
Tag retention: Keep only the three most recent tags/releases; delete older tags and their GitHub release assets everywhere.

Follow the existing history: short, imperative subject lines (e.g., “Create .gitattributes”, “Adding gzip support of snapshot repo cache”).
PRs should state what changed, why, data/config files touched, and the tests/commands run; link related issues.
Exclude local configs (*.local.*) and large data files that are covered by .gitignore; .cache artifacts are tracked and should remain in commits.
If behavior changes are user-visible, include a brief reproduction snippet or before/after note in the PR description.

Keep credentials and machine-local paths out of tracked files; rely on teadata/teadata_sources.yaml and untracked *.local.* overrides.
Treat .cache/ artifacts as required repo assets; avoid publishing them outside the repo unless explicitly sanitized.
When using optional extras ([database], notebooks), avoid embedding connection strings in code or examples—use env vars or local config instead.

teadata-app consumes the public DataEngine/Query APIs and snapshot loading behavior directly; avoid breaking signatures, return types, or query semantics, and coordinate any behavioral changes with that app before merging.