- Core engine lives in
teadata/engine.py,entities.py,query.py, andgeometry.py; they define theDataEngine, entity models, query operators, and spatial helpers. - Enrichment logic sits in
teadata/enrichment/; database bridges are underteadata/persistence/; utility runners live inteadata/scripts/. - Config and sample data references are in
teadata/teadata_sources.yamlandteadata/data/. Snapshots are written to.cache/and are committed; keep.cacheartifacts in version control. - Tests reside in
tests/(e.g.,tests/test_snapshot_loading.py), and example notebooks/scripts are inexamples/. Packaging artifacts land inbuild/andteadata.egg-info/.
- Install for iterative work:
uv sync --all-extras(this handles dev, database, and notebook extras automatically). - Always run Python commands through
uv runand manage packages withuv; avoid barepythonorpipto keep environments consistent. - Run the suite:
uv run pytestor target a test (uv run pytest tests/test_entities.py::test_campus_to_dict_includes_percent_enrollment_change). - Optional: build a fresh snapshot from the configured spatial files with
uv run python -m teadata.load_data(usesteadata_sources.yamland writes to.cache/). - When code changes affect anything serialized into the snapshot (pickled data), run
uv run python -m teadata.load_dataas the final task to refresh snapshot artifacts. - Packaging sanity check:
uv buildafter a cleangit statusif you need a wheel/sdist. - Linting and type checking:
uv run ruff check .anduv run ty check ..
- Python 3.11+, PEP 8 defaults, 4-space indents, and type hints preferred (modules use
from __future__ import annotations). - Modules and functions use
snake_case; classes usePascalCase; keep public attributes stable for pickled snapshots. - Keep data loaders/enrichers deterministic: avoid implicit network calls and prefer explicit file paths resolved via config helpers.
- Use
uv run ruff check .anduv run ty check .when available; keep formatting minimal and readable.
- Add pytest cases under
tests/withtest_*.pynaming; exercise both happy paths and failure branches (e.g., gzip/no-extension snapshot handling). - Use
tmp_path/fixtures to avoid touching real data; prefer lightweight synthetic inputs over large datasets. - For coverage when changing core logic:
pytest --cov=teadata --cov-report=term-missing. - Keep assertions specific (types, counts, and key fields) to guard query/enrichment regressions.
- When adding a new library or tool, update
pyproject.toml(dependencies/extras), refresh the lockfile, and adjust test/tooling configs (pyproject.tomltool sections,pytestcoverage settings) so CI stays aligned.
- Release tags: Always use the thousandths place (e.g.,
v0.0.101,v0.0.102). If no tags exist, start atv0.0.101. - Standard SemVer (X.Y.Z): Used for Major, Minor, and Patch updates to the codebase.
- Data Refreshes (X.Y.Zy): Append an additional digit for data-only updates (e.g.,
0.0.71implies a refresh of version0.0.7). - Release Process: Increment the version in
pyproject.tomlprior to executing a productionload_data.pyrun or distributing new snapshots. - Tag retention: Keep only the three most recent tags/releases; delete older tags and their GitHub release assets everywhere.
- Follow the existing history: short, imperative subject lines (e.g., “Create .gitattributes”, “Adding gzip support of snapshot repo cache”).
- PRs should state what changed, why, data/config files touched, and the tests/commands run; link related issues.
- Exclude local configs (
*.local.*) and large data files that are covered by.gitignore;.cacheartifacts are tracked and should remain in commits. - If behavior changes are user-visible, include a brief reproduction snippet or before/after note in the PR description.
- Keep credentials and machine-local paths out of tracked files; rely on
teadata/teadata_sources.yamland untracked*.local.*overrides. - Treat
.cache/artifacts as required repo assets; avoid publishing them outside the repo unless explicitly sanitized. - When using optional extras (
[database], notebooks), avoid embedding connection strings in code or examples—use env vars or local config instead.
teadata-appconsumes the publicDataEngine/QueryAPIs and snapshot loading behavior directly; avoid breaking signatures, return types, or query semantics, and coordinate any behavioral changes with that app before merging.