Skip to content

Latest commit

 

History

History
52 lines (43 loc) · 4.78 KB

File metadata and controls

52 lines (43 loc) · 4.78 KB

Repository Guidelines

Project Structure & Module Organization

  • Core engine lives in teadata/engine.py, entities.py, query.py, and geometry.py; they define the DataEngine, entity models, query operators, and spatial helpers.
  • Enrichment logic sits in teadata/enrichment/; database bridges are under teadata/persistence/; utility runners live in teadata/scripts/.
  • Config and sample data references are in teadata/teadata_sources.yaml and teadata/data/. Snapshots are written to .cache/ and are committed; keep .cache artifacts in version control.
  • Tests reside in tests/ (e.g., tests/test_snapshot_loading.py), and example notebooks/scripts are in examples/. Packaging artifacts land in build/ and teadata.egg-info/.

Build, Test, and Development Commands

  • Install for iterative work: uv sync --all-extras (this handles dev, database, and notebook extras automatically).
  • Always run Python commands through uv run and manage packages with uv; avoid bare python or pip to keep environments consistent.
  • Run the suite: uv run pytest or target a test (uv run pytest tests/test_entities.py::test_campus_to_dict_includes_percent_enrollment_change).
  • Optional: build a fresh snapshot from the configured spatial files with uv run python -m teadata.load_data (uses teadata_sources.yaml and writes to .cache/).
  • When code changes affect anything serialized into the snapshot (pickled data), run uv run python -m teadata.load_data as the final task to refresh snapshot artifacts.
  • Packaging sanity check: uv build after a clean git status if you need a wheel/sdist.
  • Linting and type checking: uv run ruff check . and uv run ty check ..

Coding Style & Naming Conventions

  • Python 3.11+, PEP 8 defaults, 4-space indents, and type hints preferred (modules use from __future__ import annotations).
  • Modules and functions use snake_case; classes use PascalCase; keep public attributes stable for pickled snapshots.
  • Keep data loaders/enrichers deterministic: avoid implicit network calls and prefer explicit file paths resolved via config helpers.
  • Use uv run ruff check . and uv run ty check . when available; keep formatting minimal and readable.

Testing Guidelines

  • Add pytest cases under tests/ with test_*.py naming; exercise both happy paths and failure branches (e.g., gzip/no-extension snapshot handling).
  • Use tmp_path/fixtures to avoid touching real data; prefer lightweight synthetic inputs over large datasets.
  • For coverage when changing core logic: pytest --cov=teadata --cov-report=term-missing.
  • Keep assertions specific (types, counts, and key fields) to guard query/enrichment regressions.

Dependency & Tooling Updates

  • When adding a new library or tool, update pyproject.toml (dependencies/extras), refresh the lockfile, and adjust test/tooling configs (pyproject.toml tool sections, pytest coverage settings) so CI stays aligned.

Versioning Policy

  • Release tags: Always use the thousandths place (e.g., v0.0.101, v0.0.102). If no tags exist, start at v0.0.101.
  • Standard SemVer (X.Y.Z): Used for Major, Minor, and Patch updates to the codebase.
  • Data Refreshes (X.Y.Zy): Append an additional digit for data-only updates (e.g., 0.0.71 implies a refresh of version 0.0.7).
  • Release Process: Increment the version in pyproject.toml prior to executing a production load_data.py run or distributing new snapshots.
  • Tag retention: Keep only the three most recent tags/releases; delete older tags and their GitHub release assets everywhere.

Commit & Pull Request Guidelines

  • Follow the existing history: short, imperative subject lines (e.g., “Create .gitattributes”, “Adding gzip support of snapshot repo cache”).
  • PRs should state what changed, why, data/config files touched, and the tests/commands run; link related issues.
  • Exclude local configs (*.local.*) and large data files that are covered by .gitignore; .cache artifacts are tracked and should remain in commits.
  • If behavior changes are user-visible, include a brief reproduction snippet or before/after note in the PR description.

Security & Configuration Tips

  • Keep credentials and machine-local paths out of tracked files; rely on teadata/teadata_sources.yaml and untracked *.local.* overrides.
  • Treat .cache/ artifacts as required repo assets; avoid publishing them outside the repo unless explicitly sanitized.
  • When using optional extras ([database], notebooks), avoid embedding connection strings in code or examples—use env vars or local config instead.

Downstream Dependencies

  • teadata-app consumes the public DataEngine/Query APIs and snapshot loading behavior directly; avoid breaking signatures, return types, or query semantics, and coordinate any behavioral changes with that app before merging.