From c3064732b448150c364e3d53f60f065aa69a6822 Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Sat, 14 Mar 2026 20:17:38 +0000 Subject: [PATCH] Add comprehensive repository improvement suggestions Co-authored-by: Ben Wilson --- IMPROVEMENT_SUGGESTIONS.md | 295 +++++++++++++++++++++++++++++++++++++ 1 file changed, 295 insertions(+) create mode 100644 IMPROVEMENT_SUGGESTIONS.md diff --git a/IMPROVEMENT_SUGGESTIONS.md b/IMPROVEMENT_SUGGESTIONS.md new file mode 100644 index 0000000..accdcfd --- /dev/null +++ b/IMPROVEMENT_SUGGESTIONS.md @@ -0,0 +1,295 @@ +# Repository Improvement Suggestions + +A comprehensive review of the `forecasting-tools` repository with actionable improvement recommendations, organized by priority and effort. + +--- + +## 1. Testing & Code Quality Infrastructure + +### 1.1 Add Test Coverage Reporting (High Priority, Low Effort) + +There is no test coverage configuration anywhere in the repo. `pytest-cov` is not in dev dependencies, and CI does not report coverage. This means there is no visibility into which parts of the codebase are tested. + +**Recommendations:** +- Add `pytest-cov` to dev dependencies +- Add `[tool.coverage]` configuration in `pyproject.toml` +- Add coverage reporting to the `unit-tests.yaml` CI workflow (e.g., upload to Codecov or Coveralls) +- Consider enforcing a minimum coverage threshold to prevent regressions + +```toml +# pyproject.toml +[tool.coverage.run] +source = ["forecasting_tools"] +omit = ["forecasting_tools/front_end/*", "*/deprecated/*"] + +[tool.coverage.report] +fail_under = 60 +show_missing = true +``` + +### 1.2 Centralize Linting & Formatting Configuration (Medium Priority, Low Effort) + +Black, isort, and ruff settings live only in `.pre-commit-config.yaml`. This means IDE integrations, manual CLI runs, and CI may use different settings than pre-commit. Centralizing into `pyproject.toml` gives a single source of truth. + +**Recommendations:** +- Add `[tool.black]`, `[tool.isort]`, and `[tool.ruff]` sections to `pyproject.toml` +- Pre-commit hooks can then just inherit those settings without duplicating args + +```toml +# pyproject.toml +[tool.black] +line-length = 88 + +[tool.isort] +profile = "black" +line_length = 88 + +[tool.ruff] +line-length = 88 +ignore = ["E741", "E731", "E402", "E711", "E712", "E721"] +``` + +### 1.3 Add a Makefile or Task Runner (Medium Priority, Low Effort) + +There is no `Makefile`, `tox.ini` (despite `tox` being in dev deps), or other task runner. This means contributors must memorize or look up commands for common tasks. + +**Recommendations:** +- Add a `Makefile` with targets for common workflows: + +```makefile +.PHONY: test lint format install + +install: + poetry install + +test: + poetry run pytest ./code_tests/unit_tests + +test-cov: + poetry run pytest ./code_tests/unit_tests --cov=forecasting_tools --cov-report=html + +lint: + poetry run ruff check forecasting_tools/ + +format: + poetry run black forecasting_tools/ code_tests/ + poetry run isort forecasting_tools/ code_tests/ +``` + +- Alternatively, configure `tox` since it is already a dev dependency but unused. + +--- + +## 2. Error Handling & Robustness + +### 2.1 Narrow Broad `except Exception` Catches (High Priority, Medium Effort) + +There are **~85 instances** of `except Exception` across the codebase. While some are intentional fallbacks, many silently swallow errors or catch too broadly, making debugging difficult. + +**Key areas to address:** +- `forecasting_tools/cp_benchmarking/benchmark_for_bot.py` — 8 occurrences +- `forecasting_tools/forecast_bots/forecast_bot.py` — 6 occurrences +- `forecasting_tools/agents_and_tools/ai_congress_v2/congress_orchestrator.py` — 6 occurrences +- `forecasting_tools/data_models/questions.py` — 5 occurrences + +**Recommendations:** +- Replace with specific exception types where possible (e.g., `ValueError`, `KeyError`, `httpx.HTTPError`) +- Where broad catches are intentional, add a comment explaining why +- Consider using `except Exception as e: logger.exception(...)` instead of silent catches + +### 2.2 Validate Environment Variables Before Use (Medium Priority, Low Effort) + +Several environment variables are used without validation. When they are `None`, this produces confusing runtime errors (e.g., `Authorization: Token None` headers). + +**Files affected:** +- `forecasting_tools/ai_models/general_llm.py` — `METACULUS_TOKEN` used in auth header without validation +- `forecasting_tools/util/coda_utils.py` — `CODA_API_KEY` used at class level, can be `None` + +**Recommendation:** Add a helper function for required env var access: + +```python +def require_env(name: str) -> str: + value = os.getenv(name) + if not value: + raise EnvironmentError(f"Required environment variable '{name}' is not set") + return value +``` + +--- + +## 3. Code Organization & Style + +### 3.1 Fix Import Ordering Violations (Low Priority, Low Effort) + +`forecasting_tools/util/coda_utils.py` has code (`logger = logging.getLogger(__name__)`) placed between import blocks, violating PEP 8. All imports should precede module-level code. + +### 3.2 Add Missing Type Hints (Medium Priority, Medium Effort) + +Several public functions are missing return type hints: +- `metaculus_api.py:94` — `delete_question_link` +- `metaculus_client.py:544` — `delete_question_link` +- `coda_utils.py:62,75` — `add_row_to_table`, `check_that_row_matches_columns` +- `app_page.py:54,58` — `header`, `footer` +- `questions.py:792` — `preprocess` +- `base_rate_researcher.py:528` — `count_must_be_non_negative` + +**Recommendation:** Add return type annotations to all public methods. Consider enabling `ruff` rules for type hint enforcement (e.g., `ANN` rules). + +### 3.3 Replace `print()` with `logging` in Non-Test Code (Low Priority, Low Effort) + +At least 26 files use `print()` instead of `logging`. While acceptable in tests, production and script code should use the existing logging infrastructure. + +**Key offenders:** +- `scripts/generate_bot_costs_csv.py` — 5+ print calls +- `scripts/run_benchmarker.py` — mix of print and logger +- Various front-end app pages + +### 3.4 Clean Up `typing` Imports (Low Priority, Low Effort) + +The `conftest.py` imports `Generator` from `typing`, but Python 3.11+ supports `collections.abc.Generator` or just `Generator` from `collections.abc`. Several files import from `typing` where built-in generics work. + +--- + +## 4. Dependency Management + +### 4.1 Pin Pre-Commit Hook Versions More Aggressively (Low Priority, Low Effort) + +Pre-commit hooks are pinned to specific revisions (good), but some are getting stale: +- Black: `24.8.0` (current latest is significantly newer) +- Ruff: `v0.6.7` (current latest is significantly newer) +- isort: `5.13.2` + +**Recommendation:** Set up Dependabot or Renovate to auto-update pre-commit hooks. + +### 4.2 Consider Migrating from Poetry to uv (Low Priority, High Effort) + +`uv` is significantly faster for dependency resolution and installation. Poetry lock resolution can be slow for large dependency trees. This is a lower-priority, higher-effort change to consider for the future. + +### 4.3 Remove Unused Dev Dependencies (Low Priority, Low Effort) + +`tox` is listed as a dev dependency but has no `tox.ini` or `[tool.tox]` configuration. Either configure it or remove it. + +--- + +## 5. CI/CD Improvements + +### 5.1 Add Integration Test CI Job (High Priority, Medium Effort) + +Integration tests exist in `code_tests/integration_tests/` but there is no CI workflow running them. They are only run manually. + +**Recommendation:** Add a scheduled CI workflow (e.g., nightly or weekly) that runs integration tests. Use GitHub Actions scheduled triggers and limit API costs with sampling. + +### 5.2 Add Coverage Reporting to CI (Medium Priority, Low Effort) + +The unit test workflow should report coverage alongside pass/fail. This provides ongoing visibility. + +```yaml +- name: "Run unit tests with coverage" + run: poetry run pytest ./code_tests/unit_tests --cov=forecasting_tools --cov-report=xml + +- name: Upload coverage + uses: codecov/codecov-action@v4 + with: + file: ./coverage.xml +``` + +### 5.3 Run Unit Tests on PRs to `develop` Too (Low Priority, Low Effort) + +The `unit-tests.yaml` workflow only triggers on PRs to `main`. If the team uses a `develop` branch (referenced in `pre-commit.yaml`), unit tests should also run on PRs targeting it. + +### 5.4 Add Dependency Caching for Pre-Commit in CI (Low Priority, Low Effort) + +The `pre-commit.yaml` workflow doesn't cache pre-commit environments. Adding caching can speed up CI: + +```yaml +- uses: actions/cache@v4 + with: + path: ~/.cache/pre-commit + key: pre-commit-${{ hashFiles('.pre-commit-config.yaml') }} +``` + +--- + +## 6. Documentation + +### 6.1 Add Contributing Guidelines (Medium Priority, Low Effort) + +There is no `CONTRIBUTING.md` file. This makes it harder for external contributors to understand: +- How to set up the development environment +- How to run tests +- Code style and conventions +- PR review process + +### 6.2 Add Architecture Documentation (Low Priority, Medium Effort) + +The README covers installation and usage well but lacks an architecture overview. A diagram or document explaining how modules interact (bots -> AI models -> APIs, benchmarking pipeline, etc.) would help new contributors ramp up faster. + +### 6.3 Address Outstanding TODOs (Medium Priority, Varies) + +There are **~35 TODO comments** scattered across the codebase. Some notable ones: + +| File | TODO | +|------|------| +| `forecast_database_manager.py:30` | Move to more robust database | +| `metaculus_client.py:1183` | Timezone for strftime | +| `hosted_file.py:47` | Handle zip bombs (security) | +| `benchmark_displayer.py:580` | Refactor file | +| `general_llm.py:352,380,445` | Citation, reasoning, token tracking | +| `data_models/questions.py:85` | Move `unit_of_measure` to numeric questions | + +**Recommendation:** Triage these into GitHub Issues so they are visible and can be prioritized. The zip bomb handling TODO in particular has security implications. + +--- + +## 7. Security + +### 7.1 Handle Zip Bombs in File Processing (High Priority, Medium Effort) + +There is an explicit TODO at `forecasting_tools/agents_and_tools/other/hosted_file.py:47` about handling zip bombs. This is a security risk if user-supplied files are processed. + +### 7.2 Add API Key Validation at Startup (Medium Priority, Low Effort) + +Rather than failing deep in a call stack when an API key is missing, validate required keys at application startup (or at least at the point of first use with a clear error message). The existing pattern in some files (e.g., `adjacent_news_api.py`) should be applied consistently. + +--- + +## 8. Performance & Scalability + +### 8.1 Use `asyncio` More Consistently (Low Priority, High Effort) + +The codebase uses both sync (`requests`) and async (`aiohttp`) HTTP patterns. Standardizing on async where possible would improve throughput for parallel API calls, which is common in forecasting workflows. + +### 8.2 Add Request Timeouts Everywhere (Medium Priority, Low Effort) + +Some HTTP requests (e.g., in `coda_utils.py`) don't specify explicit timeouts. This can cause the application to hang indefinitely. + +```python +response = requests.post(uri, headers=headers, json=full_payload, timeout=30) +``` + +--- + +## Summary: Prioritized Action Items + +| # | Item | Priority | Effort | +|---|------|----------|--------| +| 1 | Add test coverage reporting | High | Low | +| 2 | Narrow `except Exception` catches | High | Medium | +| 3 | Handle zip bomb TODO (security) | High | Medium | +| 4 | Add integration test CI job | High | Medium | +| 5 | Centralize lint/format config in pyproject.toml | Medium | Low | +| 6 | Validate env vars before use | Medium | Low | +| 7 | Add Makefile / task runner | Medium | Low | +| 8 | Add CONTRIBUTING.md | Medium | Low | +| 9 | Triage TODOs into GitHub Issues | Medium | Low | +| 10 | Add missing type hints | Medium | Medium | +| 11 | Add coverage to CI | Medium | Low | +| 12 | Add request timeouts | Medium | Low | +| 13 | Fix import ordering | Low | Low | +| 14 | Replace print with logging | Low | Low | +| 15 | Update pre-commit hook versions | Low | Low | +| 16 | Remove/configure tox | Low | Low | +| 17 | Run tests on develop branch too | Low | Low | +| 18 | Add architecture docs | Low | Medium | +| 19 | Standardize on async HTTP | Low | High | +| 20 | Consider uv migration | Low | High |