A foundation for building, evaluating, and comparing forecasting systems β conventional numerical models, LLM Processes, and agentic forecasters β on real economic, financial, and event-prediction tasks.
The repository pairs a small, stable core library with a set of self-contained reference implementations. The library gives you cutoff-safe data handling, a single Predictor interface, and a backtest/evaluation harness. Each reference implementation is a worked example of a different forecasting problem and the techniques that suit it. Start from whichever one is closest to what you want to build.
π First time here? Run the environment check. After
uv sync(see Setup), openimplementations/getting_started/00_environment_check.ipynband run it top to bottom. It's a self-guided preflight that verifies every capability β proxy LLM inference, Langfuse, E2B code execution, StatCan/FRED data access, and an end-to-end mini backtest β and tells you exactly what to fix when something isn't set up. Do this before anything else.
- Core library β
aieng-forecasting(aieng.forecasting): data services, cutoff enforcement, forecasting tasks, prediction payloads, backtesting, evaluation, and artifacts. - Reusable methods β
aieng.forecasting.methods:Predictorimplementations including naive baselines (continuous, binary, and categorical), Darts numerical predictors, LLM-process predictors (continuous, binary-probability, and categorical-probability), and ADK-based agentic infrastructure (build_adk_agent,AdkTextRunner,AgentPredictor). - Reference implementations β
implementations/<use-case>/: notebooks, helper modules, task-specific configuration, and co-located YAML specs. - Tracing β Langfuse / OpenTelemetry bootstrap (
aieng.forecasting.langfuse_tracing) for LiteLLM and Google ADK. - Data scripts β
scripts/: one fetch script per data source, plusbuild_e2b_template.pyfor the agentic code-execution sandbox.
Every method can be used in one of two modes, and the distinction runs through the library:
- Track 1 β evaluated prediction. Numerical methods, LLM Processes, and agentic forecasters emit standardized
Predictionobjects and are compared head-to-head with the evaluation harness (CRPS, Brier, RPS, calibration). - Track 2 β interactive analysis. The same agents can do scenario analysis, monitoring, open-ended Q&A, code-backed analysis, and reasoning over evidence β useful work that isn't reduced to a single score.
Each is independent and self-contained β pick the one that matches the problem you care about, and read that directory's README.md for the full walkthrough. They are numbered in a recommended order that mirrors the bootcamp progression β conventional numerical methods β LLM Processes β agents β agentic evaluation β but any one stands on its own, so jump straight to the problem you care about.
Start here β #0 getting_started/ β one CPI series, one month ahead. The smallest end-to-end loop: a Predictor, a BacktestSpec and EvalSpec, naive + AutoARIMA baselines, CRPS scoring. The place to learn the evaluation framework before picking a domain below. Also includes 99_repo_concierge.ipynb β a lite-model repo guide for βhow does this codebase work?β questions (uv run adk run implementations/getting_started/concierge_agent from the repo root).
| # | Implementation | The problem | Concepts & techniques it demonstrates |
|---|---|---|---|
| 1 | sp500_forecasting/ |
S&P 500 returns under a macro/market covariate panel. | A head-to-head of conventional numerical methods (naive, ETS, Kalman, AutoARIMA, linear regression, LightGBM) plus a covariate-aware LLM-Process, all reading the same leak-safe covariate panel. Cumulative-return targets at 1/5/21-business-day horizons, CRPS + direction metrics, config-driven specs. |
| 2 | food_price_forecasting/ |
A multivariate food-CPI trajectory, in the style of Canada's Food Price Report. | Nine correlated sub-indices, a 12-step trajectory, a domain metric (avg/avg YoY), baselines vs LLM-Process predictors, leakage-aware backtests, and cached artifacts for fast iteration. |
| 3 | energy_oil_forecasting/ |
Daily WTI crude-oil price under regime-breaking news. | A capability progression β Prophet β LLM-Process β news-grounded agent β code-executing agent β plus an adaptive agent that learns a strategy from data and is scored before vs after. Continuous trajectories, a binary up-shock task, and interactive scenario analysis. |
| 4 | boc_rate_decisions/ |
Will the Bank of Canada cut, hold, or hike at its next meeting? | Discrete-event forecasting: ordered-categorical outcomes on an irregular calendar, RPS scoring and one-vs-rest calibration (instead of CRPS), a binary (Brier) special case, cutoff-aware document ingestion, and an LLM-as-judge that scores an agent's reasoning against the official rationale. |
Not sure where to start building? Each of the four domain implementations above ends with a 99_starter_agent.ipynb β a fresh, hackable starter agent (a starter_agent/ module) with toggleable news search and code execution, two lightweight tool-usage skills, an interactive cell, and one scored forecast. It's the consistent "continue from here" entry point for taking any reference use case in an agentic direction, and a quick end-to-end test of that use case's agent stack.
- StatCan β Canadian CPI and related macroeconomic series.
- FRED β macroeconomic and commodity series.
- yfinance β equities, indices, and commodity futures.
Historical data is cached locally under data/ and is not committed. Each implementation's README names the fetch script(s) it needs.
Several reference implementations (S&P 500, BoC rate decisions) fetch data from the Federal Reserve Economic Data (FRED) API, which requires a free personal API key. We cannot provide this key for you β each participant must request their own at:
FRED keys are free and approval is typically quick, but it can occasionally take some time, so request yours early. When asked for a use-case description, something extended from the following works well:
"Requesting an API key to explore the effectiveness of various forecasting techniques on economic data."
Once you have the key, add it to your repo-root .env:
FRED_API_KEY=your_fred_api_key
On Coder workspaces, bootcamp keys (OPENAI_*, E2B_*, LANGFUSE_*) live in your shell environment β not in repo .env. See Bootcamp environment.
aieng-forecasting/ # Installable library: import as aieng.forecasting
implementations/ # Self-contained reference implementations + co-located specs
scripts/ # Data-fetch scripts + E2B template builder
tests/ # Onboarding integration tests (not run in CI)
planning-docs/ # Architecture notes and the extension/roadmap catalog
playground/ # Exploration and archived demos (not reference implementations)
Install dependencies from the repo root:
git clone <repo-url>. # If running locally. Coder environment setup clones repo automatically.
cd agentic-forecasting
uv sync --devmacOS β LightGBM and OpenMP. The library depends on LightGBM (used by DartsLightGBMPredictor and some notebooks). The PyPI wheel expects OpenMP at runtime. If you see Library not loaded: @rpath/libomp.dylib when importing or training, install Homebrew's OpenMP once and restart your shell or Jupyter kernel:
brew install libompOn Apple Silicon the dylib is typically under /opt/homebrew/opt/libomp/lib/; on Intel Homebrew, /usr/local/opt/libomp/lib/.
When you open a Coder workspace, startup runs automatically in the background. By the time you connect you should have:
- The repo cloned, a Python venv, and dependencies installed
- Bootcamp API keys (
OPENAI_*,E2B_*,LANGFUSE_*) available in your shell (not in.env) - A shell that opens in the repo with the venv activated
Your next step: run 00_environment_check.ipynb top to bottom. That notebook will confirm that startup succeeded.
On first boot, keys are verified against live services and your onboarding status is recorded. Workspace restarts reload keys without re-running the full test suite.
Local machine or troubleshooting β fetch and verify keys manually:
eval "$(onboard --bootcamp-name agentic-forecasting --test-script tests/test_integration.py)"Reload keys in a new shell without re-testing:
eval "$(onboard --bootcamp-name agentic-forecasting --skip-test)"Headless verification (same checks as first-boot onboarding):
uv sync --all-extras --dev --all-packages
uv run pytest tests/test_integration.py -vCredential model: bootcamp keys live in your shell environment. Optional personal keys (e.g. FRED_API_KEY) go in a .env only β see .env.example.
New to the project? Open implementations/getting_started/00_environment_check.ipynb and run it top to bottom. It's a self-guided preflight that checks every major capability β proxy LLM inference, Langfuse, E2B code execution, StatCan/FRED data access, and a full end-to-end mini backtest β one cell at a time, and tells you exactly what to fix when something isn't set up (most often a missing or placeholder key in your .env). It's the fastest way to confirm setup before working through the reference implementations.
Data is fetched once and cached locally (gitignored). Each implementation names the fetch script(s) it needs in its own README.md β for example scripts/fetch_cpi.py (getting started), scripts/fetch_sp500_market.py + scripts/fetch_fred.py (S&P 500), scripts/fetch_wti.py (energy), and scripts/fetch_boc.py and scripts/fetch_boc_press_releases.py (BoC). Run the relevant one before opening that implementation's notebooks:
uv run python scripts/fetch_cpi.pyAgentic forecasters can run code in an E2B cloud sandbox. Credentials for e2b should be automatically injected into the environment for bootcamp participants, and you can confirm successful setup by running 00_environment_check.ipynb.
If this was unsuccessful, or if you prefer to run with E2B in an alternative environment, do this once before enabling code execution in build_adk_agent:
- Create a free account at e2b.dev and copy your API key.
- Add it to your
.envfile alongside the other keys (see.env.example):
E2B_API_KEY=your_e2b_api_key
- Build the template (takes a few minutes on first run):
uv run --env-file .env scripts/build_e2b_template.pyThe template name is the default in CodeExecutionConfig.template_name, so notebooks pick it up automatically.
Predictor is the interface every forecasting method implements:
class MyPredictor(Predictor):
@property
def predictor_id(self) -> str:
return "my_predictor"
def predict(self, task: ForecastingTask, context: ForecastContext) -> list[Prediction]:
series = context.get_series(task.target_series_id)
...
return [Prediction(...)]ForecastContext is cutoff-scoped. Predictors only see observations available as of the forecast origin, which keeps backtests honest.
backtest() is the open iteration loop against historical data. evaluate() is the budgeted protected-window loop.
This repo is a starting point, not a finished product. The shape of a new forecaster is always the same: implement Predictor, declare a spec, and run backtest() / evaluate() to compare it against the baselines. Each reference implementation's README ends with concrete extension ideas; planning-docs/roadmap.md collects the cross-cutting ones (new data sources, additional methods, live forecasting, deeper agent work).
make lint
make formatmake lint runs the expected pre-push quality checks. Git commits do not run hooks locally. To mirror the full pre-commit suite, run:
uv run pre-commit run --all-files- Per-implementation READMEs under
implementations/β the primary user surface. aieng-forecasting/README.mdandaieng-forecasting/aieng/forecasting/methods/README.mdβ the library and the method catalog.planning-docs/roadmap.mdβ architecture principles and extension ideas.
Keep code, notebooks, specs, and these docs in sync when you change behavior, setup, layout, or datasets.