Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,11 @@ htmlcov/
logs/
site/
src/ml4t/data/_version.py
coverage/

# Claude Code (local development only)
CLAUDE.md
.claude/
.agentic_documentation/
.skills-outputs/
planning/
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ fred = FREDProvider().fetch_series("GDP", "2020-01-01", "2024-12-31")

| Provider | Coverage |
|----------|----------|
| Alpaca | US equities + crypto (free IEX feed) |
| EODHD | 60+ global exchanges |
| Tiingo | US equities with quality focus |
| Twelve Data | Multi-asset coverage |
Expand Down
2 changes: 2 additions & 0 deletions docs/INTEGRATION_TESTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,8 @@ pytest tests/integration/ -v -W default

Required secrets:
```
ALPACA_API_KEY
ALPACA_API_SECRET
CRYPTOCOMPARE_API_KEY
DATABENTO_API_KEY
OANDA_API_KEY
Expand Down
6 changes: 6 additions & 0 deletions docs/providers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ For detailed pricing, terms, and gap analysis, see [PROVIDER_AUDIT.md](PROVIDER_
| Provider | API Key | Free Tier | Best For |
|----------|---------|-----------|----------|
| [YahooFinance](yahoo.md) | No | Unlimited* | Quick start, US equities |
| [Alpaca](alpaca.md) | Yes | 200 req/min (IEX feed) | US equities + crypto, intraday |
| [EODHD](eodhd.md) | Yes | 20 calls/day | Global equities (60+ exchanges) |
| [Tiingo](tiingo.md) | Yes | 1,000 req/day | US equities alternative |
| [Finnhub](finnhub.md) | Yes | 60 req/min | Company metrics, real-time |
Expand Down Expand Up @@ -73,6 +74,7 @@ For detailed pricing, terms, and gap analysis, see [PROVIDER_AUDIT.md](PROVIDER_
| Provider | Minute | Hourly | Daily | Options | Fundamentals |
|----------|--------|--------|-------|---------|--------------|
| YahooFinance | ✅ (7d) | ✅ | ✅ | ❌* | ❌* |
| Alpaca | ✅ | ✅ | ✅ | ❌ | ❌ |
| Databento | ✅ | ✅ | ✅ | ✅ (OPRA) | ❌ |
| Massive | ✅ | ✅ | ✅ | ✅ | ✅ |
| EODHD | ❌ | ❌ | ✅ | ✅ ($29.99) | ✅ ($59.99) |
Expand Down Expand Up @@ -200,6 +202,9 @@ Create a `.env` file in your project root:

```bash
# Free tier providers
ALPACA_API_KEY=your_key_here
ALPACA_API_SECRET=your_secret_here
# (Alpaca's own SDK names APCA_API_KEY_ID / APCA_API_SECRET_KEY also work)
EODHD_API_KEY=your_key_here
TIINGO_API_KEY=your_key_here
FINNHUB_API_KEY=your_key_here
Expand Down Expand Up @@ -251,6 +256,7 @@ See [Incremental Updates Guide](../storage/INCREMENTAL_ARCHITECTURE.md) for deta
| Provider | Split Adjusted | Dividend Adjusted |
|----------|----------------|-------------------|
| YahooFinance | ✅ | ✅ |
| Alpaca | Opt-in (`adjustment=`) | Opt-in (`adjustment=`) |
| EODHD | ✅ | ✅ |
| Tiingo | ✅ | ✅ |
| WikiPrices | ✅ | ✅ |
Expand Down
148 changes: 148 additions & 0 deletions docs/providers/alpaca.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# Alpaca Provider

**Provider**: `AlpacaDataProvider`
**Website**: [alpaca.markets](https://alpaca.markets)
**API Key**: Required (key + secret pair)
**Free Tier**: 200 requests/min, real-time IEX feed

---

## Overview

Alpaca provides long-history, high-frequency US market data across equities and
crypto over a single historical REST API. One provider serves both asset
classes: plain tickers route to the stock bars endpoint, `BASE/QUOTE` symbols
route to the crypto bars endpoint.

**Best For**: Free US intraday equities, US crypto bars

**Pricing**:
| Tier | Price | Features |
|------|-------|----------|
| Basic | $0/mo | 200 req/min, real-time IEX feed, no recent-15-min SIP access |
| Algo Trader Plus | $99/mo | 10,000 req/min, full SIP (consolidated tape) |

---

## Quick Start

```python
import os
os.environ["ALPACA_API_KEY"] = "your_key_here"
os.environ["ALPACA_API_SECRET"] = "your_secret_here"

from ml4t.data.providers import AlpacaDataProvider

provider = AlpacaDataProvider()

# US stocks
df = provider.fetch_ohlcv("AAPL", "2024-01-01", "2024-12-01")

# Crypto (BASE/QUOTE symbol routes to the crypto endpoint)
df = provider.fetch_ohlcv("BTC/USD", "2024-01-01", "2024-01-31")

# Intraday with RFC-3339 datetime bounds
df = provider.fetch_ohlcv(
"BTC/USD", "2024-01-01T00:00:00Z", "2024-01-01T01:00:00Z", frequency="minute"
)

# Multi-year intraday backfills with minute multiples (5m / 15m / 30m)
df = provider.fetch_ohlcv("AAPL", "2021-01-01", "2024-12-31", frequency="15m")

provider.close()
```

Async usage:

```python
async with AlpacaDataProvider() as provider:
df = await provider.fetch_ohlcv_async("AAPL", "2024-01-01", "2024-12-01")
```

---

## Symbol Format

| Asset class | Format | Examples |
|-------------|--------|----------|
| US stocks | Plain ticker | AAPL, MSFT |
| Crypto | BASE/QUOTE | BTC/USD, ETH/USD |

Symbols are uppercased into requests and into the output `symbol` column; the
crypto slash is preserved (e.g. `BTC/USD`).

---

## Supported Frequencies

| Frequency | Availability |
|-----------|--------------|
| `daily` / `1d` | ✅ |
| `hourly` / `1h` | ✅ |
| `minute` / `1m` | ✅ |
| `5m` / `5minute` | ✅ |
| `15m` / `15minute` | ✅ |
| `30m` / `30minute` | ✅ |

`start`/`end` accept `YYYY-MM-DD` dates or RFC-3339 datetimes (both inclusive);
datetime bounds are the natural shape for sub-day minute/hour windows.

---

## Feeds and Adjustment

- `feed="iex"` (default): the free feed, real-time but served from a single
exchange (IEX, roughly 2-3% of US volume). The Basic plan additionally cannot
query the most recent 15 minutes of SIP data.
- `feed="sip"`: the consolidated tape (paid plans).
- `adjustment="raw"` (default, Alpaca's own default): stock bars are **not**
adjusted for splits or dividends. Pass `adjustment="split"`, `"dividend"`, or
`"all"` for adjusted bars. Crypto has no adjustment concept.

```python
provider = AlpacaDataProvider(feed="sip", adjustment="all")
```

---

## API Key Setup

```bash
# .env file (project convention)
ALPACA_API_KEY=your_key_here
ALPACA_API_SECRET=your_secret_here
```

Alpaca's own SDK/CLI names `APCA_API_KEY_ID` / `APCA_API_SECRET_KEY` are also
accepted. Get a free key at [alpaca.markets](https://alpaca.markets/).

---

## Rate Limits

| Tier | Limit |
|------|-------|
| Basic (free) | 200 req/min |
| Algo Trader Plus | 10,000 req/min |

The provider throttles client-side to 200 req/min by default (override with the
`rate_limit` constructor argument), honors 429 `Retry-After`/rate-limit-reset
headers, and retries transient failures per pagination page.

---

## Not Yet Implemented

| Feature | Priority |
|---------|----------|
| Quotes / trades (tick) endpoints | MEDIUM |
| Options bars | LOW |
| News API | LOW |

---

## See Also

- [Alpaca Market Data docs](https://docs.alpaca.markets/us/docs/about-market-data-api)
- [Provider README](README.md)
- [Provider Audit](PROVIDER_AUDIT.md)
4 changes: 1 addition & 3 deletions examples/01_microstructure_tick_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,10 @@
import matplotlib.pyplot as plt
import numpy as np
import polars as pl

from ml4t.engineer.bars import DollarBar, VolumeBar
from ml4t.engineer.features import microstructure as ms
from ml4t.engineer.labeling import BarrierConfig, triple_barrier_labels

# Import qeval modules
# Import qfeatures modules
from ml4t.evaluation import Evaluator, Tier
from ml4t.evaluation.splitters import PurgedWalkForwardCV

Expand Down
5 changes: 2 additions & 3 deletions examples/02_cross_sectional_nasdaq100.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,10 @@
from pathlib import Path

import matplotlib.pyplot as plt

# Import qfeatures modules
import ml4t.engineer as qf
import polars as pl
import seaborn as sns

import ml4t.engineer as qf
from ml4t.evaluation import Evaluator, Tier
from ml4t.evaluation.evaluation.stats import deflated_sharpe_ratio
from ml4t.evaluation.evaluation.viz import create_factor_tearsheet
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ nav:
- Providers:
- providers/index.md
- Yahoo Finance: providers/yahoo.md
- Alpaca: providers/alpaca.md
- CoinGecko: providers/coingecko.md
- FRED: providers/fred.md
- Fama-French: providers/fama_french.md
Expand Down
7 changes: 7 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,13 @@ ignore = [
"B904", # exception chaining (not always needed)
]

[tool.ruff.lint.isort]
# Treat the whole ml4t namespace (data, engineer, evaluation, ...) as
# first-party so its imports always form their own third section after the
# standard library and third-party blocks, even for ml4t distributions
# installed from outside this repo.
known-first-party = ["ml4t"]

[tool.ruff.lint.per-file-ignores]
"tests/*" = ["ARG001", "ARG002", "ARG005", "B017", "F821", "SIM117"] # Test patterns
"src/ml4t/data/futures/continuous_downloader.py" = ["ARG002"] # Public API placeholder
Expand Down
1 change: 1 addition & 0 deletions src/ml4t/data/config/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ class ProviderType(str, Enum):
"""Provider type enumeration."""

YAHOO = "yahoo"
ALPACA = "alpaca"
BINANCE = "binance"
CRYPTOCOMPARE = "cryptocompare"
DATABENTO = "databento"
Expand Down
12 changes: 11 additions & 1 deletion src/ml4t/data/config/validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,11 +65,21 @@ def _validate_providers(self) -> None:
provider_names.add(provider.name)

# Check API keys for providers that need them
if provider.type in ["massive", "polygon", "cryptocompare"] and not provider.api_key:
if (
provider.type in ["massive", "polygon", "cryptocompare", "alpaca"]
and not provider.api_key
):
self.warnings.append(
f"Provider {provider.name} ({provider.type}) may require an API key"
)

# Alpaca authenticates with a key/secret pair, so a missing secret
# is just as fatal as a missing key.
if provider.type == "alpaca" and not provider.api_secret:
self.warnings.append(
f"Provider {provider.name} (alpaca) requires api_secret as well as api_key"
)

# Validate rate limits
if provider.rate_limit and provider.rate_limit.requests_per_second <= 0:
self.errors.append(
Expand Down
8 changes: 7 additions & 1 deletion src/ml4t/data/managers/config_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,14 @@ class ConfigManager:
>>> print(config_mgr.config["providers"]["yahoo"])
"""

# Environment variable to provider config mapping
# Environment variable to provider config mapping. Later entries win when
# two variables map to the same field, so the APCA_* aliases (Alpaca's own
# SDK/CLI names) come before the ALPACA_* project-convention names.
ENV_MAPPING = {
"APCA_API_KEY_ID": ("alpaca", "api_key"),
"APCA_API_SECRET_KEY": ("alpaca", "api_secret"),
"ALPACA_API_KEY": ("alpaca", "api_key"),
"ALPACA_API_SECRET": ("alpaca", "api_secret"),
"CRYPTOCOMPARE_API_KEY": ("cryptocompare", "api_key"),
"DATABENTO_API_KEY": ("databento", "api_key"),
"POLYGON_API_KEY": ("massive", "api_key"),
Expand Down
Loading