Skip to content

Feat/alpaca data provider v1#24

Open
Alejandro-Duenas wants to merge 13 commits into
ml4t:mainfrom
Alejandro-Duenas:feat/alpaca-data-provider-v1
Open

Feat/alpaca data provider v1#24
Alejandro-Duenas wants to merge 13 commits into
ml4t:mainfrom
Alejandro-Duenas:feat/alpaca-data-provider-v1

Conversation

@Alejandro-Duenas

Copy link
Copy Markdown

Description

Adds an Alpaca Market Data provider supporting US equities and crypto OHLCV bars (daily, hourly, and 1/5/15/30-minute) over Alpaca's historical REST API, with full sync and async paths. One provider serves both asset classes: plain tickers route to the stock bars endpoint, BASE/QUOTE symbols to the crypto endpoint, with next_page_token pagination, per-page retry, and circuit-breaker protection on both transports. Changed-code branch coverage is 100% (2,969 tests green, including the -W error::ResourceWarning leak lane).

Provider implementation:

  • providers/alpaca.py: New AlpacaDataProvider (two-credential header auth, feed iex/sip selection, configurable price adjustment defaulting to Alpaca's raw). Accepts YYYY-MM-DD or RFC-3339 datetime bounds — datetime bounds make sub-day minute/hour windows possible. Retries transient failures per pagination page (a failure on page N never refetches earlier pages) and derives retry_after from 429 Retry-After/X-RateLimit-Reset headers. Validates explicit asset_class values and uppercases symbols into requests and output, preserving the crypto slash. Frequencies cover daily, hourly, and 1/5/15/30-minute bars (15m/15minute-style aliases), making multi-year intraday backfills possible without client-side resampling.

Resilience infrastructure:

  • providers/mixins/circuit_breaker.py: Added CircuitBreaker.call_async and _with_circuit_breaker_async, giving async fetch paths the same state handling and failure accounting as sync ones. The Alpaca async path runs its entire fetch/transform/validate pipeline inside the breaker — an open breaker refuses before any request goes out.

Registration and configuration:

  • managers/provider_manager.py: Registered Alpaca as a keyed provider; env autodetection requires both credentials (a key without a secret is not constructable, so it is not reported available); added a SECRET_FIELDS redaction set so get_provider_info strips api_secret alongside api_key.
  • managers/config_manager.py: Mapped ALPACA_API_KEY/ALPACA_API_SECRET plus Alpaca's own SDK names (APCA_API_KEY_ID/APCA_API_SECRET_KEY) into provider config, with the project-convention names winning on conflict.
  • config/models.py, config/validator.py: Added the alpaca provider type; the validator warns when an Alpaca entry lacks an API key or secret.

Testing:

  • tests/test_alpaca_provider.py: Offline unit suite covering frequency mapping for every supported key and alias, both endpoints and response shapes, pagination and token threading, the full async public path, HTTP error mapping (401/404/429/5xx), transport-failure wrapping with cause preservation, retry counts and fail-fast behavior, breaker accounting, credential autodetection, and secret redaction.
  • tests/integration/test_alpaca.py: Credential-gated smoke tests (integration marker, skipped by default) for daily stock, 15-minute stock, and minute crypto fetches.
  • tests/test_base_provider_enhanced.py: Coverage for CircuitBreaker.call_async success, failure accounting, and half-open recovery.
  • tests/test_provider_registration.py: Registry expectations extended to 12 providers (Alpaca alongside main's new Massive/Polygon entries).

Documentation:

  • docs/providers/alpaca.md: New provider page (pricing, symbol format, frequencies, feed/adjustment semantics, key setup, rate limits).
  • docs/providers/README.md, root README.md, mkdocs.yml, docs/INTEGRATION_TESTING.md: Added Alpaca to the provider tables, nav, env-var examples, and CI secrets list.

Tooling:

  • pyproject.toml: Declared the ml4t namespace known-first-party for ruff's isort so its imports always form their own section after stdlib and third-party blocks; re-sorted the two example scripts accordingly.
  • .gitignore: Ignore local coverage output and planning/working directories.

- Add AlpacaDataProvider with two-header auth on sync and async sessions
- Map daily/hourly/minute frequencies and aliases to Alpaca timeframes
- Declare crypto/intraday capabilities and conform to OHLCVProvider
- Add sync and async single-page stock bar fetch with per-status error mapping
- Inspect status codes directly so 401/403/429/404/500 surface as typed errors
- Transform bars into the canonical OHLCV schema, tolerant of list and dict shapes
- Forward the configured data feed in request params
- Resolve asset class from a BASE/QUOTE slash or an explicit override
- Route crypto symbols to the v1beta3 crypto bars endpoint, omitting feed
- Preserve the BASE/QUOTE symbol verbatim in params and the symbol column
- Fully override fetch_ohlcv/fetch_ohlcv_async to thread asset_class while
  keeping input validation, circuit breaker, and OHLCV validation
- Share one bars-to-DataFrame helper across the stock and crypto branches
- Loop each fetch until next_page_token is null, merging per-page bars
- Extend the stock bar list and concatenate crypto bars per symbol
- Send a fresh params dict per page so the token never rewrites earlier requests
- Acquire one rate-limit token per page, preserving the once-per-fetch guarantee
Wire AlpacaDataProvider into the provider registry, env auto-detection,
config env mapping, public exports, the config-validation warning list, and
the provider catalog tables so provider="alpaca" resolves end-to-end.
- Expand module docstring with API URL, feed/tier, env vars, rate limit, sync and async examples
- Add skipped-by-default integration test fetching AAPL daily and BTC/USD minute
- Update provider registration test to include the registered Alpaca provider

Task: SG-6
- Accept RFC-3339 datetime bounds and validate explicit asset_class
- Retry per pagination page, honoring 429 rate-limit headers
- Run async fetches inside the circuit breaker via new call_async
- Uppercase crypto symbols and expose a stock adjustment option
- Strip api_secret from the sanitized provider info config
- Mark alpaca available only when both key and secret resolve
- Accept APCA_* SDK env aliases in config mapping and autodetect
- Warn when an alpaca config lacks api_secret
- Add fetch_ohlcv_async, per-page retry, and rate-limit header tests
- Cover circuit breaker call_async, lazy init, and registry caching
- Pin two-credential availability and secret redaction behavior
- Dedupe the provider fixture and mock page-response builder
- Add docs/providers/alpaca.md plus README, mkdocs, and env-var entries
- Correct the IEX delay claim and document the raw adjustment default
- Refresh provider counts and catalog line counts
- Declare the ml4t namespace known-first-party so its imports form a
  third section after stdlib and third-party blocks
- Drop stale import-section comments splitting the ml4t block in examples
- Add coverage output and local working dirs to gitignore
- Map 5m/15m/30m and their Nminute aliases to Alpaca timeframes
- Add parametrized mapping tests and a live 15m integration test
- Document the new frequencies and uniform aliases in provider docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant