Skip to content

Make proxy CA selectable via --proxy-ca flag#50

Open
joaquingx wants to merge 2 commits intomasterfrom
feature/bundle-proxy-ca
Open

Make proxy CA selectable via --proxy-ca flag#50
joaquingx wants to merge 2 commits intomasterfrom
feature/bundle-proxy-ca

Conversation

@joaquingx
Copy link
Copy Markdown
Contributor

@joaquingx joaquingx commented May 5, 2026

Summary

Adds optional support for installing a custom CA certificate into the spider Docker image generated by estela init. This enables spiders to trust internal/private CAs — useful for users behind corporate TLS-inspecting proxies, MITM debugging tools, or air-gapped environments with private PKIs.

Motivation

When a spider runs behind a TLS-intercepting proxy (e.g., mitmproxy, corporate egress proxy, internal CA-signed services), HTTPS requests fail with SSL: CERTIFICATE_VERIFY_FAILED because the proxy's CA isn't in the system trust store. Today, the only workaround is editing the generated Dockerfile-estela by hand after every estela init, which doesn't survive regeneration. This PR makes the CA injection a first-class, declarative option.

Design

  • New assets/certs/ folder can hold any number of bundled CA certs
  • New flag estela init --proxy-ca <name|path|none>:
    • <name> → resolves to assets/certs/<name>.crt (bundled with the CLI)
    • <path> (.crt/.pem, contains /, starts with .) → any local file
    • none → no CA installed (current behavior, no Dockerfile changes)
  • Selection is persisted in estela.yaml as project.proxy_ca so subsequent estela init calls or other tooling can read it
  • Generated Dockerfile copies the cert into the system trust store via update-ca-certificates and exports REQUESTS_CA_BUNDLE / SSL_CERT_FILE / CURL_CA_BUNDLE so requests, Twisted (Scrapy), curl, and Chromium (via NSS) all pick it up
  • The CA install block is conditionally emitted — when proxy_ca = none, the generated Dockerfile is identical to today's
  • New estela list certs command for users to discover what's bundled
  • setup.py package_data updated to ship assets/certs/*.crt

Backwards compatibility

The flag has a default value, so users who don't opt in get a Dockerfile equivalent to today's (no CA installed, no extra layers, no behavior change).

Coverage

  • Scrapy / requests Dockerfile (DOCKERFILE)
  • Selenium Dockerfile (DOCKERFILE_SELENIUM)

Verifications

  • ✅ Template renders with the install block present/omitted match the expected output
  • resolve_proxy_ca_source correctly handles bundled names, absolute paths, relative paths, .pem extensions, and none
  • estela list certs lists bundled certs as expected
  • estela init --help exposes --proxy-ca with the documented default and behavior
  • ✅ End-to-end test on a real K8s deployment: a Scrapy spider built with this CLI completed HTTPS requests through a TLS-intercepting proxy with no certificate errors (tested with ESTELA_PROXY_NAME env var picked up by the existing EstelaProxyMiddleware)
  • docker build of a generated Dockerfile — update-ca-certificates registers the cert and Python's default SSL context picks it up (one extra CA in the trust store, as expected)

Notes

  • The bundled cert is a public CA cert (no private key); only public roots should ever be shipped this way
  • Existing projects need estela init re-run to regenerate Dockerfile-estela and pick up the new template
  • Selenium Dockerfile path verified by template render and visual inspection but not with a full docker build — reviewer welcome to spot-check

joaquingx added 2 commits May 5, 2026 18:32
The Bitmaker MITM proxy presents TLS certs signed by its own CA
(<BM Proxy CA>), which spiders did not trust by default, causing
SSL: CERTIFICATE_VERIFY_FAILED when proxying HTTPS requests.

This change ships the proxy CA cert as a package asset and the
generated Dockerfile-estela now installs it into the system trust
store via update-ca-certificates. REQUESTS_CA_BUNDLE / SSL_CERT_FILE
/ CURL_CA_BUNDLE env vars are set so requests, Twisted (Scrapy)
and curl pick it up; Chromium uses the system NSS store.

Validated end-to-end on hetzner-staging: a Scrapy spider going
through the proxy now completes HTTPS requests successfully.

Bumps version 0.3.3 -> 0.3.4.
Refactors the previous Bitmaker-specific bundling so the CLI is
useful to any user with their own MITM proxy, while keeping the
Bitmaker default for backwards-compatible behavior.

Changes:
- Move assets/bitmaker-proxy-ca.crt to assets/certs/bitmaker.crt;
  the assets/certs/ folder now holds any number of bundled CAs.
- estela init learns a --proxy-ca <name|path|none> option:
    * <name>      -> assets/certs/<name>.crt (bundled)
    * <path>      -> any absolute or relative .crt/.pem file
    * none        -> skip CA installation entirely
  Default: bitmaker (keeps current Bitmaker flow with no extra flag).
- The selected name is persisted as project.proxy_ca in estela.yaml.
- Generated Dockerfile-estela copies the cert as .estela/proxy-ca.crt
  (fixed filename, decoupled from the original cert name) and only
  emits the CA install block when proxy_ca != none.
- New `estela list certs` command lists bundled certs.
- setup.py package_data updated to include assets/certs/*.crt.
@joaquingx joaquingx changed the title Bundle Bitmaker Proxy CA cert in spider Docker images Make proxy CA selectable via --proxy-ca flag May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant