Skip to content

RFC: Domain-Scoped mTLS for GoRouter#1438

Open
rkoster wants to merge 12 commits intomainfrom
rfc-app-to-app-mtls-routing
Open

RFC: Domain-Scoped mTLS for GoRouter#1438
rkoster wants to merge 12 commits intomainfrom
rfc-app-to-app-mtls-routing

Conversation

@rkoster
Copy link
Contributor

@rkoster rkoster commented Feb 17, 2026

Summary

This RFC proposes enabling per-domain mutual TLS (mTLS) on GoRouter with optional identity extraction and authorization enforcement.

View the full RFC

This infrastructure supports multiple use cases:

  • CF app-to-app routing: Authenticated internal communication via apps.mtls.internal
  • External client certificates: Partner integrations, IoT devices on specific domains
  • Cross-CF federation: Secure communication between CF installations

Key Points

  • No new infrastructure: Uses existing GoRouter with domain-specific mTLS configuration
  • Default-deny security: For CF app-to-app routing, routes blocked unless explicitly allowed
  • RFC-0027 compliant: Uses flat route options (mtls_allowed_apps, mtls_allowed_spaces, mtls_allowed_orgs, mtls_allow_any)
  • Layered authorization: Domain-level (operator) + route-level (developer) access control

Implementation Phases

  • Phase 1a (mTLS Infrastructure): GoRouter validates client certificates for configured domains
  • Phase 1b (Authorization): CF identity extraction and per-route access control
  • Phase 2 (Optional): Egress HTTP proxy for simplified client adoption

Draft Implementation PRs

cc @cloudfoundry/toc @cloudfoundry/wg-app-runtime-interfaces

@rkoster rkoster force-pushed the rfc-app-to-app-mtls-routing branch from bce2e1d to 5557aeb Compare February 17, 2026 15:47
@rkoster rkoster added toc rfc CFF community RFC labels Feb 17, 2026
@rkoster rkoster requested review from a team, Gerg, beyhan, cweibel and stephanme and removed request for a team February 17, 2026 15:55
@silvestre
Copy link
Member

I really like this proposal.

Just to be sure: It would be still possible to have an apps.mtls.internal route allowing access for any source app, so that the authorization check could be done in the app, right?

One use-case would be in the app-autoscaler service, where we expose an mTLS endpoint but check the authorization by determining if the app is bound to an autoscaler service instance, which is dynamic information we could not determine during route creation.

Enable authenticated and authorized app-to-app communication via GoRouter
using mutual TLS (mTLS). Applications connect to a shared internal domain
(apps.mtls.internal), where GoRouter validates client certificates and
enforces per-route access control using a default-deny model.

Key features:
- Phase 1a: Domain-specific mTLS in GoRouter (validates instance identity)
- Phase 1b: Authorization enforcement via allowed_sources route option
- Phase 2 (optional): Egress HTTP proxy for simplified client adoption

Depends on RFC-0027 (Generic Per-Route Features) for route options support.
@rkoster rkoster force-pushed the rfc-app-to-app-mtls-routing branch from 5557aeb to 8f3900a Compare February 17, 2026 19:04
@rkoster
Copy link
Contributor Author

rkoster commented Feb 17, 2026

I really like this proposal.

Just to be sure: It would be still possible to have an apps.mtls.internal route allowing access for any source app, so that the authorization check could be done in the app, right?

One use-case would be in the app-autoscaler service, where we expose an mTLS endpoint but check the authorization by determining if the app is bound to an autoscaler service instance, which is dynamic information we could not determine during route creation.

@silvestre I have update the RFC with:

applications:
- name: autoscaler-api
  routes:
  - route: autoscaler.apps.mtls.internal
    options:
      allowed_sources:
        any: true

@theghost5800
Copy link
Contributor

This idea is really interesting but will be possible to have communication to app containers on different ports or different protocol than http?

Give credit to Beyhan and Max for the initial work on this RFC
@rkoster
Copy link
Contributor Author

rkoster commented Feb 19, 2026

This idea is really interesting but will be possible to have communication to app containers on different ports or different protocol than http?

This RFC currently focuses on HTTP traffic via GoRouter, but non-HTTP protocol support is an interesting future direction.

Current constraints:

  • GoRouter uses Go's httputil.ReverseProxy which handles HTTP semantics (headers, paths, etc.)
  • Caller identity is forwarded via the XFCC HTTP header, which doesn't exist for raw TCP
  • GoRouter does not currently support HTTP CONNECT method for tunneling

What would be needed for non-HTTP support:

  1. HTTP CONNECT tunneling in GoRouter: GoRouter would need to detect CONNECT requests, validate mTLS + allowed_sources, then hijack the connection and relay raw TCP bytes to the backend. The pattern exists (similar to WebSocket upgrades), but would require new implementation.
  2. Identity forwarding challenge: Inside a TCP tunnel there's no XFCC header. Options include:
  • PROXY protocol v2 (GoRouter sends client cert info as TLV before the TCP stream)
  • Backend also requires mTLS and validates the client cert directly
  • Application-layer identity (less secure)
  1. Envoy egress proxy: The Phase 2 egress proxy (Envoy) already supports HTTP CONNECT tunneling, so apps could potentially use CONNECT backend.apps.mtls.internal:5432 to tunnel arbitrary protocols. But GoRouter still needs to support CONNECT for this to work end-to-end.

For now, this is out of scope to keep the RFC focused and achievable. But feel free to create a follow up RFC for Non-HTTP use cases.

@beyhan beyhan moved this from Inbox to In Progress in CF Community Feb 24, 2026
- `any: true` is mutually exclusive with `apps`, `spaces`, and `orgs`
- If `any` is not set, at least one of `apps`, `spaces`, or `orgs` must be specified (default-deny)

This builds on the route options framework from [RFC-0027: Generic Per-Route Features](rfc-0027-generic-per-route-features.md). Phase 1b depends on RFC-0027 being implemented first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful with what we add to the per-route features. Each and every option we set in there will be transmitted via the NATS bus every 20s (unless adjusted by the operator) from each app instance to each gorouter instance. Even a slight increase in message size can have quite noticeable effects on the overall bandwidth consumption.

One of the thoughts we had, is to only allow very simple rules. Specifically allow from instances of the same app, from all apps within a space, from all apps within an org. This could be a simple enum option allowed_scope: app / space / org (name is just for illustration purposes) which would keep the size within a predictable scope.

If we go for a more flexible option we must introduce strict limits on the number of bytes each user is allowed to add to each route.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this concern, NATS bandwidth is an important consideration.

However, I'd argue this is a generic problem that should be addressed in RFC-0027 (Generic Per-Route Features), not by limiting the functionality of individual features built on top of it. RFC-0027 already acknowledges this at line 26-27:

"Other components MAY limit the size of the map or size of keys / values for technical reasons."

The proposed allowed_scope: app | space | org enum would fundamentally change the security model:

  • It only supports "relative" policies (same app/space/org as the target app)
  • It doesn't support cross-boundary access (app in org A calling app in org B)
  • It doesn't support specific allowlisting ("only apps X, Y, Z can call me")

These cross-boundary and specific-app use cases are core to what makes this feature valuable. For example, a shared service in one org needs to accept calls from apps in multiple other orgs - the enum approach can't express this.

Regarding limits: I agree that some limit on route options size makes sense, but it should be a single global limit on the total route options size, configured by the operator. This keeps the concern in RFC-0027 where it belongs, rather than requiring each individual route option to implement its own size restrictions.

Operators already tune bandwidth-related settings like route_emitting_interval based on their deployment characteristics. A configurable max size for route options would follow the same pattern - operators can balance flexibility vs bandwidth based on their specific needs.

Proposed path forward:

  1. Add a note to RFC-0027 specifying an operator-configurable global size limit for route options
  2. Keep the flexible allowed_sources design in this RFC

@maxmoehl shall I take a stab at creating a PR for the global route options size limit, or do you have fundamental concerns with this approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me! The adjustment to RFC-0027 should target CC so that the user gets immediate feedback instead of some internal routing component which silently fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxmoehl PR has been created: #1447

@cweibel
Copy link

cweibel commented Mar 3, 2026

First, I really like the idea behind this RFC.

I have a unique constraint where I need a fine grained access control at the org level on whether app-to-app mtls communications are allowed. For instance, at the platform layer, I need to enforce app-to-app mtls between organizations is not allowed, but within a space it would be, meaning you would need to be a Space Developer in both spaces.

@rkoster
Copy link
Contributor Author

rkoster commented Mar 5, 2026

Implementation Update

Draft PRs implementing Phase 1 (1a + 1b):

Just a note about the PRs, I have not yet reviewed them myself, just wanted to get something functional.

Tested end-to-end on BOSH-Lite.


Finding: Route Options Format

RFC-0027 doesn't allow nested objects/arrays in route options. We adapted to a flat format:

// Instead of nested mtls_allowed_sources: {apps: [...]}
{"mtls_allowed_apps": "guid1,guid2", "mtls_allowed_spaces": "space-guid", "mtls_allow_any": true}

Should the RFC be updated to reflect this, or should RFC-0027 be extended?


Open Issue: Application Security Groups

Apps need to reach GoRouter on port 443, but default ASGs block internal IPs. Currently requires manual security group creation with router IPs.

Proposal: Auto-manage ASG via BOSH link when feature flag is enabled. This is not blocking (manual workaround exists) but improves operator experience.

@rkoster
Copy link
Contributor Author

rkoster commented Mar 5, 2026

Also recorded a demo here: https://asciinema.org/a/zLXrO9ERP3lXqGuM, but this was before refactoring to flat options (still uses the nested structure, which is why cf curl is being used).

@rkoster rkoster force-pushed the rfc-app-to-app-mtls-routing branch from a28cef8 to a8b4db1 Compare March 6, 2026 07:42
@rkoster rkoster force-pushed the rfc-app-to-app-mtls-routing branch from 0283b49 to 50ee8a5 Compare March 6, 2026 07:57
rkoster added a commit that referenced this pull request Mar 6, 2026
Add a new Size Limits section specifying that Cloud Controller must
enforce a configurable maximum size (default: 1 KB) for route options
to prevent excessive NATS bandwidth consumption.

- Default limit: 1024 bytes
- Configurable via cc.max_route_options_size BOSH property
- CC returns HTTP 422 when limit is exceeded
- Documents relationship with route emit interval for tuning

This addresses feedback from the App-to-App mTLS RFC (PR #1438) where
concerns were raised about NATS bandwidth impact of per-route options.
@Gerg
Copy link
Member

Gerg commented Mar 10, 2026

This option feels more awkward to me since:

  1. It's pretty divergent from CF's existing container networking infrastructure. It'd effectively be a parallel implementation of kind of the same thing without adhering to existing concepts like network policy.
  2. There is a less clear path for supporting non-HTTP protocols. Unless we're going to extend TCP router to implement the same behavior 😉 .

rkoster added 6 commits March 10, 2026 14:33
…lidation

- Use full paths: authorization.config.scope, authorization.config.orgs/spaces
- Clarify validation happens during BOSH deployment (not GoRouter startup)
- Sync with latest RFC changes (renamed to Domain-Scoped mTLS for GoRouter)
- Simplified layered authorization diagram (horizontal mermaid)
- Use RFC-0027 compliant flat route options: mtls_allowed_apps, mtls_allowed_spaces, mtls_allowed_orgs, mtls_allow_any
- Moved Phase 2 (Egress HTTP Proxy) after Phase 1b
- Removed redundant Security Model section
- Consolidated appendix with subheadings (C2C comparison, Configuration Examples, References)
- Merged repetitive scope examples into single YAML block
- Updated validation rules with full config paths (authorization.config.scope)
- Clarified validation happens during BOSH deployment
@rkoster rkoster changed the title RFC: App-to-App mTLS via GoRouter RFC: Domain-Scoped mTLS for GoRouter Mar 10, 2026
@rkoster
Copy link
Contributor Author

rkoster commented Mar 10, 2026

RFC Update: Scope Refinement

This update refactors the RFC based on feedback and implementation learnings.

Renamed to "Domain-Scoped mTLS for GoRouter"

The RFC now positions this as a generic per-domain mTLS feature that enables three use cases:

  1. CF app-to-app routing (the original motivation)
  2. External client certificate validation (partner integrations, IoT)
  3. Cross-CF federation

This framing makes it clearer that Phase 1a alone is useful for external client cert validation, while Phase 1b adds CF-specific identity and authorization.

RFC-0027 Alignment

Updated route options to use the flat format required by RFC-0027 (which doesn't support nested objects):

# Before (nested - not supported by RFC-0027)
options:
  allowed_sources:
    apps: ["guid1", "guid2"]

# After (flat - RFC-0027 compliant)
options:
  mtls_allowed_apps: "guid1,guid2"
  mtls_allowed_spaces: "space-guid"
  mtls_allowed_orgs: "org-guid"
  mtls_allow_any: true

File Renamed

rfc-draft-app-to-app-mtls-routing.mdrfc-draft-domain-scoped-mtls-gorouter.md

@ameowlia
Copy link
Member

ameowlia commented Mar 10, 2026

I like the idea of:

  1. leveraging gorouter 2. making c2c mtls easier to for users 3. having this alongside the current c2c setup 4. enabling per app mtls even for external users.

Some areas that I have issues with...

  1. I think this is implicit in the RFC, but not explicit: all of this should be opt-in.
  2. Configuring who can access a domain at both the operator and the developer level seems confusing.
  3. Using guids for orgs and spaces at deploy time for configuration seems less than ideal and hard for humans to read.
  4. Same as (2), but with using guids for org/spaces/apps in app manifests.
  5. You have waded into a long standing debate about who should be able to create network policies. Currently (as you said) c2c requires a user to have network.write. What you didn't write is that it requires users to have network.write in BOTH spaces if the apps talking are in different spaces. This is a high bar. Many users we interviewed when designing network policies wanted this high bar. Some wanted a low bar, which is why we let users turn off c2c completely if they want and we allow for enable_space_developer_self_service: true. Users should not have the ability to create these policies by default.
  6. Network policies in app manifest is hard. Given that it can require special permission in multiple spaces to add network policies, what happens if the user pushing doesn't have permissions to change policies? Does the app fail? Etc, there are lots of issues around this.
  7. How do you plan to update the DNS so that the special domains will route to gorouter?
  8. How do you plan to have envoy intercept requests? This would be powerful, but I am wary of this. We tried to do this many years ago and it interfered with too much other traffic and caused issues.
  9. I agree with Max that nats messages might get very large. in my experience some users make a TON of c2c policies to one main app. Even if we tell them to use org/space level, I know this will happen. This change should require scale tests and/or limits.

@rkoster
Copy link
Contributor Author

rkoster commented Mar 11, 2026

@ameowlia Thanks for the feedback!

  1. Opt-in: Yes, this is entirely opt-in. The feature requires operators to explicitly configure mtls_domains (likely via an ops-file in cf-deployment). Without this configuration, route options like mtls_allowed_apps only produce
    warnings in application logs—they have no effect.

  2. & 3. GUIDs vs names in BOSH config: The authorization.config.orgs/spaces uses GUIDs because this mirrors how C2C network policies work—the CLI does the translation, but the underlying data is GUID-based. For the cross-CF
    federation use case, we can't do dynamic lookups since GUIDs from remote installations have no local meaning. The operator-level config is also rarely changed after initial setup.

  3. GUIDs in app manifests: Agreed this is a UX limitation. The CLI may provide translation (similar to cf add-network-policy), but the manifest would need GUIDs. This is consistent with how network policies work today.

  4. Permissions model: This is where operator control via authorization.config becomes important. Operators can lock down access with:

    • scope: space — apps can only call apps in the same space
    • scope: org — apps can only call apps in the same org
    • spaces: [...] / orgs: [...] — explicit allowlist of spaces/orgs

    Developers can only restrict further within these boundaries. If an operator wants the same high bar as C2C, they can set scope: space.

    Limitation: CC RBAC is not available in GoRouter, and we don't track who set the policy. The permission check is on the target route (user needs route-update permission), but there's no permission check on the client app side. This
    differs from C2C which requires network.write in both spaces. Operators who want stricter control can use scope: space or scope: org at the domain level, which enforces the boundary regardless of what developers configure.

  5. Manifest push without permissions: The route options are applied via the routes API. If the domain doesn't have authorization.mode: cf_identity, the options are stored but not enforced—warnings appear in router logs. The push
    doesn't fail.

  6. DNS: The RFC assumes BOSH DNS with wildcard alias support (e.g., _.apps.mtls.internal → router instances). Apps also need an ASG rule to reach GoRouter on port 443. Future improvement: DNS-based ASG automation.

  7. Envoy interception (Phase 2): This is opt-in for developers—they explicitly set HTTP_PROXY=http://127.0.0.1:8888. No traffic is intercepted without the app opting in.

  8. NATS message size: Good point about scale. This is being addressed in RFC-0027: Add route options size limit #1447 which proposes moving route metadata to a separate endpoint.

Addresses feedback that GoRouter lacks target app's org/space info in
route registration messages, making scope: org/space unimplementable.

Changes:
- Remove scope: any/org/space option from authorization.config
- Simplify to only orgs: or spaces: (mutually exclusive) for caller
  restrictions
- Default: if config omitted, any authenticated caller passes domain
  level (route-level mtls_allowed_* still applies)
- Add 'Org-scoped internal domains' example to appendix showing how
  operators can achieve org isolation via domain naming patterns
@rkoster
Copy link
Contributor Author

rkoster commented Mar 12, 2026

Update: Simplified domain-level authorization config

Thanks to @beyhan for raising this during review! After investigating the implementation details, we identified that the proposed scope: org and scope: space options were not implementable with current GoRouter infrastructure.

The issue: These scope options would require GoRouter to check if the caller is in the same org/space as the target app. However, the RegistryMessage struct that GoRouter receives via NATS router.register only includes the target app's GUID (App field)—not its org or space GUID:

type RegistryMessage struct {
    App                     string            `json:"app"`  // Only app GUID, no org/space
    Host                    string            `json:"host"`
    Port                    uint16            `json:"port"`
    Tags                    map[string]string `json:"tags"`
    Uris                    []route.Uri       `json:"uris"`
    // ... other fields, but no org_guid or space_guid
}

What GoRouter knows:

Information Available? Source
Caller's app/space/org GUID ✅ Yes XFCC header (from certificate OU)
Target's app GUID ✅ Yes Route registration message
Target's space/org GUID ❌ No Not in registration message

Implementing scope: org/space would require extending the route registration protocol across multiple components (Diego route-emitter, Cloud Controller, GoRouter)—a much larger scope than this RFC.

The solution: We removed scope: org/space and kept only the explicit orgs: and spaces: lists in domain config. These work because they restrict callers (whose identity we have from XFCC), not targets.

For operators who want org-level isolation, we added an appendix example showing how to achieve this using per-org mTLS domains (e.g., *.apps.mtls.org1.internal), following the same pattern as cross-CF federation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfc CFF community RFC toc

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

8 participants