Status: Draft / Active Target: SLSA Level 4 Compliance / WASM Compilation Applies to: Crossbar.io, Autobahn, txaio
This document defines the architectural constraints required to transition our Python codebase from a dynamic, reference-counted runtime model to a Statically Typed, Region-Based Memory model.
While the code remains valid Python 3.12+ runnable on CPython, it strictly adheres to a subset of the language that allows for deterministic compilation to WebAssembly (WASM) without a Garbage Collector (GC). This enables deployment in high-assurance environments (Defense, Aerospace, MCUs) and specifically targets NXP MCUs (via WAMR) and Cloudflare Workers (via V8).
To achieve SLSA Level 4 build integrity and execution safety, we treat Python as a Source Language rather than a Runtime.
We enforce two major pillars:
- Strict Static Typing: Every variable must have a known, computable memory layout at compile time.
- Region-Based Lifetimes: Memory is managed via hierarchical regions (Arenas). Objects are allocated in a specific scope and deallocated en-masse when that scope ends.
In a Region-Based system, "Leaks" and "Use-After-Free" errors are prevented by strictly enforcing a hierarchical ownership model. An object belonging to a shorter-lived region must never leak into a longer-lived region.
The canonical hierarchy for our stack is defined as follows (Level 0 lives longest):
| Level | Scope / Region | Examples | Lifetime Description |
|---|---|---|---|
| 0 | Worker / Process | Router, Node, TransportFactory |
Allocated at startup. Persists until process exit. |
| 1 | Transport | TcpTransport, TlsTransport |
Exists for the duration of a TCP connection. |
| 2 | Session | WampSession, ApplicationSession |
Exists while the WAMP session is active (over a transport). |
| 3 | Operation | Call, Invocation, Subscription |
Exists for the duration of a specific RPC or PubSub flow. |
| 4 | Message | CallMessage, EventMessage |
Ephemeral. Exists only while processing a single frame. |
A Reference Violation occurs if an object at Level
$N$ holds a strong reference to an object at Level$M$ where$M > N$ .
- Allowed: A
Session(L2) holding a reference to itsTransport(L1). (Child -> Parent). - Allowed: A
Session(L2) holding a reference to aCall(L3). (Parent -> Child / Ownership). - FORBIDDEN: A
Session(L2) storing a reference to aMessage(L4).- Why: The
Messageregion is cleared after the handle loop. If theSessionretains a pointer, it points to freed memory (Dangling Pointer). - Fix: We must Copy/Clone the data from the Message into the Session's region.
- Why: The
We move from "Type Hinting" (for IDEs) to "Type Definition" (for Compilation).
- No
Any: The typeAnyis banned in Core logic. It represents a memory layout that cannot be determined at compile time. - No
Unknown: If the type checker cannot infer a local variable's type, it must be explicitly annotated. - Modern Syntax: Use
list[str], notList[str]. Usex | y, notUnion[x, y].
We use Pyright (via ty or direct) as the authoritative compiler gate.
pyproject.toml Compliance Profile:
[tool.pyright]
typeCheckingMode = "strict"
reportUnknownVariableType = true # Critical for WASM compilation
reportUnknownMemberType = true # Critical for vtables/structs
reportUnknownArgumentType = trueFor existing modules being ported to the "WASM Core":
- Add
# pyright: strictto the top of the file. - Run
pyright. - Resolve all
Unknownerrors by adding explicit hints.
Since Python lacks a native Borrow Checker, we enforce lifetime hierarchy via two methods: Static Phantom Types (Compile Time) and Runtime Poisoning (Test Time).
We use Python Generics to tag objects with their Region. This allows Pyright to catch lifetime violations.
from typing import TypeVar, Generic, NewType
# Phantom Types (No runtime representation)
class R_Worker: pass
class R_Session: pass
class R_Message: pass
# Generic Region Variable
R = TypeVar("R")
class Message(Generic[R]):
payload: bytes
class Session(Generic[R]):
last_error: str
# BAD: Attempting to store a Message-scoped object in a Session
# def store_bad(self, msg: Message[R_Message]):
# self.cache = msg # Pyright Error: Type Mismatch
# GOOD: Explicit Copy/Clone
def store_good(self, msg: Message[R_Message]):
self.last_error = msg.payload.decode() # Copied primitiveFor the legacy codebase, we apply a runtime mixin during the test suite execution to detect hierarchy violations.
The Logic:
Every core class defines _region_level: int. If an attribute assignment is detected where self.level < value.level, the runtime raises a MemoryError.
# autobahn/util/region.py
class RegionTracked:
_region_level: int = 0
def __setattr__(self, key, value):
# Optimization: Only check generic objects, skip primitives (int/str are copy-by-value in WASM usually)
if hasattr(value, "_region_level"):
if self._region_level < value._region_level:
raise MemoryError(
f"LIFETIME VIOLATION: {self.__class__.__name__}(L{self._region_level}) "
f"cannot hold reference to shorter-lived {value.__class__.__name__}(L{value._region_level}). "
f"Attribute: '{key}'. Explicit copy required."
)
super().__setattr__(key, value)Implementation Strategy:
- Inherit
RegionTrackedinSession,Transport,Message. - Set
_region_levelin__init__. - Run the full test suite.
- Any crash represents a future segregation fault in WASM.
- CI: Update
ruffconfig to enforceANN(annotations) andUP(modern syntax). - CI: Add
pyrightjob in non-strict mode globally. - Code: Identify the "Core WAMP" modules targetable for WASM.
- Code: Add
# pyright: strictto core modules one by one. - Refactor: Eliminate all
AnyandUnknownin core modules. - Refactor: Replace dynamic
__getattr__orkwargswith explicit Dataclasses/Structs.
- Audit: Apply
RegionTrackedmixin to key classes. - Test: Run suite and fix "Leak" bugs (where long-lived objects hold message references).
- Design: Introduce
clone(scope=...)methods for objects that need to move between regions.
Q: Do I have to write this for all code? A: No. Only code intended to run in the Secure Enclave (WASM) needs strictly typed regions. Test helpers, scripts, and legacy adapters can remain standard Python.
Q: Why not just use Rust? A: We are preserving the semantic logic of Crossbar.io/Autobahn accumulated over 10 years. We are compiling the logic, not the interpreter.
Q: What happens if I violate the region rule? A: In Python CRuntime: Nothing (it works). In WASM: The allocator panics or the device crashes. We treat these as Critical Security Bugs.
We need to address the critical complexity that separates a simple "Function Call Stack" from an "Async Protocol Machine."
A pure LIFO (Stack) allocator works for nested functions A() -> B() -> C().
It fails for Async Concurrency, where Session starts RPC_A, then RPC_B, and RPC_A might finish after RPC_B, or vice versa. One cannot "pop" the stack for A without destroying B.
To map this to WASM/WAMR while keeping the "Region" safety, we must move from a Single Stack Allocator to a Forest of Arenas (Pool Allocator).
The WAMP hierarchy is strictly logical (Worker -> Session -> Operation), but the temporal execution is concurrent.
- A
Sessionis a 1:N container. It owns multiple activeOperations(RPC calls, subscriptions) simultaneously. - These Operations complete in non-deterministic order.
- Therefore, a single "Bump Pointer Stack" is insufficient for the Session Layer, as we cannot free Operation A's memory while Operation B is still active on top of it.
To solve this in WASM without a GC, the Runtime Shim implements a Hybrid Allocator consisting of two strategies:
Used for Level 0 (Worker), Level 1 (Transport), and Level 2 (Session).
- Mechanism: The runtime pre-allocates fixed-size "slots" (Slabs) for these structures.
- Behavior: When a new TCP connection comes in, we grab a free
SessionSlabfrom the pool. When it disconnects, we mark it free. - Fragmentation: Zero. Slabs are uniform.
Used for Level 3 (Operations) and Level 4 (Messages).
- Mechanism: Every Async Operation (e.g., a pending
Call) is assigned its own dedicated Linear Memory Arena (e.g., a 4KB or 16KB Page). - Binding: The WAMP
RequestIDacts as the handle to this Arena.Map<RequestID, ArenaPointer>
- Behavior:
- Start:
Call(ID=100)starts. Runtime grabs a clean Page. All local variables and outgoing messages for this call are allocated via bump-pointer inside this Page. - Suspend: The Page stays in memory while we await the network.
- Resume: When
RESULT(ID=100)arrives, the Router looks upID=100, finds the Page, switches the "Active Allocator" to that Page, and deserializes the Result message into that Page. - End: When the logic completes, the Entire Page is released back to the free list.
- Start:
In this model, "Memory Management" is tied directly to "Async Context Switching."
The Flow:
-
Network Ingress (The Router Loop):
- The Transport reads raw bytes into a generic IO Buffer (Level 1 Region).
- Parser decodes the WAMP Header to find
RequestID.
-
Context Lookup:
- Case A (New Op): It's a
CALL. Runtime allocates a New Arena. - Case B (Existing Op): It's a
RESULTforID=100. Runtime looks upArena(100).
- Case A (New Op): It's a
-
The "Region Switch":
- The Runtime sets the global
CURRENT_ALLOCATORpointer toArena(100). - The Parser deserializes the rest of the message payload. The memory lands physically inside
Arena(100).
- The Runtime sets the global
-
Execution:
- The Python logic
on_result(res)runs. Any temporary variables it creates land inArena(100).
- The Python logic
-
Teardown:
- The logic finishes. The Runtime calls
arena_free(Arena(100)). - Safety: All temporary objects, the incoming message, and the operation state vanish instantly.
- The logic finishes. The Runtime calls
This is the exact insight that turns a generic memory model into a Domain-Driven Architecture.
In the Router (Crossbar.io), the Observation class is the natural "Lifecycle Owner" of the PubSub payload. It is the standard-bearer for the 1:N distribution.
By tying the memory region to the Observation, we solve the hardest problem in zero-copy networking: "When is it safe to free the payload?"
In the Router (Crossbar.io), a PUBLISH message triggers a 1:N fan-out to subscribers. We must avoid copying the payload (Arguments/KwArgs)
We map the Crossbar.io Observation abstraction directly to a Reference-Counted Memory Arena.
-
Ingress (The Publish):
- When the Broker processes a
PUBLISH, it allocates a new Observation Arena. - The payload (application data) is deserialized once directly into this Arena.
- The
Observationobject (living in the Broker/Worker region) holds the pointer to this Arena.
- When the Broker processes a
-
Dispatch (The Fan-Out):
- The Broker iterates matching subscriptions.
- For each match, it queues an
EVENTmessage to the subscriber's Transport. - Crucial Optimization: The
EVENTmessage struct does not contain the payload data. It contains a View (Pointer) into the Observation Arena. - Ref-Counting: The
Observationincrements a counter:pending_deliveries += 1.
-
Egress (The Flush):
- As each Transport writes the packet to the wire (or fills the Kernel TCP buffer), it triggers a callback.
- The callback decrements
pending_deliveries -= 1.
-
Teardown (The Drop):
- When
pending_deliveries == 0, the Router knows that every subscriber has received the data (or it has been handed off to the OS kernel). - The Observation Arena is freed.
- When
Note: This creates a "Zero-Copy" path from Ingress Socket -> Observation Arena -> Egress Socket.
Router-to-Router links introduce complexity (latency, buffering), but they map cleanly to the Observation Arena model.
- The R2R Link as a "Subscriber":
- To the Broker, an Uplink/Downlink is just another subscriber with a Transport.
- It holds a reference to the
Observation Arenajust like a local WebSocket client.
- The Difference:
- R2R links might have significant buffering or "Store and Forward" behavior.
- The
Observation Arenapersists as long as the R2R link is holding the reference.
- Safety:
- Because the
Observationis decoupled from the original Publisher's Session, the Publisher can disconnect immediately after sending. TheObservation(and its memory) stays alive until the R2R link confirms transmission.
- Because the
This mapping validates the existing Python implementation choices:
crossbar.router.observation.Observation: This class effectively becomes the "Handle" for the Arena.crossbar.router.broker.Broker: The Broker manages the lifetime of the Handle.
When compiling the Typed Python to WASM:
- The
Observationclass logic remains Python (orchestrating the logic). - The
self.payloadattribute insideObservationis transformed by the compiler into a WASM Pointer to theObservation Arena. - Passing
self.payloadtosession.send()passes that pointer, not a copy.
This confirms that the architecture is robust enough to handle the transition to WASM without redesigning the protocol flow.
session.call(..., request_id=123): This Python line triggers the creation ofArena(123).await future: This suspends the Python stack. TheArena(123)sits dormant in WASM linear memory.msg = Transport.read(): This happens in a Transport/IO buffer.future.resolve(msg): The runtime identifiesmsgbelongs toArena(123), performs amemcpy(or move) of the data intoArena(123), and resumes the Python stack using that arena as the heap.
Here is the text for Section 9 to be added to our architecture document. It formalizes the "Defense in Depth" strategy regarding lifetime enforcement.
While Pyright's strict static analysis is the primary mechanism for developer guidance, it is insufficient on its own to guarantee the rigorous memory safety required for the WASM Secure Enclave. We employ a dual-strategy (Static + Runtime) to eliminate blind spots inherent in the Python language.
Static verification relies on complete coverage. If a single helper function in the call stack is not typed as Generic[R] (or is typed loosely), the region constraint is lost.
- The Risk: Passing a
Message[R_Message]into a generic helperdef utils.cache(item: Any)allows the item to be stored in a global variable, bypassing the static checker. - The Runtime Fix: The
RegionTrackedruntime mixin carries the region level (_region_level: int) on the object instance itself. Even if the type system "forgets" the region, the object remembers.
Python allows operations that are difficult to statically analyze for ownership transfer.
- Container Mutation: Appending a strictly-typed item to a loosely-typed list (e.g.,
list[object]) is often permitted by type checkers but violates region safety. - Dynamic Attributes: Using
setattr(self, name, value)bypasses property type checks. - Type Ignores: Developers may use
# type: ignoreto bypass CI errors during crunch times.
To satisfy SLSA Level 4 and high-assurance audit requirements, we treat these two methods as complementary layers:
-
Layer 1: Static Phantom Types (Guard Rails)
- Role: Developer Guidance.
- Effect: Prevents bugs during coding (IDE feedback).
- Constraint: Can be bypassed by
Anyortype: ignore.
-
Layer 2: Runtime Poisoning (Land Mines)
- Role: Architectural Enforcement.
- Effect: Detects actual architectural violations during the test suite execution.
- Constraint: Cannot catch code paths not covered by tests.
- Mechanism: The
__setattr__hook inRegionTrackedcomparesself._region_levelvsvalue._region_leveland raises aMemoryErrorimmediately if a shorter-lived object is assigned to a longer-lived parent.
Verdict: We require both. Static types prove we intended to follow the rules; Runtime checks prove we actually followed them.
A fundamental characteristic of WAMP is that args (positional) and kwargs (keyword) arguments are application-defined and arbitrary. While traditional Python types them as list[Any] | None or dict[str, Any] | None, our strict static typing requirement bans Any.
To resolve this conflict while enabling high-performance routing, we employ a Split-View Strategy: a recursive union type for Application Logic (Core 1 / SDK) and an opaque zero-copy handle for Router Logic (Core 0).
For application code (Autobahn SDK) where inspection of arguments is required, we define the closed set of all valid WAMP-serializable types. This provides static safety without resorting to Any.
from typing import TypeAlias
# 1. Primitives (Fixed set of WAMP-compatible scalars)
WampScalar: TypeAlias = int | float | str | bool | bytes | None
# 2. Recursive Containers
WampList: TypeAlias = list["WampValue"]
WampDict: TypeAlias = dict[str, "WampValue"]
# 3. The Closed Union (Replaces 'Any')
WampValue: TypeAlias = WampScalar | WampList | WampDictCompiler Implication: In the WASM build, WampValue lowers to a Tagged Union (Variant). Attempts to pass non-serializable objects (e.g., a datetime or socket) will trigger a static type error in Pyright, preventing runtime serialization failures.
The Crossbar.io Router (Core 0) performs routing based strictly on the URI (procedure or topic). It does not inspect application payloads.
To avoid the overhead of allocating and deserializing recursive structures that the Router will never read, the compiler treats args and kwargs differently in the Router Build Profile.
Router Message Definition:
class Call(Message[R]):
request_id: int
procedure: str
# In Router Profile, the compiler maps these to 'RawPayloadHandle'
# instead of deserializing into 'list[WampValue]'
args: WampRawPayload | None
kwargs: WampRawPayload | NoneOperational Flow:
- Ingress: The Transport reads the WAMP frame.
- Parsing: The parser decodes the header (Type, ID, URI).
- Bypass: When the parser encounters the
ArgumentsorArgumentsKwfields, it stops deserializing. It records the pointer and length of the raw serialized bytes (e.g., the MessagePack map/array slice) into aWampRawPayloadhandle. - Forwarding: The Router passes this opaque handle to the outgoing Transport.
- Egress: The outgoing Transport writes the header and simply
memcpys the raw payload bytes.
Benefit:
- Zero Allocation: No complex recursive structs are allocated in the heap.
- Zero Garbage: No cleanup required for deep object trees.
- Security: Malformed payloads (e.g., deeply nested JSON bombs) are not parsed by the Router, protecting the Core 0 control plane from parsing vulnerabilities.
*Recursive type aliases are fully supported in Python 3.11+ and PyPy 3.11+**, but the syntax changes slightly depending on whether we are on 3.11 or the newer 3.12.
Here is the breakdown of how "New" they are and how to write them for our specific target (3.11+).
While we could always "hack" recursive types using forward references (strings) in older Python, official support for Recursive Type Aliases where the type checker actually understands the recursion depth and structural equality was formalized in Python 3.10 (PEP 613) and perfected in Python 3.12 (PEP 695).
In Python 3.11, we must use from typing import TypeAlias and we must use string quotes for the self-reference.
from typing import TypeAlias
# 1. Primitives
WampScalar: TypeAlias = int | float | str | bool | bytes | None
# 2. Recursive definition
# NOTE: we MUST use quotes "WampValue" inside the definition
# because WampValue isn't fully defined yet when the interpreter reads this line.
WampValue: TypeAlias = WampScalar | list["WampValue"] | dict[str, "WampValue"]- Supported by Pyright/Ty: Yes, fully.
- Runtime: The interpreter sees a string
"WampValue".ty/Pyright resolves it statically.
Python 3.12 introduced the type keyword (PEP 695). This is the "neat" syntax we might have seen. It handles the forward reference automatically (no quotes needed).
# Python 3.12+ only
type WampScalar = int | float | str | bool | bytes | None
type WampValue = WampScalar | list[WampValue] | dict[str, WampValue]PyPy 3.10+ (and the upcoming 3.11 releases) supports the TypeAlias syntax perfectly.
Since type hints are primarily a static analysis feature (erased or stored in __annotations__ at runtime), PyPy has no trouble with them. The performance impact is zero because the recursion is resolved by ty/Pyright at compile/check time, not by PyPy at runtime.
Since we are targeting 3.11+, we stick to the Standard 3.11 Syntax:
from typing import TypeAlias
WampValue: TypeAlias = int | float | str | bool | bytes | None | list["WampValue"] | dict[str, "WampValue"]This is:
- Compatible with CPython 3.11 and PyPy 3.11.
- Understood by
ty/ Pyright (Strict Mode). - Compilable by our future WASM frontend (it sees the recursive graph).
- Client/SDK Code sees
list[WampValue](Type Safe, Recursive). - Router Code sees
WampRawPayload(Fast, Opaque). - Type Checker enforces
WampValuecompliance, preventingAny.
The risk to end-users is extremely low, bordering on zero, provided we follow the standard "Pythonic" implementation pattern.
Here is the risk assessment breakdown for our stakeholders/users:
Risk Level: None / Positive.
- Runtime Impact: Python ignores type hints at runtime (mostly). They are comments to the interpreter. There is no performance penalty and no change in behavior for existing code.
- User Experience: This is a pure upgrade for users.
- Users with modern IDEs (VS Code, PyCharm) will suddenly get working auto-completion and "red squigglies" if they pass the wrong arguments to
session.publish(). - Users without type checkers will notice nothing.
- Users with modern IDEs (VS Code, PyCharm) will suddenly get working auto-completion and "red squigglies" if they pass the wrong arguments to
Risk Level: Low (Manageable via Configuration).
The only real risk here is Performance, not stability.
- The Concern: If we add a
__setattr__hook to everyMessageclass to check_region_level, we introduce Python function call overhead on every attribute assignment. In a high-throughput router (Crossbar.io), this adds up. - The Solution: Make the Runtime Instrumentation Conditional.
Implementation Strategy (The "Debug Mode" Pattern):
Do not bake the RegionTracked mixin logic into the production class permanently. Use a conditional inheritance or a runtime toggle.
# autobahn/util/region.py
# Default: No-op (Zero Overhead for Production Users)
class RegionTracked:
_region_level: int = 0
# Debug/CI/WASM-Prep Mode: Active Checks
if __debug__ or os.environ.get("AUTOBAHN_DEBUG_LIFETIMES"):
class RegionTracked:
_region_level: int = 0
def __setattr__(self, key, value):
# ... perform the expensive check ...
super().__setattr__(key, value)Result:
- Standard User (
pip install autobahn): Gets the "fast" version.RegionTrackeddoes nothing. - CI / Test Suite: Runs with
AUTOBAHN_DEBUG_LIFETIMES=1. Catches the violations. - WASM Compiler: Sees the
_region_levelannotation and uses it to generate the memory management code.
Risk Level: Low.
- we are not changing the public API methods (e.g.,
publish,call). - we are only formalizing the internal contracts.
- Edge Case: If a user was doing something "illegal" before (like monkey-patching a
Messageobject onto aSessionobject manually), their code might break if they run in strict mode. This is acceptable. That user was relying on undefined behavior that would have caused memory leaks or bugs anyway.
- For the "Standard" Python User: The libraries get better (typed) and stay fast (checks disabled by default).
- For the "Defense" Customer: They get the mathematically proven, hardened artifact.
- For us (Maintainer): we get a single codebase that serves both masters.
It is a very safe migration path.