You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Base64-encode binary state token in Arrow IPC metadata for UTF-8 safety
The STATE_KEY metadata value contained raw binary data (HMAC-signed token
with serialized Arrow IPC and SHA-256 digest) that violated the Arrow IPC
requirement for UTF-8 metadata values. This broke cross-language Arrow
consumers.
- Rename STATE_KEY from vgi_rpc.stream_state to vgi_rpc.stream_state#b64
to signal that the value is base64-encoded binary data
- Base64-encode in _pack_state_token, base64-decode in _unpack_state_token
- Add tests for UTF-8 validity and pack/unpack roundtrip
- Update wire protocol docs, README, and docstrings
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CLAUDE.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,7 +59,7 @@ Then you can commit.
59
59
60
60
-**`logging_utils.py`** — `VgiJsonFormatter`, a `logging.Formatter` subclass that serializes log records as single-line JSON. Not auto-imported; must be imported explicitly from `vgi_rpc.logging_utils`.
61
61
62
-
-**`metadata.py`** — Shared helpers for `pa.KeyValueMetadata`. Centralises well-known metadata key constants (`vgi_rpc.method`, `vgi_rpc.stream_state`, `vgi_rpc.log_level`, `vgi_rpc.log_message`, `vgi_rpc.log_extra`, `vgi_rpc.server_id`, `vgi_rpc.request_version`, `vgi_rpc.location`, `vgi_rpc.shm_offset`, etc.) and provides encoding, merging, and key-stripping utilities used by `rpc/`, `http/`, `log.py`, `external.py`, `shm.py`, and `introspect.py`.
62
+
-**`metadata.py`** — Shared helpers for `pa.KeyValueMetadata`. Centralises well-known metadata key constants (`vgi_rpc.method`, `vgi_rpc.stream_state#b64`, `vgi_rpc.log_level`, `vgi_rpc.log_message`, `vgi_rpc.log_extra`, `vgi_rpc.server_id`, `vgi_rpc.request_version`, `vgi_rpc.location`, `vgi_rpc.shm_offset`, etc.) and provides encoding, merging, and key-stripping utilities used by `rpc/`, `http/`, `log.py`, `external.py`, `shm.py`, and `introspect.py`.
63
63
64
64
-**`introspect.py`** — Introspection support. Provides the built-in `__describe__` RPC method, `MethodDescription`, `ServiceDescription`, `build_describe_batch`, `parse_describe_batch`, and `introspect()`. Enabled on `RpcServer` via `enable_describe=True`.
Copy file name to clipboardExpand all lines: README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1420,7 +1420,7 @@ All framework metadata keys live in the `vgi_rpc.` namespace:
1420
1420
|---|---|---|
1421
1421
|`vgi_rpc.method`| batch metadata | Target RPC method name |
1422
1422
|`vgi_rpc.request_version`| batch metadata | Wire protocol version (`"1"`) |
1423
-
|`vgi_rpc.stream_state`| batch metadata |Serialized stream state (HTTP transport) |
1423
+
|`vgi_rpc.stream_state#b64`| batch metadata |Base64-encoded serialized stream state (HTTP transport). The `#b64` suffix indicates the value is base64-encoded binary data.|
|`vgi_rpc.log_message`| batch metadata | Log message text |
1426
1426
|`vgi_rpc.log_extra`| batch metadata | JSON-encoded extra fields |
@@ -1480,7 +1480,7 @@ All endpoints use `Content-Type: application/vnd.apache.arrow.stream`.
1480
1480
|`{prefix}/{method}/init`| POST | Stream initialization (producer and exchange) |
1481
1481
|`{prefix}/{method}/exchange`| POST | Stream continuation (producer and exchange) |
1482
1482
1483
-
Over HTTP, streaming is **stateless**: each exchange carries serialized `StreamState` in a signed token in the `vgi_rpc.stream_state` batch metadata key. Producer stream init returns data batches directly; exchange stream init returns a state token.
1483
+
Over HTTP, streaming is **stateless**: each exchange carries serialized `StreamState` in a signed token in the `vgi_rpc.stream_state#b64` batch metadata key. The token is base64-encoded to ensure the metadata value is valid UTF-8. Producer stream init returns data batches directly; exchange stream init returns a state token.
1484
1484
1485
1485
For streams with headers, the `/init` response body contains the header IPC stream prepended to the main output IPC stream. The `/exchange` endpoint never re-sends the header — it is only included in the initial response.
Copy file name to clipboardExpand all lines: docs/WIRE_PROTOCOL.md
+9-8Lines changed: 9 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,7 +83,7 @@ where they appear, and their semantics:
83
83
84
84
| Key (bytes) | Value | Description |
85
85
|-------------|-------|-------------|
86
-
|`vgi_rpc.stream_state`|Opaque binary (signed token) | Serialized stream state for stateless HTTP exchanges. |
86
+
|`vgi_rpc.stream_state#b64`|Base64-encoded binary (signed token) | Serialized stream state for stateless HTTP exchanges. The `#b64` suffix signals that the value is base64-encoded binary data. |
0 commit comments