Skip to content

Commit 920fcd0

Browse files
committed
feat(gdpr): runtime field encryption for x-gdpr-sensitive — Java core (ADR-0030)
The SDK-enforcement half of GDPR governance (mirrors the Go reference). New com.babelqueue.gdpr: a caller-bound Cipher interface (encrypt/decrypt onto KMS/Vault) + a JDK-only AesGcmCipher reference (javax.crypto, AES-256-GCM), and Gdpr.protect/unprotect that encrypt/decrypt exactly the schema's x-gdpr-sensitive leaves in place (nested + array + root), byte-for-byte round-trip, wrong key -> DecryptException. SensitivePaths.of parses x-gdpr-sensitive (validation-neutral); a small public JsonValues bridges the package-private codec for canonical leaf round-trips. data stays pure JSON (ciphertext string) so the envelope is frozen (GR-1, schema_version 1, trace_id preserved); the Cipher seam keeps the core zero-dep (GR-7). Opt-in; schema validation on cleartext. v1.7.0.
1 parent 804be46 commit 920fcd0

14 files changed

Lines changed: 1129 additions & 1 deletion

CHANGELOG.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,34 @@ The envelope wire format is versioned separately by `meta.schema_version`
99

1010
## [Unreleased]
1111

12+
## [1.7.0] - 2026-06-21
13+
14+
### Added
15+
- **Runtime GDPR field encryption** (ADR-0030) in the new optional `com.babelqueue.gdpr`
16+
module — the **SDK-enforcement** half of the registry's `x-gdpr-sensitive` declaration.
17+
babelqueue-registry only *declares* and audits which `data` fields are personal data; this
18+
module *enforces* it on the wire: a producer encrypts each marked leaf before publish, a
19+
consumer decrypts it after decode. It is the Java mirror of the Go reference, so every SDK
20+
round-trips byte-for-byte. Standalone and **opt-in**.
21+
- `Cipher` is a **caller-provided** interface (`encrypt(byte[])` / `decrypt(String)`) — a
22+
seam onto KMS/Vault/HSM/tokenisation, so the core pulls **no** crypto dependency (GR-7).
23+
`AesGcmCipher` is a JDK-only reference (`javax.crypto`, AES-256-GCM, a fresh random 12-byte
24+
IV prepended, Base64); the caller owns the key. A wrong key or tampered ciphertext fails GCM
25+
authentication and throws rather than returning corrupt plaintext.
26+
- `Gdpr.protect(data, schema, cipher)` / `Gdpr.unprotect(...)` rewrite each `x-gdpr-sensitive`
27+
leaf **in place**: the value is canonically JSON-encoded then replaced by the ciphertext
28+
**string**, and `unprotect` decodes the decrypted bytes back — so the round-trip is
29+
**byte-for-byte** (a number restores to a number, an object to an object). An absent path is
30+
skipped; a non-string leaf in `unprotect` is left untouched (idempotent re-runs); a value the
31+
cipher cannot open throws `DecryptException` so the message takes retry / dead-letter.
32+
- `SensitivePaths.of(schema)` (+ the `SensitivePath` record) in `com.babelqueue.schema` walk a
33+
decoded JSON Schema for the `x-gdpr-sensitive` marks (boolean `true` or a non-empty string
34+
category), descending nested objects, array items (`field[]`) and the root. The keyword is
35+
**validation-neutral** — annotating a schema is never a breaking change.
36+
- The wire envelope stays **frozen**: only the *value* of a sensitive field changes (to a
37+
ciphertext string), so `data` is still pure JSON (GR-3), `meta.schema_version` stays `1` and
38+
`trace_id` is untouched (GR-4). This is additive and opt-in.
39+
1240
## [1.6.0] - 2026-06-21
1341

1442
### Added

README.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,49 @@ the broker, then at-least-once on the wire as always — consumers still dedupe
146146
(`SELECT … FOR UPDATE SKIP LOCKED`) is the adapter's job; the in-memory reference store does
147147
not implement it.
148148

149+
### GDPR field encryption (optional)
150+
151+
The `com.babelqueue.gdpr` helper (ADR-0030) is the **runtime, SDK-enforcement** half of
152+
the registry's `x-gdpr-sensitive` declaration: a producer encrypts each marked `data`
153+
field before publish, a consumer decrypts it after decode. The registry only *declares*
154+
which fields are personal data; this enforces it on the wire. It is standalone and
155+
**opt-in** — call it, or don't.
156+
157+
```java
158+
import com.babelqueue.gdpr.*;
159+
import com.babelqueue.schema.*;
160+
161+
// The Cipher is YOURS — a seam onto KMS/Vault/HSM/tokenisation. The core ships a JDK-only
162+
// reference (AES-256-GCM, random 12-byte IV prepended, Base64) so it pulls no crypto dep (GR-7).
163+
Cipher cipher = new AesGcmCipher(key); // key is 16/24/32 bytes; the caller owns it
164+
165+
Map<String, Object> schema = provider.schemaFor(env.job()); // the same per-URN schema you validate against
166+
if (schema != null) {
167+
// Producer — validate CLEARTEXT first, then encrypt the marked leaves in place:
168+
SchemaValidation.validate(provider, env.job(), env.data());
169+
Gdpr.protect(env.data(), schema, cipher);
170+
}
171+
String body = EnvelopeCodec.encode(env); // ciphertext rides inside data
172+
173+
// Consumer — decrypt the marked leaves in place AFTER decode, BEFORE the handler reads data:
174+
Envelope in = EnvelopeCodec.decode(body);
175+
Map<String, Object> inSchema = provider.schemaFor(in.job());
176+
if (inSchema != null) {
177+
Gdpr.unprotect(in.data(), inSchema, cipher); // wrong key → DecryptException (retry/DLQ)
178+
}
179+
```
180+
181+
The wire envelope stays **frozen** (GR-1): only the **value** of a sensitive field changes
182+
— it becomes a ciphertext **string**, so `data` is still pure JSON (GR-3) and any SDK can
183+
carry the envelope even without the key. `meta.schema_version` stays `1` and `trace_id` is
184+
untouched (GR-4). Each leaf is canonically JSON-encoded before encryption and decoded back
185+
after, so `protect``unprotect` restores the value **byte-for-byte** (a number comes back a
186+
number, an object an object). The sensitive paths come from the schema's `x-gdpr-sensitive`
187+
marks (`SensitivePaths.of(schema)` — nested objects, array items `field[]`, and the root),
188+
which are **validation-neutral** so annotating a schema is never a breaking change. Validate
189+
cleartext **before** `protect` / **after** `unprotect`: a schema that constrains a sensitive
190+
field (`minLength`, `enum`, …) would reject the ciphertext string otherwise.
191+
149192
## What this core is (and isn't)
150193

151194
It enforces the **contract**: the envelope shape, URN identity, trace propagation,

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
<groupId>com.babelqueue</groupId>
88
<artifactId>babelqueue-core</artifactId>
9-
<version>1.6.0</version>
9+
<version>1.7.0</version>
1010
<packaging>jar</packaging>
1111

1212
<name>BabelQueue Core</name>
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
package com.babelqueue;
2+
3+
/**
4+
* Canonical encode/decode for a <b>single</b> decoded-JSON value, using the same minimal codec
5+
* the wire envelope uses. It is the seam field-level features ({@link com.babelqueue.gdpr})
6+
* need: the core's {@link Json} reader/writer is package-private (it must never force a JSON
7+
* library on consumers, GR-7), so this class exposes just enough of it — one value in, one value
8+
* out — without widening {@code Json} itself.
9+
*
10+
* <p>Because it routes through the very same codec, the round-trip is <b>type-exact</b>: a value
11+
* decoded by {@link EnvelopeCodec#decode} (numbers as {@link Long}/{@link java.math.BigInteger}/
12+
* {@link Double}, objects as {@link java.util.LinkedHashMap}, arrays as {@link java.util.ArrayList})
13+
* re-encodes and re-decodes back to the same Java types. That is what lets
14+
* {@link com.babelqueue.gdpr.Gdpr#protect protect}/{@link com.babelqueue.gdpr.Gdpr#unprotect unprotect}
15+
* restore a protected field byte-for-byte after a decrypt.
16+
*/
17+
public final class JsonValues {
18+
19+
private JsonValues() {
20+
}
21+
22+
/**
23+
* Encode one decoded-JSON value to its compact canonical JSON string — the same form the
24+
* envelope codec emits (slashes and non-ASCII left literal, no insignificant whitespace).
25+
*
26+
* @param value a decoded-JSON value ({@code Map}/{@code List}/{@code String}/{@code Number}/
27+
* {@code Boolean}/{@code null})
28+
* @return the compact JSON encoding
29+
* @throws BabelQueueException if the value cannot be encoded (e.g. a non-finite number)
30+
*/
31+
public static String encode(Object value) {
32+
return Json.write(value);
33+
}
34+
35+
/**
36+
* Parse one JSON document back into a decoded-JSON value, with the same types the envelope
37+
* codec produces. The exact inverse of {@link #encode(Object)}.
38+
*
39+
* @param raw a JSON document holding a single value
40+
* @return the decoded value
41+
* @throws BabelQueueException if {@code raw} is not a single, well-formed JSON value
42+
*/
43+
public static Object decode(String raw) {
44+
return Json.parse(raw);
45+
}
46+
}
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
package com.babelqueue.gdpr;
2+
3+
import java.nio.charset.StandardCharsets;
4+
import java.security.GeneralSecurityException;
5+
import java.security.SecureRandom;
6+
import java.util.Arrays;
7+
import java.util.Base64;
8+
import javax.crypto.spec.GCMParameterSpec;
9+
import javax.crypto.spec.SecretKeySpec;
10+
11+
/**
12+
* A reference {@link Cipher} built ONLY on the JDK's {@code javax.crypto} ({@code AES/GCM/NoPadding}):
13+
* AES-GCM authenticated encryption with a fresh random 12-byte IV per call, the IV <b>prepended</b>
14+
* to the ciphertext, the whole thing Base64-encoded so it drops straight into a JSON string. The key
15+
* is the CALLER's — this type performs no key management, rotation or derivation; bind a KMS-backed
16+
* {@link Cipher} for that.
17+
*
18+
* <p>A 32-byte key selects AES-256-GCM (recommended); 24- and 16-byte keys select AES-192/128-GCM.
19+
* GCM authenticates the ciphertext, so {@link #decrypt} rejects any tampered or wrong-key input by
20+
* throwing (it never returns corrupt plaintext). It pulls no third-party crypto dependency (GR-7)
21+
* and is safe for concurrent use — {@code javax.crypto.Cipher} instances are created per call, and
22+
* the stored key material is only read.
23+
*/
24+
public final class AesGcmCipher implements Cipher {
25+
26+
/** AES-GCM standard nonce/IV length, in bytes. A 12-byte IV is the recommended GCM size. */
27+
private static final int IV_LENGTH = 12;
28+
29+
/** GCM authentication-tag length, in bits (the full 128-bit tag). */
30+
private static final int TAG_BITS = 128;
31+
32+
private static final String TRANSFORMATION = "AES/GCM/NoPadding";
33+
private static final String ALGORITHM = "AES";
34+
35+
private final SecretKeySpec key;
36+
private final SecureRandom random = new SecureRandom();
37+
38+
/**
39+
* Build an AES-GCM reference cipher from a raw symmetric key. The key length selects the AES
40+
* variant: 32 bytes &rarr; AES-256-GCM (recommended), 24 &rarr; AES-192, 16 &rarr; AES-128.
41+
*
42+
* @param keyBytes the raw symmetric key (16, 24, or 32 bytes)
43+
* @throws InvalidKeySizeException if the key is not 16, 24, or 32 bytes
44+
*/
45+
public AesGcmCipher(byte[] keyBytes) {
46+
int len = keyBytes == null ? 0 : keyBytes.length;
47+
if (len != 16 && len != 24 && len != 32) {
48+
throw new InvalidKeySizeException(len);
49+
}
50+
this.key = new SecretKeySpec(keyBytes, ALGORITHM);
51+
}
52+
53+
/**
54+
* Seals {@code plaintext} with a fresh random IV, prepends the IV, and Base64-encodes the result
55+
* ({@code Base64(iv || ciphertext || tag)}).
56+
*/
57+
@Override
58+
public String encrypt(byte[] plaintext) throws GeneralSecurityException {
59+
byte[] iv = new byte[IV_LENGTH];
60+
random.nextBytes(iv);
61+
62+
javax.crypto.Cipher gcm = javax.crypto.Cipher.getInstance(TRANSFORMATION);
63+
gcm.init(javax.crypto.Cipher.ENCRYPT_MODE, key, new GCMParameterSpec(TAG_BITS, iv));
64+
byte[] sealed = gcm.doFinal(plaintext);
65+
66+
byte[] out = new byte[iv.length + sealed.length];
67+
System.arraycopy(iv, 0, out, 0, iv.length);
68+
System.arraycopy(sealed, 0, out, iv.length, sealed.length);
69+
return Base64.getEncoder().encodeToString(out);
70+
}
71+
72+
/**
73+
* Reverses {@link #encrypt}: Base64-decodes, splits off the prepended IV, and opens the GCM
74+
* ciphertext. A wrong key or tampered input fails GCM authentication and throws (never corrupt
75+
* plaintext).
76+
*
77+
* @throws MalformedCiphertextException if the input is not valid Base64 or is too short to hold
78+
* an IV (i.e. not something this cipher produced)
79+
* @throws GeneralSecurityException if GCM authentication fails (wrong key or tampering)
80+
*/
81+
@Override
82+
public byte[] decrypt(String ciphertext) throws GeneralSecurityException {
83+
byte[] raw;
84+
try {
85+
raw = Base64.getDecoder().decode(ciphertext.getBytes(StandardCharsets.UTF_8));
86+
} catch (IllegalArgumentException ex) {
87+
throw new MalformedCiphertextException("not valid Base64", ex);
88+
}
89+
if (raw.length < IV_LENGTH) {
90+
throw new MalformedCiphertextException("shorter than the IV", null);
91+
}
92+
93+
byte[] iv = Arrays.copyOfRange(raw, 0, IV_LENGTH);
94+
byte[] sealed = Arrays.copyOfRange(raw, IV_LENGTH, raw.length);
95+
96+
javax.crypto.Cipher gcm = javax.crypto.Cipher.getInstance(TRANSFORMATION);
97+
gcm.init(javax.crypto.Cipher.DECRYPT_MODE, key, new GCMParameterSpec(TAG_BITS, iv));
98+
// AEADBadTagException (a GeneralSecurityException) on wrong key / tampered input.
99+
return gcm.doFinal(sealed);
100+
}
101+
}
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
package com.babelqueue.gdpr;
2+
3+
/**
4+
* The field-level protection primitive that the <b>caller provides</b> — a seam onto a KMS, a
5+
* Vault transit engine, an HSM, a tokenisation service, or the reference {@link AesGcmCipher}
6+
* below. {@link Gdpr#protect protect} runs {@link #encrypt} over every {@code x-gdpr-sensitive}
7+
* leaf's value (after it is canonically JSON-encoded); {@link Gdpr#unprotect unprotect} runs
8+
* {@link #decrypt} to restore it. Keeping this an interface is what holds GR-7: the core never
9+
* pulls a crypto/KMS dependency — only a caller who binds a concrete backend does.
10+
*
11+
* <p>Contract for an implementation:
12+
* <ul>
13+
* <li>{@link #encrypt} takes the canonical JSON bytes of one field value (see
14+
* {@link Gdpr#protect}) and returns the ciphertext as a <b>String</b> that is valid for
15+
* placement inside a JSON document (the {@link AesGcmCipher} reference returns Base64, which
16+
* is). The same plaintext MAY encrypt to a different string each call — a random nonce/IV is
17+
* expected and good.</li>
18+
* <li>{@link #decrypt} is the exact inverse: given a string {@code encrypt} produced, it returns
19+
* the original JSON bytes <b>byte-for-byte</b>. A string it did not produce, or one produced
20+
* under a different key, MUST throw rather than return silent garbage, so a wrong-key consume
21+
* fails loudly (the message then takes retry / dead-letter).</li>
22+
* <li>Both MUST be safe for concurrent use; a producer/consumer fans the same {@code Cipher}
23+
* across threads.</li>
24+
* </ul>
25+
*/
26+
public interface Cipher {
27+
28+
/**
29+
* Protects one field value (its canonical JSON bytes) and returns a JSON-safe ciphertext
30+
* string.
31+
*
32+
* @param plaintext the canonical JSON bytes of one field value
33+
* @return the ciphertext, encoded so it is safe to place inside a JSON string
34+
* @throws Exception if encryption fails (surfaced wrapped by {@link Gdpr#protect})
35+
*/
36+
String encrypt(byte[] plaintext) throws Exception;
37+
38+
/**
39+
* Reverses {@link #encrypt}, returning the original field-value JSON bytes.
40+
*
41+
* @param ciphertext a string produced by {@link #encrypt}
42+
* @return the original field-value JSON bytes
43+
* @throws Exception if the input is not a valid ciphertext, is tampered, or was produced under
44+
* a different key (surfaced as {@link DecryptException} by
45+
* {@link Gdpr#unprotect})
46+
*/
47+
byte[] decrypt(String ciphertext) throws Exception;
48+
}
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
package com.babelqueue.gdpr;
2+
3+
import com.babelqueue.BabelQueueException;
4+
5+
/**
6+
* Raised by {@link Gdpr#unprotect} when a protected field cannot be restored on the consume side —
7+
* a wrong key, a tampered/garbled ciphertext, or a value that is not the string {@link Gdpr#protect}
8+
* produced. {@code unprotect} stops at the first such failure and throws this, so it is
9+
* distinguishable from a missing field (which is skipped, not an error) and from an already-cleartext
10+
* non-string leaf (which is left untouched).
11+
*
12+
* <p>A consumer should treat it as fatal for that message: fail the delivery so the adapter retries
13+
* and eventually dead-letters it, rather than handle unreadable PII. It is unchecked
14+
* ({@link BabelQueueException} is a {@link RuntimeException}) so it composes with the existing
15+
* handler/redrive flow that already reacts to thrown runtime exceptions.
16+
*/
17+
public class DecryptException extends BabelQueueException {
18+
19+
private static final long serialVersionUID = 1L;
20+
21+
/**
22+
* @param message the failure detail
23+
* @param cause the underlying cipher or decode failure
24+
*/
25+
public DecryptException(String message, Throwable cause) {
26+
super("babelqueue/gdpr: cannot decrypt a protected field: " + message, cause);
27+
}
28+
}

0 commit comments

Comments
 (0)