Skip to content

Add TypeScript, Python, and PHP code generators#1094

Open
pcriadoperez wants to merge 13 commits intoaeron-io:masterfrom
pcriadoperez:add-languages
Open

Add TypeScript, Python, and PHP code generators#1094
pcriadoperez wants to merge 13 commits intoaeron-io:masterfrom
pcriadoperez:add-languages

Conversation

@pcriadoperez
Copy link
Copy Markdown

@pcriadoperez pcriadoperez commented Dec 27, 2025

Summary

Adds SBE code generators for three new target languages: TypeScript, Python, and PHP.

Each generator follows the same architecture as the existing generators (Java, C++, Rust, etc.) and plugs into the standard SBE tool pipeline via TargetCodeGeneratorLoader.

What's included

  • TypeScript generator — produces TypeScript interfaces and classes for encoding/decoding SBE messages, with an index file for module exports.
  • Python generator — produces Python 3.8+ classes using the struct module for binary serialization, with type hints. Supports a keyword append token (-Dsbe.keyword.append.token=_) for schemas containing Python reserved words.
  • PHP generator — produces PHP 8.0+ classes using pack()/unpack() for binary serialization, with union types and typed properties.
  • Updated TargetCodeGeneratorLoader to register PYTHON, PHP, and TYPESCRIPT targets.
  • Updated README with usage instructions for all three languages.
  • Basic encode/decode test scripts for Python and PHP.

Generator files

Each language follows the same structure under sbe-tool/src/main/java/uk/co/real_logic/sbe/generation/{language}/:

  • {Language}Generator.java — main code generation logic
  • {Language}OutputManager.java — file output management
  • {Language}Util.java — language-specific utilities (type mapping, keyword handling, etc.)

How to test

1. Build the SBE tool

./gradlew

2. Generate codecs from a schema

# TypeScript
java -Dsbe.output.dir=output/typescript -Dsbe.target.language=TYPESCRIPT \
  -jar sbe-all/build/libs/sbe-all-*.jar path/to/schema.xml

# Python
java -Dsbe.output.dir=output/python -Dsbe.target.language=PYTHON \
  -jar sbe-all/build/libs/sbe-all-*.jar path/to/schema.xml

# PHP
java -Dsbe.output.dir=output/php -Dsbe.target.language=PHP \
  -jar sbe-all/build/libs/sbe-all-*.jar path/to/schema.xml

3. Run the included test scripts

# Python (requires Python 3.8+)
cd python
python test_order.py

# PHP (requires PHP 8.0+)
cd php
php test_order.php

These scripts encode and decode a simple Order message and verify round-trip correctness.

4. Test with the existing SBE test schemas

You can also generate code against the test schemas bundled in the repo to verify more complex scenarios (groups, vardata, composites, enums):

java -Dsbe.output.dir=output/python -Dsbe.target.language=PYTHON \
  -jar sbe-all/build/libs/sbe-all-*.jar \
  sbe-tool/src/test/resources/code-generation-schema.xml

java -Dsbe.output.dir=output/typescript -Dsbe.target.language=TYPESCRIPT \
  -jar sbe-all/build/libs/sbe-all-*.jar \
  sbe-tool/src/test/resources/code-generation-schema.xml

5. Verify existing tests still pass

./gradlew test

final String compositeName,
final List<Token> tokens)
{
final String byteOrder = ir.byteOrder() == ByteOrder.LITTLE_ENDIAN ? "<" : ">";

Check notice

Code scanning / CodeQL

Unread local variable Note

Variable 'String byteOrder' is never read.
final List<Token> groups,
final List<Token> varData)
{
final String byteOrder = ir.byteOrder() == ByteOrder.LITTLE_ENDIAN ? "<" : ">";

Check notice

Code scanning / CodeQL

Unread local variable Note

Variable 'String byteOrder' is never read.
…N_FIELD

The composite decoder used Generators.forEachField() which only matches
Signal.BEGIN_FIELD tokens, but composite type inner tokens use
Signal.ENCODING. This produced empty interfaces and wrong ENCODED_LENGTH
(e.g., MessageHeader had ENCODED_LENGTH=2 instead of 8), causing all
subsequent field reads to be offset by 6 bytes.

Rewrite generateCompositeDecoder() to iterate tokens checking for
Signal.ENCODING, matching the pattern already used by the Python and PHP
generators. Pass encodedLength as a parameter from the composite token.

Regenerated all TypeScript output from the Binance schema.
Three fixes to the TypeScript code generator:

1. Nested groups and vardata inside groups were silently dropped.
   generateGroupInterfaces() and generateGroupDecoderMethods() only used
   Generators.forEachField() which skips BEGIN_GROUP and BEGIN_VAR_DATA
   tokens. Rewrote both to use collectFields/collectGroups/collectVarData
   on the group body with recursive handling of nested groups, matching
   the pattern used by the Go generator.

2. Var data length type was hardcoded as uint32. Replaced shared helper
   methods (decodeVarData, decodeVarStringUtf8, decodeVarStringAscii)
   with inline decode that reads the actual length type from schema
   tokens via Generators.findFirst("length", ...).

3. Message interface character encoding detection used the BEGIN_VAR_DATA
   token which doesn't carry characterEncoding. Changed to use
   Generators.findFirst("varData", ...) to read from the inner composite
   token, fixing type mismatches (e.g. ErrorResponse.msg was Uint8Array,
   now correctly string).
@pcriadoperez pcriadoperez marked this pull request as ready for review March 9, 2026 00:34
@pcriadoperez pcriadoperez changed the title add languages Add TypeScript, Python, and PHP code generators Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants