-
Notifications
You must be signed in to change notification settings - Fork 385
Description
Feature Request
Add Apache Fory xlang serialization support for Ruby, with full wire compatibility to existing language runtimes.
Is your feature request related to a problem? Please describe
Ruby currently has no Fory runtime, so Ruby services cannot participate in Fory xlang object exchange. This blocks cross-language adoption for Ruby-based systems and prevents reuse of Fory protocol capabilities (reference tracking, polymorphism, and schema evolution).
Describe the solution you'd like
Implement Apache Fory xlang serialization for Ruby with full wire compatibility with existing language runtimes, while preserving performance-first design principles.
Scope
- Implement xlang binary format in Ruby according to
docs/specification/xlang_serialization_spec.md. - Follow phased implementation guidance in
docs/specification/xlang_implementation_guide.md. - Support both schema-consistent mode and compatible mode (meta share / TypeDef).
- Provide cross-language interoperability with Java.
Non-Goals (Initial Delivery)
- Ruby-native non-xlang serialization format.
- Decimal support (currently not supported in spec).
- Advanced runtime code generation in first iteration.
Protocol Constraints That Ruby Must Follow
- Use little-endian encoding for all multi-byte values.
- Write xlang header bitmap exactly:
null,xlang,oobflags in byte 0. - Implement reference flags exactly:
NULL(-3),REF(-2),NOT_NULL(-1),REF_VALUE(0). - Assign reference IDs sequentially from
0in serialization order. - Encode type IDs as varuint32; for user types write internal type ID then
user_type_idvaruint32. - For named types use namespace + type name metadata (or shared TypeDef marker in meta share mode).
- Implement deterministic struct field order exactly as specified (grouping + sort rules).
- Implement meta string encodings and dedup semantics required for TypeDef and named metadata.
- Ensure unknown fields and unknown union cases can be skipped safely by reading type meta and value payload correctly.
Proposed Ruby Architecture
Public API Layer
Use Fory (top-level entry point), aligned with Rust API ergonomics.
fory = Fory.new
.xlang(true)
.compatible(true)
.compress_string(true)
.track_ref(false)
.max_dyn_depth(5)
fory.register(User, id: 1001)
bytes = fory.serialize(user)
decoded = fory.deserialize(bytes, as: User)Configuration API (chainable, similar to Rust):
compatible(enabled)sets compatible mode and meta share behavior.xlang(enabled)toggles cross-language format.compress_string(enabled)toggles meta string compression.check_struct_version(enabled)enables schema hash/version checks (schema-consistent mode).track_ref(enabled)toggles global reference tracking.max_dyn_depth(depth)limits dynamic nesting depth.
Core IO API:
serialize(value) -> Stringserialize_to(buffer, value) -> Integer(append to mutable byte buffer, return bytes written)deserialize(bytes, as: nil)(asoptional for dynamic/object mode)deserialize_from(reader, as: nil)(streaming/offset reader)
Registration API (Ruby-refined, named parameters):
register(klass, id: nil, namespace: nil, type_name: nil, serializer: nil)for non-union types.register_union(klass, id: nil, namespace: nil, type_name: nil)for union schemas.
Registration rules:
- Exactly one registration mode must be provided:
- numeric mode:
id: - named mode:
type_name:(with optionalnamespace:)
- numeric mode:
namespace:defaults to""when omitted in named mode.serializer:is optional; if omitted, default struct/enum serializer resolution is used.- Passing both
id:andtype_name:is invalid. registeris for struct/enum/ext and custom serializer types;register_unionis union-only.
Runtime Core
Fory::Buffer- byte storage, read/write cursor, growth strategy.
- fixed-width little-endian read/write.
- varint/zigzag and varuint36_small helpers.
Fory::TypeResolver- maps Ruby classes and declared schemas to internal/user type IDs.
- owns serializer lookup and polymorphic dispatch.
Fory::RefResolver- write-side object identity map (
object_id -> ref_id). - read-side ref table (
ref_id -> object).
- write-side object identity map (
Fory::MetaString- encoding selection and bit-packing for meta strings.
- per-stream dedup table.
Fory::TypeDefContext- shared TypeDef marker/index cache for meta share mode.
Fory::FieldSkipper- skip-value dispatcher for unknown fields/unknown union alternatives.
Serializer Layer
- Primitive serializers for all required built-in types.
- Composite serializers for list/set/map/array.
- Struct serializer with deterministic field-order computation and schema fingerprint support.
- Union serializer for
UNION,TYPED_UNION,NAMED_UNION.
Ruby Type Mapping (Initial)
| Ruby Type | Xlang Type | Notes |
|---|---|---|
NilClass |
null via ref flag | Uses NULL flag, no payload |
TrueClass/FalseClass |
BOOL |
1 byte |
Integer |
declared numeric type or dynamic numeric mapping | Prefer declared schema type to avoid ambiguity |
Float |
FLOAT64 |
Ruby float is IEEE 754 double |
String |
STRING |
UTF-8 required; LATIN1/UTF16 optional for optimization |
Array |
LIST |
Heterogeneous supported |
Set |
SET |
via stdlib Set |
Hash |
MAP |
Mutable keys disallowed by policy |
Time |
TIMESTAMP |
normalize secs+nanos per spec |
Date |
DATE |
days since epoch |
| Ruby class instances | STRUCT/COMPATIBLE_STRUCT or named variants |
Requires registration or naming policy |
Ruby Struct Model
Ruby needs explicit schema metadata to be deterministic across languages. Use a DSL similar to:
class User
include Fory::Struct
fory_type id: 1001, mode: :compatible
field :id, :int64, nullable: false
field :name, :string, nullable: false
field :tags, [:list, :string], nullable: false
endDesign rules:
- Field identifier is
idif present, otherwise snake_case field name. - Field ordering uses the exact xlang algorithm from spec.
- Default xlang behavior:
nullable=false,tracking_ref=falseunless explicitly configured. - Namespace/type name for named mode: module path + class name.
Implementation Phases
Phase 0: Project Skeleton
- Create
ruby/runtime layout withlib/foryand test directories. - Add CI entry for Ruby tests and style checks.
- Add minimal smoke API and versioned gem scaffold.
Exit criteria:
serialize(nil)anddeserializeskeleton path works with xlang header.- CI runs Ruby lint + unit tests.
Phase 1: Core Infrastructure
- Implement buffer, fixed-width little-endian primitives, varint/zigzag, varuint36_small.
- Implement xlang header read/write.
- Implement reference resolver core state and flags.
Exit criteria:
- Round-trip tests for all numeric encoders and header flags.
- Reference flag behavior matches spec state machine.
Phase 2: Basic Types
- Implement bool, int types, float types, string.
- Implement duration, timestamp, date.
- Add optional tagged int64/uint64 fast path.
Exit criteria:
- Ruby-to-Java and Java-to-Ruby tests pass for primitive and temporal values.
- String decoding works for LATIN1/UTF16/UTF8.
Phase 3: Collections and Arrays
- Implement list/set element header handling.
- Implement map chunk-based encoding/decoding.
- Implement primitive array fast path and object array via list.
Exit criteria:
- Cross-language tests for empty, homogeneous, heterogeneous, null-containing collections.
- Map chunk boundaries and null chunk behavior validated.
Phase 4: Type System and Structs (Schema Consistent)
- Implement type registry for numeric and named types.
- Implement struct serializer with deterministic field ordering.
- Implement optional schema hash (MurmurHash3 x64_128 low 32 bits).
Exit criteria:
- Registered and named struct round-trips pass cross-language.
- Schema hash interoperability tests pass when enabled.
Phase 5: Meta String
- Implement all required meta string encodings.
- Implement encoding selection algorithm.
- Implement per-stream meta string dedup.
Exit criteria:
- Golden-vector tests for encoded meta strings match reference outputs.
- TypeDef/name metadata size reduction validated.
Phase 6: Compatible Mode and Shared TypeDef
- Implement TypeDef encode/decode (global header + body + field info).
- Implement shared TypeDef marker/index streaming context.
- Implement field mapping by name/tag with unknown-field skipping.
Exit criteria:
- Schema evolution tests pass (add/remove/reorder/rename with tags).
- Unknown field skip behavior verified across languages.
Phase 7: Union and Extension Types
- Implement
UNION,TYPED_UNION,NAMED_UNIONtype meta and value payload. - Ensure Any-style
case_valueencoding with ref meta + type meta + value bytes. - Add extension type registration hooks.
Exit criteria:
- Unknown union alternative skipping works correctly.
- Typed and named union cross-language tests pass.
Phase 8: Performance Hardening
- Remove avoidable allocations in buffer and hot serializer paths.
- Add specialized fast paths for homogeneous collections.
- Optionally introduce C extension for varint, buffer copy, and meta string packing.
Exit criteria:
- Throughput and allocation metrics improve against pure baseline.
- No protocol behavior regressions.
Cross-Language Test Plan
- Golden vector tests for primitives, string encodings, refs, list/set/map headers, TypeDef, unions.
- Bidirectional compatibility tests:
- Ruby write -> Java read
- Java write -> Ruby read
- Reference tests:
- shared object references
- circular graphs
- ref-tracking disabled cases
- Struct evolution tests in compatible mode:
- add/remove fields
- tag-based mapping
- unknown field skip
- Error-path tests:
- invalid varint
- unknown type id
- truncated payload
- malformed TypeDef
Key Risks and Mitigations
- Ruby
Integeris arbitrary precision.- Mitigation: enforce explicit numeric schema types for struct fields and validate ranges.
- Ruby object model is dynamic and can destabilize struct schema.
- Mitigation: require explicit field declarations for struct serialization.
- MRI performance may lag in bit-heavy paths.
- Mitigation: optimize hot loops and add optional C extension.
- String encoding interoperability bugs.
- Mitigation: exhaustive tests for LATIN1/UTF16/UTF8 read/write permutations.
Deliverables
- Ruby runtime package with xlang serializer/deserializer.
- Registration and struct DSL for deterministic schemas.
- Cross-language compatibility test suite integrated into CI.
- Developer documentation for API usage and schema evolution behavior.
Describe alternatives you've considered
- Bind Ruby to existing Python/C++ runtime via FFI.
- Rejected for initial target due to FFI overhead, packaging complexity, and weaker Ruby-native maintainability.
- Implement only schema-consistent mode first.
- Not preferred long-term; compatible mode (TypeDef/meta share) is required for practical schema evolution.
- Implement only named registration and drop numeric IDs.
- Rejected for parity reasons; numeric registration should remain available like other runtimes.
Additional context
The detailed implementation plan, API shape, architecture, phased rollout, risks, and validation strategy are fully included above in the solution section.