Skip to content

[Ruby] Add Fory Serialization Support for Ruby #3379

@chaokunyang

Description

@chaokunyang

Feature Request

Add Apache Fory xlang serialization support for Ruby, with full wire compatibility to existing language runtimes.

Is your feature request related to a problem? Please describe

Ruby currently has no Fory runtime, so Ruby services cannot participate in Fory xlang object exchange. This blocks cross-language adoption for Ruby-based systems and prevents reuse of Fory protocol capabilities (reference tracking, polymorphism, and schema evolution).

Describe the solution you'd like

Implement Apache Fory xlang serialization for Ruby with full wire compatibility with existing language runtimes, while preserving performance-first design principles.

Scope

  1. Implement xlang binary format in Ruby according to docs/specification/xlang_serialization_spec.md.
  2. Follow phased implementation guidance in docs/specification/xlang_implementation_guide.md.
  3. Support both schema-consistent mode and compatible mode (meta share / TypeDef).
  4. Provide cross-language interoperability with Java.

Non-Goals (Initial Delivery)

  1. Ruby-native non-xlang serialization format.
  2. Decimal support (currently not supported in spec).
  3. Advanced runtime code generation in first iteration.

Protocol Constraints That Ruby Must Follow

  1. Use little-endian encoding for all multi-byte values.
  2. Write xlang header bitmap exactly: null, xlang, oob flags in byte 0.
  3. Implement reference flags exactly: NULL(-3), REF(-2), NOT_NULL(-1), REF_VALUE(0).
  4. Assign reference IDs sequentially from 0 in serialization order.
  5. Encode type IDs as varuint32; for user types write internal type ID then user_type_id varuint32.
  6. For named types use namespace + type name metadata (or shared TypeDef marker in meta share mode).
  7. Implement deterministic struct field order exactly as specified (grouping + sort rules).
  8. Implement meta string encodings and dedup semantics required for TypeDef and named metadata.
  9. Ensure unknown fields and unknown union cases can be skipped safely by reading type meta and value payload correctly.

Proposed Ruby Architecture

Public API Layer

Use Fory (top-level entry point), aligned with Rust API ergonomics.

fory = Fory.new
           .xlang(true)
           .compatible(true)
           .compress_string(true)
           .track_ref(false)
           .max_dyn_depth(5)

fory.register(User, id: 1001)
bytes = fory.serialize(user)
decoded = fory.deserialize(bytes, as: User)

Configuration API (chainable, similar to Rust):

  1. compatible(enabled) sets compatible mode and meta share behavior.
  2. xlang(enabled) toggles cross-language format.
  3. compress_string(enabled) toggles meta string compression.
  4. check_struct_version(enabled) enables schema hash/version checks (schema-consistent mode).
  5. track_ref(enabled) toggles global reference tracking.
  6. max_dyn_depth(depth) limits dynamic nesting depth.

Core IO API:

  1. serialize(value) -> String
  2. serialize_to(buffer, value) -> Integer (append to mutable byte buffer, return bytes written)
  3. deserialize(bytes, as: nil) (as optional for dynamic/object mode)
  4. deserialize_from(reader, as: nil) (streaming/offset reader)

Registration API (Ruby-refined, named parameters):

  1. register(klass, id: nil, namespace: nil, type_name: nil, serializer: nil) for non-union types.
  2. register_union(klass, id: nil, namespace: nil, type_name: nil) for union schemas.

Registration rules:

  1. Exactly one registration mode must be provided:
    • numeric mode: id:
    • named mode: type_name: (with optional namespace:)
  2. namespace: defaults to "" when omitted in named mode.
  3. serializer: is optional; if omitted, default struct/enum serializer resolution is used.
  4. Passing both id: and type_name: is invalid.
  5. register is for struct/enum/ext and custom serializer types; register_union is union-only.

Runtime Core

  • Fory::Buffer
    • byte storage, read/write cursor, growth strategy.
    • fixed-width little-endian read/write.
    • varint/zigzag and varuint36_small helpers.
  • Fory::TypeResolver
    • maps Ruby classes and declared schemas to internal/user type IDs.
    • owns serializer lookup and polymorphic dispatch.
  • Fory::RefResolver
    • write-side object identity map (object_id -> ref_id).
    • read-side ref table (ref_id -> object).
  • Fory::MetaString
    • encoding selection and bit-packing for meta strings.
    • per-stream dedup table.
  • Fory::TypeDefContext
    • shared TypeDef marker/index cache for meta share mode.
  • Fory::FieldSkipper
    • skip-value dispatcher for unknown fields/unknown union alternatives.

Serializer Layer

  • Primitive serializers for all required built-in types.
  • Composite serializers for list/set/map/array.
  • Struct serializer with deterministic field-order computation and schema fingerprint support.
  • Union serializer for UNION, TYPED_UNION, NAMED_UNION.

Ruby Type Mapping (Initial)

Ruby Type Xlang Type Notes
NilClass null via ref flag Uses NULL flag, no payload
TrueClass/FalseClass BOOL 1 byte
Integer declared numeric type or dynamic numeric mapping Prefer declared schema type to avoid ambiguity
Float FLOAT64 Ruby float is IEEE 754 double
String STRING UTF-8 required; LATIN1/UTF16 optional for optimization
Array LIST Heterogeneous supported
Set SET via stdlib Set
Hash MAP Mutable keys disallowed by policy
Time TIMESTAMP normalize secs+nanos per spec
Date DATE days since epoch
Ruby class instances STRUCT/COMPATIBLE_STRUCT or named variants Requires registration or naming policy

Ruby Struct Model

Ruby needs explicit schema metadata to be deterministic across languages. Use a DSL similar to:

class User
  include Fory::Struct

  fory_type id: 1001, mode: :compatible
  field :id, :int64, nullable: false
  field :name, :string, nullable: false
  field :tags, [:list, :string], nullable: false
end

Design rules:

  1. Field identifier is id if present, otherwise snake_case field name.
  2. Field ordering uses the exact xlang algorithm from spec.
  3. Default xlang behavior: nullable=false, tracking_ref=false unless explicitly configured.
  4. Namespace/type name for named mode: module path + class name.

Implementation Phases

Phase 0: Project Skeleton

  1. Create ruby/ runtime layout with lib/fory and test directories.
  2. Add CI entry for Ruby tests and style checks.
  3. Add minimal smoke API and versioned gem scaffold.

Exit criteria:

  1. serialize(nil) and deserialize skeleton path works with xlang header.
  2. CI runs Ruby lint + unit tests.

Phase 1: Core Infrastructure

  1. Implement buffer, fixed-width little-endian primitives, varint/zigzag, varuint36_small.
  2. Implement xlang header read/write.
  3. Implement reference resolver core state and flags.

Exit criteria:

  1. Round-trip tests for all numeric encoders and header flags.
  2. Reference flag behavior matches spec state machine.

Phase 2: Basic Types

  1. Implement bool, int types, float types, string.
  2. Implement duration, timestamp, date.
  3. Add optional tagged int64/uint64 fast path.

Exit criteria:

  1. Ruby-to-Java and Java-to-Ruby tests pass for primitive and temporal values.
  2. String decoding works for LATIN1/UTF16/UTF8.

Phase 3: Collections and Arrays

  1. Implement list/set element header handling.
  2. Implement map chunk-based encoding/decoding.
  3. Implement primitive array fast path and object array via list.

Exit criteria:

  1. Cross-language tests for empty, homogeneous, heterogeneous, null-containing collections.
  2. Map chunk boundaries and null chunk behavior validated.

Phase 4: Type System and Structs (Schema Consistent)

  1. Implement type registry for numeric and named types.
  2. Implement struct serializer with deterministic field ordering.
  3. Implement optional schema hash (MurmurHash3 x64_128 low 32 bits).

Exit criteria:

  1. Registered and named struct round-trips pass cross-language.
  2. Schema hash interoperability tests pass when enabled.

Phase 5: Meta String

  1. Implement all required meta string encodings.
  2. Implement encoding selection algorithm.
  3. Implement per-stream meta string dedup.

Exit criteria:

  1. Golden-vector tests for encoded meta strings match reference outputs.
  2. TypeDef/name metadata size reduction validated.

Phase 6: Compatible Mode and Shared TypeDef

  1. Implement TypeDef encode/decode (global header + body + field info).
  2. Implement shared TypeDef marker/index streaming context.
  3. Implement field mapping by name/tag with unknown-field skipping.

Exit criteria:

  1. Schema evolution tests pass (add/remove/reorder/rename with tags).
  2. Unknown field skip behavior verified across languages.

Phase 7: Union and Extension Types

  1. Implement UNION, TYPED_UNION, NAMED_UNION type meta and value payload.
  2. Ensure Any-style case_value encoding with ref meta + type meta + value bytes.
  3. Add extension type registration hooks.

Exit criteria:

  1. Unknown union alternative skipping works correctly.
  2. Typed and named union cross-language tests pass.

Phase 8: Performance Hardening

  1. Remove avoidable allocations in buffer and hot serializer paths.
  2. Add specialized fast paths for homogeneous collections.
  3. Optionally introduce C extension for varint, buffer copy, and meta string packing.

Exit criteria:

  1. Throughput and allocation metrics improve against pure baseline.
  2. No protocol behavior regressions.

Cross-Language Test Plan

  1. Golden vector tests for primitives, string encodings, refs, list/set/map headers, TypeDef, unions.
  2. Bidirectional compatibility tests:
    • Ruby write -> Java read
    • Java write -> Ruby read
  3. Reference tests:
    • shared object references
    • circular graphs
    • ref-tracking disabled cases
  4. Struct evolution tests in compatible mode:
    • add/remove fields
    • tag-based mapping
    • unknown field skip
  5. Error-path tests:
    • invalid varint
    • unknown type id
    • truncated payload
    • malformed TypeDef

Key Risks and Mitigations

  1. Ruby Integer is arbitrary precision.
    • Mitigation: enforce explicit numeric schema types for struct fields and validate ranges.
  2. Ruby object model is dynamic and can destabilize struct schema.
    • Mitigation: require explicit field declarations for struct serialization.
  3. MRI performance may lag in bit-heavy paths.
    • Mitigation: optimize hot loops and add optional C extension.
  4. String encoding interoperability bugs.
    • Mitigation: exhaustive tests for LATIN1/UTF16/UTF8 read/write permutations.

Deliverables

  1. Ruby runtime package with xlang serializer/deserializer.
  2. Registration and struct DSL for deterministic schemas.
  3. Cross-language compatibility test suite integrated into CI.
  4. Developer documentation for API usage and schema evolution behavior.

Describe alternatives you've considered

  1. Bind Ruby to existing Python/C++ runtime via FFI.
    • Rejected for initial target due to FFI overhead, packaging complexity, and weaker Ruby-native maintainability.
  2. Implement only schema-consistent mode first.
    • Not preferred long-term; compatible mode (TypeDef/meta share) is required for practical schema evolution.
  3. Implement only named registration and drop numeric IDs.
    • Rejected for parity reasons; numeric registration should remain available like other runtimes.

Additional context

The detailed implementation plan, API shape, architecture, phased rollout, risks, and validation strategy are fully included above in the solution section.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions