|
| 1 | +--- |
| 2 | +slug: typescript-style-type-inference |
| 3 | +title: "Building TypeScript-Style Type Inference for T-Ruby" |
| 4 | +authors: [yhk1038] |
| 5 | +tags: [technical, type-inference, compiler] |
| 6 | +--- |
| 7 | + |
| 8 | +How we implemented TypeScript-inspired static type inference for T-Ruby, enabling automatic type detection without explicit annotations. |
| 9 | + |
| 10 | +<!-- truncate --> |
| 11 | + |
| 12 | +## The Problem |
| 13 | + |
| 14 | +When writing T-Ruby code, developers had to explicitly annotate every return type: |
| 15 | + |
| 16 | +```ruby |
| 17 | +def greet(name: String): String |
| 18 | + "Hello, #{name}!" |
| 19 | +end |
| 20 | +``` |
| 21 | +
|
| 22 | +Without the `: String` return type, the generated RBS would show `untyped`: |
| 23 | +
|
| 24 | +```rbs |
| 25 | +def greet: (name: String) -> untyped |
| 26 | +``` |
| 27 | +
|
| 28 | +This was frustrating. The return type is obviously `String` - why can't T-Ruby figure it out? |
| 29 | +
|
| 30 | +## Inspiration: TypeScript's Approach |
| 31 | +
|
| 32 | +TypeScript handles this elegantly. You can write: |
| 33 | +
|
| 34 | +```typescript |
| 35 | +function greet(name: string) { |
| 36 | + return `Hello, ${name}!`; |
| 37 | +} |
| 38 | +``` |
| 39 | +
|
| 40 | +And TypeScript infers the return type as `string`. We wanted the same experience for T-Ruby. |
| 41 | +
|
| 42 | +### How TypeScript Does It |
| 43 | +
|
| 44 | +TypeScript's type inference is built on two key components: |
| 45 | +
|
| 46 | +1. **Binder**: Builds a Control Flow Graph (CFG) during parsing |
| 47 | +2. **Checker**: Lazily evaluates types when needed, using flow analysis |
| 48 | +
|
| 49 | +The magic happens in `getFlowTypeOfReference` - a 1200+ line function that determines a symbol's type at any point in the code by walking backwards through flow nodes. |
| 50 | +
|
| 51 | +### Our Simplified Approach |
| 52 | +
|
| 53 | +Ruby's control flow is simpler than JavaScript's. We don't need the full complexity of TypeScript's flow graph. Instead, we implemented: |
| 54 | +
|
| 55 | +- **Linear data flow analysis** - Ruby's straightforward execution model |
| 56 | +- **Separation of concerns** - IR Builder (Binder role) + ASTTypeInferrer (Checker role) |
| 57 | +- **Lazy evaluation** - Types computed only when generating RBS |
| 58 | +
|
| 59 | +## Architecture |
| 60 | +
|
| 61 | +``` |
| 62 | +[Binder Stage - IR Builder] |
| 63 | +Source (.trb) → Parser → IR Tree (with method bodies) |
| 64 | + |
| 65 | +[Checker Stage - Type Inferrer] |
| 66 | +IR Node traversal → Type determination → Caching |
| 67 | + |
| 68 | +[Output Stage] |
| 69 | +Inferred types → RBS generation |
| 70 | +``` |
| 71 | + |
| 72 | +### The Core Components |
| 73 | + |
| 74 | +#### 1. BodyParser - Parsing Method Bodies |
| 75 | + |
| 76 | +The first challenge was that our parser didn't analyze method bodies - it only extracted signatures. We built `BodyParser` to convert T-Ruby method bodies into IR nodes: |
| 77 | + |
| 78 | +```ruby |
| 79 | +class BodyParser |
| 80 | + def parse(lines, start_line, end_line) |
| 81 | + statements = [] |
| 82 | + # Parse each line into IR nodes |
| 83 | + # Handle: literals, variables, operators, method calls, conditionals |
| 84 | + IR::Block.new(statements: statements) |
| 85 | + end |
| 86 | +end |
| 87 | +``` |
| 88 | + |
| 89 | +Supported constructs: |
| 90 | +- Literals: `"hello"`, `42`, `true`, `:symbol` |
| 91 | +- Variables: `name`, `@instance_var`, `@@class_var` |
| 92 | +- Operators: `a + b`, `x == y`, `!flag` |
| 93 | +- Method calls: `str.upcase`, `array.map { |x| x * 2 }` |
| 94 | +- Conditionals: `if`/`unless`/`elsif`/`else` |
| 95 | + |
| 96 | +#### 2. TypeEnv - Scope Chain Management |
| 97 | + |
| 98 | +```ruby |
| 99 | +class TypeEnv |
| 100 | + def initialize(parent = nil) |
| 101 | + @parent = parent |
| 102 | + @bindings = {} # Local variables |
| 103 | + @instance_vars = {} # Instance variables |
| 104 | + end |
| 105 | + |
| 106 | + def lookup(name) |
| 107 | + @bindings[name] || @instance_vars[name] || @parent&.lookup(name) |
| 108 | + end |
| 109 | + |
| 110 | + def child_scope |
| 111 | + TypeEnv.new(self) |
| 112 | + end |
| 113 | +end |
| 114 | +``` |
| 115 | + |
| 116 | +This enables proper scoping - a method's local variables don't leak into other methods, but instance variables are shared across the class. |
| 117 | + |
| 118 | +#### 3. ASTTypeInferrer - The Type Inference Engine |
| 119 | + |
| 120 | +The heart of the system: |
| 121 | + |
| 122 | +```ruby |
| 123 | +class ASTTypeInferrer |
| 124 | + LITERAL_TYPE_MAP = { |
| 125 | + string: "String", |
| 126 | + integer: "Integer", |
| 127 | + float: "Float", |
| 128 | + boolean: "bool", |
| 129 | + symbol: "Symbol", |
| 130 | + nil: "nil" |
| 131 | + }.freeze |
| 132 | + |
| 133 | + def infer_expression(node, env) |
| 134 | + # Check cache first (lazy evaluation) |
| 135 | + return @type_cache[node.object_id] if @type_cache[node.object_id] |
| 136 | + |
| 137 | + type = case node |
| 138 | + when IR::Literal |
| 139 | + LITERAL_TYPE_MAP[node.literal_type] |
| 140 | + when IR::VariableRef |
| 141 | + env.lookup(node.name) |
| 142 | + when IR::BinaryOp |
| 143 | + infer_binary_op(node, env) |
| 144 | + when IR::MethodCall |
| 145 | + infer_method_call(node, env) |
| 146 | + # ... more cases |
| 147 | + end |
| 148 | + |
| 149 | + @type_cache[node.object_id] = type |
| 150 | + end |
| 151 | +end |
| 152 | +``` |
| 153 | + |
| 154 | +### Handling Ruby's Implicit Returns |
| 155 | + |
| 156 | +Ruby's last expression is the implicit return value. This is crucial for type inference: |
| 157 | + |
| 158 | +```ruby |
| 159 | +def status |
| 160 | + if active? |
| 161 | + "running" |
| 162 | + else |
| 163 | + "stopped" |
| 164 | + end |
| 165 | +end |
| 166 | +# Implicit return: String (from both branches) |
| 167 | +``` |
| 168 | + |
| 169 | +We handle this by: |
| 170 | +1. Collecting all explicit `return` types |
| 171 | +2. Finding the last expression (implicit return) |
| 172 | +3. Unifying all return types |
| 173 | + |
| 174 | +```ruby |
| 175 | +def infer_method_return_type(method_node, env) |
| 176 | + # Collect explicit returns |
| 177 | + return_types, terminated = collect_return_types(method_node.body, env) |
| 178 | + |
| 179 | + # Add implicit return (unless method always returns explicitly) |
| 180 | + unless terminated |
| 181 | + implicit_return = infer_implicit_return(method_node.body, env) |
| 182 | + return_types << implicit_return if implicit_return |
| 183 | + end |
| 184 | + |
| 185 | + unify_types(return_types) |
| 186 | +end |
| 187 | +``` |
| 188 | + |
| 189 | +### Special Case: `initialize` Method |
| 190 | + |
| 191 | +Ruby's `initialize` is a constructor. Its return value is ignored - `Class.new` returns the instance. Following RBS conventions, we always infer `void`: |
| 192 | + |
| 193 | +```ruby |
| 194 | +class User |
| 195 | + def initialize(name: String) |
| 196 | + @name = name |
| 197 | + end |
| 198 | +end |
| 199 | +``` |
| 200 | + |
| 201 | +Generates: |
| 202 | + |
| 203 | +```rbs |
| 204 | +class User |
| 205 | + def initialize: (name: String) -> void |
| 206 | +end |
| 207 | +``` |
| 208 | + |
| 209 | +### Built-in Method Type Knowledge |
| 210 | + |
| 211 | +We maintain a table of common Ruby method return types: |
| 212 | + |
| 213 | +```ruby |
| 214 | +BUILTIN_METHOD_TYPES = { |
| 215 | + %w[String upcase] => "String", |
| 216 | + %w[String downcase] => "String", |
| 217 | + %w[String length] => "Integer", |
| 218 | + %w[String to_i] => "Integer", |
| 219 | + %w[Array first] => "untyped", # Element type |
| 220 | + %w[Array length] => "Integer", |
| 221 | + %w[Integer to_s] => "String", |
| 222 | + # ... 200+ methods |
| 223 | +}.freeze |
| 224 | +``` |
| 225 | + |
| 226 | +## Results |
| 227 | + |
| 228 | +Now this T-Ruby code: |
| 229 | + |
| 230 | +```ruby |
| 231 | +class Greeter |
| 232 | + def initialize(name: String) |
| 233 | + @name = name |
| 234 | + end |
| 235 | + |
| 236 | + def greet |
| 237 | + "Hello, #{@name}!" |
| 238 | + end |
| 239 | + |
| 240 | + def shout |
| 241 | + @name.upcase |
| 242 | + end |
| 243 | +end |
| 244 | +``` |
| 245 | + |
| 246 | +Automatically generates correct RBS: |
| 247 | + |
| 248 | +```rbs |
| 249 | +class Greeter |
| 250 | + @name: String |
| 251 | +
|
| 252 | + def initialize: (name: String) -> void |
| 253 | + def greet: () -> String |
| 254 | + def shout: () -> String |
| 255 | +end |
| 256 | +``` |
| 257 | + |
| 258 | +No explicit return types needed! |
| 259 | + |
| 260 | +## Testing |
| 261 | + |
| 262 | +We built comprehensive tests: |
| 263 | + |
| 264 | +- **Unit tests**: Literal inference, operator types, method call types |
| 265 | +- **E2E tests**: Full compilation with RBS validation |
| 266 | + |
| 267 | +```ruby |
| 268 | +it "infers String from string literal" do |
| 269 | + create_trb_file("src/test.trb", <<~TRB) |
| 270 | + class Test |
| 271 | + def message |
| 272 | + "hello world" |
| 273 | + end |
| 274 | + end |
| 275 | + TRB |
| 276 | + |
| 277 | + rbs_content = compile_and_get_rbs("src/test.trb") |
| 278 | + expect(rbs_content).to include("def message: () -> String") |
| 279 | +end |
| 280 | +``` |
| 281 | + |
| 282 | +## Challenges & Solutions |
| 283 | + |
| 284 | +| Challenge | Solution | |
| 285 | +|-----------|----------| |
| 286 | +| Method bodies not parsed | Built custom BodyParser for T-Ruby syntax | |
| 287 | +| Implicit returns | Analyze last expression in blocks | |
| 288 | +| Recursive methods | 2-pass analysis (signatures first, then bodies) | |
| 289 | +| Complex expressions | Gradual expansion: literals → variables → operators → method calls | |
| 290 | +| Union types | Collect all return paths and unify | |
| 291 | + |
| 292 | +## Future Work |
| 293 | + |
| 294 | +- **Generic inference**: `[1, 2, 3]` → `Array[Integer]` |
| 295 | +- **Block/lambda types**: Infer block parameter and return types |
| 296 | +- **Type narrowing**: Smarter types after `if x.is_a?(String)` |
| 297 | +- **Cross-method inference**: Use inferred types from other methods |
| 298 | + |
| 299 | +## Conclusion |
| 300 | + |
| 301 | +By studying TypeScript's approach and adapting it for Ruby's simpler semantics, we built a practical type inference system. The key insights: |
| 302 | + |
| 303 | +1. **Parse method bodies** - You can't infer types without seeing the code |
| 304 | +2. **Lazy evaluation with caching** - Don't compute until needed |
| 305 | +3. **Handle Ruby idioms** - Implicit returns, `initialize`, etc. |
| 306 | +4. **Start simple** - Literals first, then build up complexity |
| 307 | + |
| 308 | +Type inference makes T-Ruby feel more natural. Write Ruby code, get type safety - no annotations required. |
| 309 | + |
| 310 | +--- |
| 311 | + |
| 312 | +*The type inference system is available in T-Ruby. Try it out and let us know what you think!* |
0 commit comments