|
| 1 | +<!--- Licensed to the Apache Software Foundation (ASF) under one --> |
| 2 | +<!--- or more contributor license agreements. See the NOTICE file --> |
| 3 | +<!--- distributed with this work for additional information --> |
| 4 | +<!--- regarding copyright ownership. The ASF licenses this file --> |
| 5 | +<!--- to you under the Apache License, Version 2.0 (the --> |
| 6 | +<!--- "License"); you may not use this file except in compliance --> |
| 7 | +<!--- with the License. You may obtain a copy of the License at --> |
| 8 | + |
| 9 | +<!--- http://www.apache.org/licenses/LICENSE-2.0 --> |
| 10 | + |
| 11 | +<!--- Unless required by applicable law or agreed to in writing, --> |
| 12 | +<!--- software distributed under the License is distributed on an --> |
| 13 | +<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY --> |
| 14 | +<!--- KIND, either express or implied. See the License for the --> |
| 15 | +<!--- specific language governing permissions and limitations --> |
| 16 | +<!--- under the License. --> |
| 17 | + |
| 18 | +# Design: `ffi.ReprPrint` — Unified Object Repr |
| 19 | + |
| 20 | +## Motivation |
| 21 | + |
| 22 | +Before this change, `__repr__` for TVM FFI objects was fragmented: |
| 23 | + |
| 24 | +- **Array/List/Map**: Python-side `__repr__` methods iterated elements and |
| 25 | + formatted strings entirely in Python, using Python's native `repr()` for each |
| 26 | + element. This produced Python-native formatting (e.g. single-quoted strings |
| 27 | + `'hello'`) and had no awareness of object identity or shared references. |
| 28 | +- **Dataclass objects** (`@c_class`): A code-generated `__repr__` was produced |
| 29 | + per-class via `exec()` in `_utils.method_repr()`. This was coupled to the |
| 30 | + Python dataclass layer and could not represent C++-only objects. |
| 31 | +- **Other objects**: Fell back to `ClassName(0x...)` — the raw handle address. |
| 32 | + |
| 33 | +Problems with this approach: |
| 34 | + |
| 35 | +1. **No deduplication**: A DAG of objects (e.g. the same sub-object referenced |
| 36 | + from multiple fields) would print the full sub-object each time, potentially |
| 37 | + producing exponentially large output. |
| 38 | +2. **No cycle safety**: Cyclic object graphs would cause infinite recursion. |
| 39 | +3. **Inconsistent formatting**: Python `repr()` and C++ repr used different |
| 40 | + quoting and formatting conventions. |
| 41 | +4. **Python-only**: C++ objects without Python wrappers had no repr at all. |
| 42 | +5. **Per-class code generation**: The `exec()`-based `__repr__` in |
| 43 | + `method_repr()` was fragile and hard to extend. |
| 44 | + |
| 45 | +## Design Overview |
| 46 | + |
| 47 | +The new system introduces a single C++ function `ffi.ReprPrint` that produces a |
| 48 | +human-readable string for any TVM FFI value. All Python `__repr__` methods |
| 49 | +delegate to this function. |
| 50 | + |
| 51 | +```text |
| 52 | + Python __repr__ |
| 53 | + | |
| 54 | + v |
| 55 | + ffi.ReprPrint (C++ global function) |
| 56 | + | |
| 57 | + v |
| 58 | + ReprPrinter (DFS) |
| 59 | + / | \ |
| 60 | + Built-in Custom Generic |
| 61 | + repr fns __ffi_repr__ (reflection) |
| 62 | +``` |
| 63 | + |
| 64 | +### Key Properties |
| 65 | + |
| 66 | +- **Single source of truth**: One C++ implementation handles all types. |
| 67 | +- **DFS-based**: Processes the object graph depth-first with three-state |
| 68 | + tracking (NotVisited / InProgress / Done), naturally handling DAGs via |
| 69 | + memoization and detecting cycles via the InProgress state. |
| 70 | +- **Extensible**: Types can register custom `__ffi_repr__` functions via the |
| 71 | + type attribute system. |
| 72 | +- **Per-field control**: Individual fields can be excluded from repr via the |
| 73 | + `Repr(false)` InfoTrait, using a bit flag on the field metadata. |
| 74 | +- **Address control**: Object addresses are hidden by default for clean output. |
| 75 | + Set `TVM_FFI_REPR_WITH_ADDR=1` to show addresses for debugging. |
| 76 | + |
| 77 | +## Architecture |
| 78 | + |
| 79 | +### Components |
| 80 | + |
| 81 | +#### 1. `ReprPrinter` class (`src/ffi/extra/repr_print.cc`) |
| 82 | + |
| 83 | +The core engine. A stateful class that recursively processes the object graph |
| 84 | +via DFS: |
| 85 | + |
| 86 | +```text |
| 87 | +ReprOfAny(value) |
| 88 | + | |
| 89 | + ├── POD type? → format inline (None, bool, int, float, ...) |
| 90 | + | |
| 91 | + └── Object type? → check state_[obj]: |
| 92 | + | |
| 93 | + ├── Done → return repr_cache_[obj] (DAG: memoized) |
| 94 | + ├── InProgress → return "..." (cycle detected) |
| 95 | + └── NotVisited → mark InProgress |
| 96 | + → ProcessObject(obj) |
| 97 | + → cache result |
| 98 | + → mark Done |
| 99 | + → return result |
| 100 | +``` |
| 101 | + |
| 102 | +**Data members:** |
| 103 | + |
| 104 | +| Member | Type | Purpose | |
| 105 | +| ------ | ---- | ------- | |
| 106 | +| `state_` | `unordered_map<Object*, State>` | DFS state: NotVisited, InProgress, or Done | |
| 107 | +| `repr_cache_` | `unordered_map<Object*, string>` | Memoized repr string for each processed object | |
| 108 | +| `show_addr_` | `bool` | Whether to show addresses (from `TVM_FFI_REPR_WITH_ADDR` env var) | |
| 109 | + |
| 110 | +**State transitions:** |
| 111 | + |
| 112 | +Each object goes through: `NotVisited → InProgress → Done`. |
| 113 | + |
| 114 | +- **NotVisited → InProgress**: First encounter. The object is about to be |
| 115 | + processed; its children will be visited recursively. |
| 116 | +- **InProgress → Done**: All children have been processed. The repr string is |
| 117 | + computed, cached, and the object is marked done. |
| 118 | +- **InProgress (re-entered)**: A cycle is detected. Return `"..."` (or |
| 119 | + `"...@0xADDR"` when `show_addr_` is true). |
| 120 | +- **Done (re-encountered)**: A DAG shared reference. Return the cached repr |
| 121 | + string (full form). |
| 122 | + |
| 123 | +**`ProcessObject(obj)`:** |
| 124 | + |
| 125 | +For each object, checks for a custom `__ffi_repr__` type attribute: |
| 126 | + |
| 127 | +- If found: call the custom function, passing a `fn_repr` callback that |
| 128 | + recursively calls `ReprOfAny`. |
| 129 | +- If not found: use `GenericRepr()` — reflection-based |
| 130 | + `TypeKey(field=value, ...)`. |
| 131 | +- For Array/List: if `show_addr_` is true, append `@0xADDR` to the result. |
| 132 | + |
| 133 | +#### 2. Built-in `__ffi_repr__` functions |
| 134 | + |
| 135 | +Registered for core container/value types during static initialization: |
| 136 | + |
| 137 | +| Type | Format | Example | |
| 138 | +| ---- | ------ | ------- | |
| 139 | +| String | `"quoted"` | `"hello world"` | |
| 140 | +| Bytes | `b"escaped"` | `b"\x00\x01"` | |
| 141 | +| Tensor | `dtype[shape]@device@addr` | `float32[3, 4]@cpu:0@0x1234` | |
| 142 | +| Shape | `Shape(dims)` | `Shape(3, 4)` | |
| 143 | +| Array | `(elems)` with trailing comma for single | `(1, 2, 3)`, `(42,)`, `()` | |
| 144 | +| List | `[elems]` | `[1, 2, 3]` | |
| 145 | +| Map | `{k: v, ...}` | `{"key": "value"}` | |
| 146 | + |
| 147 | +Each function receives `(const T* obj, const Function& fn_repr)` where |
| 148 | +`fn_repr` is a callback to format child elements. This callback internally calls |
| 149 | +`ReprOfAny`, which handles cycle detection, DAG memoization, and POD formatting. |
| 150 | + |
| 151 | +#### 3. Generic reflection-based repr |
| 152 | + |
| 153 | +For user-defined objects without a custom `__ffi_repr__`, the system uses |
| 154 | +`GenericRepr()`: |
| 155 | + |
| 156 | +```text |
| 157 | +TypeKey(field1=value1, field2=value2) # default |
| 158 | +TypeKey@0xADDR(field1=value1, field2=value2) # with TVM_FFI_REPR_WITH_ADDR |
| 159 | +``` |
| 160 | + |
| 161 | +Fields are enumerated via `ForEachFieldInfo`. Fields with the |
| 162 | +`kTVMFFIFieldFlagBitMaskReprOff` flag are skipped. If no visible fields exist, |
| 163 | +the format is just `TypeKey` (or `TypeKey@0xADDR` with the env var). |
| 164 | + |
| 165 | +#### 4. `Repr(bool)` InfoTrait (`include/tvm/ffi/reflection/registry.h`) |
| 166 | + |
| 167 | +A per-field trait that controls repr visibility: |
| 168 | + |
| 169 | +```cpp |
| 170 | +refl::ObjectDef<MyClass>() |
| 171 | + .def_rw("visible_field", &MyClass::visible_field) |
| 172 | + .def_rw("hidden_field", &MyClass::hidden_field, refl::Repr(false)); |
| 173 | +``` |
| 174 | + |
| 175 | +`Repr(false)` sets `kTVMFFIFieldFlagBitMaskReprOff` (bit 6) on |
| 176 | +`TVMFFIFieldInfo::flags`. The repr printer checks this flag in `GenericRepr` |
| 177 | +to omit hidden fields from output. |
| 178 | + |
| 179 | +This replaces the previous `repr_fields` approach which required listing |
| 180 | +visible field names as strings in a separate struct — that was error-prone |
| 181 | +and required O(N*M) name matching at repr time. |
| 182 | + |
| 183 | +#### 5. Python integration (`python/tvm_ffi/cython/object.pxi`) |
| 184 | + |
| 185 | +`Object.__repr__` delegates to `ffi.ReprPrint`: |
| 186 | + |
| 187 | +```python |
| 188 | +def __repr__(self) -> str: |
| 189 | + if self.chandle == NULL: |
| 190 | + return type(self).__name__ + "(chandle=None)" |
| 191 | + return str(__object_repr__(self)) |
| 192 | +``` |
| 193 | + |
| 194 | +`__object_repr__` lazily loads `ffi.ReprPrint` and calls it. If the call fails |
| 195 | +for any reason, it silently falls back to `ClassName(handle)` — `__repr__` must |
| 196 | +never raise. |
| 197 | + |
| 198 | +Container classes (Array, List, Map) also delegate their `__repr__` to the same |
| 199 | +`__object_repr__` function, replacing the previous Python-side formatting. |
| 200 | + |
| 201 | +#### 6. Removal of Python-side `__repr__` generation |
| 202 | + |
| 203 | +The following are removed: |
| 204 | + |
| 205 | +- `_utils.method_repr()`: The `exec()`-based per-class `__repr__` generator. |
| 206 | +- `Field.repr` attribute and `field(repr=...)` parameter. |
| 207 | +- `c_class(repr=...)` parameter. |
| 208 | +- Old `test_cxx_class_repr*` tests (replaced by `test_repr.py`). |
| 209 | + |
| 210 | +## DAG / Shared Reference Handling |
| 211 | + |
| 212 | +When the same object is referenced multiple times (a DAG), the DFS memoization |
| 213 | +ensures it is processed only once. On subsequent encounters, the cached repr |
| 214 | +string is returned in full: |
| 215 | + |
| 216 | +```text |
| 217 | +obj = TestIntPair(a=1, b=2) |
| 218 | +arr = Array([obj, obj, obj]) |
| 219 | +
|
| 220 | +repr(arr) => |
| 221 | + (TestIntPair(a=1, b=2), TestIntPair(a=1, b=2), TestIntPair(a=1, b=2)) |
| 222 | + ^-- full form ^-- full form (cached) ^-- full form (cached) |
| 223 | +``` |
| 224 | + |
| 225 | +This is achieved by: |
| 226 | + |
| 227 | +1. DFS first encounters `obj` → marks `InProgress` → processes → caches repr → |
| 228 | + marks `Done`. |
| 229 | +2. Second/third encounter: `state_[obj] == Done` → return `repr_cache_[obj]`. |
| 230 | + |
| 231 | +## Cycle Detection |
| 232 | + |
| 233 | +Cyclic object graphs (e.g. `obj.field = [obj]`) are detected via the |
| 234 | +`InProgress` state. When DFS re-encounters an object that is currently being |
| 235 | +processed, it returns a `"..."` marker instead of recursing infinitely: |
| 236 | + |
| 237 | +```text |
| 238 | +obj = TestObjectDerived(v_i64=1, v_str="hi", v_array=[obj]) |
| 239 | +
|
| 240 | +repr(obj) => |
| 241 | + TestObjectDerived(v_i64=1, v_str="hi", v_map={}, v_array=(...,)) |
| 242 | + ^-- cycle marker |
| 243 | +``` |
| 244 | + |
| 245 | +With `TVM_FFI_REPR_WITH_ADDR=1`, the cycle marker includes the address: |
| 246 | + |
| 247 | +```text |
| 248 | + TestObjectDerived@0x1a2b(v_i64=1, ..., v_array=(...@0x1a2b,)@0x3c4d) |
| 249 | + ^-- obj addr ^-- cycle points back |
| 250 | +``` |
| 251 | + |
| 252 | +## Address Display Control |
| 253 | + |
| 254 | +By default, object addresses are **not shown** in repr output. This produces |
| 255 | +clean, readable output suitable for documentation and test assertions. |
| 256 | + |
| 257 | +Set the environment variable `TVM_FFI_REPR_WITH_ADDR=1` to enable addresses: |
| 258 | + |
| 259 | +| Context | Default | With `TVM_FFI_REPR_WITH_ADDR` | |
| 260 | +| ------- | ------- | ----------------------------- | |
| 261 | +| User objects | `TypeKey(fields)` | `TypeKey@0xADDR(fields)` | |
| 262 | +| No-field objects | `TypeKey` | `TypeKey@0xADDR` | |
| 263 | +| Array | `(elems)` | `(elems)@0xADDR` | |
| 264 | +| List | `[elems]` | `[elems]@0xADDR` | |
| 265 | +| Cycle marker | `...` | `...@0xADDR` | |
| 266 | +| Tensor | `dtype[shape]@dev@0xADDR` | `dtype[shape]@dev@0xADDR` (always) | |
| 267 | + |
| 268 | +## Format Summary |
| 269 | + |
| 270 | +```text |
| 271 | +42 # int |
| 272 | +3.14 # float |
| 273 | +True / False # bool |
| 274 | +None # None |
| 275 | +"hello" # String (SmallStr or StringObj) |
| 276 | +b"\x00\x01" # Bytes |
| 277 | +float32[3, 4]@cpu:0@0x1a2b # Tensor |
| 278 | +Shape(3, 4) # Shape |
| 279 | +(1, 2, 3) # Array |
| 280 | +(42,) # Array (single element) |
| 281 | +() # Array (empty) |
| 282 | +[1, 2, 3] # List |
| 283 | +{"key": "value"} # Map |
| 284 | +testing.MyObj(x=1, y="hi") # User object (all fields) |
| 285 | +testing.MyObj(y="hi") # User object (x has Repr(false)) |
| 286 | +testing.MyObj # No visible fields |
| 287 | +... # Cycle marker |
| 288 | +``` |
| 289 | + |
| 290 | +## File Changes |
| 291 | + |
| 292 | +| File | Change | |
| 293 | +| ---- | ------ | |
| 294 | +| `src/ffi/extra/repr_print.cc` | **New.** Core `ReprPrinter` and built-in repr functions. | |
| 295 | +| `CMakeLists.txt` | Add `repr_print.cc` to build. | |
| 296 | +| `include/tvm/ffi/c_api.h` | Add `kTVMFFIFieldFlagBitMaskReprOff = 1 << 6`. | |
| 297 | +| `include/tvm/ffi/reflection/registry.h` | Add `Repr` InfoTrait class. | |
| 298 | +| `python/tvm_ffi/cython/object.pxi` | `__repr__` delegates to `ffi.ReprPrint`. | |
| 299 | +| `python/tvm_ffi/container.py` | Array/List/Map `__repr__` delegate to `ffi.ReprPrint`. | |
| 300 | +| `python/tvm_ffi/_ffi_api.py` | Add `ReprPrint` type stub. | |
| 301 | +| `python/tvm_ffi/dataclasses/c_class.py` | Remove `repr` parameter; drop `method_repr` usage. | |
| 302 | +| `python/tvm_ffi/dataclasses/field.py` | Remove `Field.repr` and `field(repr=...)`. | |
| 303 | +| `python/tvm_ffi/dataclasses/_utils.py` | Remove `method_repr()`. | |
| 304 | +| `src/ffi/testing/testing.cc` | Use `Repr(false)` on `TestCxxClassBase` fields. | |
| 305 | +| `tests/python/test_repr.py` | **New.** 55 tests with strict assertions. | |
| 306 | +| `tests/python/test_container.py` | Update expected Array format to tuple. | |
| 307 | +| `tests/python/test_dataclasses_c_class.py` | Remove old repr tests (superseded). | |
0 commit comments