Skip to content

Commit 3d1fe18

Browse files
junrushaoclaude
andcommitted
feat: add DFS-based ffi.ReprPrint for unified object repr
- Single C++ ffi.ReprPrint function handles all types - DFS with 3-state tracking (NotVisited/InProgress/Done): - DAGs: memoized repr returned in full on re-encounter - Cycles: detected via InProgress state, shown as ... - Addresses hidden by default; set TVM_FFI_REPR_WITH_ADDR=1 to show - Per-field Repr(false) to exclude fields from repr output - Built-in repr for String, Bytes, Tensor, Shape, Array, List, Map - All Python __repr__ methods delegate to this function Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 86c4042 commit 3d1fe18

15 files changed

Lines changed: 1361 additions & 103 deletions

File tree

CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ set(_tvm_ffi_extra_objs_sources
7878
"${CMAKE_CURRENT_SOURCE_DIR}/src/ffi/extra/json_writer.cc"
7979
"${CMAKE_CURRENT_SOURCE_DIR}/src/ffi/extra/serialization.cc"
8080
"${CMAKE_CURRENT_SOURCE_DIR}/src/ffi/extra/deep_copy.cc"
81+
"${CMAKE_CURRENT_SOURCE_DIR}/src/ffi/extra/repr_print.cc"
8182
"${CMAKE_CURRENT_SOURCE_DIR}/src/ffi/extra/reflection_extra.cc"
8283
"${CMAKE_CURRENT_SOURCE_DIR}/src/ffi/extra/module.cc"
8384
"${CMAKE_CURRENT_SOURCE_DIR}/src/ffi/extra/library_module.cc"

DESIGN.md

Lines changed: 307 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,307 @@
1+
<!--- Licensed to the Apache Software Foundation (ASF) under one -->
2+
<!--- or more contributor license agreements. See the NOTICE file -->
3+
<!--- distributed with this work for additional information -->
4+
<!--- regarding copyright ownership. The ASF licenses this file -->
5+
<!--- to you under the Apache License, Version 2.0 (the -->
6+
<!--- "License"); you may not use this file except in compliance -->
7+
<!--- with the License. You may obtain a copy of the License at -->
8+
9+
<!--- http://www.apache.org/licenses/LICENSE-2.0 -->
10+
11+
<!--- Unless required by applicable law or agreed to in writing, -->
12+
<!--- software distributed under the License is distributed on an -->
13+
<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
14+
<!--- KIND, either express or implied. See the License for the -->
15+
<!--- specific language governing permissions and limitations -->
16+
<!--- under the License. -->
17+
18+
# Design: `ffi.ReprPrint` — Unified Object Repr
19+
20+
## Motivation
21+
22+
Before this change, `__repr__` for TVM FFI objects was fragmented:
23+
24+
- **Array/List/Map**: Python-side `__repr__` methods iterated elements and
25+
formatted strings entirely in Python, using Python's native `repr()` for each
26+
element. This produced Python-native formatting (e.g. single-quoted strings
27+
`'hello'`) and had no awareness of object identity or shared references.
28+
- **Dataclass objects** (`@c_class`): A code-generated `__repr__` was produced
29+
per-class via `exec()` in `_utils.method_repr()`. This was coupled to the
30+
Python dataclass layer and could not represent C++-only objects.
31+
- **Other objects**: Fell back to `ClassName(0x...)` — the raw handle address.
32+
33+
Problems with this approach:
34+
35+
1. **No deduplication**: A DAG of objects (e.g. the same sub-object referenced
36+
from multiple fields) would print the full sub-object each time, potentially
37+
producing exponentially large output.
38+
2. **No cycle safety**: Cyclic object graphs would cause infinite recursion.
39+
3. **Inconsistent formatting**: Python `repr()` and C++ repr used different
40+
quoting and formatting conventions.
41+
4. **Python-only**: C++ objects without Python wrappers had no repr at all.
42+
5. **Per-class code generation**: The `exec()`-based `__repr__` in
43+
`method_repr()` was fragile and hard to extend.
44+
45+
## Design Overview
46+
47+
The new system introduces a single C++ function `ffi.ReprPrint` that produces a
48+
human-readable string for any TVM FFI value. All Python `__repr__` methods
49+
delegate to this function.
50+
51+
```text
52+
Python __repr__
53+
|
54+
v
55+
ffi.ReprPrint (C++ global function)
56+
|
57+
v
58+
ReprPrinter (DFS)
59+
/ | \
60+
Built-in Custom Generic
61+
repr fns __ffi_repr__ (reflection)
62+
```
63+
64+
### Key Properties
65+
66+
- **Single source of truth**: One C++ implementation handles all types.
67+
- **DFS-based**: Processes the object graph depth-first with three-state
68+
tracking (NotVisited / InProgress / Done), naturally handling DAGs via
69+
memoization and detecting cycles via the InProgress state.
70+
- **Extensible**: Types can register custom `__ffi_repr__` functions via the
71+
type attribute system.
72+
- **Per-field control**: Individual fields can be excluded from repr via the
73+
`Repr(false)` InfoTrait, using a bit flag on the field metadata.
74+
- **Address control**: Object addresses are hidden by default for clean output.
75+
Set `TVM_FFI_REPR_WITH_ADDR=1` to show addresses for debugging.
76+
77+
## Architecture
78+
79+
### Components
80+
81+
#### 1. `ReprPrinter` class (`src/ffi/extra/repr_print.cc`)
82+
83+
The core engine. A stateful class that recursively processes the object graph
84+
via DFS:
85+
86+
```text
87+
ReprOfAny(value)
88+
|
89+
├── POD type? → format inline (None, bool, int, float, ...)
90+
|
91+
└── Object type? → check state_[obj]:
92+
|
93+
├── Done → return repr_cache_[obj] (DAG: memoized)
94+
├── InProgress → return "..." (cycle detected)
95+
└── NotVisited → mark InProgress
96+
→ ProcessObject(obj)
97+
→ cache result
98+
→ mark Done
99+
→ return result
100+
```
101+
102+
**Data members:**
103+
104+
| Member | Type | Purpose |
105+
| ------ | ---- | ------- |
106+
| `state_` | `unordered_map<Object*, State>` | DFS state: NotVisited, InProgress, or Done |
107+
| `repr_cache_` | `unordered_map<Object*, string>` | Memoized repr string for each processed object |
108+
| `show_addr_` | `bool` | Whether to show addresses (from `TVM_FFI_REPR_WITH_ADDR` env var) |
109+
110+
**State transitions:**
111+
112+
Each object goes through: `NotVisited → InProgress → Done`.
113+
114+
- **NotVisited → InProgress**: First encounter. The object is about to be
115+
processed; its children will be visited recursively.
116+
- **InProgress → Done**: All children have been processed. The repr string is
117+
computed, cached, and the object is marked done.
118+
- **InProgress (re-entered)**: A cycle is detected. Return `"..."` (or
119+
`"...@0xADDR"` when `show_addr_` is true).
120+
- **Done (re-encountered)**: A DAG shared reference. Return the cached repr
121+
string (full form).
122+
123+
**`ProcessObject(obj)`:**
124+
125+
For each object, checks for a custom `__ffi_repr__` type attribute:
126+
127+
- If found: call the custom function, passing a `fn_repr` callback that
128+
recursively calls `ReprOfAny`.
129+
- If not found: use `GenericRepr()` — reflection-based
130+
`TypeKey(field=value, ...)`.
131+
- For Array/List: if `show_addr_` is true, append `@0xADDR` to the result.
132+
133+
#### 2. Built-in `__ffi_repr__` functions
134+
135+
Registered for core container/value types during static initialization:
136+
137+
| Type | Format | Example |
138+
| ---- | ------ | ------- |
139+
| String | `"quoted"` | `"hello world"` |
140+
| Bytes | `b"escaped"` | `b"\x00\x01"` |
141+
| Tensor | `dtype[shape]@device@addr` | `float32[3, 4]@cpu:0@0x1234` |
142+
| Shape | `Shape(dims)` | `Shape(3, 4)` |
143+
| Array | `(elems)` with trailing comma for single | `(1, 2, 3)`, `(42,)`, `()` |
144+
| List | `[elems]` | `[1, 2, 3]` |
145+
| Map | `{k: v, ...}` | `{"key": "value"}` |
146+
147+
Each function receives `(const T* obj, const Function& fn_repr)` where
148+
`fn_repr` is a callback to format child elements. This callback internally calls
149+
`ReprOfAny`, which handles cycle detection, DAG memoization, and POD formatting.
150+
151+
#### 3. Generic reflection-based repr
152+
153+
For user-defined objects without a custom `__ffi_repr__`, the system uses
154+
`GenericRepr()`:
155+
156+
```text
157+
TypeKey(field1=value1, field2=value2) # default
158+
TypeKey@0xADDR(field1=value1, field2=value2) # with TVM_FFI_REPR_WITH_ADDR
159+
```
160+
161+
Fields are enumerated via `ForEachFieldInfo`. Fields with the
162+
`kTVMFFIFieldFlagBitMaskReprOff` flag are skipped. If no visible fields exist,
163+
the format is just `TypeKey` (or `TypeKey@0xADDR` with the env var).
164+
165+
#### 4. `Repr(bool)` InfoTrait (`include/tvm/ffi/reflection/registry.h`)
166+
167+
A per-field trait that controls repr visibility:
168+
169+
```cpp
170+
refl::ObjectDef<MyClass>()
171+
.def_rw("visible_field", &MyClass::visible_field)
172+
.def_rw("hidden_field", &MyClass::hidden_field, refl::Repr(false));
173+
```
174+
175+
`Repr(false)` sets `kTVMFFIFieldFlagBitMaskReprOff` (bit 6) on
176+
`TVMFFIFieldInfo::flags`. The repr printer checks this flag in `GenericRepr`
177+
to omit hidden fields from output.
178+
179+
This replaces the previous `repr_fields` approach which required listing
180+
visible field names as strings in a separate struct — that was error-prone
181+
and required O(N*M) name matching at repr time.
182+
183+
#### 5. Python integration (`python/tvm_ffi/cython/object.pxi`)
184+
185+
`Object.__repr__` delegates to `ffi.ReprPrint`:
186+
187+
```python
188+
def __repr__(self) -> str:
189+
if self.chandle == NULL:
190+
return type(self).__name__ + "(chandle=None)"
191+
return str(__object_repr__(self))
192+
```
193+
194+
`__object_repr__` lazily loads `ffi.ReprPrint` and calls it. If the call fails
195+
for any reason, it silently falls back to `ClassName(handle)``__repr__` must
196+
never raise.
197+
198+
Container classes (Array, List, Map) also delegate their `__repr__` to the same
199+
`__object_repr__` function, replacing the previous Python-side formatting.
200+
201+
#### 6. Removal of Python-side `__repr__` generation
202+
203+
The following are removed:
204+
205+
- `_utils.method_repr()`: The `exec()`-based per-class `__repr__` generator.
206+
- `Field.repr` attribute and `field(repr=...)` parameter.
207+
- `c_class(repr=...)` parameter.
208+
- Old `test_cxx_class_repr*` tests (replaced by `test_repr.py`).
209+
210+
## DAG / Shared Reference Handling
211+
212+
When the same object is referenced multiple times (a DAG), the DFS memoization
213+
ensures it is processed only once. On subsequent encounters, the cached repr
214+
string is returned in full:
215+
216+
```text
217+
obj = TestIntPair(a=1, b=2)
218+
arr = Array([obj, obj, obj])
219+
220+
repr(arr) =>
221+
(TestIntPair(a=1, b=2), TestIntPair(a=1, b=2), TestIntPair(a=1, b=2))
222+
^-- full form ^-- full form (cached) ^-- full form (cached)
223+
```
224+
225+
This is achieved by:
226+
227+
1. DFS first encounters `obj` → marks `InProgress` → processes → caches repr →
228+
marks `Done`.
229+
2. Second/third encounter: `state_[obj] == Done` → return `repr_cache_[obj]`.
230+
231+
## Cycle Detection
232+
233+
Cyclic object graphs (e.g. `obj.field = [obj]`) are detected via the
234+
`InProgress` state. When DFS re-encounters an object that is currently being
235+
processed, it returns a `"..."` marker instead of recursing infinitely:
236+
237+
```text
238+
obj = TestObjectDerived(v_i64=1, v_str="hi", v_array=[obj])
239+
240+
repr(obj) =>
241+
TestObjectDerived(v_i64=1, v_str="hi", v_map={}, v_array=(...,))
242+
^-- cycle marker
243+
```
244+
245+
With `TVM_FFI_REPR_WITH_ADDR=1`, the cycle marker includes the address:
246+
247+
```text
248+
TestObjectDerived@0x1a2b(v_i64=1, ..., v_array=(...@0x1a2b,)@0x3c4d)
249+
^-- obj addr ^-- cycle points back
250+
```
251+
252+
## Address Display Control
253+
254+
By default, object addresses are **not shown** in repr output. This produces
255+
clean, readable output suitable for documentation and test assertions.
256+
257+
Set the environment variable `TVM_FFI_REPR_WITH_ADDR=1` to enable addresses:
258+
259+
| Context | Default | With `TVM_FFI_REPR_WITH_ADDR` |
260+
| ------- | ------- | ----------------------------- |
261+
| User objects | `TypeKey(fields)` | `TypeKey@0xADDR(fields)` |
262+
| No-field objects | `TypeKey` | `TypeKey@0xADDR` |
263+
| Array | `(elems)` | `(elems)@0xADDR` |
264+
| List | `[elems]` | `[elems]@0xADDR` |
265+
| Cycle marker | `...` | `...@0xADDR` |
266+
| Tensor | `dtype[shape]@dev@0xADDR` | `dtype[shape]@dev@0xADDR` (always) |
267+
268+
## Format Summary
269+
270+
```text
271+
42 # int
272+
3.14 # float
273+
True / False # bool
274+
None # None
275+
"hello" # String (SmallStr or StringObj)
276+
b"\x00\x01" # Bytes
277+
float32[3, 4]@cpu:0@0x1a2b # Tensor
278+
Shape(3, 4) # Shape
279+
(1, 2, 3) # Array
280+
(42,) # Array (single element)
281+
() # Array (empty)
282+
[1, 2, 3] # List
283+
{"key": "value"} # Map
284+
testing.MyObj(x=1, y="hi") # User object (all fields)
285+
testing.MyObj(y="hi") # User object (x has Repr(false))
286+
testing.MyObj # No visible fields
287+
... # Cycle marker
288+
```
289+
290+
## File Changes
291+
292+
| File | Change |
293+
| ---- | ------ |
294+
| `src/ffi/extra/repr_print.cc` | **New.** Core `ReprPrinter` and built-in repr functions. |
295+
| `CMakeLists.txt` | Add `repr_print.cc` to build. |
296+
| `include/tvm/ffi/c_api.h` | Add `kTVMFFIFieldFlagBitMaskReprOff = 1 << 6`. |
297+
| `include/tvm/ffi/reflection/registry.h` | Add `Repr` InfoTrait class. |
298+
| `python/tvm_ffi/cython/object.pxi` | `__repr__` delegates to `ffi.ReprPrint`. |
299+
| `python/tvm_ffi/container.py` | Array/List/Map `__repr__` delegate to `ffi.ReprPrint`. |
300+
| `python/tvm_ffi/_ffi_api.py` | Add `ReprPrint` type stub. |
301+
| `python/tvm_ffi/dataclasses/c_class.py` | Remove `repr` parameter; drop `method_repr` usage. |
302+
| `python/tvm_ffi/dataclasses/field.py` | Remove `Field.repr` and `field(repr=...)`. |
303+
| `python/tvm_ffi/dataclasses/_utils.py` | Remove `method_repr()`. |
304+
| `src/ffi/testing/testing.cc` | Use `Repr(false)` on `TestCxxClassBase` fields. |
305+
| `tests/python/test_repr.py` | **New.** 55 tests with strict assertions. |
306+
| `tests/python/test_container.py` | Update expected Array format to tuple. |
307+
| `tests/python/test_dataclasses_c_class.py` | Remove old repr tests (superseded). |

include/tvm/ffi/c_api.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -870,6 +870,13 @@ typedef enum {
870870
* being used directly as the default value.
871871
*/
872872
kTVMFFIFieldFlagBitMaskDefaultFromFactory = 1 << 5,
873+
/*!
874+
* \brief The field is excluded from repr output.
875+
*
876+
* When set, the field will not appear in the generic reflection-based repr.
877+
* By default this flag is off (meaning the field is included in repr).
878+
*/
879+
kTVMFFIFieldFlagBitMaskReprOff = 1 << 6,
873880
#ifdef __cplusplus
874881
};
875882
#else

include/tvm/ffi/reflection/registry.h

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,33 @@ class AttachFieldFlag : public InfoTrait {
223223
int32_t flag_;
224224
};
225225

226+
/*!
227+
* \brief Trait that controls whether a field appears in repr output.
228+
*
229+
* By default, all fields appear in repr. Use `Repr(false)` to exclude a field.
230+
*/
231+
class Repr : public InfoTrait {
232+
public:
233+
/*!
234+
* \brief Constructor.
235+
* \param show Whether the field should appear in repr output.
236+
*/
237+
explicit Repr(bool show) : show_(show) {}
238+
239+
/*!
240+
* \brief Apply the repr flag to the field info.
241+
* \param info The field info.
242+
*/
243+
TVM_FFI_INLINE void Apply(TVMFFIFieldInfo* info) const {
244+
if (!show_) {
245+
info->flags |= kTVMFFIFieldFlagBitMaskReprOff;
246+
}
247+
}
248+
249+
private:
250+
bool show_;
251+
};
252+
226253
/*!
227254
* \brief Get the byte offset of a class member field.
228255
*
@@ -493,6 +520,7 @@ struct init {
493520
namespace type_attr {
494521
inline constexpr const char* kInit = "__ffi_init__";
495522
inline constexpr const char* kShallowCopy = "__ffi_shallow_copy__";
523+
inline constexpr const char* kRepr = "__ffi_repr__";
496524
} // namespace type_attr
497525

498526
/*!

python/tvm_ffi/_ffi_api.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ def ModuleImportModule(_0: Module, _1: Module, /) -> None: ...
8282
def ModuleInspectSource(_0: Module, _1: str, /) -> str: ...
8383
def ModuleLoadFromFile(_0: str, /) -> Module: ...
8484
def ModuleWriteToFile(_0: Module, _1: str, _2: str, /) -> None: ...
85+
def ReprPrint(_0: Any, /) -> str: ...
8586
def Shape(*args: Any) -> Any: ...
8687
def String(_0: str, /) -> str: ...
8788
def StructuralHash(_0: Any, _1: bool, _2: bool, /) -> int: ...
@@ -141,6 +142,7 @@ def ToJSONGraphString(_0: Any, _1: Any, /) -> str: ...
141142
"ModuleInspectSource",
142143
"ModuleLoadFromFile",
143144
"ModuleWriteToFile",
145+
"ReprPrint",
144146
"Shape",
145147
"String",
146148
"StructuralHash",

0 commit comments

Comments
 (0)