Skip to content

feat: add DFS-based ffi.ReprPrint for unified object repr#454

Merged
tqchen merged 1 commit intoapache:mainfrom
junrushao:2026-02-14/dataclass-repr
Feb 18, 2026
Merged

feat: add DFS-based ffi.ReprPrint for unified object repr#454
tqchen merged 1 commit intoapache:mainfrom
junrushao:2026-02-14/dataclass-repr

Conversation

@junrushao
Copy link
Member

@junrushao junrushao commented Feb 16, 2026

Summary

  • Single C++ ffi.ReprPrint function produces human-readable repr for any TVM FFI value
  • DFS with 3-state tracking (NotVisited/InProgress/Done):
    • DAGs: memoized repr returned in full on every re-encounter
    • Cycles: detected via InProgress state, shown as ...
  • Addresses hidden by default; set TVM_FFI_REPR_WITH_ADDR=1 to show
  • Per-field Repr(false) InfoTrait to exclude fields from repr output
  • Built-in repr for String, Bytes, Tensor, Shape, Array, List, Map
  • All Python __repr__ methods delegate to this function

Format Examples

42                                    # int
"hello"                               # String
(1, 2, 3)                             # Array
[1, 2, 3]                             # List
{"key": "value"}                      # Map
testing.MyObj(x=1, y="hi")            # User object
...                                   # Cycle marker
float32[3, 4]@cpu:0@0x1234            # Tensor

Test plan

  • 55 Python tests covering primitives, containers, user objects, DAGs, cycles, and TVM_FFI_REPR_WITH_ADDR
  • All pre-commit hooks pass (ruff, ty, clang-format, markdownlint, etc.)
  • Container tests (test_container.py) pass with updated Array format

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @junrushao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the string representation (repr) of FFI objects by introducing a robust, centralized C++ system. The new ffi.ReprPrint mechanism leverages a Breadth-First Search (BFS) algorithm to accurately represent complex object graphs, including those with cycles and shared references, preventing infinite recursion and providing concise output for duplicates. This change streamlines how objects are displayed, offering both built-in type-specific formatting and a flexible reflection-based approach for user-defined types, while also integrating seamlessly with Python's __repr__ methods.

Highlights

  • Unified Object Representation: Introduced a new centralized, reflection-based ffi.ReprPrint system in C++ for generating string representations of FFI objects.
  • Cycle and DAG Handling: The ffi.ReprPrint system uses a BFS traversal to correctly handle object graphs with cycles and duplicate references, ensuring consistent and informative repr output.
  • Custom and Generic Formatters: Provided specialized formatting for common built-in types (String, Tensor, Shape, Array, List, Map) and a generic reflection-based mechanism for user-defined types, including type_key@0xADDR(field=value, ...).
  • Field Selection for Repr: Added a refl::repr_fields tag struct to allow explicit control over which fields appear in the repr output for user-defined types.
  • Python Integration: Python's __repr__ methods for Object, Array, List, and Map now delegate to the new C++ ffi.ReprPrint functionality, ensuring consistent behavior across language boundaries.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • CMakeLists.txt
    • Updated build configuration to include the new repr_print.cc source file.
  • include/tvm/ffi/reflection/registry.h
    • Added new type attributes kRepr and kReprFields for custom representation and field selection.
    • Introduced a repr_fields tag struct to control which fields are included in an object's representation.
  • python/tvm_ffi/_ffi_api.py
    • Exposed the new ReprPrint function from the FFI API for Python usage.
  • python/tvm_ffi/container.py
    • Modified __repr__ methods for Array, List, and Map to delegate to the new centralized ffi.ReprPrint system.
  • python/tvm_ffi/cython/object.pxi
    • Updated the __object_repr__ function to dynamically load and utilize the ffi.ReprPrint global function.
  • python/tvm_ffi/dataclasses/_utils.py
    • Removed the Python-side method_repr generation logic, as object representation is now handled by the C++ ffi.ReprPrint.
  • python/tvm_ffi/dataclasses/c_class.py
    • Removed the repr parameter and related logic from the c_class decorator, centralizing repr generation in C++.
  • python/tvm_ffi/dataclasses/field.py
    • Removed the repr parameter from the Field class and field function, streamlining field definition.
  • src/ffi/extra/repr_print.cc
    • Added a new C++ file implementing a BFS-based ReprPrinter for FFI objects, including cycle detection and custom formatters for built-in types.
    • Registered ffi.ReprPrint as a global FFI function.
  • src/ffi/testing/testing.cc
    • Configured TestCxxClassDerived to use refl::repr_fields to specify which fields are included in its representation for testing purposes.
  • tests/python/test_container.py
    • Updated an assertion in test_repr to match the new string quoting format for map keys.
  • tests/python/test_dataclasses_c_class.py
    • Removed tests related to Python-generated __repr__ methods, as this functionality is now handled in C++.
  • tests/python/test_repr.py
    • Added a new test file containing comprehensive unit tests for the ffi.ReprPrint functionality across various data types, user objects, and object graph structures.
Activity
  • A new feature for unified object representation (ffi.ReprPrint) has been introduced.
  • 17 new tests were added in tests/python/test_repr.py to cover primitives, strings, containers, tensors, shapes, user objects, repr_fields, duplicate references, Python __repr__ integration, and empty containers.
  • An existing test in test_container.py was updated to reflect changes in string quoting format.
  • The author confirmed that the full test suite passes, including 401 Python tests and 276 C++ tests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@junrushao junrushao force-pushed the 2026-02-14/dataclass-repr branch from cb25c51 to 493d366 Compare February 16, 2026 04:58
@junrushao junrushao marked this pull request as draft February 16, 2026 04:58
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a centralized, reflection-based representation system (ffi.ReprPrint) for FFI objects, which is a great improvement for debuggability and consistency. The implementation correctly handles cycles and duplicate references in object graphs using a BFS traversal. The changes to delegate __repr__ in Python containers to this new system are clean. The new tests are comprehensive. I've found a few areas for improvement regarding code duplication, efficiency, and error handling, which are detailed in the specific comments. Overall, this is a solid feature addition.

@junrushao junrushao force-pushed the 2026-02-14/dataclass-repr branch 3 times, most recently from 4f8962e to 02b2f36 Compare February 16, 2026 05:52
@junrushao junrushao marked this pull request as ready for review February 16, 2026 07:26
@junrushao junrushao force-pushed the 2026-02-14/dataclass-repr branch 2 times, most recently from 3d0562f to 8d10187 Compare February 16, 2026 08:09
@junrushao junrushao mentioned this pull request Feb 16, 2026
15 tasks
Copy link
Member

@tqchen tqchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to discuss the text format a bit mainly in our behavior of common refererence, is there a precedence we can refer to?

@tqchen
Copy link
Member

tqchen commented Feb 16, 2026

Looks like great improvement, would be good to discuss a bit how are we thinking in terms of repr printing and behavior against python repr. The repr do not exactly need round trippable(serialization perhaps is a better choice there). So there is a question whether we want to have the duplicated value printing (or as an option), and whether it should be default.

The default behavior of python atm is simply expand. Expansion also could make sense for cases like immutable data structure. Say a shape value get reference in multiple places beause of the way we copy the data structure

x = (1,2,3)
y = (1,4)

print([y, y])
> [(1,4), (1,4)]

# circle case
x = [12]
x.append(x)
> [12, [...]]

This being said, there can be value in cases where we might want duplicated value printing. Perhaps we can do it under a flag.

return String(FormatBytes(obj->data, obj->size));
}

String ReprTensor(const TensorObj* obj, const Function& fn_repr) {
Copy link
Member

@tqchen tqchen Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although this one is concise, personally i think it is good to be explicit here

Tensor(shape=(1, 2), dtype="float32", device="cuda:0")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a matter of personal taste, but also please consider the output length and readability to users. In that case:

float32[10, 20]@gpu:0

seems an overall win.

Copy link
Member

@tqchen tqchen Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree it is a close call. My original thinking is ideally align with existing ones and not inventing new syntax (that users needs to learn from). Indeed the output synax would be longer, but being explicit allows user not having to learn what float32[10, 20] means.

Just to note some of the nit comments when i read the new syntax:

  • when looking at it, i would have questions like "is it maps to an on stack raw array?" (in which case it is not, and it maps to a tensor).
    • Tensor(shape=(1, 2), dtype="float32", device="cuda:0") avoids that confusion at the cost of slightly longer
  • Another minor nit is that @gpu:0 syntax conflicts a bit with @{addr} although it is really nit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the syntax. @0x{addr} shows up only when TVM_FFI_REPR_WITH_ADDR is set to 1. It means there's no point of confusion

@junrushao junrushao force-pushed the 2026-02-14/dataclass-repr branch from 8d10187 to 3d1fe18 Compare February 17, 2026 09:10
@junrushao junrushao changed the title feat: add BFS-based ffi.ReprPrint for unified object repr feat: add DFS-based ffi.ReprPrint for unified object repr Feb 17, 2026
@junrushao junrushao force-pushed the 2026-02-14/dataclass-repr branch from 3d1fe18 to 9df6b09 Compare February 17, 2026 09:12
@junrushao
Copy link
Member Author

Updated the text format and duplication handling

@junrushao junrushao force-pushed the 2026-02-14/dataclass-repr branch from 9df6b09 to a709442 Compare February 18, 2026 07:54
@junrushao
Copy link
Member Author

junrushao commented Feb 18, 2026

The default behavior of python atm is simply expand. Expansion also could make sense for cases like immutable data structure. Say a shape value get reference in multiple places beause of the way we copy the data structure

x = (1,2,3)
y = (1,4)

print([y, y])
> [(1,4), (1,4)]

# circle case
x = [12]
x.append(x)
> [12, [...]]

This being said, there can be value in cases where we might want duplicated value printing. Perhaps we can do it under a flag.

Behavior from the latest commit:

import tvm_ffi as ffi
from tvm_ffi._ffi_api import ReprPrint

x = ffi.List([12])
x.append(x)
print(ReprPrint(x))
# gives: [12, ...]

y = ffi.Array((1,4))
y = ffi.Array((y, y))
print(ReprPrint(y))
# gives: ((1, 4), (1, 4))

@tqchen LMK if it looks good to you

@junrushao junrushao force-pushed the 2026-02-14/dataclass-repr branch from a709442 to 997cbf1 Compare February 18, 2026 08:07
- Single C++ ffi.ReprPrint function handles all types
- DFS with 3-state tracking (NotVisited/InProgress/Done):
  - DAGs: memoized repr returned in full on re-encounter
  - Cycles: detected via InProgress state, shown as ...
- Addresses hidden by default; set TVM_FFI_REPR_WITH_ADDR=1 to show
- Per-field Repr(false) to exclude fields from repr output
- Built-in repr for String, Bytes, Tensor, Shape, Array, List, Map
- All Python __repr__ methods delegate to this function
@junrushao junrushao force-pushed the 2026-02-14/dataclass-repr branch from 997cbf1 to cf535ab Compare February 18, 2026 08:22
@tqchen tqchen merged commit b648c5d into apache:main Feb 18, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments