Skip to content

Conversation

@enirolf
Copy link
Contributor

@enirolf enirolf commented Jan 8, 2026

RNTuple's cardinality fields are projected read-only fields, and currently an exception is thrown when a user tries to snapshot fields of this type to a new RNTuple.

To prevent this from happening, with this PR, such fields are instead converted into non-projected fields of the inner ROOT::RNTupleCardinality<SizeT> field (either std::uint32_t or std::uint64_t) before they are added to the model of the new RNTuple. A warning is shown to the user when this happens.

A follow-up/alternative approach is to preserve the projection when creating the model for the output RNTuple. However, this comes with the caveat that the source fields must be included in the output RNTuple. This becomes an issue for cardinality fields of collections of anonymous records (i.e., as is the case for NanoAODs, see paragraph below), since the RNTuple data source here only exposes the inner fields and not the collection field itself, because there is no straightforward way to represent the anonymous record in memory.

A notable scenario is the current implementation of CMS NanoAOD, which in the TTree format contain leaflist arrays. When converting to RNTuple these leaflist arrays, e.g. created via tree.Branch("jet_pt", &jet_pt, "jet_pt[njets]"), the RNTupleImporter creates an anonymous collection record, where jet_pt becomes a true collection field, and njets is a projected field of type RNTupleCardinality. As such, currently RDataFrame is not capable of writing out RNTuple NanoAOD data via Snapshot that preserves the column names for both the collection payload and also the size of the collections. We want to be able to preserve the complete NanoAOD schema.

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

Test Results

    22 files      22 suites   3d 20h 56m 38s ⏱️
 3 792 tests  3 792 ✅ 0 💤 0 ❌
80 337 runs  80 337 ✅ 0 💤 0 ❌

Results for commit b70d77b.

@enirolf enirolf changed the title [df] Enable snapshotting RNTuple cardinality cols [df] Enable snapshotting RNTuple cardinality fields Jan 9, 2026
Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion, I believe we should go ahead with this strategy because:

  • It enables calling Snapshot when reading RNTuple nanoAOD converted from TTree nanoAOD with the output format being also RNTuple
  • While it increases the output file size slightly, it preserves the nanoAOD schema in terms of column names, and it's also in line with what currently already happens when Snapshot stores the collection columns themselves (e.g. Muon_pt) to the output file.

Later on we can think about how to preserve the anonymous record of collections when calling Snapshot from an input RNTuple to an output RNTuple.

model->AddField(ROOT::RFieldBase::Create(fOutputFieldNames[i], typeName).Unwrap());

// Cardinality fields are read-only, so instead we snapshot them as their inner type.
if (typeName.substr(0, 24) == "ROOT::RNTupleCardinality") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlikely, but we may at some point have another type ROOT::RNTupleCardinalityXYZ.

Suggested change
if (typeName.substr(0, 24) == "ROOT::RNTupleCardinality") {
if (typeName.substr(0, 25) == "ROOT::RNTupleCardinality<") {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants