Pass entity index to integration kernel for cell-dependent custom data by sclaus2 · Pull Request #4210 · FEniCS/dolfinx

sclaus2 · 2026-05-12T12:12:40Z

Pass assembly loop index to UFCx kernels for cell-dependent custom data

Motivation

This PR is to start a discussion about how to pass the current entity index to the integration kernel. This PR is an extension of PR #4013

Currently the entity index is not exposed to the integration kernels which makes the use of cell based custom_data impossible. Cell based custom_data is for example essential in runtime quadrature, where the quadrature rule varies from entity to entity.

I see three natural ways to provide this context:

Store a callback in the form and let the assembler ask the callback for the
data pointer for the current loop entry.
Wrap the user-provided data pointer in a small context object that also
stores the current loop index.
Pass the current loop index through the existing entity_local_index
argument of the UFCx kernel.

This PR implements option 3. The implementation is intentionally small and keeps
the UFCx kernel signature unchanged, but the choice is not obvious. The rest of
this note compares the options.

Option 1: Per-entity callback `std::function<void*(std::int32_t loop_index)> custom_data`

The form could store a callable object, for example:

std::function<void*(std::int32_t loop_index)> custom_data;

The assembler would call this function for each entity before invoking the UFCx
kernel:

for (std::int32_t index = 0; index < num_entities; ++index)
{
  void* data = custom_data(index);
  kernel(A.data(), coeffs, constants, coordinate_dofs, entity_local_index,
         quadrature_permutation, data);
}

The kernel continues to receive a plain void*. The pointer can refer directly
to the data for the current entity, so the kernel does not need to know about the
global layout of the runtime data structure.

Advantages

Very flexible. The callback can use arrays, maps, caches, or different data sources for different entities.
The UFCx kernel signature is unchanged.
The kernel sees exactly the data for the current entity, which keeps kernel
code simple.
It naturally supports cases where the data is not stored in a dense array
indexed by the assembly loop.

Disadvantages

It adds an indirect function call inside the innermost assembly loop.
It is harder for the compiler to inline and optimize than direct indexing.
It requires a new callback-bearing API in C++ and corresponding Python binding
decisions.
Ownership and lifetime rules become more complicated, especially with Python
objects being involved.
It may be more general than the common runtime quadrature use case needs.

Option 2: Context struct passed through `custom_data`

The assembler could wrap the user pointer in a small DOLFINx-owned context
object:

struct CustomDataContext
{
  void* data;
  std::int32_t loop_index;
};

The assembler would update the index before each kernel call:

CustomDataContext ctx{user_custom_data, 0};

for (std::int32_t index = 0; index < num_entities; ++index)
{
  ctx.loop_index = index;
  kernel(A.data(), coeffs, constants, coordinate_dofs, entity_local_index,
         quadrature_permutation, &ctx);
}

The kernel would cast custom_data to this context type and then use
ctx->loop_index to select the correct runtime data:

auto* ctx = static_cast<CustomDataContext*>(custom_data);
auto* data = static_cast<RuntimeQuadratureData*>(ctx->data);
auto& rule = data->rules[ctx->loop_index];

Advantages

The UFCx kernel signature is unchanged.
The meaning is explicit: custom_data becomes a context containing both the
user data pointer and the current loop index.
The per-entity overhead is small, just one integer write before the kernel
call.
It is extensible. Additional fields such as the mesh cell id, entity type, or
number of entities could be added later if DOLFINx defines a stable ABI for
the context.
It avoids an indirect callback call in the assembly loop.

Disadvantages

It changes the interpretation of custom_data. Kernels would need to know
whether they receive the raw user pointer or a DOLFINx context wrapper.
A public context layout would have to be specified and kept stable.
Existing custom kernels that directly cast custom_data to their own type
would not automatically work with wrapped data unless wrapping is opt-in or
otherwise carefully designed.
The lifetime of the context object must be clear. A stack-allocated context is
fine for synchronous assembly, but the pointer must not escape the kernel
call.
It introduces a new DOLFINx-specific convention on top of the generic UFCx
void*.

Option 3: Pass the loop index through `entity_local_index`

This PR implements the least invasive option: keep passing the user
custom_data pointer unchanged, and pass the current assembly loop index through
the existing entity_local_index kernel argument.

For cell integrals, entity_local_index is not otherwise needed by the kernel,
so it can point directly to the current loop index:

for (std::int32_t index = 0; index < num_cells; ++index)
{
  std::int32_t cell = cells[index];
  std::int32_t entity_local_index = index;

  kernel(A.data(), &coeffs(index, 0), constants.data(), coordinate_dofs.data(),
         &entity_local_index, quadrature_permutation, custom_data);
}

For exterior facet, vertex, and ridge integrals, DOLFINx currently passes a
pointer to the local entity index. The loop index can be attached efficiently by
passing a small stack array that keeps the existing local entity index in the
first entry and adds the assembly loop index in the second entry:

for (std::int32_t index = 0; index < num_entities; ++index)
{
  std::int32_t cell = entities(index, 0);
  std::int32_t local_entity = entities(index, 1);
  std::array<std::int32_t, 2> entity_local_index = {local_entity, index};

  kernel(A.data(), &coeffs(index, 0), constants.data(), coordinate_dofs.data(),
         entity_local_index.data(), quadrature_permutation, custom_data);
}

For interior facets, DOLFINx currently passes the two local facet indices, one
for each adjacent cell. The loop index can be appended:

for (std::int32_t index = 0; index < num_facets; ++index)
{
  std::array<std::int32_t, 3> entity_local_index = {
      local_facet[0], local_facet[1], index};

  kernel(A.data(), &coeffs(index, 0, 0), constants.data(),
         coordinate_dofs.data(), entity_local_index.data(),
         quadrature_permutation, custom_data);
}

Advantages

The UFCx kernel signature is unchanged.
Existing custom_data pointers are passed through unchanged.
The implementation is small: the assembler only writes one additional integer
into data that is already passed to the kernel.
The runtime cost is negligible and comparable to option 2.

It is easy for a runtime quadrature kernel to use:

// Cell integrals: entity_local_index[0]
// Exterior facets/entities: entity_local_index[1]
// Interior facets: entity_local_index[2]
std::int32_t index = entity_local_index[loop_index_position];
auto* data = static_cast<RuntimeQuadratureData*>(custom_data);
auto& rule = data->rules[index];

Disadvantages

Existing generated kernels may assume a particular size or meaning for
entity_local_index.
It is less self-documenting than a named context struct.
It only provides an index. More complex lookup logic still has to live inside
the object behind custom_data.
If future UFCx changes need more runtime context, this approach may not scale
as cleanly as a dedicated context object.

This option is a pragmatic way to make cell-dependent custom_data useful now,
with minimal changes to DOLFINx and no UFCx signature change.

Comparison

Criterion	Callback	Context struct	`entity_local_index`
Per-entity overhead	Indirect call	Integer write	Integer write
Keeps raw `custom_data` pointer	Yes, but selected per entity	No, if always wrapped	Yes
Supports complex lookup	Yes	Only through user data	Only through user data
Semantics	Clear callback	Clear context	Existing argument is overloaded
Extensibility	Good for lookup policy	Good if ABI is defined	Limited

Why this PR implements option 3

Option 3 is implemented because it solves the immediate runtime quadrature
problem with the smallest API and ABI surface:

The kernel signature stays the same.
Existing custom_data storage remains valid.
DOLFINx does not need to store or bind a per-entity callback.
The assembler already passes entity_local_index to the kernel, so the change
is localized to how that array is populated.

This makes it a useful first implementation for discussion. It demonstrates that
cell-dependent runtime data can be selected inside the kernel without requiring
quadrature data to be globally constant across all cells.

Proposed discussion point

This PR chooses option 3 because it is the least invasive way to support
cell-dependent runtime quadrature data. However, option 2 may be semantically
cleaner if DOLFINx wants custom_data to become a general runtime context and Option 1 is the most flexible but adds API complexity and an indirect call in the assembly loop.

I would be grateful for your opinions and input as the cell based custom_data feature is essential for my work on runtime quadrature rules.

Replace void* with nullptr pattern with std::optional<void*> for custom_data in the Form class and assembly functions. This is the canonical modern C++ approach for representing optional values. Changes: - Form.h: integral_data::custom_data now uses std::optional<void*> - Form.h: custom_data() returns std::optional<void*> - Form.h: set_custom_data() accepts std::optional<void*> - assemble_*_impl.h: Function parameters use std::optional<void*> - assemble_*_impl.h: Kernel calls use .value_or(nullptr) - Python bindings: Handle std::optional with None support - Test: Update to expect None instead of 0 for unset custom_data Benefits: - Clearer intent: "optional value" vs "magic nullptr" - Type safety: Distinguishes "no value" from "null pointer value" - Pythonic: Returns None instead of 0 when not set - Zero-cost: .value_or(nullptr) compiles to same code

Address review comments to maintain Form immutability: - Remove set_custom_data method from Form class - Pass custom_data as 5th element of integrals tuple at Form creation - Integrals tuple format: (id, kernel_ptr, entities, coeffs, custom_data) - custom_data can be None (maps to std::nullopt) or a pointer (uintptr_t) Pass cell index through entity_local_index for cell integrals: - Cell integrals now receive &cell instead of nullptr for entity_local_index - Enables per-cell data lookup in custom kernels via custom_data - FFCx-generated kernels don't read entity_local_index for cells (return 0) Safety documentation: - Add warning about void pointer safety to Form.h and Python bindings - Document that users must keep data alive while Form is in use Update tests: - Update test_custom_jit_kernels.py to use 5-element tuples - Expand test_custom_data.py with per-cell material property example

…ata_to_assembler # Conflicts: # cpp/dolfinx/fem/assemble_matrix_impl.h # cpp/dolfinx/fem/assemble_vector_impl.h # python/dolfinx/wrappers/fem.cpp

jhale · 2026-05-12T14:07:43Z

I'm not sure I agree with your LLMs reasoning here - best jump in a thread on #development before continuing.

sclaus2 · 2026-05-12T15:41:25Z

I am very open to suggestions. I wrote a short message in #development .

sclaus2 and others added 20 commits December 1, 2025 14:01

Pass custom_data through assemblers to kernel functions.

fa84113

add None as fifth argument to integrals

c073221

add runtime quadrature rule test

8ea554b

Make sure that just custom_data is passed. No other changes

f62e7de

fix ruff and add ffcx branch for custom_data ptr casting

310f93e

fix clang-format

2bfcf1a

fix docs

0864699

fix clang

709b0f4

more doc fixes

dd72a49

Fix tests in parallel

487fb2e

Merge branch 'main' into sclaus2/pass_custom_data_to_assembler

0f7d14b

Merge branch 'main' into sclaus2/pass_custom_data_to_assembler

f7a5ac0

Merge remote-tracking branch 'origin/main' into sclaus2/pass_custom_d…

9f90f32

…ata_to_assembler # Conflicts: # cpp/dolfinx/fem/assemble_matrix_impl.h # cpp/dolfinx/fem/assemble_vector_impl.h # python/dolfinx/wrappers/fem.cpp

fix ruff check

084a5da

Format custom data expression helper

c5df0c3

clang formatting

bf90fc4

fix import dolfinx in test_custom_data

c2bd5c4

pass loop index to integration kernel for runtime integration

ab1ff71

sclaus2 requested review from chrisrichardson, jhale, jorgensd and schnellerhase May 12, 2026 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pass entity index to integration kernel for cell-dependent custom data#4210

Pass entity index to integration kernel for cell-dependent custom data#4210
sclaus2 wants to merge 20 commits into
mainfrom
sclaus2/pass_local_index_to_integration_kernel

sclaus2 commented May 12, 2026

Uh oh!

jhale commented May 12, 2026

Uh oh!

sclaus2 commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sclaus2 commented May 12, 2026

Pass assembly loop index to UFCx kernels for cell-dependent custom data

Motivation

Option 1: Per-entity callback std::function<void*(std::int32_t loop_index)> custom_data

Advantages

Disadvantages

Option 2: Context struct passed through custom_data

Advantages

Disadvantages

Option 3: Pass the loop index through entity_local_index

Advantages

Disadvantages

Comparison

Why this PR implements option 3

Proposed discussion point

Uh oh!

jhale commented May 12, 2026

Uh oh!

sclaus2 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Option 1: Per-entity callback `std::function<void*(std::int32_t loop_index)> custom_data`

Option 2: Context struct passed through `custom_data`

Option 3: Pass the loop index through `entity_local_index`

sclaus2 commented May 12, 2026 •

edited

Loading