Skip to content

Pass entity index to integration kernel for cell-dependent custom data#4210

Open
sclaus2 wants to merge 20 commits into
mainfrom
sclaus2/pass_local_index_to_integration_kernel
Open

Pass entity index to integration kernel for cell-dependent custom data#4210
sclaus2 wants to merge 20 commits into
mainfrom
sclaus2/pass_local_index_to_integration_kernel

Conversation

@sclaus2
Copy link
Copy Markdown
Contributor

@sclaus2 sclaus2 commented May 12, 2026

Pass assembly loop index to UFCx kernels for cell-dependent custom data

Motivation

This PR is to start a discussion about how to pass the current entity index to the integration kernel. This PR is an extension of PR #4013

Currently the entity index is not exposed to the integration kernels which makes the use of cell based custom_data impossible. Cell based custom_data is for example essential in runtime quadrature, where the quadrature rule varies from entity to entity.

I see three natural ways to provide this context:

  1. Store a callback in the form and let the assembler ask the callback for the
    data pointer for the current loop entry.
  2. Wrap the user-provided data pointer in a small context object that also
    stores the current loop index.
  3. Pass the current loop index through the existing entity_local_index
    argument of the UFCx kernel.

This PR implements option 3. The implementation is intentionally small and keeps
the UFCx kernel signature unchanged, but the choice is not obvious. The rest of
this note compares the options.

Option 1: Per-entity callback std::function<void*(std::int32_t loop_index)> custom_data

The form could store a callable object, for example:

std::function<void*(std::int32_t loop_index)> custom_data;

The assembler would call this function for each entity before invoking the UFCx
kernel:

for (std::int32_t index = 0; index < num_entities; ++index)
{
  void* data = custom_data(index);
  kernel(A.data(), coeffs, constants, coordinate_dofs, entity_local_index,
         quadrature_permutation, data);
}

The kernel continues to receive a plain void*. The pointer can refer directly
to the data for the current entity, so the kernel does not need to know about the
global layout of the runtime data structure.

Advantages

  • Very flexible. The callback can use arrays, maps, caches, or different data sources for different entities.
  • The UFCx kernel signature is unchanged.
  • The kernel sees exactly the data for the current entity, which keeps kernel
    code simple.
  • It naturally supports cases where the data is not stored in a dense array
    indexed by the assembly loop.

Disadvantages

  • It adds an indirect function call inside the innermost assembly loop.
  • It is harder for the compiler to inline and optimize than direct indexing.
  • It requires a new callback-bearing API in C++ and corresponding Python binding
    decisions.
  • Ownership and lifetime rules become more complicated, especially with Python
    objects being involved.
  • It may be more general than the common runtime quadrature use case needs.

Option 2: Context struct passed through custom_data

The assembler could wrap the user pointer in a small DOLFINx-owned context
object:

struct CustomDataContext
{
  void* data;
  std::int32_t loop_index;
};

The assembler would update the index before each kernel call:

CustomDataContext ctx{user_custom_data, 0};

for (std::int32_t index = 0; index < num_entities; ++index)
{
  ctx.loop_index = index;
  kernel(A.data(), coeffs, constants, coordinate_dofs, entity_local_index,
         quadrature_permutation, &ctx);
}

The kernel would cast custom_data to this context type and then use
ctx->loop_index to select the correct runtime data:

auto* ctx = static_cast<CustomDataContext*>(custom_data);
auto* data = static_cast<RuntimeQuadratureData*>(ctx->data);
auto& rule = data->rules[ctx->loop_index];

Advantages

  • The UFCx kernel signature is unchanged.
  • The meaning is explicit: custom_data becomes a context containing both the
    user data pointer and the current loop index.
  • The per-entity overhead is small, just one integer write before the kernel
    call.
  • It is extensible. Additional fields such as the mesh cell id, entity type, or
    number of entities could be added later if DOLFINx defines a stable ABI for
    the context.
  • It avoids an indirect callback call in the assembly loop.

Disadvantages

  • It changes the interpretation of custom_data. Kernels would need to know
    whether they receive the raw user pointer or a DOLFINx context wrapper.
  • A public context layout would have to be specified and kept stable.
  • Existing custom kernels that directly cast custom_data to their own type
    would not automatically work with wrapped data unless wrapping is opt-in or
    otherwise carefully designed.
  • The lifetime of the context object must be clear. A stack-allocated context is
    fine for synchronous assembly, but the pointer must not escape the kernel
    call.
  • It introduces a new DOLFINx-specific convention on top of the generic UFCx
    void*.

Option 3: Pass the loop index through entity_local_index

This PR implements the least invasive option: keep passing the user
custom_data pointer unchanged, and pass the current assembly loop index through
the existing entity_local_index kernel argument.

For cell integrals, entity_local_index is not otherwise needed by the kernel,
so it can point directly to the current loop index:

for (std::int32_t index = 0; index < num_cells; ++index)
{
  std::int32_t cell = cells[index];
  std::int32_t entity_local_index = index;

  kernel(A.data(), &coeffs(index, 0), constants.data(), coordinate_dofs.data(),
         &entity_local_index, quadrature_permutation, custom_data);
}

For exterior facet, vertex, and ridge integrals, DOLFINx currently passes a
pointer to the local entity index. The loop index can be attached efficiently by
passing a small stack array that keeps the existing local entity index in the
first entry and adds the assembly loop index in the second entry:

for (std::int32_t index = 0; index < num_entities; ++index)
{
  std::int32_t cell = entities(index, 0);
  std::int32_t local_entity = entities(index, 1);
  std::array<std::int32_t, 2> entity_local_index = {local_entity, index};

  kernel(A.data(), &coeffs(index, 0), constants.data(), coordinate_dofs.data(),
         entity_local_index.data(), quadrature_permutation, custom_data);
}

For interior facets, DOLFINx currently passes the two local facet indices, one
for each adjacent cell. The loop index can be appended:

for (std::int32_t index = 0; index < num_facets; ++index)
{
  std::array<std::int32_t, 3> entity_local_index = {
      local_facet[0], local_facet[1], index};

  kernel(A.data(), &coeffs(index, 0, 0), constants.data(),
         coordinate_dofs.data(), entity_local_index.data(),
         quadrature_permutation, custom_data);
}

Advantages

  • The UFCx kernel signature is unchanged.

  • Existing custom_data pointers are passed through unchanged.

  • The implementation is small: the assembler only writes one additional integer
    into data that is already passed to the kernel.

  • The runtime cost is negligible and comparable to option 2.

  • It is easy for a runtime quadrature kernel to use:

    // Cell integrals: entity_local_index[0]
    // Exterior facets/entities: entity_local_index[1]
    // Interior facets: entity_local_index[2]
    std::int32_t index = entity_local_index[loop_index_position];
    auto* data = static_cast<RuntimeQuadratureData*>(custom_data);
    auto& rule = data->rules[index];

Disadvantages

  • Existing generated kernels may assume a particular size or meaning for
    entity_local_index.
  • It is less self-documenting than a named context struct.
  • It only provides an index. More complex lookup logic still has to live inside
    the object behind custom_data.
  • If future UFCx changes need more runtime context, this approach may not scale
    as cleanly as a dedicated context object.

This option is a pragmatic way to make cell-dependent custom_data useful now,
with minimal changes to DOLFINx and no UFCx signature change.

Comparison

Criterion Callback Context struct entity_local_index
Per-entity overhead Indirect call Integer write Integer write
Keeps raw custom_data pointer Yes, but selected per entity No, if always wrapped Yes
Supports complex lookup Yes Only through user data Only through user data
Semantics Clear callback Clear context Existing argument is overloaded
Extensibility Good for lookup policy Good if ABI is defined Limited

Why this PR implements option 3

Option 3 is implemented because it solves the immediate runtime quadrature
problem with the smallest API and ABI surface:

  • The kernel signature stays the same.
  • Existing custom_data storage remains valid.
  • DOLFINx does not need to store or bind a per-entity callback.
  • The assembler already passes entity_local_index to the kernel, so the change
    is localized to how that array is populated.

This makes it a useful first implementation for discussion. It demonstrates that
cell-dependent runtime data can be selected inside the kernel without requiring
quadrature data to be globally constant across all cells.

Proposed discussion point

This PR chooses option 3 because it is the least invasive way to support
cell-dependent runtime quadrature data. However, option 2 may be semantically
cleaner if DOLFINx wants custom_data to become a general runtime context and Option 1 is the most flexible but adds API complexity and an indirect call in the assembly loop.

I would be grateful for your opinions and input as the cell based custom_data feature is essential for my work on runtime quadrature rules.

sclaus2 and others added 20 commits December 1, 2025 14:01
Replace void* with nullptr pattern with std::optional<void*> for
custom_data in the Form class and assembly functions. This is the
canonical modern C++ approach for representing optional values.

Changes:
- Form.h: integral_data::custom_data now uses std::optional<void*>
- Form.h: custom_data() returns std::optional<void*>
- Form.h: set_custom_data() accepts std::optional<void*>
- assemble_*_impl.h: Function parameters use std::optional<void*>
- assemble_*_impl.h: Kernel calls use .value_or(nullptr)
- Python bindings: Handle std::optional with None support
- Test: Update to expect None instead of 0 for unset custom_data

Benefits:
- Clearer intent: "optional value" vs "magic nullptr"
- Type safety: Distinguishes "no value" from "null pointer value"
- Pythonic: Returns None instead of 0 when not set
- Zero-cost: .value_or(nullptr) compiles to same code
Address review comments to maintain Form immutability:
- Remove set_custom_data method from Form class
- Pass custom_data as 5th element of integrals tuple at Form creation
- Integrals tuple format: (id, kernel_ptr, entities, coeffs, custom_data)
- custom_data can be None (maps to std::nullopt) or a pointer (uintptr_t)

Pass cell index through entity_local_index for cell integrals:
- Cell integrals now receive &cell instead of nullptr for entity_local_index
- Enables per-cell data lookup in custom kernels via custom_data
- FFCx-generated kernels don't read entity_local_index for cells (return 0)

Safety documentation:
- Add warning about void pointer safety to Form.h and Python bindings
- Document that users must keep data alive while Form is in use

Update tests:
- Update test_custom_jit_kernels.py to use 5-element tuples
- Expand test_custom_data.py with per-cell material property example
…ata_to_assembler

# Conflicts:
#	cpp/dolfinx/fem/assemble_matrix_impl.h
#	cpp/dolfinx/fem/assemble_vector_impl.h
#	python/dolfinx/wrappers/fem.cpp
@jhale
Copy link
Copy Markdown
Member

jhale commented May 12, 2026

I'm not sure I agree with your LLMs reasoning here - best jump in a thread on #development before continuing.

@sclaus2
Copy link
Copy Markdown
Contributor Author

sclaus2 commented May 12, 2026

I am very open to suggestions. I wrote a short message in #development .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants