Skip to content

Adding SYCL bindless image support#612

Open
juanchuletas wants to merge 1 commit intoRenderKit:develfrom
juanchuletas:feature/sycl-bindless-image-support
Open

Adding SYCL bindless image support#612
juanchuletas wants to merge 1 commit intoRenderKit:develfrom
juanchuletas:feature/sycl-bindless-image-support

Conversation

@juanchuletas
Copy link
Copy Markdown

@juanchuletas juanchuletas commented Mar 24, 2026

Description

This PR adds the initial infrastructure for SYCL bindless image support in OSPRay's GPU module. It introduces four virtual functions in DeviceRT for managing hardware texture resources (image memory allocation and sampled image handles), implements them for the SYCL backend, and wires the calls into Texture2D::commit() after MIP generation.

The kernel-side change (replacing software sampling with sample_mipmap in Texture2D_get) is not included in this PR, as it is blocked on a DPC++ compiler upgrade.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Refactor (code change that neither fixes a bug nor adds a feature)
  • Documentation (changes to docs only)
  • Performance (improves performance)

Changes Made

  • DeviceRT.h: four pure virtual functions (createImageMemHandle, freeImageMemHandle, createSampledImageHandle, freeSampledImageHandle)
  • DeviceRTImpl.h (CPU): no-op stub overrides
  • DeviceRTImpl_sycl.h/.cpp: SYCL implementation with OSPTextureFormat to image_channel_type mapping, alloc_image_mem with mipmap descriptor, per-level ext_oneapi_copy, and sampled image creation with bindless_image_sampler
  • Texture2D.cpp: commit() calls both create functions after MIP pyramid generation

Technical Details

Separate image_mem_handle (VRAM allocation, cached and shared across textures using the same data) from sampled_image_handle (per-texture, carries filter and wrap mode for the hardware sampler).

The image_mem_handle is stored in an imageMemCache map keyed on raw_handle for descriptor retrieval when creating the sampled handle. The sampled_image_handle is stored in Texture2DShared::data[0] for kernel access.

Testing

  • Tested locally
  • Visual comparison before/after
  • Performance benchmarked
  • Edge cases verified

Screenshots / Results

Performance Impact

Related Issues

Closes #610

Checklist

  • Code compiles without warnings
  • Code follows project style guidelines
  • Self-reviewed my own code
  • Commented hard-to-understand areas
  • No unnecessary debug code left behind

SYCL implementation for image_mem_handle and sampled_image_handle.
Kernel-side sample_mipmap pending compiler upgrade.
Copy link
Copy Markdown
Contributor

@johguenther johguenther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising!

Please use clang-format when finished to adjust to the coding style.

sycl::float4 res = syclexp::sample_mipmap<sycl::float4>(
handle,
sycl::float2{st.x, st.y},
0.f);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use calcLambda(pixelFootprint, self->size, filter_nearest) for the (fractional) level.
Maybe there is also a way to use the anisotropic version of the sample_mipmap function (i.e., calculate Dx/Dy from pixelFootprint), which is cheaper, since calcLamba uses log.

void *imgMemHandlePtr, const OSPTextureFilter filter, const vec2ui wrapMode)
{
sycl::addressing_mode addressingMode;
switch (wrapMode.x) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OSPRay supports wrapMode per dimension (x and y).

syclexp::image_mem_handle memHandle;
syclexp::image_descriptor desc;
};
std::unordered_map<void *, ImageMemEntry> imageMemCache;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra cache just to remember the imgDesc seems a bit heavy. Just pass vec2i size again to createSampledImageHandle (then all information is present to locally re-create an imgDesc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accelerate textures of the GPU module

2 participants