Skip to content

Solari v7#24767

Open
JMS55 wants to merge 16 commits into
bevyengine:mainfrom
JMS55:solari7-unified-restir
Open

Solari v7#24767
JMS55 wants to merge 16 commits into
bevyengine:mainfrom
JMS55:solari7-unified-restir

Conversation

@JMS55

@JMS55 JMS55 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

This is a major rewrite of Solari.

The old version did DI, diffuse GI, and specular GI / mirror DI in 3 separate passes. Only diffuse rays were resampled, and no rays were traced for cross-domain MIS weight, leading to bias.

This new version is one unified algorithm that handles diffuse and specular DI and GI all in one reservoir. Resampling takes the full BRDF into account, as well as cross-domain visibility in MIS weights.

Advantages:

  • Much less biased
  • Much less prone to GI light-leaks
  • Improved specular GI quality
  • Improved DI quality for nearby lights thanks to BRDF sampling for DI
  • Less memory usage

Disadvantages:

  • A bit slower due to the extra rays traced

Showcase

Reference
image-6-fixed
This PR (with DLSS)
image-7-fixed
Main (with DLSS)
image-9-fixed
This PR
image-8-fixed
Main
image-10-fixed

@JMS55 JMS55 added this to the 0.20 milestone Jun 26, 2026
@JMS55 JMS55 added A-Rendering Drawing game state to the screen D-Complex Quite challenging from either a design or technical perspective. Ask for help! S-Needs-Review Needs reviewer attention (from anyone!) to move forward C-Refinement Improves output quality, without fixing a clear bug or adding new functionality. labels Jun 26, 2026
@github-project-automation github-project-automation Bot moved this to Needs SME Triage in Rendering Jun 26, 2026
@JMS55 JMS55 requested a review from SparkyPotato June 26, 2026 18:36
@JMS55 JMS55 requested a review from IceSentry June 28, 2026 02:56
@github-actions

Copy link
Copy Markdown
Contributor

You added a new example but didn't add metadata for it. Please update the root Cargo.toml file.

Comment thread crates/bevy_solari/src/realtime/world_cache_query.wgsl
Comment thread crates/bevy_solari/src/realtime/world_cache_query.wgsl Outdated
Comment thread crates/bevy_solari/src/realtime/mod.rs Outdated
Comment thread crates/bevy_solari/src/realtime/initial_path.wgsl Outdated
// throughput_past_x1 drops the primary brdf*cos, so multiply it back by primary_brdf_at_x2 to
// recover the bounded throughput.
let full_throughput = path.throughput_past_x1 * max(path.primary_brdf_at_x2, vec3(0.0001));
let rr = saturate(luminance(full_throughput));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saturation shouldn't be required, RR will fail anyways without

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this is true - because otherwise when you divide throughput by rr, you could divide by a value >1, even though the path will always be kept.

I fixed this just now in pathtracer.wgsl to make it consistent though.

Comment thread crates/bevy_solari/src/realtime/initial_path.wgsl Outdated
Comment thread crates/bevy_solari/src/realtime/initial_path.wgsl Outdated
Comment thread crates/bevy_solari/src/realtime/initial_path.wgsl Outdated
Comment thread crates/bevy_solari/src/realtime/initial_path.wgsl
@JMS55 JMS55 requested a review from SparkyPotato June 28, 2026 17:39

@stuartparmenter stuartparmenter left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The math looks right, but given the complexity and to double check it all I ran a small simulation against using your math to check for bias and it looks good after your fixes. I also wired it in to a number of example scenes (bistro, etc) and it looks better / don't see any visual regressions.

let merge_result = merge_reservoirs(initial_reservoir, surface.world_position, surface.world_normal, surface.material,
temporal.reservoir, temporal.world_position, temporal.world_normal, temporal.material, previous_camera_world_position, false, &rng);

reservoirs_b[pixel_index] = merge_result.merged_reservoir;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried storing only the pixel index of the sample (either of the temporal neighbor or center pixel, whichever is selected) here instead of the full reservoir? Because when reusing you basically just move samples between pixels so instead of rewriting the whole sample from one pixel to another you could just keep track of the pixel index.

In the spatial pass you would read from reservoirs_b[temporal_output_pixels_indices[pixel_index]]

This can save a bit of bandwith by not writing out full reservoirs in the temporal pass

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you wouldn't have to keep the sample of the merged_reservoir in registers either

reservoirs_b[pixel_index] = initial.reservoir;
}

@compute @workgroup_size(8, 8, 1)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much everywhere you dispatch your kernels in (8, 8) 2D blocks
Then you compute your pixel index as let pixel_index = global_id.x + global_id.y * u32(view.main_pass_viewport.z);

Visualizing one 44 workgroup, with warps size 8 (which, for demonstrating my point, is equivalent to your 88 with warps size 32, but 4*4 is easier to draw below), this means that your warps look like this on your framebuffer:

[ w0 w0 w0 w0]
[ w0 w0 w0 w0]
[ w1 w1 w1 w1]
[ w1 w1 w1 w1]

This halves coalescing in the 44 scenario, and divides by 4 coalescing in the real-usecase 88 + warps 32, so that's 2x or 4x more memory transactions to send respectively. Maybe thread swizzling could help here such that 32 consecutive threads read 32 consecutive addresses in the gbuffer, instead of jumping entire rows because of the 2D workgroups

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read somewhere in another path tracer (iirc it was this one) that dropping textures for the gbuffers and using raw buffers + coalesced accesses was faster

weight_sum: f32,
unbiased_contribution_weight: f32,
radiance: vec3<f32>,
confidence_weight: f32,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the confidence have to be a f32? could be a u8?

// MIS weight against the bounce-0 BRDF-emissive strategy, recomputed from this surface's
// brdf and material rather than baked into the unbiased contribution weight at generation. Mirrors the bounce-0
// nee_mis_weight in generate_initial_reservoir, which puts the same factor in the target.
var nee_mis_weight = 1.0;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there NEE MIS stuff here?

let delta = reservoir.sample_point_world_position - (world_position + world_normal * RAY_T_MIN);
let sample_distance = length(delta);
let wi = delta / sample_distance;
var brdf_radiance = reservoir.radiance * evaluate_brdf(wo, wi, world_normal, material, F_ab);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is re-evaluating the target function as if the BSDF at vertex x2 didn't change. For specular this is not going to be great as you'll potentially be reconnecting to mirror (or very specular) x2 surfaces.

Maybe storing some sort of "x2_rough_enough" flag in the reservoir at initial candidate generation time could help reject bad reconnections like that

}

fn load_spatial_reservoir(pixel_id: vec2<u32>, depth: f32, world_position: vec3<f32>, world_normal: vec3<f32>, rng: ptr<function, u32>) -> NeighborInfo {
for (var i = 0u; i < 5u; i++) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried the 2026 compatibility guided ReSTIR paper? It should be a fairly simple addition in here and could be good for quality, idk about the perf, have to try


fn load_spatial_reservoir(pixel_id: vec2<u32>, depth: f32, world_position: vec3<f32>, world_normal: vec3<f32>, rng: ptr<function, u32>) -> NeighborInfo {
for (var i = 0u; i < 5u; i++) {
let spatial_pixel_id = get_neighbor_pixel_id(pixel_id, SPATIAL_REUSE_RADIUS_PIXELS, rng);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried using the same rng for all threads of the warp to improve memory coalescing?

If correlations are too bad, using the same rng only for half the warp (so 2 different rng seeds per warp) could help the correlations, or 4 seeds per warp, etc... trading perf for correlations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen C-Refinement Improves output quality, without fixing a clear bug or adding new functionality. D-Complex Quite challenging from either a design or technical perspective. Ask for help! S-Needs-Review Needs reviewer attention (from anyone!) to move forward

Projects

Status: Needs SME Triage

Development

Successfully merging this pull request may close these issues.

4 participants