Solari v7#24767
Conversation
|
You added a new example but didn't add metadata for it. Please update the root Cargo.toml file. |
| // throughput_past_x1 drops the primary brdf*cos, so multiply it back by primary_brdf_at_x2 to | ||
| // recover the bounded throughput. | ||
| let full_throughput = path.throughput_past_x1 * max(path.primary_brdf_at_x2, vec3(0.0001)); | ||
| let rr = saturate(luminance(full_throughput)); |
There was a problem hiding this comment.
Saturation shouldn't be required, RR will fail anyways without
There was a problem hiding this comment.
Don't think this is true - because otherwise when you divide throughput by rr, you could divide by a value >1, even though the path will always be kept.
I fixed this just now in pathtracer.wgsl to make it consistent though.
stuartparmenter
left a comment
There was a problem hiding this comment.
The math looks right, but given the complexity and to double check it all I ran a small simulation against using your math to check for bias and it looks good after your fixes. I also wired it in to a number of example scenes (bistro, etc) and it looks better / don't see any visual regressions.
| let merge_result = merge_reservoirs(initial_reservoir, surface.world_position, surface.world_normal, surface.material, | ||
| temporal.reservoir, temporal.world_position, temporal.world_normal, temporal.material, previous_camera_world_position, false, &rng); | ||
|
|
||
| reservoirs_b[pixel_index] = merge_result.merged_reservoir; |
There was a problem hiding this comment.
Have you tried storing only the pixel index of the sample (either of the temporal neighbor or center pixel, whichever is selected) here instead of the full reservoir? Because when reusing you basically just move samples between pixels so instead of rewriting the whole sample from one pixel to another you could just keep track of the pixel index.
In the spatial pass you would read from reservoirs_b[temporal_output_pixels_indices[pixel_index]]
This can save a bit of bandwith by not writing out full reservoirs in the temporal pass
There was a problem hiding this comment.
And you wouldn't have to keep the sample of the merged_reservoir in registers either
| reservoirs_b[pixel_index] = initial.reservoir; | ||
| } | ||
|
|
||
| @compute @workgroup_size(8, 8, 1) |
There was a problem hiding this comment.
Pretty much everywhere you dispatch your kernels in (8, 8) 2D blocks
Then you compute your pixel index as let pixel_index = global_id.x + global_id.y * u32(view.main_pass_viewport.z);
Visualizing one 44 workgroup, with warps size 8 (which, for demonstrating my point, is equivalent to your 88 with warps size 32, but 4*4 is easier to draw below), this means that your warps look like this on your framebuffer:
[ w0 w0 w0 w0]
[ w0 w0 w0 w0]
[ w1 w1 w1 w1]
[ w1 w1 w1 w1]
This halves coalescing in the 44 scenario, and divides by 4 coalescing in the real-usecase 88 + warps 32, so that's 2x or 4x more memory transactions to send respectively. Maybe thread swizzling could help here such that 32 consecutive threads read 32 consecutive addresses in the gbuffer, instead of jumping entire rows because of the 2D workgroups
There was a problem hiding this comment.
I've read somewhere in another path tracer (iirc it was this one) that dropping textures for the gbuffers and using raw buffers + coalesced accesses was faster
| weight_sum: f32, | ||
| unbiased_contribution_weight: f32, | ||
| radiance: vec3<f32>, | ||
| confidence_weight: f32, |
There was a problem hiding this comment.
Does the confidence have to be a f32? could be a u8?
| // MIS weight against the bounce-0 BRDF-emissive strategy, recomputed from this surface's | ||
| // brdf and material rather than baked into the unbiased contribution weight at generation. Mirrors the bounce-0 | ||
| // nee_mis_weight in generate_initial_reservoir, which puts the same factor in the target. | ||
| var nee_mis_weight = 1.0; |
| let delta = reservoir.sample_point_world_position - (world_position + world_normal * RAY_T_MIN); | ||
| let sample_distance = length(delta); | ||
| let wi = delta / sample_distance; | ||
| var brdf_radiance = reservoir.radiance * evaluate_brdf(wo, wi, world_normal, material, F_ab); |
There was a problem hiding this comment.
This is re-evaluating the target function as if the BSDF at vertex x2 didn't change. For specular this is not going to be great as you'll potentially be reconnecting to mirror (or very specular) x2 surfaces.
Maybe storing some sort of "x2_rough_enough" flag in the reservoir at initial candidate generation time could help reject bad reconnections like that
| } | ||
|
|
||
| fn load_spatial_reservoir(pixel_id: vec2<u32>, depth: f32, world_position: vec3<f32>, world_normal: vec3<f32>, rng: ptr<function, u32>) -> NeighborInfo { | ||
| for (var i = 0u; i < 5u; i++) { |
There was a problem hiding this comment.
Have you tried the 2026 compatibility guided ReSTIR paper? It should be a fairly simple addition in here and could be good for quality, idk about the perf, have to try
|
|
||
| fn load_spatial_reservoir(pixel_id: vec2<u32>, depth: f32, world_position: vec3<f32>, world_normal: vec3<f32>, rng: ptr<function, u32>) -> NeighborInfo { | ||
| for (var i = 0u; i < 5u; i++) { | ||
| let spatial_pixel_id = get_neighbor_pixel_id(pixel_id, SPATIAL_REUSE_RADIUS_PIXELS, rng); |
There was a problem hiding this comment.
Have you tried using the same rng for all threads of the warp to improve memory coalescing?
If correlations are too bad, using the same rng only for half the warp (so 2 different rng seeds per warp) could help the correlations, or 4 seeds per warp, etc... trading perf for correlations.
This is a major rewrite of Solari.
The old version did DI, diffuse GI, and specular GI / mirror DI in 3 separate passes. Only diffuse rays were resampled, and no rays were traced for cross-domain MIS weight, leading to bias.
This new version is one unified algorithm that handles diffuse and specular DI and GI all in one reservoir. Resampling takes the full BRDF into account, as well as cross-domain visibility in MIS weights.
Advantages:
Disadvantages:
Showcase
Reference





This PR (with DLSS)
Main (with DLSS)
This PR
Main