You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix ACT layer gradient computation on CUDA (#3128)
Move effective_weights accumulation into update_act_state() and
finalize_act_output() kernels so that the weights used in backward()
match the actual forward pass computation.
Previously, true_effective_weights_ was computed on the host using
remainders_/cumulative_halting_ values that became stale after CUDA
kernels updated them on the device. This caused gradient mismatches
in test_layer() on GPU builds.
0 commit comments