I cannot find a method to register a hook. Is there another way this can be accomplished?
for step_i in range(unroll_steps):
states = model.represent(states)
states.register_hook(lambda grad: grad * 0.5)
#Get the loss...
loss.register_hook(1.0/unroll_steps)
optimizer.zero_grad()
loss.backward()
The above is an example in python.
I believe I can do the loss gradient like this:
List<Tensor> grads = [torch.tensor(1.0f / unroll_steps)];
loss.backward(grads);
But I am still unsure how to scale the gradient of the 'states' tensor. Any help would be appreciated.
Just for background context. This is an implementation of MuZero. In their code, the scale the future latent states non-linearly. This gradient happens in the rollout loop prior to the loss gradient which is scaled linearly.
- OS: Windows
- Package Type: torchsharp-cuda-windows
- Version: 0.105.0