feat(scalarization): Add readme#749
Conversation
PierreQuinton
left a comment
There was a problem hiding this comment.
So think that if this is a good idea to add, then it should be dev oriented only. Here it looks like documentation, which it should not be. The way I view this would be to explain how to create scalarizers, what to be careful about (when are they stateful, avoid non-atomic things, etc...), what are not scalarizers? etc...
What do you think of this?
|
Yea I will make it dev oriented |
|
|
||
| - **Any shape in, scalar out:** it reduces over *all* elements of `values` (0-dim, vector, matrix, | ||
| higher-dim) into a 0-dim scalar. | ||
| - **`values`, not `losses`:** a scalarizer is generic and not tied to losses. |
There was a problem hiding this comment.
I think we may want to speak more about the abstraction level rather than saying that. Not sure how though.
There was a problem hiding this comment.
rewrote it around the abstraction
| Anything that needs the model, its parameters, or the per-task gradients belongs in the | ||
| [aggregation](../aggregation) package as a `Weighting` / `Aggregator`, which operates on the Jacobian | ||
| or its Gramian. If you reach for gradient norms or the network inside `forward`, you are writing an | ||
| aggregator, not a scalarizer. |
There was a problem hiding this comment.
Well the aggregation package should not contain anything else than Scalarizers, it contains aggregators (that could be stateful).
There was a problem hiding this comment.
Fixed. It now says the gradient-level counterpart in the aggregation package is an Aggregator (which, like a scalarizer, can be stateful) that operates on the Jacobian or its Gramian.
| - **Subclass `Stateful`** (`from torchjd._mixins import Stateful`) and implement `reset()` to restore | ||
| the initial state. |
There was a problem hiding this comment.
Do we have Randomness? If so do we have the Stochastic mixin that we also use in aggregation? Maybe we can mention it here.
There was a problem hiding this comment.
We do have a random baseline, but there is no Stochastic mixin. It just calls torch.randn directly
There was a problem hiding this comment.
My bad, I thought that In the aggregation package, we consider random aggregators as stateful (the seed is the state), I think it would be beneficial to do that in scalarization. @ValerianRey Didn't we go in that direction? Did we abandon that idea?
Co-authored-by: Pierre Quinton <pierre.quinton@gmail.com>
Adds a simple README