Is your feature request related to a problem? Please describe.
.generate() function is typically availably in transformers *ForCausalLM models.
It doesn’t have to be fast - but it is extremely useful for development, debugging, and small scale evaluations, especially because it provides a reference numerically equivalent implementation to training (unlike vllm and other inference backends).
Describe the solution you'd like
HF API compatible NemotronHForCausalLM.generate method
Describe alternatives you've considered
N/A
Additional context
N/A
Is your feature request related to a problem? Please describe.
.generate()function is typically availably in transformers*ForCausalLMmodels.It doesn’t have to be fast - but it is extremely useful for development, debugging, and small scale evaluations, especially because it provides a reference numerically equivalent implementation to training (unlike vllm and other inference backends).
Describe the solution you'd like
HF API compatible
NemotronHForCausalLM.generatemethodDescribe alternatives you've considered
N/A
Additional context
N/A