[fix] regression introduced by #45534#46456
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
For the models with no LM head, why set |
There was a problem hiding this comment.
Should this be propagated to modelling_glmasr.py?
And tie_word_embeddings updated?
There was a problem hiding this comment.
It's already propagated, but since _tied_weights_keys has been removed from AudioFlamingo3ForConditionalGeneration, this needs to be readded here (so modeling stays the same)
|
no lm_heads in the above tables means no lm_head in hub weigths, so tie_word_embeddings must be set tot True |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: audioflamingo3, glmasr, musicflamingo, qwen2_audio, vibevoice_asr, voxtral, voxtral_realtime |
|
run-slow: audioflamingo3, glmasr, musicflamingo, qwen2_audio, vibevoice_asr, voxtral, voxtral_realtime |
|
CI Dashboard: View test results in Grafana |
|
Workflow Run ⚙️💔 This comment contains |
What does this PR do?
A continuation of #46400
Verified by running (on this branch) this script
For
glmasr, hub weights have a lm head but it's bitwise to embed tokens so we keep weight tying