Skip to content

[fix] regression introduced by #45534#46456

Open
eustlb wants to merge 4 commits into
mainfrom
fix-tie-words-embeddings-regression
Open

[fix] regression introduced by #45534#46456
eustlb wants to merge 4 commits into
mainfrom
fix-tie-words-embeddings-regression

Conversation

@eustlb
Copy link
Copy Markdown
Contributor

@eustlb eustlb commented Jun 5, 2026

What does this PR do?

A continuation of #46400

Verified by running (on this branch) this script

model                 lm_head   lm==embed   weights tied  config tie  match
---------------------------------------------------------------------------
qwen2_audio           yes       False       False         False       OK
voxtral               yes       False       False         False       OK
voxtral_realtime      no        n/a         True          True        OK
glmasr                yes       True        False         True        MISMATCH <<<
granite_speech        no        n/a         True          True        OK
granite_speech_plus   no        n/a         True          True        OK
audioflamingo3        yes       False       False         False       OK
musicflamingo         yes       False       False         False       OK
vibevoice_asr         yes       False       False         False       OK
  • lm_head — yes/no: whether the checkpoint has a separate lm_head.* tensor in its safetensors (has_lm_head).
  • lm==embed — True/False/n/a: when an lm_head exists, whether lm_head.weight is bitwise identical to embed_tokens.weight. n/a when there's no separate lm_head.
  • weights tied — True/False: what the actual checkpoint implies (not has_lm_head → no separate head means weights are tied).
  • config tie — True/False: what the config class resolves tie_word_embeddings to (fallback False).
  • match — OK if config tie == weights tied, else MISMATCH <<< (the regression check).

For glmasr, hub weights have a lm head but it's bitwise to embed tokens so we keep weight tying

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@hmellor hmellor added the for patch Tag issues / labels that should be included in the next patch label Jun 5, 2026
@hmellor
Copy link
Copy Markdown
Member

hmellor commented Jun 5, 2026

For the models with no LM head, why set tie_word_embeddings=True?

Copy link
Copy Markdown
Member

@hmellor hmellor Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be propagated to modelling_glmasr.py?

And tie_word_embeddings updated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already propagated, but since _tied_weights_keys has been removed from AudioFlamingo3ForConditionalGeneration, this needs to be readded here (so modeling stays the same)

@eustlb
Copy link
Copy Markdown
Contributor Author

eustlb commented Jun 6, 2026

no lm_heads in the above tables means no lm_head in hub weigths, so tie_word_embeddings must be set tot True

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3, glmasr, musicflamingo, qwen2_audio, vibevoice_asr, voxtral, voxtral_realtime

@eustlb
Copy link
Copy Markdown
Contributor Author

eustlb commented Jun 6, 2026

run-slow: audioflamingo3, glmasr, musicflamingo, qwen2_audio, vibevoice_asr, voxtral, voxtral_realtime

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

CI Dashboard: View test results in Grafana

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

Workflow Run ⚙️💔 This comment contains run-slow, but unknown error occurred and the workflow run aborted!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

for patch Tag issues / labels that should be included in the next patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants