Skip to content

lxt2 模型示例推理脚本报RuntimeError: Error(s) in loading state_dict for LTX2TextEncoder: Missing key(s) in state_dict: "vision... #1351

@sfzman

Description

@sfzman

环境:
Ubuntu 24.04.3 LTS
Python 3.10.13
torch 2.9.1+cu130
torchaudio 2.9.1+cu130
torchvision 0.24.1+cu130
sageattention 2.2.0

问题:
尝试运行示例脚本https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/ltx2/model_inference/LTX-2-I2AV-OneStage.py,运行直接报错:

Downloading Model from https://www.modelscope.cn to directory: /home/arkstone/workspace/wanvace/models/google/gemma-3-12b-it-qat-q4_0-unquantized
Loading models from: [
"./models/google/gemma-3-12b-it-qat-q4_0-unquantized/model-00003-of-00005.safetensors",
"./models/google/gemma-3-12b-it-qat-q4_0-unquantized/model-00005-of-00005.safetensors",
"./models/google/gemma-3-12b-it-qat-q4_0-unquantized/model-00004-of-00005.safetensors",
"./models/google/gemma-3-12b-it-qat-q4_0-unquantized/model-00001-of-00005.safetensors",
"./models/google/gemma-3-12b-it-qat-q4_0-unquantized/model-00002-of-00005.safetensors"
]
Traceback (most recent call last):
File "/home/arkstone/workspace/wanvace/test.py", line 17, in
pipe = LTX2AudioVideoPipeline.from_pretrained(
File "/home/arkstone/workspace/DiffSynth-Studio/diffsynth/pipelines/ltx2_audio_video.py", line 121, in from_pretrained
model_pool = pipe.download_and_load_models(model_configs, vram_limit)
File "/home/arkstone/workspace/DiffSynth-Studio/diffsynth/diffusion/base_pipeline.py", line 303, in download_and_load_models
model_pool.auto_load_model(
File "/home/arkstone/workspace/DiffSynth-Studio/diffsynth/models/model_loader.py", line 72, in auto_load_model
model = self.load_model_file(config, path, vram_config, vram_limit=vram_limit, state_dict=state_dict)
File "/home/arkstone/workspace/DiffSynth-Studio/diffsynth/models/model_loader.py", line 41, in load_model_file
model = load_model(
File "/home/arkstone/workspace/DiffSynth-Studio/diffsynth/core/loader/model.py", line 28, in load_model
model.load_state_dict(state_dict, assign=True)
File "/home/arkstone/miniconda3/envs/wanvace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2629, in load_state_dict
raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for LTX2TextEncoder:
Missing key(s) in state_dict: "vision_tower.vision_model.embeddings.patch_embedding.weight", "vision_tower.vision_model.embeddings.patch_embedding.bias", "vision_tower.vision_model.embeddings.position_embedding.weight", "vision_tower.vision_model.encoder.layers.0.layer_norm1.weight" ...

(truncated)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions