lxt2 模型示例推理脚本报RuntimeError: Error(s) in loading state_dict for LTX2TextEncoder: 	Missing key(s) in state_dict: "vision...

环境：
Ubuntu 24.04.3 LTS
Python 3.10.13
torch                    2.9.1+cu130
torchaudio               2.9.1+cu130
torchvision              0.24.1+cu130
sageattention            2.2.0    

问题：
尝试运行示例脚本https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/ltx2/model_inference/LTX-2-I2AV-OneStage.py，运行直接报错：

Downloading Model from https://www.modelscope.cn to directory: /home/arkstone/workspace/wanvace/models/google/gemma-3-12b-it-qat-q4_0-unquantized
Loading models from: [
    "./models/google/gemma-3-12b-it-qat-q4_0-unquantized/model-00003-of-00005.safetensors",
    "./models/google/gemma-3-12b-it-qat-q4_0-unquantized/model-00005-of-00005.safetensors",
    "./models/google/gemma-3-12b-it-qat-q4_0-unquantized/model-00004-of-00005.safetensors",
    "./models/google/gemma-3-12b-it-qat-q4_0-unquantized/model-00001-of-00005.safetensors",
    "./models/google/gemma-3-12b-it-qat-q4_0-unquantized/model-00002-of-00005.safetensors"
]
Traceback (most recent call last):
  File "/home/arkstone/workspace/wanvace/test.py", line 17, in <module>
    pipe = LTX2AudioVideoPipeline.from_pretrained(
  File "/home/arkstone/workspace/DiffSynth-Studio/diffsynth/pipelines/ltx2_audio_video.py", line 121, in from_pretrained
    model_pool = pipe.download_and_load_models(model_configs, vram_limit)
  File "/home/arkstone/workspace/DiffSynth-Studio/diffsynth/diffusion/base_pipeline.py", line 303, in download_and_load_models
    model_pool.auto_load_model(
  File "/home/arkstone/workspace/DiffSynth-Studio/diffsynth/models/model_loader.py", line 72, in auto_load_model
    model = self.load_model_file(config, path, vram_config, vram_limit=vram_limit, state_dict=state_dict)
  File "/home/arkstone/workspace/DiffSynth-Studio/diffsynth/models/model_loader.py", line 41, in load_model_file
    model = load_model(
  File "/home/arkstone/workspace/DiffSynth-Studio/diffsynth/core/loader/model.py", line 28, in load_model
    model.load_state_dict(state_dict, assign=True)
  File "/home/arkstone/miniconda3/envs/wanvace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2629, in load_state_dict
    raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for LTX2TextEncoder:
	Missing key(s) in state_dict: "vision_tower.vision_model.embeddings.patch_embedding.weight", "vision_tower.vision_model.embeddings.patch_embedding.bias", "vision_tower.vision_model.embeddings.position_embedding.weight", "vision_tower.vision_model.encoder.layers.0.layer_norm1.weight" ...

(truncated)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lxt2 模型示例推理脚本报RuntimeError: Error(s) in loading state_dict for LTX2TextEncoder: Missing key(s) in state_dict: "vision... #1351

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

lxt2 模型示例推理脚本报RuntimeError: Error(s) in loading state_dict for LTX2TextEncoder: Missing key(s) in state_dict: "vision... #1351

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions