Releases: modelscope/DiffSynth-Engine
v0.7.0: supports multiple new image models
Supports Qwen-Image-Edit-2511, Qwen-Image-2512, Z-Image-Turbo and Z-Image-Omni-Base (To Be Released)
- Qwen-Image-Edit-2511 is an enhanced version over Qwen-Image-Edit-2509 with notably better consistency and other multiple improvements.
- Qwen-Image-2512 is an updated text-to-image model with enhanced human realism, finer natural detail and improved text rendering.
- Z-Image is a powerful and highly efficient image generation model with 6B parameters.
- Other fixes.
What's Changed
- init all dit module with device and dtype for speed up by @qzzz95 in #164
- fix wan umt5 state dict converter by @akaitsuki-ii in #170
- ADD update_weights for flux and qwen_image by @qzzz95 in #168
- add progress by @tenderness-git in #171
- ADD Option: OFFLINE fetch modelscope model by @qzzz95 in #172
- supports flash attn 3 fp8 by @akaitsuki-ii in #174
- use utf8 by @tenderness-git in #178
- bug fix by @tenderness-git in #179
- supports flux kontext with multiple input images by @akaitsuki-ii in #173
- Feature/qwen image control by @Glaceon-Hyy in #176
- Feature/qwen edit plus by @Glaceon-Hyy in #180
- fix key convert for lora kohya lora by @qzzz95 in #183
- supports sequence parallel and use custom image size for Qwen Image by @akaitsuki-ii in #186
- convert qwen image diffusers lora key by @qzzz95 in #189
- enable FSDP for qwen vl by @akaitsuki-ii in #184
- torch.compile with dynamic=False by @akaitsuki-ii in #185
- fix compile repeated blocks by @akaitsuki-ii in #191
- fix redux multiple ref images by @qzzz95 in #192
- define qwen image edit system prompt by @qzzz95 in #194
- Fix Wan2.2 low noise model load LoRA bug by @continue-revolution in #188
- fix mask dtype differ from latent by @qzzz95 in #195
- load encoder optional by @qzzz95 in #196
- video sparse attention by @akaitsuki-ii in #190
- Fix/qwen image by @akaitsuki-ii in #197
- Fix/import vsa by @akaitsuki-ii in #200
- Enable aiter attention for rocm by @guangzlu in #198
- auto enable vsa by @akaitsuki-ii in #203
- support svd quant by @Glaceon-Hyy in #202
- Fix circular dependence by @qzzz95 in #205
- suppor lora loading from state dict by @qzzz95 in #206
- set module device to skip weight init by @qzzz95 in #207
- fix svd init memory by @Glaceon-Hyy in #208
- support edit 2511 by @Glaceon-Hyy in #212
- 修复 qwen edit 2511 序列并行报错 & 修复 timesteps 对不齐问题 & 修复 image resize 算法对不齐 by @qzzz95 in #214
- support z image by @Glaceon-Hyy in #213
- add edit 2511 example by @qzzz95 in #215
- Fix Z Image model default dtype by @qzzz95 in #216
- Support diffusers and diffsynth studio lora by @qzzz95 in #217
- add WanDMDPipeline by @akaitsuki-ii in #219
- add FlashAtten 4 API by @bingchenlll in #218
- safety check module available by @qzzz95 in #225
- support Z-Image-Omni-Base by @Artiprocher in #226
New Contributors
- @guangzlu made their first contribution in #198
- @bingchenlll made their first contribution in #218
Full Changelog: v0.6.0...v0.7.0
v0.6.0: supports Wan2.2-S2V
Supports Wan2.2-S2V
Wan2.2-S2V is based on Wan 2.1, with several additional modules to inject audio, reference image and pose video conditions. Check the usage example here.
What's Changed
- per-token scaling for fp8 linear by @akaitsuki-ii in #160
- remove redundant empty_cache in parallel forward by @akaitsuki-ii in #161
- Wan Speech2Video by @continue-revolution in #162
- no fa3 with attention mask by @akaitsuki-ii in #163
Full Changelog: v0.5.0...v0.6.0
v0.5.0: supports Qwen-Image-Edit
Supports Qwen-Image-Edit
Qwen-Image-Edit is the image editing version of Qwen-Image, enabling semantic/appearance visual editing, and precise text editing. Check the usage example here.
What's Changed
- add offload to disk by @tenderness-git in #124
- fix qwen image max rope index to 10000 by @akaitsuki-ii in #139
- fix cast fp8 by @tenderness-git in #140
- pin memory bug fix by @tenderness-git in #141
- Dev/download hook by @sir1st-inc in #143
- use spawn context by @akaitsuki-ii in #144
- support Wan/fp8 by @sir1st-inc in #145
- implement sd & sdxl pipeline from_state_dict by @qzzz95 in #147
- Dev/flux tool from state dict by @qzzz95 in #153
- Support Hunyuan3d by @tenderness-git in #150
- Feature/qwen image edit by @Glaceon-Hyy in #152
- Fix/qwen image use vae tiled by @akaitsuki-ii in #155
- Fix qwen vl by @akaitsuki-ii in #156
- add benchmark script by @weiyilwy in #158
- Refactor/parallel by @akaitsuki-ii in #157
- update README by @akaitsuki-ii in #159
New Contributors
Full Changelog: v0.4.1.post1...v0.5.0
v0.4.1: supports Qwen-Image
Supports Qwen-Image
Qwen-Image is an image generation model excels at complex text rendering and creating images in a wide range of artistic styles. Check the usage example here.
System Requirements
Resource utilization for generating a 1024x1024 using Qwen-Image model with H20 GPU under different offload_mode:
| Offload Mode | Peak VRAM Usage (GB) | Peak Memory Usage (GB) | Inference Time (s) |
|---|---|---|---|
| None | 62 | 64 | 57 |
| "cpu_offload" | 39 | 64 | 86 |
| "sequential_cpu_offload" | 8 | 64 | 134 |
What's Changed
- support Qwen-Image by @Glaceon-Hyy in #130
- fix qwen2 tokenizer by @akaitsuki-ii in #132
- remove redundant config by @akaitsuki-ii in #134
- Doc/qwen image by @Glaceon-Hyy in #133
- speedup model cpu offload by @akaitsuki-ii in #136
- Feature/no default lora by @Glaceon-Hyy in #137
Full Changelog: v0.4.0...v0.4.1.post1
v0.4.0: Supports Wan2.2
Supports Wan2.2 video generation model
The WanVideoPipeline also supports Wan2.2 series model now. Taking the Wan2.2-TI2V-5B model as an example:
from diffsynth_engine import fetch_model, WanVideoPipeline, WanPipelineConfig
from diffsynth_engine.utils.video import save_video
config = WanPipelineConfig.basic_config(
model_path=fetch_model(
"Wan-AI/Wan2.2-TI2V-5B",
revision="bf16",
path=[
"diffusion_pytorch_model-00001-of-00003-bf16.safetensors",
"diffusion_pytorch_model-00002-of-00003-bf16.safetensors",
"diffusion_pytorch_model-00003-of-00003-bf16.safetensors",
],
),
parallelism=1,
offload_mode=None,
)
pipe = WanVideoPipeline.from_pretrained(config)
image = Image.open("input/wan_i2v_input.jpg").convert("RGB")
video = pipe(
prompt="Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside.",
negative_prompt="",
input_image=image,
num_frames=121,
width=704,
height=1280,
seed=42,
)
save_video(video, "wan_ti2v.mp4", fps=pipe.config.fps)- Set
parallelismto 2, 4 or 8 to speed up video generation with multiple GPUs. - By default, CPU offload is disabled. For lower VRAM usage, set
offload_modeto"cpu_offload"(model-level offload) or"sequential_cpu_offload"(parameter-level offload, with lowest VRAM usage and maximum generation time). - The Wan2.2-TI2V-5B model supports generation with or without an input image.
- The Wan2.2-TI2V-5B model generates video at 24 fps by default. To create a video of X seconds, please set
num_framesto 24X+1.
Find more examples here for Wan2.2-T2V-A14B, Wan2.2-I2V-A14B.
⚠️ [Breaking Change] Improved from_pretrained method pipeline initialization
In previous versions, we have from_pretrained method to initialize pipeline with a ModelConfig and other arguments. Such as,
from diffsynth_engine import fetch_model, FluxImagePipeline, FluxModelConfig
model_path = fetch_model("muse/flux-with-vae", path="flux1-dev-with-vae.safetensors")
config = FluxModelConfig(dit_path=model_path, use_fp8_linear=True, use_fsdp=True)
pipe = FluxImagePipeline.from_pretrained(config, parallelism=8, use_cfg_parallel=True)In the code example above, the division between ModelConfig and other arguments in from_pretrained method is not clear, which makes it quite confusing.
Since v0.4.0, we introduce a new PipelineConfig to contain all pipeline initialization arguments. With it, the above code can rewritten as:
from diffsynth_engine import fetch_model, FluxImagePipeline, FluxPipelineConfig
model_path = fetch_model("muse/flux-with-vae", path="flux1-dev-with-vae.safetensors")
config = FluxPipelineConfig(
model_path=model_path,
use_fp8_linear=True,
parallelism=8,
use_cfg_parallel=True,
use_fsdp=True,
)
pipe = FluxImagePipeline.from_pretrained(config)For beginners, we also provide a basic_config method with fewer arguments to make pipeline initialization easier:
from diffsynth_engine import fetch_model, FluxImagePipeline, FluxPipelineConfig
model_path = fetch_model("muse/flux-with-vae", path="flux1-dev-with-vae.safetensors")
config = FluxPipelineConfig.basic_config(model_path=model_path, parallelism=8)
pipe = FluxImagePipeline.from_pretrained(config)Check here for more available configs.
What's Changed
- publish on new PR merged by @akaitsuki-ii in #109
- publish on push to main by @akaitsuki-ii in #112
- supports loading multiple model files & update doc by @akaitsuki-ii in #115
- support kontext inference by @qzzz95 in #114
- fix wan parallel & update examples by @akaitsuki-ii in #116
- support flux fbcache by @Glaceon-Hyy in #117
- support fp8 store bf16 exec by @tenderness-git in #120
- speedup when offload_mode enable by @qzzz95 in #119
- support fp8 linear on AMD by @qzzz95 in #86
- new PipelineConfig for initialization by @akaitsuki-ii in #123
- fix fbcache param by @akaitsuki-ii in #125
- reformat control params and pipeline utils by @sir1st-inc in #128
- defend flash attention3 failed by @qzzz95 in #126
- supports wan2.2 by @akaitsuki-ii in #127
Full Changelog: v0.3.5...v0.4.0
v0.3.5
fix sdxl lora load
v0.3.4
fix controlnet offload
v0.3.3
Support Flux Diffusers LoRA
v0.3.2
bug fix for mps device
v0.3.1
bug fix