feat: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI#131
feat: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI#131
Conversation
|
@copilot When I attempt to run a generation with Qwen Image (Diffusers version). I get this stack trace: |
The crash was caused by Fixed in |
|
@copilot Can you figure out where the WARNING in the fourth line of log output is coming from and remove it? |
The warning came from |
|
@copilot I'm getting the following error upon generating with the model: |
Fixed in |
|
@copilot I'm still getting tokenizer errors: |
... Fixed in |
|
@copilot |
Fixed in Processor loading: Replaced Reference images UI: Added txt2img without reference images: No warning or block when no reference images are set — the text encoder works with an empty list, so txt2img works fine. |
|
@copilot New error: |
Adds full support for the Qwen Image Edit 2511 model architecture, including both the diffusers version (Qwen/Qwen-Image-Edit-2511) and GGUF quantized versions (unsloth/Qwen-Image-Edit-2511-GGUF). Backend changes: - Add QwenImageEdit base model type to taxonomy - Add diffusers and GGUF model config classes with detection logic - Add model loader for diffusers and GGUF formats - Add 5 invocation nodes: model loader, text/vision encoder, denoise, image-to-latents, latents-to-image - Add QwenVLEncoderField for Qwen2.5-VL vision-language encoder - Add QwenImageEditConditioningInfo and conditioning field - Add generation modes and step callback support - Add 5 starter models (full diffusers + Q2_K, Q4_K_M, Q6_K, Q8_0 GGUF) Frontend changes: - Add graph builder for linear UI generation - Register in canvas and generate enqueue hooks - Update type definitions, optimal dimensions, grid sizes - Add readiness validation, model picker grouping, clip skip config - Regenerate OpenAPI schema Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> fix: use AutoProcessor.from_pretrained to load Qwen VL processor correctly Co-authored-by: lstein <111189+lstein@users.noreply.github.com> Agent-Logs-Url: https://github.com/lstein/InvokeAI/sessions/4d4417be-0f61-4faa-a21c-16e9ce81fec7 chore: bump diffusers==0.37.1 Co-authored-by: lstein <111189+lstein@users.noreply.github.com> Agent-Logs-Url: https://github.com/lstein/InvokeAI/sessions/38a76809-d9a3-40f1-b5b3-fb56342e8e90 fix: handle multiple reference images feature: add text encoder selection to advanced section for Qwen Image Edit feat: complete Qwen Image Edit pipeline with LoRA, GGUF, quantization, and UI support Major additions: - LoRA support: loader invocation, config detection, conversion utils, prefix constants, and LayerPatcher integration in denoise with sidecar patching for GGUF models - Lightning LoRA: starter models (4-step and 8-step bf16), shift override parameter for the distilled sigma schedule - GGUF fixes: correct base class (ModelLoader), zero_cond_t=True, correct in_channels (no /4 division) - Denoise: use FlowMatchEulerDiscreteScheduler directly, proper CFG gating (skip negative when cfg<=1), reference latent pixel-space resize - I2L: resize reference image to generation dimensions before VAE encoding - Graph builder: wire LoRAs via collection loader, VAE-encode reference image as latents for spatial conditioning, pass shift/quantization params - Frontend: shift override (checkbox+slider), LoRA graph wiring, scheduler hidden for Qwen Image Edit, model switching cleanup - Starter model bundle for Qwen Image Edit - LoRA config registered in discriminated union (factory.py) - Downgrade transformers requirement back to >=4.56.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e897fa0 to
bc82599
Compare
- GGUF loader: handle zero_cond_t absence in diffusers 0.36, try dtype before torch_dtype for forward compat - Denoise: load scheduler config from disk with GGUF fallback, inline calculate_shift to avoid pipeline import, remove deprecated txt_seq_lens - Text encoder: resize reference images to ~512x512 before VL encoding to prevent vision tokens from overwhelming the text prompt - Picker badges: wrap to next line instead of truncating labels Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove module-level cache for quantized encoders — load fresh each invocation and free VRAM via cleanup callback (gc + empty_cache) - Suppress harmless BnB MatMul8bitLt bfloat16→float16 cast warning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename the base model type from "qwen-image-edit" to "qwen-image" to
reflect that the Qwen Image family includes both txt2img and image
editing models. The edit models are a specific use case within the
broader Qwen Image architecture.
- BaseModelType.QwenImageEdit -> BaseModelType.QwenImage ("qwen-image")
- All Python files, classes, variables, and invocation names renamed
- All TypeScript/React components, selectors, and state fields renamed
- Frontend display: "Qwen Image" in model picker, "QwenImg" badge
- Starter model bundle: "Qwen Image"
- File renames: qwen_image_edit_* -> qwen_image_*
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- QwenImageVariantType enum: Generate (txt2img) and Edit (image editing) - Diffusers models: auto-detect variant from model_index.json pipeline class (QwenImagePipeline → Generate, QwenImageEditPlusPipeline → Edit) - GGUF models: default to Generate (can't detect from state dict) - Frontend: hide reference image panel when a Generate variant is selected - Variant display names: "Qwen Image" / "Qwen Image Edit" - ModelRecordChanges: include QwenImageVariantType in variant union Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The variant field with a default value was appended to the discriminator tag (e.g. main.gguf_quantized.qwen-image.generate), breaking model detection for GGUF and Diffusers models. Making variant optional with default=None restores the correct tags (main.gguf_quantized.qwen-image). The variant is still set during Diffusers model probing via _get_qwen_image_variant() and can be manually set for GGUF models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents variable name collisions when the txt2img branch adds qwen_image_* variables for the Qwen Image 2512 models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nModelConfig) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…URLs The global rename sed changed 'qwen-image-edit-2511' to 'qwen-image-2511' inside the HuggingFace URLs, but the actual files on HF still have 'edit' in their names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add optional variant field to StarterModelWithoutDependencies - Tag all Qwen Image Edit starter models (Diffusers + GGUF) with variant=QwenImageVariantType.Edit - Frontend passes variant through to the install endpoint config so GGUF edit models get the correct variant set on install Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The txt2img model doesn't use zero_cond_t — setting it causes the transformer to double the timestep batch and create modulation indices for non-existent reference patches, producing noise output. Now checks the config variant before enabling it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Flux PEFT LoRAs use transformer.single_transformer_blocks.* keys which contain "transformer_blocks." as a substring, falsely matching the Qwen Image LoRA detection. Add single_transformer_blocks to the Flux exclusion set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add stripped test models for: - Qwen Image Lightning LoRA (transformer_blocks.* keys) - Qwen Image community LoRA (transformer.transformer_blocks.* keys) Both should be detected as base=qwen-image, type=lora, format=lycoris. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Complete implementation of the Qwen Image Edit 2511 pipeline for InvokeAI, including text-to-image generation, image editing with reference images, LoRA support (including Lightning distillation), GGUF quantized transformers, and BitsAndBytes encoder quantization.
Key Features
Backend Changes
zero_cond_tmodulation, LoRA application via LayerPatcher with sidecar patching for GGUF, shift override for LightningModelLoader),zero_cond_t=True, correctin_channels>=4.56.0(the video processor fallback imports already handle this)Frontend Changes
qwenImageEditComponentSource,qwenImageEditQuantization,qwenImageEditShiftin params slice with persistence and model-switch cleanupFunctional Testing Guide
1. Text-to-Image Generation (Basic)
2. GGUF Quantized Transformer
3. BitsAndBytes Encoder Quantization
4. LoRA Support
5. Image Editing with Reference Image
6. Multiple Reference Images
7. Model Switching Cleanup
🤖 Generated with Claude Code