Caution
COMFYUI'S OWN INTERNAL MEMORY / BLOCK CONTROL BEHAVIOR IS SUCH A APAIN IN THE ASS THAT I HAVE THE ASSUMPTION THAT THE CURRENT PLUGIN STATE IS THE BEST I CAN DO. IT IS NOT ACTUALLY THE SAME AS THE ORIGINAL WAN BLOCK SWAP NODE IN THE BLOCK SWAPPING, BUT MAYBE IT CAN HELP YOU RENDER BIGGER RESOLUTIONS. MAYBE IT DOESN'T. I CAN'T SAY OR PROOMISE ANYTHING. IF ANYONE HAS BETTER KNOWLEDGE, CODING SKILLS AND COMFYUI'S BLOCK / MEMORY UNDERSTANDING, FEEL FREE TO FIX MY HALF SUCCESSFUL TRY HERE.
Caution
THE SIMPLE UNET GGUF LOADER NODE USED IN THE EXAMPLE WORKFLOWS IS CREATING CUDA ISSUES AFTER ~3 RUNS, REPLACE IT WITH THE "Unet Loader (GGUF/Advanced)" NODE OR THE WAN MODEL LOADER IN THIS PACKAGE HERE! I WILL UPDATE THE WORKFLOWS SOON.
ComfyUI_Wan22Blockswap enables running WAN 2.1/2.2 14B GGUF models with lower VRAM usage by dynamically swapping transformer blocks between GPU and CPU during inference, allowing for generations with higher resolutions.
- β Forward Patching: Patches model's forward method to swap blocks during inference
- β GGUF Lazy Loading: Loads blocks directly to target device (prevents VRAM spikes)
- β Combo Patcher: Automatic HIGHβLOW model switching for guidance distillation workflows
- β ON_CLEANUP Callbacks: Automatic model switching when sampling completes
- β WanVideoLooper Compatible: Works with multi-loop video generation
- β Full Cleanup Node: Aggressive memory cleanup at end of workflow
- Download the latest release from the Releases page
- Extract the contents to your ComfyUI custom nodes directory:
ComfyUI/custom_nodes/ComfyUI_Wan22Blockswap/ - Restart ComfyUI
cd ComfyUI/custom_nodes
git clone https://github.com/crmbz0r/ComfyUI_Wan22Blockswap.gitIf you're using ComfyUI Manager, search for "Wan22Blockswap" in the available nodes list and install directly. (not yet, will add soon)
- Load your WAN model in ComfyUI
- Add the "WAN 2.2 BlockSwap Patcher" node to your workflow
- Connect your model to the BlockSwap Patcher node
- Configure the parameters based on your VRAM requirements
- Connect the output directly to the KSampler of your choice, I'd recommend to connect it directly to the KSampler without any nodes inbetween. (check workflow examples)
| Node | Description |
|---|---|
| WAN Model Loader | Simple WAN model loader (no BlockSwap) - use with GGUF loaders |
| WAN 2.2 BlockSwap Patcher | Apply BlockSwap to any single loaded model |
| WAN 2.2 BlockSwap Combo Patcher | Apply BlockSwap to HIGH+LOW model pair with automatic switching |
| WAN 2.2 BlockSwap Cleanup | Clean up BlockSwap state after sampling |
| WAN 2.2 BlockSwap Reposition | Re-position blocks for next sampling run |
| WAN 2.2 Full Cleanup (End) | Aggressive cleanup at end of workflow (like "Free Model and Node Cache") |
For guidance distillation workflows (HIGH noise β LOW noise models):
Usage with the WanVideoLooper node is pretty similar, just replace the Integrated KSampler with the Looper Node and add the LoRA Sequencer if needed. I'll add an example workflow later too..
- Combo Patcher receives both HIGH and LOW noise models
- Positions HIGH noise blocks on GPU (28) and CPU (12)
- Moves ALL LOW noise blocks to CPU (waiting)
- Patches forward methods for dynamic block swapping
- Registers ON_CLEANUP callback on HIGH noise model
- When HIGH noise sampling completes β callback positions LOW noise blocks
- LOW noise samples with its blocks properly positioned
- Full Cleanup frees all memory at workflow end
| Parameter | Type | Default | Description |
|---|---|---|---|
model_high |
MODEL | Required | High noise model (guidance distillation) |
model_low |
MODEL | Required | Low noise model (guidance distillation) |
blocks_to_swap |
INT | 12 | Number of blocks to offload to CPU (0-40) |
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
MODEL | Required | Any WAN model to apply BlockSwap to |
blocks_to_swap |
INT | 20 | Number of blocks to offload to CPU (0-40) |
| Parameter | Type | Default | Description |
|---|---|---|---|
any_input |
ANY | Optional | Connect any output to trigger execution order |
unload_models |
BOOL | True | Unload all models from GPU |
free_memory |
BOOL | True | Clear node cache after workflow |
clear_cuda_cache |
BOOL | True | Clear PyTorch CUDA cache |
run_gc |
BOOL | True | Run Python garbage collection |
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
MODEL | Required | Model to clean up |
latent |
LATENT | Optional* | Latent pass-through (Integrated KSampler) |
images |
IMAGE | Optional* | Image pass-through (WanVideoLooper) |
move_to_cpu |
BOOL | True | Move all blocks to CPU |
unpatch |
BOOL | False | Remove BlockSwap patches entirely |
clear_cache |
BOOL | True | Clear CUDA cache and run GC |
- Use at least one of the latent or image input since they function as signal inputs to start cleaning after the KSampler is finished
| Configuration | VRAM Required | Notes |
|---|---|---|
| No BlockSwap | ~24GB+ | OOM on 12GB cards |
| 12 blocks swapped | ~7.2GB | Fits on 12GB with margin |
| 20 blocks swapped | ~5.5GB | More headroom for batches |
| Phase | Time |
|---|---|
| HIGH noise (2 steps) | ~26s |
| Model switch | ~1s |
| LOW noise (3 steps) | ~25s |
| Total per segment | ~52s |
Block 28: transfer_time=0.10s, compute_time=0.21s, to_cpu_transfer_time=0.10s
- Transfer to GPU: ~70-100ms per block
- Compute: ~200-300ms per block
- Transfer to CPU: ~100-130ms per block
The BlockSwap patcher wraps the model's forward method:
def patched_forward(*args, **kwargs):
for block in swapped_blocks:
# Move block to GPU
block.to(gpu_device, non_blocking=True)
torch.cuda.synchronize()
# Execute original forward
result = original_forward(*args, **kwargs)
for block in swapped_blocks:
# Move block back to CPU
block.to(cpu_device, non_blocking=True)
return resultThe lazy loader intercepts ComfyUI's model loading to prevent VRAM spikes:
- Hook Installation: Patches GGUF loader's
load_torch_file - Block Detection: Identifies transformer blocks during load
- Direct Routing: Loads blocks directly to CPU/GPU based on swap config
- Zero Spike: Never loads all blocks to GPU simultaneously
ComfyUI's add_object_patch with "ON_CLEANUP" key triggers after sampling:
model_high.add_object_patch("ON_CLEANUP", switch_to_low_noise_callback)This enables automatic HIGHβLOW model switching without manual intervention.
ComfyUI_Wan22Blockswap/
βββ __init__.py # Node registration (active nodes only)
βββ blockswap_forward.py # Main implementation (~1600 lines)
β βββ BlockSwapForwardPatcher # Core patching logic
β βββ WAN22BlockSwapPatcher # Single model patcher node
β βββ WAN22BlockSwapComboPatcher # HIGH+LOW combo patcher node
β βββ WAN22BlockSwapCleanup # Cleanup node
β βββ WAN22BlockSwapReposition # Reposition node
β βββ WAN22FullCleanup # End-of-workflow cleanup node
βββ wan_loader.py # Simple WAN model loader
βββ block_manager.py # Block management utilities
βββ callbacks.py # Lazy load and cleanup callbacks
βββ utils.py # Utility functions
βββ config.py # Configuration and constants
β
βββ # DEPRECATED (code preserved, nodes disabled):
βββ nodes.py # Old callback-based nodes
βββ blockswap_loader.py # Old integrated loader
βββ blockswap_looper.py # Old looper integration
βββ blockswap_meta_loader.py # Old meta loader
| Class | Purpose |
|---|---|
BlockSwapForwardPatcher |
Core logic for patching forward methods |
BlockManager |
Manages block state and device placement |
BlockSwapTracker |
Tracks which blocks are swapped for cleanup |
A: Yes! The system includes a lazy loader specifically designed for GGUF models that prevents VRAM spikes during loading.
A:
- Patcher: For single models (e.g., standard WAN 2.1 workflows)
- Combo Patcher: For guidance distillation with HIGH+LOW model pairs (automatic switching)
A: This is harmless. It occurs when ComfyUI tries to unpin tensors that BlockSwap already moved. Doesn't affect functionality.
A:
- 12GB VRAM: Start with 12 blocks
- 16GB VRAM: 8-10 blocks for faster inference
- 24GB+ VRAM: May not need BlockSwap at all
A:
- Full Cleanup: Recommended at end of workflow to free VRAM for next run
- BlockSwap Cleanup: Optional, useful between multiple sampling runs in same workflow
- β Forward Patching: New approach that patches model forward methods
- β Combo Patcher: Automatic HIGHβLOW model switching
- β ON_CLEANUP Callbacks: Automatic model switching after sampling
- β GGUF Lazy Loader: Prevents VRAM spikes during model loading
- β Full Cleanup Node: Aggressive end-of-workflow cleanup
- β WanVideoLooper Support: Works with multi-loop workflows
- ποΈ Deprecated: Old callback-based nodes (code preserved)
- Initial development with ON_LOAD callback approach
- Various experimental loaders and patchers that didn't really work as intended
- Realization that ComfyUI's core mechanics do not like external blockswap nodes
- Despair and on the verge of losing hope
- ComfyUI-wanBlockswap - Original block swapping implementation
- ComfyUI-WanVideoWrapper - WAN 2.2 wrapper and techniques
- ComfyUI-GGUF - GGUF model support
- Claude Opus - AI pair programming assistance
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter issues:
- Check this README and FAQ
- Search existing Issues
- Create a new issue with:
- ComfyUI version
- GPU model and VRAM
- Full error traceback
- Workflow description