mflux-generate-z-image-turbo takes twice of VRAM at the end of generation

First — thanks to filipstrand and contributors!

Question – is it OK? With mflux-generate-z-image-turbo (--model filipstrand/Z-Image-Turbo-mflux-4bit), it uses about 9–10 GB of VRAM during generation, but at the end it consumes much more—I'd say 1.5 to 2 times as much.
I use macbook pro m1 pro. 
[graph in activity monitor](https://postimg.cc/2VztGwwZ)

Also I couldn't find, does mflux support LoRA lightening?
One more question — right now, the model loads and unloads from VRAM for every prompt. Is it possible to not unload the model automatically after generation, or may be there is a UI that can do it (like LMStudio for mlx-lm)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mflux-generate-z-image-turbo takes twice of VRAM at the end of generation #311

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

mflux-generate-z-image-turbo takes twice of VRAM at the end of generation #311

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions