First — thanks to filipstrand and contributors!
Question – is it OK? With mflux-generate-z-image-turbo (--model filipstrand/Z-Image-Turbo-mflux-4bit), it uses about 9–10 GB of VRAM during generation, but at the end it consumes much more—I'd say 1.5 to 2 times as much.
I use macbook pro m1 pro.
graph in activity monitor
Also I couldn't find, does mflux support LoRA lightening?
One more question — right now, the model loads and unloads from VRAM for every prompt. Is it possible to not unload the model automatically after generation, or may be there is a UI that can do it (like LMStudio for mlx-lm)?