From 6994b20bc239124e399db7eceac24ae22585d87f Mon Sep 17 00:00:00 2001 From: Adrian Tobiszewski Date: Tue, 2 Jun 2026 11:56:51 +0200 Subject: [PATCH 1/3] Add CPU deployment instructions for Multi-LoRA Image Generation section --- demos/image_generation/README.md | 38 +++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/demos/image_generation/README.md b/demos/image_generation/README.md index 22a879b745..6234c67063 100644 --- a/demos/image_generation/README.md +++ b/demos/image_generation/README.md @@ -619,6 +619,42 @@ This section demonstrates how to serve multiple LoRA adapters with a single SDXL The following command starts OVMS with Stable Diffusion XL and 5 LoRA adapters for different artistic styles: +### CPU + +::::{tab-set} +:::{tab-item} Docker (Linux) +:sync: docker +```bash +mkdir -p models + +docker run -d --rm --user $(id -u):$(id -g) -p 8000:8000 -v $(pwd)/models:/models/:rw \ + -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy \ + openvino/model_server:latest \ + --rest_port 8000 \ + --model_repository_path /models/ \ + --task image_generation \ + --source_model OpenVINO/stable-diffusion-xl-base-1.0-int8-ov \ + --source_loras "xray=DoctorDiffusion/doctor-diffusion-s-xray-xl-lora@DD-xray-v1.safetensors,thepoint=alvdansen/the-point@araminta_k_the_point.safetensors,ukiyo=KappaNeuro/ukiyo-e-art@Ukiyo-e%20Art.safetensors,vector=DoctorDiffusion/doctor-diffusion-s-controllable-vector-art-xl-lora@DD-vector-v2.safetensors,chalk=Norod78/sdxl-chalkboarddrawing-lora@SDXL_ChalkBoardDrawing_LoRA_r8.safetensors" +``` +::: + +:::{tab-item} Bare metal (Windows) +:sync: bare-metal +```bat +if not exist c:\models mkdir c:\models + +ovms --rest_port 8000 ^ + --model_repository_path c:\models ^ + --task image_generation ^ + --source_model OpenVINO/stable-diffusion-xl-base-1.0-int8-ov ^ + --source_loras "xray=DoctorDiffusion/doctor-diffusion-s-xray-xl-lora@DD-xray-v1.safetensors,thepoint=alvdansen/the-point@araminta_k_the_point.safetensors,ukiyo=KappaNeuro/ukiyo-e-art@Ukiyo-e%20Art.safetensors,vector=DoctorDiffusion/doctor-diffusion-s-controllable-vector-art-xl-lora@DD-vector-v2.safetensors,chalk=Norod78/sdxl-chalkboarddrawing-lora@SDXL_ChalkBoardDrawing_LoRA_r8.safetensors" +``` +::: + +:::: + +### GPU + ::::{tab-set} :::{tab-item} Docker (Linux) :sync: docker @@ -801,7 +837,7 @@ for style_name, style_config in styles.items(): To blend multiple adapters, define a **composite adapter** at startup using the `@alias:alpha` syntax: -```bash +```text --source_loras="xray=...,ukiyo=...,blend=@xray:0.5+@ukiyo:0.4" ``` From 341ef53a604a93407fee71b2e4174eb0540c3d6e Mon Sep 17 00:00:00 2001 From: Adrian Tobiszewski Date: Mon, 8 Jun 2026 13:00:43 +0200 Subject: [PATCH 2/3] Fix formating --- demos/image_generation/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/demos/image_generation/README.md b/demos/image_generation/README.md index 6234c67063..73838472c8 100644 --- a/demos/image_generation/README.md +++ b/demos/image_generation/README.md @@ -619,7 +619,7 @@ This section demonstrates how to serve multiple LoRA adapters with a single SDXL The following command starts OVMS with Stable Diffusion XL and 5 LoRA adapters for different artistic styles: -### CPU +#### CPU ::::{tab-set} :::{tab-item} Docker (Linux) @@ -653,7 +653,7 @@ ovms --rest_port 8000 ^ :::: -### GPU +#### GPU ::::{tab-set} :::{tab-item} Docker (Linux) From 9b1553329b6d317a265e8c2c2e5176545fa7651a Mon Sep 17 00:00:00 2001 From: Adrian Tobiszewski Date: Tue, 9 Jun 2026 15:57:02 +0200 Subject: [PATCH 3/3] Fixes --- demos/image_generation/README.md | 95 ++++++------------------------ docs/image_generation/reference.md | 21 ++++--- 2 files changed, 32 insertions(+), 84 deletions(-) diff --git a/demos/image_generation/README.md b/demos/image_generation/README.md index 73838472c8..55345ee85d 100644 --- a/demos/image_generation/README.md +++ b/demos/image_generation/README.md @@ -174,80 +174,8 @@ ovms --rest_port 8000 ^ :::: -### SDXL model deployment - -To deploy an SDXL model (higher quality, 1024×1024 native resolution), use a different `--source_model`: - -::::{tab-set} -:::{tab-item} Docker (Linux) — GPU -:sync: docker - -Start docker container: -```bash -mkdir -p ${HOME}/models - -docker run -d --rm -p 8000:8000 -v ${HOME}/models:/models:rw \ - --user $(id -u):$(id -g) --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \ - -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy \ - openvino/model_server:latest-gpu \ - --rest_port 8000 \ - --model_repository_path /models \ - --task image_generation \ - --source_model OpenVINO/stable-diffusion-xl-base-1.0-int8-ov \ - --target_device GPU -``` -::: - -:::{tab-item} Bare metal (Windows) -:sync: bare-metal - -```bat -if not exist c:\models mkdir c:\models - -ovms --rest_port 8000 ^ - --model_repository_path c:\models ^ - --task image_generation ^ - --source_model OpenVINO/stable-diffusion-xl-base-1.0-int8-ov ^ - --target_device GPU -``` -::: - -:::: - -> **NOTE:** SDXL models require more RAM/vRAM than SD 1.5. Use `--resolution 1024x1024` when deploying on NPU. - - ## Option 2. Serving a pre-downloaded model - -If you already have a model on disk (downloaded via Option 1 with `--pull`, or via `huggingface-cli`, or converted with [Export Models Tool](../common/export_models/README.md)), you can start the server pointing directly to the model directory using `--model_name` and `--model_path`: - -::::{tab-set} -:::{tab-item} Docker (Linux) -:sync: docker - -```bash -docker run -d --rm -p 8000:8000 -v ${HOME}/models:/models:rw \ - openvino/model_server:latest \ - --rest_port 8000 \ - --model_name OpenVINO/stable-diffusion-v1-5-int8-ov \ - --model_path /models/OpenVINO/stable-diffusion-v1-5-int8-ov -``` -::: - -:::{tab-item} Bare metal (Windows) -:sync: bare-metal - -```bat -ovms --rest_port 8000 ^ - --model_name OpenVINO/stable-diffusion-v1-5-int8-ov ^ - --model_path c:\models\OpenVINO\stable-diffusion-v1-5-int8-ov -``` -::: - -:::: - -> **NOTE:** The `graph.pbtxt` configuration file is auto-generated at runtime when using `--task image_generation`. You can also customize it manually — see [Image Generation calculator reference](../../docs/image_generation/reference.md) for all available options. - +If you have already downloaded, converted and quantized the model using the OVMS or [Export Models Tool](../common/export_models/README.md), place the model folder in the model repository directory and start the server with appropriate configuration. For details check [Starting the Server](../../docs/starting_server.md). ## Readiness Check @@ -634,7 +562,7 @@ docker run -d --rm --user $(id -u):$(id -g) -p 8000:8000 -v $(pwd)/models:/model --model_repository_path /models/ \ --task image_generation \ --source_model OpenVINO/stable-diffusion-xl-base-1.0-int8-ov \ - --source_loras "xray=DoctorDiffusion/doctor-diffusion-s-xray-xl-lora@DD-xray-v1.safetensors,thepoint=alvdansen/the-point@araminta_k_the_point.safetensors,ukiyo=KappaNeuro/ukiyo-e-art@Ukiyo-e%20Art.safetensors,vector=DoctorDiffusion/doctor-diffusion-s-controllable-vector-art-xl-lora@DD-vector-v2.safetensors,chalk=Norod78/sdxl-chalkboarddrawing-lora@SDXL_ChalkBoardDrawing_LoRA_r8.safetensors" + --source_loras "xray=DoctorDiffusion/doctor-diffusion-s-xray-xl-lora@DD-xray-v1.safetensors,thepoint=alvdansen/the-point@araminta_k_the_point.safetensors,ukiyo=KappaNeuro/ukiyo-e-art@Ukiyo-e%20Art.safetensors,vector=DoctorDiffusion/doctor-diffusion-s-controllable-vector-art-xl-lora@DD-vector-v2.safetensors,chalk=Norod78/sdxl-chalkboarddrawing-lora@SDXL_ChalkBoardDrawing_LoRA_r8.safetensors,blend=@xray:0.5+@ukiyo:0.4" ``` ::: @@ -647,7 +575,7 @@ ovms --rest_port 8000 ^ --model_repository_path c:\models ^ --task image_generation ^ --source_model OpenVINO/stable-diffusion-xl-base-1.0-int8-ov ^ - --source_loras "xray=DoctorDiffusion/doctor-diffusion-s-xray-xl-lora@DD-xray-v1.safetensors,thepoint=alvdansen/the-point@araminta_k_the_point.safetensors,ukiyo=KappaNeuro/ukiyo-e-art@Ukiyo-e%20Art.safetensors,vector=DoctorDiffusion/doctor-diffusion-s-controllable-vector-art-xl-lora@DD-vector-v2.safetensors,chalk=Norod78/sdxl-chalkboarddrawing-lora@SDXL_ChalkBoardDrawing_LoRA_r8.safetensors" + --source_loras "xray=DoctorDiffusion/doctor-diffusion-s-xray-xl-lora@DD-xray-v1.safetensors,thepoint=alvdansen/the-point@araminta_k_the_point.safetensors,ukiyo=KappaNeuro/ukiyo-e-art@Ukiyo-e%20Art.safetensors,vector=DoctorDiffusion/doctor-diffusion-s-controllable-vector-art-xl-lora@DD-vector-v2.safetensors,chalk=Norod78/sdxl-chalkboarddrawing-lora@SDXL_ChalkBoardDrawing_LoRA_r8.safetensors,blend=@xray:0.5+@ukiyo:0.4" ``` ::: @@ -670,7 +598,7 @@ docker run -d --rm --user $(id -u):$(id -g) -p 8000:8000 -v $(pwd)/models:/model --task image_generation \ --source_model OpenVINO/stable-diffusion-xl-base-1.0-int8-ov \ --target_device GPU \ - --source_loras "xray=DoctorDiffusion/doctor-diffusion-s-xray-xl-lora@DD-xray-v1.safetensors,thepoint=alvdansen/the-point@araminta_k_the_point.safetensors,ukiyo=KappaNeuro/ukiyo-e-art@Ukiyo-e%20Art.safetensors,vector=DoctorDiffusion/doctor-diffusion-s-controllable-vector-art-xl-lora@DD-vector-v2.safetensors,chalk=Norod78/sdxl-chalkboarddrawing-lora@SDXL_ChalkBoardDrawing_LoRA_r8.safetensors" + --source_loras "xray=DoctorDiffusion/doctor-diffusion-s-xray-xl-lora@DD-xray-v1.safetensors,thepoint=alvdansen/the-point@araminta_k_the_point.safetensors,ukiyo=KappaNeuro/ukiyo-e-art@Ukiyo-e%20Art.safetensors,vector=DoctorDiffusion/doctor-diffusion-s-controllable-vector-art-xl-lora@DD-vector-v2.safetensors,chalk=Norod78/sdxl-chalkboarddrawing-lora@SDXL_ChalkBoardDrawing_LoRA_r8.safetensors,blend=@xray:0.5+@ukiyo:0.4" ``` ::: @@ -684,7 +612,7 @@ ovms --rest_port 8000 ^ --task image_generation ^ --source_model OpenVINO/stable-diffusion-xl-base-1.0-int8-ov ^ --target_device GPU ^ - --source_loras "xray=DoctorDiffusion/doctor-diffusion-s-xray-xl-lora@DD-xray-v1.safetensors,thepoint=alvdansen/the-point@araminta_k_the_point.safetensors,ukiyo=KappaNeuro/ukiyo-e-art@Ukiyo-e%20Art.safetensors,vector=DoctorDiffusion/doctor-diffusion-s-controllable-vector-art-xl-lora@DD-vector-v2.safetensors,chalk=Norod78/sdxl-chalkboarddrawing-lora@SDXL_ChalkBoardDrawing_LoRA_r8.safetensors" + --source_loras "xray=DoctorDiffusion/doctor-diffusion-s-xray-xl-lora@DD-xray-v1.safetensors,thepoint=alvdansen/the-point@araminta_k_the_point.safetensors,ukiyo=KappaNeuro/ukiyo-e-art@Ukiyo-e%20Art.safetensors,vector=DoctorDiffusion/doctor-diffusion-s-controllable-vector-art-xl-lora@DD-vector-v2.safetensors,chalk=Norod78/sdxl-chalkboarddrawing-lora@SDXL_ChalkBoardDrawing_LoRA_r8.safetensors,blend=@xray:0.5+@ukiyo:0.4" ``` ::: @@ -843,6 +771,13 @@ To blend multiple adapters, define a **composite adapter** at startup using the Then use the composite alias as the model name: ```python +from openai import OpenAI + +client = OpenAI( + base_url="http://localhost:8000/v3", + api_key="unused" +) + response = client.images.generate( model="blend", # activates both xray and ukiyo prompt="a cute cat in sunglasses", @@ -856,6 +791,12 @@ response = client.images.generate( You can override individual component alphas at request time: ```python +from openai import OpenAI + +client = OpenAI( + base_url="http://localhost:8000/v3", + api_key="unused" +) response = client.images.generate( model="blend", prompt="a cute cat in sunglasses", diff --git a/docs/image_generation/reference.md b/docs/image_generation/reference.md index f87927b814..d4d2f887d3 100644 --- a/docs/image_generation/reference.md +++ b/docs/image_generation/reference.md @@ -223,7 +223,7 @@ Each individual adapter can optionally specify a default alpha weight by appendi The alpha value controls how strongly the adapter influences generation (default: `1.0`). Examples: -```bash +``` # Linux - adapter with alpha 0.6 --source_loras="pokemon=/models/loras/pokemon.safetensors:0.6" @@ -240,11 +240,18 @@ The alpha value controls how strongly the adapter influences generation (default **Example:** ```bash -ovms --rest_port 8000 \ - --model_repository_path /models/ \ - --task image_generation \ - --source_model stabilityai/stable-diffusion-xl-base-1.0 \ - --source_loras "xray=DoctorDiffusion/doctor-diffusion-s-xray-xl-lora@DD-xray-v1.safetensors,ukiyo=KappaNeuro/ukiyo-e-art@Ukiyo-e Art.safetensors,vector=DoctorDiffusion/doctor-diffusion-s-controllable-vector-art-xl-lora@DD-vector-v2.safetensors" +mkdir -p ${HOME}/models + +docker run -d --rm -p 8000:8000 -v ${HOME}/models:/models:rw \ + --user $(id -u):$(id -g) --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \ + -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy \ + openvino/model_server:latest-gpu \ + --rest_port 8000 \ + --model_repository_path /models/ \ + --task image_generation \ + --source_model stabilityai/stable-diffusion-xl-base-1.0 \ + --target_device GPU \ + --source_loras "xray=DoctorDiffusion/doctor-diffusion-s-xray-xl-lora@DD-xray-v1.safetensors,ukiyo=KappaNeuro/ukiyo-e-art@Ukiyo-e%20Art.safetensors,vector=DoctorDiffusion/doctor-diffusion-s-controllable-vector-art-xl-lora@DD-vector-v2.safetensors" ``` > **Important:** LoRA adapters must be compatible with the base model architecture. For example, SDXL adapters can only be used with an SDXL base model. @@ -309,7 +316,7 @@ The `lora_alphas` field in the request body allows overriding the default alpha To blend multiple adapters simultaneously, define a **composite adapter** at startup: ``` ---source_loras="xray=DoctorDiffusion/doctor-diffusion-s-xray-xl-lora@DD-xray-v1.safetensors,ukiyo=KappaNeuro/ukiyo-e-art@Ukiyo-e Art.safetensors,blend=@xray:0.5+@ukiyo:0.4" +--source_loras="xray=DoctorDiffusion/doctor-diffusion-s-xray-xl-lora@DD-xray-v1.safetensors,ukiyo=KappaNeuro/ukiyo-e-art@Ukiyo-e%20Art.safetensors,blend=@xray:0.5+@ukiyo:0.4" ``` Then use the composite alias in requests: