-
Notifications
You must be signed in to change notification settings - Fork 251
Image inpainting, outpainting #4047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
3ddabdc
b5ed7d8
c2ba9d8
af51476
5f89fc2
102f8af
330374f
0a65e68
eac3932
f4be12a
a2476ee
15dd08a
7e392c6
1e952df
bc96991
7929f17
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,23 @@ | ||
| .venv | ||
| .venv-style | ||
| **/.venv | ||
| .pytest_cache | ||
| .devcontainer | ||
| .vscode | ||
| .vs | ||
| .idea | ||
| .gdb_history | ||
| out | ||
| bazel-bin | ||
| bazel-model_server/ | ||
| bazel-openvino-model-server/ | ||
| bazel-out | ||
| bazel-ovms | ||
| bazel-ovms-c | ||
| bazel-testlogs | ||
| demos/continuous_batching | ||
| demos/embeddings | ||
| demos/common/export_models/models | ||
| demos/common/export_models/models | ||
| *.log | ||
| *.img | ||
| models |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,6 +3,12 @@ | |
| This demo shows how to deploy image generation models (Stable Diffusion/Stable Diffusion 3/Stable Diffusion XL/FLUX) to create and edit images with the OpenVINO Model Server. | ||
| Image generation pipelines are exposed via [OpenAI API](https://platform.openai.com/docs/api-reference/images/create) `images/generations` and `images/edits` endpoints. | ||
|
|
||
| Supported workloads: | ||
| - **Text-to-image** — generate an image from a text prompt (`/v3/images/generations`) | ||
| - **Image-to-image** — transform an existing image guided by a prompt (`/v3/images/edits`) | ||
| - **Inpainting** — repaint a masked region of an image (`/v3/images/edits` with `mask` field) | ||
| - **Outpainting** — extend an image beyond its original borders (`/v3/images/edits` with `mask` field and larger canvas) | ||
|
|
||
| Check [supported models](https://openvinotoolkit.github.io/openvino.genai/docs/supported-models/#image-generation-models). | ||
|
|
||
| > **Note:** Please note that FLUX models are not supported on NPU. | ||
|
|
@@ -387,12 +393,13 @@ curl http://localhost:8000/v1/config | |
|
|
||
| ## Request Generation | ||
|
|
||
| A single servable exposes following endpoints: | ||
| - text to image: `images/generations` | ||
| - image to image: `images/edits` | ||
| A single servable exposes the following endpoints: | ||
| - **Text-to-image**: `images/generations` — JSON body with `prompt` | ||
| - **Image-to-image**: `images/edits` — multipart form with `image` + `prompt` (no mask) | ||
| - **Inpainting**: `images/edits` — multipart form with `image` + `mask` + `prompt` | ||
| - **Outpainting**: `images/edits` — multipart form with `image` + `mask` + `prompt` (image placed on larger canvas, mask marks the area to fill) | ||
|
|
||
| Endpoints unsupported for now: | ||
| - inpainting: `images/edits` with `mask` field | ||
| > **Note:** For inpainting/outpainting, dedicated inpainting models (e.g. `stable-diffusion-v1-5/stable-diffusion-inpainting`) only support the `images/edits` endpoint. Base models (e.g. `stable-diffusion-v1-5/stable-diffusion-v1-5`) support all endpoints. | ||
|
|
||
| All requests are processed in unary format, with no streaming capabilities. | ||
|
|
||
|
|
@@ -519,6 +526,195 @@ image.save('edit_output.png') | |
| Output file (`edit_output.png`): | ||
|  | ||
|
|
||
| ### Requesting inpainting with cURL | ||
|
|
||
| Inpainting replaces a masked region in an image based on the prompt. The `mask` is a black-and-white image where white pixels mark the area to repaint. | ||
|
|
||
|   | ||
|
|
||
| Linux | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Follow switch formating with sphinx, as in earlier part of readme. |
||
| ```bash | ||
|
dkalinowski marked this conversation as resolved.
|
||
| curl http://localhost:8000/v3/images/edits \ | ||
| -F "model=OpenVINO/stable-diffusion-v1-5-int8-ov" \ | ||
| -F "prompt=a dalmatian puppy sitting on a bench in a park, photorealistic" \ | ||
| -F "image=@cat.png" \ | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use seed, like in previous part of the demo, otherwise the images will change in between openvino releases |
||
| -F "mask=@cat_mask.png" \ | ||
| -F "num_inference_steps=50" \ | ||
| -F "size=512x512" | jq -r '.data[0].b64_json' | base64 --decode > inpaint_output.png | ||
| ``` | ||
|
|
||
| Windows Command Prompt | ||
| ```bat | ||
| curl http://localhost:8000/v3/images/edits ^ | ||
| -F "model=OpenVINO/stable-diffusion-v1-5-int8-ov" ^ | ||
| -F "prompt=a dalmatian puppy sitting on a bench in a park, photorealistic" ^ | ||
| -F "image=@cat.png" ^ | ||
| -F "mask=@cat_mask.png" ^ | ||
| -F "num_inference_steps=50" ^ | ||
| -F "size=512x512" | ||
| ``` | ||
|
|
||
| Expected output (`inpaint_output.png`): | ||
|
|
||
|  | ||
|
|
||
| ### Requesting inpainting with OpenAI Python package | ||
|
|
||
| ```python | ||
| from openai import OpenAI | ||
| import base64 | ||
| from io import BytesIO | ||
| from PIL import Image | ||
|
|
||
| client = OpenAI( | ||
| base_url="http://localhost:8000/v3", | ||
| api_key="unused" | ||
| ) | ||
|
|
||
| response = client.images.edit( | ||
| model="OpenVINO/stable-diffusion-v1-5-int8-ov", | ||
| image=open("cat.png", "rb"), | ||
| mask=open("cat_mask.png", "rb"), | ||
| prompt="a dalmatian puppy sitting on a bench in a park, photorealistic", | ||
| extra_body={ | ||
| "num_inference_steps": 50, | ||
| "size": "512x512" | ||
| } | ||
| ) | ||
| base64_image = response.data[0].b64_json | ||
|
|
||
| image_data = base64.b64decode(base64_image) | ||
| image = Image.open(BytesIO(image_data)) | ||
| image.save('inpaint_output.png') | ||
| ``` | ||
|
|
||
| ### Requesting outpainting with cURL | ||
|
|
||
| Outpainting extends an image beyond its original borders. Prepare two images: | ||
| - **outpaint_input.png** — the original image centered on a larger canvas (e.g. 768×768) with black borders | ||
| - **outpaint_mask.png** — white where the new content should be generated (the borders), black where the original image is | ||
|
|
||
|   | ||
|
|
||
| Linux | ||
| ```bash | ||
| curl http://localhost:8000/v3/images/edits \ | ||
| -F "model=OpenVINO/stable-diffusion-v1-5-int8-ov" \ | ||
| -F "prompt=a cat sitting on a bench in a park, photorealistic" \ | ||
| -F "image=@outpaint_input.png" \ | ||
| -F "mask=@outpaint_mask.png" \ | ||
| -F "num_inference_steps=50" \ | ||
| -F "size=768x768" | jq -r '.data[0].b64_json' | base64 --decode > outpaint_output.png | ||
| ``` | ||
|
|
||
| Windows Command Prompt | ||
| ```bat | ||
| curl http://localhost:8000/v3/images/edits ^ | ||
| -F "model=OpenVINO/stable-diffusion-v1-5-int8-ov" ^ | ||
| -F "prompt=a cat sitting on a bench in a park, photorealistic" ^ | ||
| -F "image=@outpaint_input.png" ^ | ||
| -F "mask=@outpaint_mask.png" ^ | ||
| -F "num_inference_steps=50" ^ | ||
| -F "size=768x768" | ||
| ``` | ||
|
|
||
| Expected output (`outpaint_output.png`): | ||
|
|
||
|  | ||
|
|
||
| ### Requesting outpainting with OpenAI Python package | ||
|
|
||
| ```python | ||
| from openai import OpenAI | ||
| import base64 | ||
| from io import BytesIO | ||
| from PIL import Image | ||
|
|
||
| client = OpenAI( | ||
| base_url="http://localhost:8000/v3", | ||
| api_key="unused" | ||
| ) | ||
|
|
||
| response = client.images.edit( | ||
| model="OpenVINO/stable-diffusion-v1-5-int8-ov", | ||
| image=open("outpaint_input.png", "rb"), | ||
| mask=open("outpaint_mask.png", "rb"), | ||
| prompt="a cat sitting on a bench in a park, photorealistic", | ||
| extra_body={ | ||
| "num_inference_steps": 50, | ||
| "size": "768x768" | ||
| } | ||
| ) | ||
| base64_image = response.data[0].b64_json | ||
|
|
||
| image_data = base64.b64decode(base64_image) | ||
| image = Image.open(BytesIO(image_data)) | ||
| image.save('outpaint_output.png') | ||
| ``` | ||
|
|
||
| ### Using dedicated inpainting models | ||
|
|
||
| For best inpainting/outpainting quality, use a dedicated inpainting model. These models have a 9-channel UNet specifically trained for masked generation. | ||
|
|
||
| Example models for inpainting: | ||
| - `stable-diffusion-v1-5/stable-diffusion-inpainting` — SD 1.5 based, 512×512 native resolution | ||
| - `diffusers/stable-diffusion-xl-1.0-inpainting-0.1` — SDXL based, 1024×1024 native resolution | ||
|
|
||
| For the full list see [supported image generation models](https://openvinotoolkit.github.io/openvino.genai/docs/supported-models/#image-generation-models). | ||
|
|
||
| > **Note:** Dedicated inpainting models only expose the `images/edits` endpoint (with mask). Text-to-image and image-to-image requests will return an error indicating the pipeline is not available for this model. Base models (e.g. `stable-diffusion-v1-5/stable-diffusion-v1-5`) support all endpoints including inpainting. | ||
|
|
||
| ::::{tab-set} | ||
| :::{tab-item} Docker (Linux) — CPU | ||
| :sync: docker | ||
| ```bash | ||
| mkdir -p models | ||
|
|
||
| docker run -d --rm --user $(id -u):$(id -g) -p 8000:8000 -v $(pwd)/models:/models/:rw \ | ||
| -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy \ | ||
| openvino/model_server:latest \ | ||
| --rest_port 8000 \ | ||
| --model_repository_path /models/ \ | ||
| --task image_generation \ | ||
| --source_model stable-diffusion-v1-5/stable-diffusion-inpainting \ | ||
| --weight-format int8 | ||
| ``` | ||
| ::: | ||
|
|
||
| :::{tab-item} Docker (Linux) — GPU | ||
| :sync: docker-gpu | ||
| ```bash | ||
| mkdir -p models | ||
|
|
||
| docker run -d --rm -p 8000:8000 -v $(pwd)/models:/models/:rw \ | ||
| --user $(id -u):$(id -g) --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \ | ||
| -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy \ | ||
| openvino/model_server:latest-gpu \ | ||
| --rest_port 8000 \ | ||
| --model_repository_path /models/ \ | ||
| --task image_generation \ | ||
| --source_model stable-diffusion-v1-5/stable-diffusion-inpainting \ | ||
| --weight-format int8 \ | ||
| --target_device GPU | ||
| ``` | ||
| ::: | ||
|
|
||
| :::{tab-item} Bare metal (Windows) | ||
| :sync: bare-metal | ||
| ```bat | ||
| mkdir models | ||
|
|
||
| ovms --rest_port 8000 ^ | ||
| --model_repository_path ./models/ ^ | ||
| --task image_generation ^ | ||
| --source_model stable-diffusion-v1-5/stable-diffusion-inpainting ^ | ||
| --weight-format int8 | ||
| ``` | ||
| ::: | ||
|
|
||
| :::: | ||
|
|
||
|
|
||
| ### Strength influence on final damage | ||
|
|
||
|  | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -49,6 +49,7 @@ static bool progress_bar(size_t step, size_t num_steps, ov::Tensor&) { | |
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Image Generation Step: {}/{}", step + 1, num_steps); | ||
| return false; | ||
| } | ||
|
|
||
| // written out separately to avoid msvc crashing when using try-catch in process method ... | ||
| static absl::Status generateTensor(ov::genai::Text2ImagePipeline& request, | ||
| const std::string& prompt, ov::AnyMap& requestOptions, | ||
|
|
@@ -94,6 +95,28 @@ static absl::Status generateTensorImg2Img(ov::genai::Image2ImagePipeline& reques | |
| return absl::OkStatus(); | ||
| } | ||
| // written out separately to avoid msvc crashing when using try-catch in process method ... | ||
| static absl::Status generateTensorInpainting(ov::genai::InpaintingPipeline& request, | ||
| const std::string& prompt, ov::Tensor image, ov::Tensor mask, ov::AnyMap& requestOptions, | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please apply.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
| std::unique_ptr<ov::Tensor>& images) { | ||
| try { | ||
| requestOptions.insert(ov::genai::callback(progress_bar)); | ||
| images = std::make_unique<ov::Tensor>(request.generate(prompt, image, mask, requestOptions)); | ||
| auto dims = images->get_shape(); | ||
| std::stringstream ss; | ||
| for (const auto& dim : dims) { | ||
| ss << dim << " "; | ||
| } | ||
| ss << " element type: " << images->get_element_type().get_type_name(); | ||
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "ImageGenCalculator generated inpainting tensor: {}", ss.str()); | ||
| } catch (const std::exception& e) { | ||
| SPDLOG_LOGGER_ERROR(llm_calculator_logger, "ImageGenCalculator Inpainting Error: {}", e.what()); | ||
| return absl::InternalError("Error during inpainting generation"); | ||
| } catch (...) { | ||
| return absl::InternalError("Unknown error during inpainting generation"); | ||
| } | ||
| return absl::OkStatus(); | ||
| } | ||
| // written out separately to avoid msvc crashing when using try-catch in process method ... | ||
| static absl::Status makeTensorFromString(const std::string& filePayload, ov::Tensor& imageTensor) { | ||
| try { | ||
| imageTensor = loadImageStbiFromMemory(filePayload); | ||
|
|
@@ -140,10 +163,12 @@ class ImageGenCalculator : public CalculatorBase { | |
| auto pipe = it->second; | ||
|
|
||
| auto payload = cc->Inputs().Tag(INPUT_TAG_NAME).Get<ovms::HttpPayload>(); | ||
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "ImageGenCalculator [Node: {}] Request URI: {}", cc->NodeName(), payload.uri); | ||
|
|
||
| std::unique_ptr<ov::Tensor> images; // output | ||
|
|
||
| if (absl::StartsWith(payload.uri, "/v3/images/generations")) { | ||
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "ImageGenCalculator [Node: {}] Routed to image generations path", cc->NodeName()); | ||
| if (payload.parsedJson->HasParseError()) | ||
| return absl::InvalidArgumentError("Failed to parse JSON"); | ||
|
|
||
|
|
@@ -154,13 +179,15 @@ class ImageGenCalculator : public CalculatorBase { | |
| SET_OR_RETURN(std::string, prompt, getPromptField(*payload.parsedJson)); | ||
| SET_OR_RETURN(ov::AnyMap, requestOptions, getImageGenerationRequestOptions(*payload.parsedJson, pipe->args)); | ||
|
|
||
| ov::genai::Text2ImagePipeline request = pipe->text2ImagePipeline->clone(); | ||
|
|
||
| auto status = generateTensor(request, prompt, requestOptions, images); | ||
| // single request assumption - use pipeline instance directly | ||
| if (!pipe->text2ImagePipeline) | ||
| return absl::FailedPreconditionError("Text-to-image pipeline is not available for this model"); | ||
| auto status = generateTensor(*pipe->text2ImagePipeline, prompt, requestOptions, images); | ||
|
atobiszei marked this conversation as resolved.
Outdated
|
||
| if (!status.ok()) { | ||
| return status; | ||
| } | ||
| } else if (absl::StartsWith(payload.uri, "/v3/images/edits")) { | ||
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "ImageGenCalculator [Node: {}] Routed to image edits path", cc->NodeName()); | ||
| if (payload.multipartParser->hasParseError()) | ||
| return absl::InvalidArgumentError("Failed to parse multipart data"); | ||
|
|
||
|
|
@@ -176,8 +203,29 @@ class ImageGenCalculator : public CalculatorBase { | |
|
|
||
| SET_OR_RETURN(ov::AnyMap, requestOptions, getImageEditRequestOptions(*payload.multipartParser, pipe->args)); | ||
|
|
||
| ov::genai::Image2ImagePipeline request = pipe->image2ImagePipeline->clone(); | ||
| status = generateTensorImg2Img(request, prompt, imageTensor, requestOptions, images); | ||
| SET_OR_RETURN(std::optional<std::string_view>, mask, getFileFromPayload(*payload.multipartParser, "mask")); | ||
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "ImageGenCalculator [Node: {}] Mask present: {}", cc->NodeName(), mask.has_value() && !mask.value().empty()); | ||
|
|
||
| if (mask.has_value() && !mask.value().empty()) { | ||
| if (!pipe->inpaintingPipeline) | ||
| return absl::FailedPreconditionError("Inpainting pipeline is not available for this model"); | ||
| // Inpainting path — uses the pre-built InpaintingPipeline that was loaded from disk | ||
| // during initialization. Do NOT derive InpaintingPipeline from Image2ImagePipeline | ||
| // at request time — that derivation direction causes a SEGFAULT in GenAI. | ||
|
atobiszei marked this conversation as resolved.
Outdated
|
||
| ov::Tensor maskTensor; | ||
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "ImageGenCalculator [Node: {}] Inpainting: decoding mask tensor", cc->NodeName()); | ||
| status = makeTensorFromString(std::string(mask.value()), maskTensor); | ||
| if (!status.ok()) { | ||
|
atobiszei marked this conversation as resolved.
|
||
| return status; | ||
| } | ||
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "ImageGenCalculator [Node: {}] Inpainting: mask tensor decoded, invoking generate()", cc->NodeName()); | ||
| status = generateTensorInpainting(*pipe->inpaintingPipeline, prompt, imageTensor, maskTensor, requestOptions, images); | ||
| } else { | ||
| if (!pipe->image2ImagePipeline) | ||
| return absl::FailedPreconditionError("Image-to-image pipeline is not available for this model"); | ||
| // image-to-image path - single pipeline instance, no clone needed | ||
| status = generateTensorImg2Img(*pipe->image2ImagePipeline, prompt, imageTensor, requestOptions, images); | ||
|
atobiszei marked this conversation as resolved.
Outdated
|
||
| } | ||
| if (!status.ok()) { | ||
| return status; | ||
| } | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.