How can I do image-to-image batched inference with FLUX.2 Klein, with one-to-one input image-prompt pairing? #13431

joshchristo · 2026-04-07T17:57:23Z

joshchristo
Apr 7, 2026

I implemented something similar to the code below, but it appears that the model treats all provided images as shared context. As a result, each prompt is paired with all passed images (e.g., prompt_a is processed with image_a, image_b, and image_c).

What I am trying to achieve instead is batched inference with one-to-one pairing, so each prompt is processed only with its corresponding image (e.g., prompt_a with image_a, prompt_b with image_b, etc.).

What is the correct way to structure the inputs or batching logic to ensure this one-to-one mapping?

I would greatly appreciate any help with this. Thank you for your time.

import torch
from diffusers import Flux2KleinPipeline
from diffusers.utils import load_image

model_id = "black-forest-labs/FLUX.2-klein-9B"
pipe = Flux2KleinPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)

prompts = [
    "prompt_a",
    "prompt_b",
    "prompt_c"
]

images = [
    load_image("path/to/image_a.jpg"),
    load_image("path/to/image_b.jpg"),
    load_image("path/to/image_c.jpg")
]

outputs = pipe(
    prompt=prompts,
    image=images,
    num_inference_steps=4,  
    guidance_scale=1.0, 
).images

missionfloyd · 2026-04-11T22:45:03Z

missionfloyd
Apr 11, 2026

You can iterate through the prompt/image pairs like so:

outputs = []

for image, prompt in zip(images, prompts):
    output = pipe(
        prompt=prompt,
        image=image,
        num_inference_steps=4,
        guidance_scale=1.0
    ).images[0]

    outputs.append(output)

0 replies

x-tahosin · 2026-04-25T12:36:49Z

x-tahosin
Apr 25, 2026

The issue you're hitting is fundamental to how Flux2KleinPipeline (and most diffusion pipelines) handle the image argument when it's a list: the pipeline broadcasts every image against every prompt, producing a Cartesian product, rather than zipping them one-to-one.

Why this happens

Internally, the pipeline stacks prompts and images into batch dimensions. When both are lists, the standard pipe(...) call doesn't have an explicit image_prompt_pairs concept — it treats prompt as batch dimension N and image as batch dimension M, giving you N×M outputs. That's why prompt_a gets paired with image_a, image_b, and image_c.

Solution 1: Manual one-to-one loop (simplest, slightly slower)

The most robust approach today is to call the pipeline individually for each pair and collect results:

outputs = []
for prompt, img in zip(prompts, images):
    out = pipe(
        prompt=prompt,
        image=img,
        num_inference_steps=4,
        guidance_scale=1.0,
    ).images[0]
    outputs.append(out)

This guarantees 1:1 pairing. The overhead is usually negligible for num_inference_steps=4 and a 9B model on a decent GPU, because the majority of cost is model forward passes, not Python loop overhead.

Solution 2: Collapse into a single batched call with paired indices

If you absolutely need one batched pipeline call (e.g. for VAE encode efficiency), you can duplicate prompts and images so each appears exactly once in the same batch order:

# Flatten: [prompt_a, prompt_b, prompt_c] and [img_a, img_b, img_c]
# become batch items 0,1,2 with direct correspondence
outputs = pipe(
    prompt=prompts,        # length 3
    image=images,          # length 3
    num_inference_steps=4,
    guidance_scale=1.0,
).images  # length 3

Wait — this is exactly what you tried. The key detail is whether Flux2KleinPipeline specifically respects list-length alignment or does implicit broadcasting. From the Diffusers source for most pipelines:

prompt is processed through the text encoder → shape (batch_size, seq_len, dim)
image is processed through the VAE encoder → shape (batch_size, channels, h, w)
If both lists have the same length, they SHOULD zip naturally... unless the pipeline does an explicit itertools.product-style expansion internally.

For FluxPipeline and variants, the image-to-image path uses prepare_latents which typically expects image to be a single PIL image or a tensor. When a list is passed, it may be converting to a tensor stack via torch.stack([...]) — which DOES preserve order if the list lengths match.

The real fix

Check if you're using a custom wrapper or a community pipeline that adds broadcasting. The base DiffusionPipeline generally zips same-length lists. If you see cross-pairing, it's likely:

A custom pipeline implementation
You're passing a nested list structure that gets unpacked unexpectedly
There's a batch_size or num_images_per_prompt override being applied

Diagnostic step

Add a debug print right before the pipe() call:

print(f"Prompts: {len(prompts)}, Images: {len(images)}")
print(f"Prompt[0]: {prompts[0]}")
print(f"Image[0] size: {images[0].size}")

Then inspect the output length. If you get 9 images from 3 prompts + 3 images, you confirmed Cartesian-product behavior.

Recommended path forward

For production use with FLUX.2 Klein, I'd go with Solution 1 (explicit loop) until you profile it. With 4 steps and a 9B model, the loop overhead is likely <2% of total latency. If you need true batching for throughput, use the torch.compile() path on the pipeline and batch at the framework level rather than relying on list semantics.

If you can share:

Exact diffusers version (diffusers.__version__)
Whether you installed from main or a specific release
The full traceback (if any) when running your original code

…I can verify whether this is a known issue in that specific version or a custom pipeline behavior.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I do image-to-image batched inference with FLUX.2 Klein, with one-to-one input image-prompt pairing? #13431

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How can I do image-to-image batched inference with FLUX.2 Klein, with one-to-one input image-prompt pairing? #13431

Uh oh!

Uh oh!

joshchristo Apr 7, 2026

Replies: 2 comments

Uh oh!

Uh oh!

missionfloyd Apr 11, 2026

Uh oh!

x-tahosin Apr 25, 2026

Why this happens

Solution 1: Manual one-to-one loop (simplest, slightly slower)

Solution 2: Collapse into a single batched call with paired indices

The real fix

Diagnostic step

Recommended path forward

joshchristo
Apr 7, 2026

missionfloyd
Apr 11, 2026

x-tahosin
Apr 25, 2026