How can I do image-to-image batched inference with FLUX.2 Klein, with one-to-one input image-prompt pairing? #13431
Replies: 2 comments
-
|
You can iterate through the prompt/image pairs like so: outputs = []
for image, prompt in zip(images, prompts):
output = pipe(
prompt=prompt,
image=image,
num_inference_steps=4,
guidance_scale=1.0
).images[0]
outputs.append(output) |
Beta Was this translation helpful? Give feedback.
-
|
The issue you're hitting is fundamental to how Why this happensInternally, the pipeline stacks prompts and images into batch dimensions. When both are lists, the standard Solution 1: Manual one-to-one loop (simplest, slightly slower)The most robust approach today is to call the pipeline individually for each pair and collect results: outputs = []
for prompt, img in zip(prompts, images):
out = pipe(
prompt=prompt,
image=img,
num_inference_steps=4,
guidance_scale=1.0,
).images[0]
outputs.append(out)This guarantees 1:1 pairing. The overhead is usually negligible for Solution 2: Collapse into a single batched call with paired indicesIf you absolutely need one batched pipeline call (e.g. for VAE encode efficiency), you can duplicate prompts and images so each appears exactly once in the same batch order: # Flatten: [prompt_a, prompt_b, prompt_c] and [img_a, img_b, img_c]
# become batch items 0,1,2 with direct correspondence
outputs = pipe(
prompt=prompts, # length 3
image=images, # length 3
num_inference_steps=4,
guidance_scale=1.0,
).images # length 3Wait — this is exactly what you tried. The key detail is whether
For The real fixCheck if you're using a custom wrapper or a community pipeline that adds broadcasting. The base
Diagnostic stepAdd a debug print right before the print(f"Prompts: {len(prompts)}, Images: {len(images)}")
print(f"Prompt[0]: {prompts[0]}")
print(f"Image[0] size: {images[0].size}")Then inspect the output length. If you get 9 images from 3 prompts + 3 images, you confirmed Cartesian-product behavior. Recommended path forwardFor production use with FLUX.2 Klein, I'd go with Solution 1 (explicit loop) until you profile it. With 4 steps and a 9B model, the loop overhead is likely <2% of total latency. If you need true batching for throughput, use the If you can share:
…I can verify whether this is a known issue in that specific version or a custom pipeline behavior. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I implemented something similar to the code below, but it appears that the model treats all provided images as shared context. As a result, each prompt is paired with all passed images (e.g., prompt_a is processed with image_a, image_b, and image_c).
What I am trying to achieve instead is batched inference with one-to-one pairing, so each prompt is processed only with its corresponding image (e.g., prompt_a with image_a, prompt_b with image_b, etc.).
What is the correct way to structure the inputs or batching logic to ensure this one-to-one mapping?
I would greatly appreciate any help with this. Thank you for your time.
Beta Was this translation helpful? Give feedback.
All reactions