Skip to content

Latest commit

 

History

History
119 lines (91 loc) · 3.52 KB

File metadata and controls

119 lines (91 loc) · 3.52 KB

Image-Gen Service Implementation Progress

Overview

FLUX.2 [klein] 9B distilled image generation service following the lip-sync-v2 pattern.

Two-phase approach:

  • Phase 1: Test directly on Vast AI (no Docker) - benchmark VRAM with reference images
  • Phase 2: Dockerize once benchmarks confirm GPU requirements

Phase 1: Vast AI Testing

Step 1: Project Scaffolding

  • Create pyproject.toml
  • Create Makefile
  • Create .gitignore
  • Create imagegen/__init__.py
  • Create imagegen/server/__init__.py

Step 2: Pydantic Schemas

  • Create imagegen/server/schemas.py
    • HealthResponse
    • GenerateRequest/Response
    • EditRequest/Response

Step 3: VRAM Benchmark Script

  • Create scripts/benchmark_vram.py
    • Model loading with/without CPU offload
    • Text-to-image at various resolutions
    • Multi-reference editing tests
    • Peak VRAM logging

Step 4: FLUX Pipeline Wrapper

  • Create imagegen/server/flux_pipeline.py
    • FluxConfig dataclass
    • FluxPipeline class with load/unload/generate/edit methods
    • CPU offload for RTX 4090 compatibility

Step 5: FastAPI Server

  • Create imagegen/server/main.py
    • Lifespan for model loading
    • GET /health endpoint
    • POST /generate endpoint
    • POST /edit endpoint
    • GET / root endpoint

Step 6: Documentation

  • Create README-VASTAI.md - Setup instructions for Vast AI
  • Create scripts/test_api.py - API testing script

Step 7: Tests

  • Create tests/conftest.py - Pytest fixtures
  • Create tests/test_schemas.py - Schema validation tests
  • Create README.md - Main project documentation

Phase 2: Dockerization (After Benchmarks Pass)

Docker Setup

  • Create Dockerfile based on lip-sync-v2
  • Test Docker build locally
  • Test on Vast AI with Docker

CI/CD

  • Create .github/workflows/image-gen.yml
  • Unit tests for schemas
  • Docker build and push to GHCR

Environment Variables

Variable Required Default Description
HUGGING_FACE_TOKEN Yes - HF token for gated model
PRELOAD_MODELS No true Load model on startup
QUANTIZATION No none Quantization mode: none, fp8, int8
LOG_LEVEL No INFO Logging level
HF_HOME No ~/.cache/huggingface Model cache dir

Verification Checklist

Phase 1 (Vast AI)

  • Run scripts/benchmark_vram.py on RTX 4090
  • Confirm model loads within 24GB with CPU offload
  • Record max reference images supported before OOM
  • Start server: uvicorn imagegen.server.main:app --host 0.0.0.0 --port 7000
  • Test /health endpoint
  • Test /generate with curl/httpie
  • Test /edit with 1, 2, 3, 4 reference images

Phase 2 (Docker)

  • Build: docker build -t image-gen:latest .
  • Run: docker run --gpus all -p 7000:7000 -e HUGGING_FACE_TOKEN=xxx image-gen:latest
  • Wait for health check to pass
  • Test all endpoints

Notes

  • Model: black-forest-labs/FLUX.2-klein-9B

    • 9B parameter rectified flow transformer
    • Step-distilled to 4 steps
    • Supports both text-to-image and multi-reference image editing
  • Memory Requirements:

    • BF16 (no quantization): ~29GB VRAM - best quality
    • FP8 quantization: ~18GB VRAM - minor quality trade-off
  • GPU Targets:

    • 32GB+ (A6000, etc.): BF16 without quantization (recommended)
    • 24GB (RTX 4090): FP8 quantization via TorchAO
  • Reference files: /Users/biz/Documents/projects/ScenemaAI/models/lip-sync-v2/