A GenAI-powered catalog enrichment system that transforms basic product images into comprehensive, rich catalog entries using NVIDIA's Nemotron VLM for content analysis, Nemotron LLM for intelligent prompt planning, FLUX Kontext model for generating high-quality product variations, and TRELLIS model for 3D asset generation.
- AI-Powered Analysis: NVIDIA Nemotron VLM for intelligent product understanding
- Smart Categorization: Automatic classification into predefined product categories
- Intelligent Prompt Planning: Context-aware image variation planning based on regional aesthetics
- Multi-Language Support: Generate product titles and descriptions in 10 regional locales
- Cultural Image Generation: Create culturally-appropriate product backgrounds (Spanish courtyards, Mexican family spaces, British formal settings)
- Quality Evaluation: Automated VLM-based quality assessment of generated images with detailed scoring
- 3D Asset Generation: Transform 2D product images into interactive 3D GLB models using Microsoft TRELLIS
- Product FAQ Generation: Automatically generate product FAQs from enriched catalog data, with optional product manual PDF upload for richer FAQs (up to 10) via stateless targeted RAG
- Policy Compliance: Upload policy PDFs and automatically check product listings against them using RAG + Milvus
- Protocol Schema Export: Export enriched product data as ACP (Agentic Commerce Protocol) and UCP (Unified Commerce Protocol) compliant schemas with LLM-extracted structured attributes
- Modular API: Separate endpoints for VLM analysis, FAQ generation, image generation, 3D asset generation, and protocol schema export
- API Documentation - Detailed API endpoints, parameters, and examples
- Docker Deployment Guide - Docker and Docker Compose setup instructions
- Product Requirements (PRD) - Product requirements and feature specifications
- Policy Compliance - How policy compliance checking works
- Product Manual for FAQs - How product manual PDFs enrich FAQ generation
- AI Agent Guidelines - Instructions for AI assistants working on this project
Backend:
- FastAPI + Uvicorn
- Python 3.11+
Frontend:
- Next.js 15 with React 19
- TypeScript
- Kaizen UI (KUI) design system
- Model-viewer for 3D assets
AI Models:
- NVIDIA Nemotron VLM (vision-language model)
- NVIDIA Nemotron LLM (prompt planning)
- NVIDIA Embeddings (Policy Compliance)
- FLUX models (image generation)
- Microsoft TRELLIS (3D generation)
Infrastructure:
- Docker & Docker Compose
- NVIDIA NIM containers
- HuggingFace model hosting
- Milvus vector database for policy PDF retrieval
For self-hosting the NIM microservices locally, the following GPU requirements apply:
| Model | Purpose | Minimum GPU | Recommended GPU |
|---|---|---|---|
| Nemotron-Nano-12B-V2-VL | Vision-Language Analysis | 1× A100 | 1× H100 |
| Nemotron-Nano-V3 | Prompt Planning (LLM) | 1× A100 | 1× H100 |
| nv-embedqa | Embeddings (Policy Compliance) | 1× A100 | 1× H100 |
| FLUX Kontext Dev | Image Generation | 1× H100 | 1× H100 |
| Microsoft TRELLIS | 3D Asset Generation | 1× L40S | 1× H100 |
Total recommended setup: 3× H100 + 1× L40S (or 4× H100 for uniform configuration). Embeddings model can be deploy on the same GPU as Flux or Trellis models.
- Docker 28.0+
- Docker compose
- Python 3.11+
uvpackage manager- NVIDIA API key for VLM/LLM services
- HuggingFace token for FLUX image generation
Copy the example env file and fill in your keys:
cp .env.example .envGetting API Keys:
- NVIDIA API Key: Get one here
- HuggingFace Token: Get one here
The FLUX.1-Kontext-Dev NIM uses a model that is for non-commercial use. Contact sales@blackforestlabs.ai for commercial terms.
Make sure you have accepted https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev and https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev-onnx License Agreements and Acceptable Use Policy, check if your HF token has correct permissions.
-
Install uv (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh -
Create and activate virtual environment:
uv venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies:
uv pip install -e . -
Configure NVIDIA NIM endpoints:
IMPORTANT: Self-Hosted NIMs Required
For local development, you must self-host the following NVIDIA NIM containers:
- Nemotron VLM (vision-language model)
- Nemotron LLM (prompt planning)
- FLUX Kontext dev (image generation)
- TRELLIS (3D asset generation)
Update the URLs in
shared/config/config.yamlto point to your self-hosted NIM endpoints:vlm: url: "http://localhost:8001/v1" # Your VLM NIM endpoint model: "nvidia/nemotron-nano-12b-v2-vl" llm: url: "http://localhost:8002/v1" # Your LLM NIM endpoint model: "nvidia/nemotron-nano-v3" flux: url: "http://localhost:8003/v1/infer" # Your FLUX NIM endpoint trellis: url: "http://localhost:8004/v1/infer" # Your TRELLIS NIM endpoint embeddings: url: "http://localhost:8005/v1" #Your Embeddings NIM endpoint model: "nvidia/nv-embedqa-e5-v5"
See the Docker Deployment Guide for instructions on deploying these NIMs.
-
Run the backend:
uvicorn --app-dir src backend.main:app --host 0.0.0.0 --port 8000 --reload
-
Run the frontend (optional):
cd src/ui pnpm install pnpm dev
The frontend at http://localhost:3000.
The Docker deployment includes all required self-hosted NVIDIA NIM containers (Nemotron VLM, Nemotron LLM, FLUX, and TRELLIS). If you want to use uploaded policy PDFs in the UI, start the companion Milvus stack from docker-compose.rag.yml as well. The shared/config/config.yaml is pre-configured with the correct service URLs for Docker networking.
For complete Docker deployment instructions, see the Docker Deployment Guide.
Quick Docker Start:
-
Create
.envfile with required credentials:NGC_API_KEY=your_ngc_api_key_here HF_TOKEN=your_huggingface_token_here
-
Create cache directories:
export LOCAL_NIM_CACHE=~/.cache/nim mkdir -p "$LOCAL_NIM_CACHE" chmod a+w "$LOCAL_NIM_CACHE"
-
Create the shared Docker network:
docker network create catalog-network || true
-
Start the policy RAG stack:
docker compose -f docker-compose.rag.yml up -d
-
Start the application stack:
docker compose up -d
-
Access the application:
- Frontend:
http://localhost:3000 - Backend API:
http://localhost:8000 - Health Check:
http://localhost:8000/health - Milvus:
localhost:19530 - MinIO Console:
http://localhost:9001
- Frontend:
The system provides the following endpoints:
POST /vlm/analyze- Fast VLM/LLM analysisPOST /vlm/faqs- Product FAQ generation (supports optional manual knowledge)POST /vlm/manual/extract- Extract knowledge from a product manual PDF for FAQ enrichmentPOST /generate/variation- Image generation with FLUXPOST /generate/3d- 3D asset generation with TRELLISPOST /protocols/generate- ACP & UCP protocol schema generation
- Recommended image size: For best results, use product images that are ideally 500×500 pixels or higher (JPEG or PNG).
For detailed API documentation with request/response examples, see API Documentation.
GOVERNING TERMS: The Blueprint scripts are governed by Apache License, Version 2.0, and enables use of separate open source and proprietary software governed by their respective licenses: NVIDIA-Nemotron-Nano-12B-v2-VL, Nemotron-Nano-V3, nv-embedqa-e5-v5, FLUX.1-Kontext-Dev, and Microsoft TRELLIS.
ADDITIONAL INFORMATION: FLUX.1-Kontext-Dev license: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev/blob/main/LICENSE.md.
Third-Party Community Consideration: The FLUX Kontext model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to: black-forest-labs/FLUX.1-Kontext-dev Model Card - https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev.
This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
