Skip to content

NVIDIA-AI-Blueprints/Retail-Catalog-Enrichment

Retail Catalog Enrichment Blueprint

NVIDIA Logo

A GenAI-powered catalog enrichment system that transforms basic product images into comprehensive, rich catalog entries using NVIDIA's Nemotron VLM for content analysis, Nemotron LLM for intelligent prompt planning, FLUX Kontext model for generating high-quality product variations, and TRELLIS model for 3D asset generation.

Demo

▶️ Watch the Demo Video

Architecture

Shopping Assistant Diagram

Key Features

  • AI-Powered Analysis: NVIDIA Nemotron VLM for intelligent product understanding
  • Smart Categorization: Automatic classification into predefined product categories
  • Intelligent Prompt Planning: Context-aware image variation planning based on regional aesthetics
  • Multi-Language Support: Generate product titles and descriptions in 10 regional locales
  • Cultural Image Generation: Create culturally-appropriate product backgrounds (Spanish courtyards, Mexican family spaces, British formal settings)
  • Quality Evaluation: Automated VLM-based quality assessment of generated images with detailed scoring
  • 3D Asset Generation: Transform 2D product images into interactive 3D GLB models using Microsoft TRELLIS
  • Product FAQ Generation: Automatically generate product FAQs from enriched catalog data, with optional product manual PDF upload for richer FAQs (up to 10) via stateless targeted RAG
  • Policy Compliance: Upload policy PDFs and automatically check product listings against them using RAG + Milvus
  • Protocol Schema Export: Export enriched product data as ACP (Agentic Commerce Protocol) and UCP (Unified Commerce Protocol) compliant schemas with LLM-extracted structured attributes
  • Modular API: Separate endpoints for VLM analysis, FAQ generation, image generation, 3D asset generation, and protocol schema export

Documentation

Tech Stack

Backend:

  • FastAPI + Uvicorn
  • Python 3.11+

Frontend:

  • Next.js 15 with React 19
  • TypeScript
  • Kaizen UI (KUI) design system
  • Model-viewer for 3D assets

AI Models:

  • NVIDIA Nemotron VLM (vision-language model)
  • NVIDIA Nemotron LLM (prompt planning)
  • NVIDIA Embeddings (Policy Compliance)
  • FLUX models (image generation)
  • Microsoft TRELLIS (3D generation)

Infrastructure:

  • Docker & Docker Compose
  • NVIDIA NIM containers
  • HuggingFace model hosting
  • Milvus vector database for policy PDF retrieval

Minimum System Requirements

Hardware Requirements

For self-hosting the NIM microservices locally, the following GPU requirements apply:

Model Purpose Minimum GPU Recommended GPU
Nemotron-Nano-12B-V2-VL Vision-Language Analysis 1× A100 1× H100
Nemotron-Nano-V3 Prompt Planning (LLM) 1× A100 1× H100
nv-embedqa Embeddings (Policy Compliance) 1× A100 1× H100
FLUX Kontext Dev Image Generation 1× H100 1× H100
Microsoft TRELLIS 3D Asset Generation 1× L40S 1× H100

Total recommended setup: 3× H100 + 1× L40S (or 4× H100 for uniform configuration). Embeddings model can be deploy on the same GPU as Flux or Trellis models.

Deployment Options

  • Docker 28.0+
  • Docker compose

Quick Start

Prerequisites

  • Python 3.11+
  • uv package manager
  • NVIDIA API key for VLM/LLM services
  • HuggingFace token for FLUX image generation

Environment Setup

Copy the example env file and fill in your keys:

cp .env.example .env

Getting API Keys:

The FLUX.1-Kontext-Dev NIM uses a model that is for non-commercial use. Contact sales@blackforestlabs.ai for commercial terms.

Make sure you have accepted https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev and https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev-onnx License Agreements and Acceptable Use Policy, check if your HF token has correct permissions.

Local Development (Without Docker)

  1. Install uv (if not already installed):

    curl -LsSf https://astral.sh/uv/install.sh | sh
  2. Create and activate virtual environment:

    uv venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install dependencies:

    uv pip install -e .
  4. Configure NVIDIA NIM endpoints:

    IMPORTANT: Self-Hosted NIMs Required

    For local development, you must self-host the following NVIDIA NIM containers:

    • Nemotron VLM (vision-language model)
    • Nemotron LLM (prompt planning)
    • FLUX Kontext dev (image generation)
    • TRELLIS (3D asset generation)

    Update the URLs in shared/config/config.yaml to point to your self-hosted NIM endpoints:

    vlm:
      url: "http://localhost:8001/v1"  # Your VLM NIM endpoint
      model: "nvidia/nemotron-nano-12b-v2-vl"
    
    llm:
      url: "http://localhost:8002/v1"  # Your LLM NIM endpoint
      model: "nvidia/nemotron-nano-v3"
    
    flux:
      url: "http://localhost:8003/v1/infer"  # Your FLUX NIM endpoint
    
    trellis:
      url: "http://localhost:8004/v1/infer"  # Your TRELLIS NIM endpoint
    
    embeddings:
      url: "http://localhost:8005/v1" #Your Embeddings NIM endpoint
      model: "nvidia/nv-embedqa-e5-v5"

    See the Docker Deployment Guide for instructions on deploying these NIMs.

  5. Run the backend:

    uvicorn --app-dir src backend.main:app --host 0.0.0.0 --port 8000 --reload
  6. Run the frontend (optional):

    cd src/ui
    pnpm install
    pnpm dev

The frontend at http://localhost:3000.

Docker Deployment (Self-Hosted NIMs)

The Docker deployment includes all required self-hosted NVIDIA NIM containers (Nemotron VLM, Nemotron LLM, FLUX, and TRELLIS). If you want to use uploaded policy PDFs in the UI, start the companion Milvus stack from docker-compose.rag.yml as well. The shared/config/config.yaml is pre-configured with the correct service URLs for Docker networking.

For complete Docker deployment instructions, see the Docker Deployment Guide.

Quick Docker Start:

  1. Create .env file with required credentials:

    NGC_API_KEY=your_ngc_api_key_here
    HF_TOKEN=your_huggingface_token_here
  2. Create cache directories:

    export LOCAL_NIM_CACHE=~/.cache/nim
    mkdir -p "$LOCAL_NIM_CACHE"
    chmod a+w "$LOCAL_NIM_CACHE"
  3. Create the shared Docker network:

    docker network create catalog-network || true
  4. Start the policy RAG stack:

    docker compose -f docker-compose.rag.yml up -d
  5. Start the application stack:

    docker compose up -d
  6. Access the application:

    • Frontend: http://localhost:3000
    • Backend API: http://localhost:8000
    • Health Check: http://localhost:8000/health
    • Milvus: localhost:19530
    • MinIO Console: http://localhost:9001

API Endpoints

The system provides the following endpoints:

  • POST /vlm/analyze - Fast VLM/LLM analysis
  • POST /vlm/faqs - Product FAQ generation (supports optional manual knowledge)
  • POST /vlm/manual/extract - Extract knowledge from a product manual PDF for FAQ enrichment
  • POST /generate/variation - Image generation with FLUX
  • POST /generate/3d - 3D asset generation with TRELLIS
  • POST /protocols/generate - ACP & UCP protocol schema generation

Image Input Guidance

  • Recommended image size: For best results, use product images that are ideally 500×500 pixels or higher (JPEG or PNG).

For detailed API documentation with request/response examples, see API Documentation.

License

GOVERNING TERMS: The Blueprint scripts are governed by Apache License, Version 2.0, and enables use of separate open source and proprietary software governed by their respective licenses: NVIDIA-Nemotron-Nano-12B-v2-VL, Nemotron-Nano-V3, nv-embedqa-e5-v5, FLUX.1-Kontext-Dev, and Microsoft TRELLIS.

ADDITIONAL INFORMATION: FLUX.1-Kontext-Dev license: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev/blob/main/LICENSE.md.

Third-Party Community Consideration: The FLUX Kontext model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to: black-forest-labs/FLUX.1-Kontext-dev Model Card - https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev.

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

About

A GenAI-powered catalog enrichment system that transforms basic product images into comprehensive, rich catalog

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors