Retail Catalog Enrichment Blueprint

A GenAI-powered catalog enrichment system that transforms basic product images into comprehensive, rich catalog entries using NVIDIA's Nemotron VLM for content analysis, Nemotron LLM for intelligent prompt planning, FLUX Kontext model for generating high-quality product variations, and TRELLIS model for 3D asset generation.

Demo

▶️ Watch the Demo Video

Architecture

Key Features

AI-Powered Analysis: NVIDIA Nemotron VLM for intelligent product understanding
Smart Categorization: Automatic classification into predefined product categories
Intelligent Prompt Planning: Context-aware image variation planning based on regional aesthetics
Multi-Language Support: Generate product titles and descriptions in 10 regional locales
Cultural Image Generation: Create culturally-appropriate product backgrounds (Spanish courtyards, Mexican family spaces, British formal settings)
Quality Evaluation: Automated VLM-based quality assessment of generated images with detailed scoring
3D Asset Generation: Transform 2D product images into interactive 3D GLB models using Microsoft TRELLIS
Product FAQ Generation: Automatically generate product FAQs from enriched catalog data, with optional product manual PDF upload for richer FAQs (up to 10) via stateless targeted RAG
Policy Compliance: Upload policy PDFs and automatically check product listings against them using RAG + Milvus
Protocol Schema Export: Export enriched product data as ACP (Agentic Commerce Protocol) and UCP (Unified Commerce Protocol) compliant schemas with LLM-extracted structured attributes
Modular API: Separate endpoints for VLM analysis, FAQ generation, image generation, 3D asset generation, and protocol schema export

Documentation

API Documentation - Detailed API endpoints, parameters, and examples
Docker Deployment Guide - Docker and Docker Compose setup instructions
Product Requirements (PRD) - Product requirements and feature specifications
Policy Compliance - How policy compliance checking works
Product Manual for FAQs - How product manual PDFs enrich FAQ generation
AI Agent Guidelines - Instructions for AI assistants working on this project

Tech Stack

Backend:

FastAPI + Uvicorn
Python 3.11+

Frontend:

Next.js 15 with React 19
TypeScript
Kaizen UI (KUI) design system
Model-viewer for 3D assets

AI Models:

NVIDIA Nemotron VLM (vision-language model)
NVIDIA Nemotron LLM (prompt planning)
NVIDIA Embeddings (Policy Compliance)
FLUX models (image generation)
Microsoft TRELLIS (3D generation)

Infrastructure:

Docker & Docker Compose
NVIDIA NIM containers
HuggingFace model hosting
Milvus vector database for policy PDF retrieval

Minimum System Requirements

Hardware Requirements

For self-hosting the NIM microservices locally, the following GPU requirements apply:

Model	Purpose	Minimum GPU	Recommended GPU
Nemotron-Nano-12B-V2-VL	Vision-Language Analysis	1× A100	1× H100
Nemotron-Nano-V3	Prompt Planning (LLM)	1× A100	1× H100
nv-embedqa	Embeddings (Policy Compliance)	1× A100	1× H100
FLUX Kontext Dev	Image Generation	1× H100	1× H100
Microsoft TRELLIS	3D Asset Generation	1× L40S	1× H100

Total recommended setup: 3× H100 + 1× L40S (or 4× H100 for uniform configuration). Embeddings model can be deploy on the same GPU as Flux or Trellis models.

Deployment Options

Docker 28.0+
Docker compose

Quick Start

Prerequisites

Python 3.11+
uv package manager
NVIDIA API key for VLM/LLM services
HuggingFace token for FLUX image generation

Environment Setup

Copy the example env file and fill in your keys:

cp .env.example .env

Getting API Keys:

NVIDIA API Key: Get one here
HuggingFace Token: Get one here

The FLUX.1-Kontext-Dev NIM uses a model that is for non-commercial use. Contact sales@blackforestlabs.ai for commercial terms.

Make sure you have accepted https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev and https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev-onnx License Agreements and Acceptable Use Policy, check if your HF token has correct permissions.

Local Development (Without Docker)

Install uv (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Create and activate virtual environment:

uv venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:
```
uv pip install -e .
```

Configure NVIDIA NIM endpoints:

IMPORTANT: Self-Hosted NIMs Required

For local development, you must self-host the following NVIDIA NIM containers:

Nemotron VLM (vision-language model)
Nemotron LLM (prompt planning)
FLUX Kontext dev (image generation)
TRELLIS (3D asset generation)

Update the URLs in shared/config/config.yaml to point to your self-hosted NIM endpoints:

vlm:
  url: "http://localhost:8001/v1"  # Your VLM NIM endpoint
  model: "nvidia/nemotron-nano-12b-v2-vl"

llm:
  url: "http://localhost:8002/v1"  # Your LLM NIM endpoint
  model: "nvidia/nemotron-nano-v3"

flux:
  url: "http://localhost:8003/v1/infer"  # Your FLUX NIM endpoint

trellis:
  url: "http://localhost:8004/v1/infer"  # Your TRELLIS NIM endpoint

embeddings:
  url: "http://localhost:8005/v1" #Your Embeddings NIM endpoint
  model: "nvidia/nv-embedqa-e5-v5"

See the Docker Deployment Guide for instructions on deploying these NIMs.

Run the backend:

uvicorn --app-dir src backend.main:app --host 0.0.0.0 --port 8000 --reload

Run the frontend (optional):
```
cd src/ui
pnpm install
pnpm dev
```

The frontend at http://localhost:3000.

Docker Deployment (Self-Hosted NIMs)

The Docker deployment includes all required self-hosted NVIDIA NIM containers (Nemotron VLM, Nemotron LLM, FLUX, and TRELLIS). If you want to use uploaded policy PDFs in the UI, start the companion Milvus stack from docker-compose.rag.yml as well. The shared/config/config.yaml is pre-configured with the correct service URLs for Docker networking.

For complete Docker deployment instructions, see the Docker Deployment Guide.

Quick Docker Start:

Create .env file with required credentials:

NGC_API_KEY=your_ngc_api_key_here
HF_TOKEN=your_huggingface_token_here

Create cache directories:

export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
chmod a+w "$LOCAL_NIM_CACHE"

Create the shared Docker network:

docker network create catalog-network || true

Start the policy RAG stack:

docker compose -f docker-compose.rag.yml up -d

Start the application stack:
```
docker compose up -d
```
Access the application:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- Health Check: http://localhost:8000/health
- Milvus: localhost:19530
- MinIO Console: http://localhost:9001

API Endpoints

The system provides the following endpoints:

POST /vlm/analyze - Fast VLM/LLM analysis
POST /vlm/faqs - Product FAQ generation (supports optional manual knowledge)
POST /vlm/manual/extract - Extract knowledge from a product manual PDF for FAQ enrichment
POST /generate/variation - Image generation with FLUX
POST /generate/3d - 3D asset generation with TRELLIS
POST /protocols/generate - ACP & UCP protocol schema generation

Image Input Guidance

Recommended image size: For best results, use product images that are ideally 500×500 pixels or higher (JPEG or PNG).

For detailed API documentation with request/response examples, see API Documentation.

License

GOVERNING TERMS: The Blueprint scripts are governed by Apache License, Version 2.0, and enables use of separate open source and proprietary software governed by their respective licenses: NVIDIA-Nemotron-Nano-12B-v2-VL, Nemotron-Nano-V3, nv-embedqa-e5-v5, FLUX.1-Kontext-Dev, and Microsoft TRELLIS.

ADDITIONAL INFORMATION: FLUX.1-Kontext-Dev license: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev/blob/main/LICENSE.md.

Third-Party Community Consideration: The FLUX Kontext model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to: black-forest-labs/FLUX.1-Kontext-dev Model Card - https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev.

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
deploy		deploy
docs		docs
shared/config		shared/config
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE-3rd-party.txt		LICENSE-3rd-party.txt
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.rag.yml		docker-compose.rag.yml
docker-compose.yml		docker-compose.yml
nginx.conf		nginx.conf
pyproject.toml		pyproject.toml
sonar-project.properties		sonar-project.properties
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retail Catalog Enrichment Blueprint

Demo

Architecture

Key Features

Documentation

Tech Stack

Minimum System Requirements

Hardware Requirements

Deployment Options

Quick Start

Prerequisites

Environment Setup

Local Development (Without Docker)

Docker Deployment (Self-Hosted NIMs)

API Endpoints

Image Input Guidance

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Retail Catalog Enrichment Blueprint

Demo

Architecture

Key Features

Documentation

Tech Stack

Minimum System Requirements

Hardware Requirements

Deployment Options

Quick Start

Prerequisites

Environment Setup

Local Development (Without Docker)

Docker Deployment (Self-Hosted NIMs)

API Endpoints

Image Input Guidance

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages