Skip to content

Latest commit

 

History

History
143 lines (107 loc) · 7.41 KB

File metadata and controls

143 lines (107 loc) · 7.41 KB

Setup Guide

System Requirements

  • NVIDIA GPUs with Ampere architecture (RTX 30 Series, A100) or newer
  • Linux operating system (Ubuntu 20.04, 22.04, or 24.04 LTS)
  • CUDA version 12.4 or later
  • Python version 3.10 or later

Installation

Clone the repository

git clone git@github.com:nvidia-cosmos/cosmos-predict2.git
cd cosmos-predict2

ARM installation

When using an ARM platform, like GB200, special steps are required to install the decord package. You need to make sure that NVIDIA Video Codec SDK is downloaded in the root of the repository. The installation will be handled by the Conda scripts or Dockerfile.

Option 1: Conda environment

Please make sure you have a Conda distribution installed (instructions).

# Create and activate the environment
conda env create --file cosmos-predict2.yaml
conda activate cosmos-predict2

# Try to install decord when on ARM platform
bash scripts/install_decord_arm.sh
# Install dependencies
pip install -r requirements-conda.txt
pip install flash-attn==2.6.3 --no-build-isolation
# Transformer engine
ln -sf $CONDA_PREFIX/lib/python3.10/site-packages/nvidia/*/include/* $CONDA_PREFIX/include/
ln -sf $CONDA_PREFIX/lib/python3.10/site-packages/nvidia/*/include/* $CONDA_PREFIX/include/python3.10
CUDA_HOME=$CONDA_PREFIX pip install transformer-engine[pytorch]==1.13.0
# NATTEN
CUDA_HOME=$CONDA_PREFIX pip install natten==0.20.1

# Apex library for training (optional if inference only)
CUDA_HOME=$CONDA_PREFIX pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext --cuda_ext" git+https://github.com/NVIDIA/apex.git

# Verify setup
CUDA_HOME=$CONDA_PREFIX python scripts/test_environment.py

Make sure the CUDA_HOME environment variable points to your Conda installation directory by running:

export CUDA_HOME=$CONDA_PREFIX

Option 2: Docker container

Please make sure you have access to Docker on your machine and the NVIDIA Container Toolkit is installed.

  • Option 2A: Use pre-built Cosmos-Predict2 container

    # Pull the Cosmos-Predict2 container
    docker pull nvcr.io/nvidia/cosmos/cosmos-predict2-container:1.1
  • Option 2B: Build container from Dockerfile

    Make sure you are under the repo root.

    # Build the Docker image
    docker build -t cosmos-predict2-local -f Dockerfile .
  • Running the container

    Use the following command to run either container, replacing [CONTAINER_NAME] with either nvcr.io/nvidia/cosmos/cosmos-predict2-container:1.1 or cosmos-predict2-local:

    # Run the container with GPU support and mount necessary directories
    docker run --gpus all -it --rm \
    -v /path/to/cosmos-predict2:/workspace \
    -v /path/to/datasets:/workspace/datasets \
    -v /path/to/checkpoints:/workspace/checkpoints \
    [CONTAINER_NAME]
    
    # Verify setup inside container
    python /workspace/scripts/test_environment.py

    Note: Replace /path/to/cosmos-predict2, /path/to/datasets, and /path/to/checkpoints with your actual local paths.

Downloading Checkpoints

  1. Get a Hugging Face access token with Read permission
  2. Login: huggingface-cli login
  3. The Llama-Guard-3-8B terms must be accepted. Approval will be required before Llama Guard 3 can be downloaded.
  4. Download models:
Models Link Download Command Notes
Cosmos-Predict2-2B-Text2Image 🤗 Huggingface python -m scripts.download_checkpoints --model_types text2image --model_sizes 2B N/A
Cosmos-Predict2-14B-Text2Image 🤗 Huggingface python -m scripts.download_checkpoints --model_types text2image --model_sizes 14B N/A
Cosmos-Predict2-2B-Video2World 🤗 Huggingface python -m scripts.download_checkpoints --model_types video2world --model_sizes 2B Download 720P, 16FPS by default. Supports 480P and 720P resolution. Supports 10FPS and 16FPS
Cosmos-Predict2-14B-Video2World 🤗 Huggingface python -m scripts.download_checkpoints --model_types video2world --model_sizes 14B Download 720P, 16FPS by default. Supports 480P and 720P resolution. Supports 10FPS and 16FPS
Cosmos-Predict2-2B-Sample-Action-Conditioned 🤗 Huggingface python -m scripts.download_checkpoints --model_types sample_action_conditioned Supports 480P and 4FPS.
Cosmos-Predict2-14B-Sample-GR00T-Dreams-GR1 🤗 Huggingface python -m scripts.download_checkpoints --model_types sample_gr00t_dreams_gr1 Supports 480P and 16FPS.
Cosmos-Predict2-14B-Sample-GR00T-Dreams-DROID 🤗 Huggingface python -m scripts.download_checkpoints --model_types sample_gr00t_dreams_droid Supports 480P and 16FPS.

For Video2World model with different resolution and FPS, you can pass resolution and fps flag to control which model checkpoint to download. For example, if you want a 2B model with 480P and 10FPS, you can do

python -m scripts.download_checkpoints --model_types video2world --model_sizes 2B --resolution 480 --fps 10

Tips: model_types, model_sizes, fps and resolution supports multiple values. So if you want a mega command to download {2,14}B Video2World models with {10,16} FPS and {480,720}P, you can download 2x2x2=8 models via

python -m scripts.download_checkpoints --model_types video2world --model_sizes 2B 14B --resolution 480 720 --fps 10 16

You can pass --checkpoint_dir <path to ckpt> if you want to control where to put the checkpoints. You can also add --verify_md5 flag to verify MD5 checksums of downloaded files. If checksums don't match, models will be automatically redownloaded.

To download models with sparse attention, run the script with the --natten option:

python -m scripts.download_checkpoints --model_types video2world --model_sizes 2B 14B --resolution 720 --fps 10 16 --natten

Troubleshooting

CUDA/GPU Issues

  • CUDA driver version insufficient: Update NVIDIA drivers to latest version compatible with CUDA 12.4+
  • Out of Memory (OOM) errors: Use 2B models instead of 14B, or reduce batch size/resolution
  • Missing CUDA libraries: Set paths with export CUDA_HOME=$CONDA_PREFIX

Installation Issues

  • Conda environment conflicts: Create fresh environment with conda create -n cosmos-predict2-clean python=3.10 -y
  • Flash-attention build failures: Install build tools with apt-get install build-essential
  • Transformer engine linking errors: Reinstall with pip install --force-reinstall transformer-engine==1.12.0

For other issues, check GitHub Issues.