- NVIDIA GPUs with Ampere architecture (RTX 30 Series, A100) or newer
- Linux operating system (Ubuntu 20.04, 22.04, or 24.04 LTS)
- CUDA version 12.4 or later
- Python version 3.10 or later
git clone git@github.com:nvidia-cosmos/cosmos-predict2.git
cd cosmos-predict2When using an ARM platform, like GB200, special steps are required to install the decord package.
You need to make sure that NVIDIA Video Codec SDK is downloaded in the root of the repository.
The installation will be handled by the Conda scripts or Dockerfile.
Please make sure you have a Conda distribution installed (instructions).
# Create and activate the environment
conda env create --file cosmos-predict2.yaml
conda activate cosmos-predict2
# Try to install decord when on ARM platform
bash scripts/install_decord_arm.sh
# Install dependencies
pip install -r requirements-conda.txt
pip install flash-attn==2.6.3 --no-build-isolation
# Transformer engine
ln -sf $CONDA_PREFIX/lib/python3.10/site-packages/nvidia/*/include/* $CONDA_PREFIX/include/
ln -sf $CONDA_PREFIX/lib/python3.10/site-packages/nvidia/*/include/* $CONDA_PREFIX/include/python3.10
CUDA_HOME=$CONDA_PREFIX pip install transformer-engine[pytorch]==1.13.0
# NATTEN
CUDA_HOME=$CONDA_PREFIX pip install natten==0.20.1
# Apex library for training (optional if inference only)
CUDA_HOME=$CONDA_PREFIX pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext --cuda_ext" git+https://github.com/NVIDIA/apex.git
# Verify setup
CUDA_HOME=$CONDA_PREFIX python scripts/test_environment.pyMake sure the CUDA_HOME environment variable points to your Conda installation directory by running:
export CUDA_HOME=$CONDA_PREFIXPlease make sure you have access to Docker on your machine and the NVIDIA Container Toolkit is installed.
-
Option 2A: Use pre-built Cosmos-Predict2 container
# Pull the Cosmos-Predict2 container docker pull nvcr.io/nvidia/cosmos/cosmos-predict2-container:1.1 -
Option 2B: Build container from Dockerfile
Make sure you are under the repo root.
# Build the Docker image docker build -t cosmos-predict2-local -f Dockerfile .
-
Running the container
Use the following command to run either container, replacing
[CONTAINER_NAME]with eithernvcr.io/nvidia/cosmos/cosmos-predict2-container:1.1orcosmos-predict2-local:# Run the container with GPU support and mount necessary directories docker run --gpus all -it --rm \ -v /path/to/cosmos-predict2:/workspace \ -v /path/to/datasets:/workspace/datasets \ -v /path/to/checkpoints:/workspace/checkpoints \ [CONTAINER_NAME] # Verify setup inside container python /workspace/scripts/test_environment.py
Note: Replace
/path/to/cosmos-predict2,/path/to/datasets, and/path/to/checkpointswith your actual local paths.
- Get a Hugging Face access token with
Readpermission - Login:
huggingface-cli login - The Llama-Guard-3-8B terms must be accepted. Approval will be required before Llama Guard 3 can be downloaded.
- Download models:
| Models | Link | Download Command | Notes |
|---|---|---|---|
| Cosmos-Predict2-2B-Text2Image | 🤗 Huggingface | python -m scripts.download_checkpoints --model_types text2image --model_sizes 2B |
N/A |
| Cosmos-Predict2-14B-Text2Image | 🤗 Huggingface | python -m scripts.download_checkpoints --model_types text2image --model_sizes 14B |
N/A |
| Cosmos-Predict2-2B-Video2World | 🤗 Huggingface | python -m scripts.download_checkpoints --model_types video2world --model_sizes 2B |
Download 720P, 16FPS by default. Supports 480P and 720P resolution. Supports 10FPS and 16FPS |
| Cosmos-Predict2-14B-Video2World | 🤗 Huggingface | python -m scripts.download_checkpoints --model_types video2world --model_sizes 14B |
Download 720P, 16FPS by default. Supports 480P and 720P resolution. Supports 10FPS and 16FPS |
| Cosmos-Predict2-2B-Sample-Action-Conditioned | 🤗 Huggingface | python -m scripts.download_checkpoints --model_types sample_action_conditioned |
Supports 480P and 4FPS. |
| Cosmos-Predict2-14B-Sample-GR00T-Dreams-GR1 | 🤗 Huggingface | python -m scripts.download_checkpoints --model_types sample_gr00t_dreams_gr1 |
Supports 480P and 16FPS. |
| Cosmos-Predict2-14B-Sample-GR00T-Dreams-DROID | 🤗 Huggingface | python -m scripts.download_checkpoints --model_types sample_gr00t_dreams_droid |
Supports 480P and 16FPS. |
For Video2World model with different resolution and FPS, you can pass resolution and fps flag to control which model checkpoint to download. For example, if you want a 2B model with 480P and 10FPS, you can do
python -m scripts.download_checkpoints --model_types video2world --model_sizes 2B --resolution 480 --fps 10Tips: model_types, model_sizes, fps and resolution supports multiple values. So if you want a mega command to download {2,14}B Video2World models with {10,16} FPS and {480,720}P, you can download 2x2x2=8 models via
python -m scripts.download_checkpoints --model_types video2world --model_sizes 2B 14B --resolution 480 720 --fps 10 16You can pass --checkpoint_dir <path to ckpt> if you want to control where to put the checkpoints.
You can also add --verify_md5 flag to verify MD5 checksums of downloaded files. If checksums don't match, models will be automatically redownloaded.
To download models with sparse attention, run the
script with the --natten option:
python -m scripts.download_checkpoints --model_types video2world --model_sizes 2B 14B --resolution 720 --fps 10 16 --natten- CUDA driver version insufficient: Update NVIDIA drivers to latest version compatible with CUDA 12.4+
- Out of Memory (OOM) errors: Use 2B models instead of 14B, or reduce batch size/resolution
- Missing CUDA libraries: Set paths with
export CUDA_HOME=$CONDA_PREFIX
- Conda environment conflicts: Create fresh environment with
conda create -n cosmos-predict2-clean python=3.10 -y - Flash-attention build failures: Install build tools with
apt-get install build-essential - Transformer engine linking errors: Reinstall with
pip install --force-reinstall transformer-engine==1.12.0
For other issues, check GitHub Issues.