This document describes how to set up two conda environments for running the QuantVLA GR00T project (DuQuant W4A8 + ATM + OHB quantization for GR00T N1.5).
The project uses a dual-environment architecture:
| Environment | Purpose | Key Packages |
|---|---|---|
groot_test |
Inference server (model loading, quantization, inference) | torch 2.5.1+cu124, transformers, diffusers, flash-attn, gr00t |
libero_test |
LIBERO simulation evaluation (client-side) | torch, LIBERO, robosuite, mujoco |
- OS: Ubuntu 20.04 / 22.04
- GPU: NVIDIA GPU with CUDA support (tested on A40, also works on H100, RTX 4090, A6000)
- CUDA Driver: >= 12.4
- Conda: Miniconda or Anaconda installed at
~/miniconda3 - System packages:
ffmpeg,libsm6,libxext6 - LIBERO repository: Cloned at
/home/jz97/VLM_REPO/Isaac-GR00T/LIBERO
conda create -n groot_test python=3.10 -y
conda activate groot_testpip install --upgrade setuptoolspip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu124Note: For CUDA 11.8, use
--index-url https://download.pytorch.org/whl/cu118instead.
cd /home/jz97/VLM_REPO/groot_test/QuantVLA_GR00T
pip install -e ".[base]"This installs all core dependencies from pyproject.toml:
transformers==4.51.3diffusers==0.30.2timm==1.0.14accelerate==1.2.1peft==0.17.0albumentations==1.4.18kornia==0.7.4ray==2.40.0wandb==0.18.0hydra-core==1.3.2pipablepytorch3d==0.7.6pyzmq(for ZMQ inference server)- ... and more (see full list in pyproject.toml)
pip install --no-build-isolation --no-cache-dir flash-attn==2.7.1.post4Note: This may take several minutes to build from source. If you encounter cross-device link errors, add
--no-cache-dir.
conda activate groot_test
python -c "
import torch
import transformers
import diffusers
import flash_attn
import gr00t
print(f'PyTorch: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'CUDA version: {torch.version.cuda}')
print(f'Transformers: {transformers.__version__}')
print(f'Diffusers: {diffusers.__version__}')
print(f'Flash-attn: {flash_attn.__version__}')
print(f'gr00t location: {gr00t.__file__}')
print('All OK!')
"Expected output:
PyTorch: 2.5.1+cu124
CUDA available: True
CUDA version: 12.4
Transformers: 4.51.3
Diffusers: 0.30.2
Flash-attn: 2.7.1.post4
gr00t location: /home/jz97/VLM_REPO/groot_test/QuantVLA_GR00T/gr00t/__init__.py
All OK!
conda create -n libero_test python=3.10 -y
conda activate libero_testpip install torch torchvision --index-url https://download.pytorch.org/whl/cu128pip install "numpy<2.0.0" robosuite==1.4.0 mujoco==3.3.7 "gymnasium>=0.29.0" \
gym==0.25.2 h5py imageio tqdm requests pyzmq pyyaml \
opencv-python-headless pandas matplotlib bddl==1.0.1 \
easydict einops future robomimicImportant:
numpy<2.0.0is required - LIBERO is not compatible with numpy 2.x.
cd /home/jz97/VLM_REPO/Isaac-GR00T/LIBERO
pip install -e . --config-settings editable_mode=compatCheck and patch torch.load in LIBERO benchmark:
# Check if already patched:
grep "weights_only" /home/jz97/VLM_REPO/Isaac-GR00T/LIBERO/libero/libero/benchmark/__init__.py
# If NOT patched, apply fix:
sed -i 's/torch.load(init_states_path)/torch.load(init_states_path, weights_only=False)/g' \
/home/jz97/VLM_REPO/Isaac-GR00T/LIBERO/libero/libero/benchmark/__init__.pyThe LIBERO eval script imports gr00t.eval.service.ExternalRobotInferenceClient. Install its transitive dependencies:
pip install msgpack pydantic av numpydantic pipablepytorch3d "albumentations==1.4.18" kornia tyromkdir -p ~/.libero
cat > ~/.libero/config.yaml <<EOF
assets: /home/jz97/VLM_REPO/Isaac-GR00T/LIBERO/libero/libero/assets
bddl_files: /home/jz97/VLM_REPO/Isaac-GR00T/LIBERO/libero/libero/bddl_files
benchmark_root: /home/jz97/VLM_REPO/Isaac-GR00T/LIBERO/libero/libero
datasets: /home/jz97/VLM_REPO/Isaac-GR00T/LIBERO/datasets
init_states: /home/jz97/VLM_REPO/Isaac-GR00T/LIBERO/libero/libero/init_files
EOFconda activate libero_test
PYTHONPATH=/home/jz97/VLM_REPO/groot_test/QuantVLA_GR00T:$PYTHONPATH python -c "
import torch
from libero.libero import get_libero_path
from gr00t.eval.service import ExternalRobotInferenceClient
print(f'PyTorch: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'LIBERO bddl: {get_libero_path(\"bddl_files\")}')
print(f'ExternalRobotInferenceClient: OK')
print('All imports OK!')
"conda activate groot_test
cd /home/jz97/VLM_REPO/groot_test/QuantVLA_GR00T
./run_inference_server.sh libero_10Available task suites: libero_spatial, libero_goal, libero_object, libero_90, libero_10
conda activate libero_test
cd /home/jz97/VLM_REPO/groot_test/QuantVLA_GR00T
./run_libero_eval.sh libero_10 --headlessResults are saved to:
- Log:
/tmp/logs/libero_eval_<task>.log - Videos:
./rollouts/<date>/
conda activate groot_test
cd /home/jz97/VLM_REPO/groot_test/QuantVLA_GR00T
./run_quantvla.sh libero_10This script:
- Performs a dry-run to show which layers will be quantized
- Starts the quantized inference server with DuQuant W4A8, ATM, and OHB enabled
- First run takes ~5-10 min for quantization preprocessing; subsequent runs use cached metadata
| Variable | Description | Default |
|---|---|---|
GR00T_DUQUANT_WBITS_DEFAULT |
Weight quantization bits | 4 |
GR00T_DUQUANT_ABITS |
Activation quantization bits | 8 |
GR00T_DUQUANT_BLOCK |
Block size for quantization | 64 |
GR00T_DUQUANT_CALIB_STEPS |
Calibration steps | 32 |
GR00T_DUQUANT_LS |
Lambda smoothing | 0.15 |
GR00T_ATM_ENABLE |
Enable ATM (Activation Temperature Modifier) | 1 |
GR00T_ATM_ALPHA_PATH |
Path to ATM alpha/beta JSON config | - |
GR00T_OHB_ENABLE |
Enable OHB (Output Head Bias) | 1 |
GR00T_DENOISING_STEPS |
Number of denoising steps | 8 |
| Package | Version |
|---|---|
| Python | 3.10 |
| PyTorch | 2.5.1+cu124 |
| Transformers | 4.51.3 |
| Diffusers | 0.30.2 |
| Flash-attn | 2.7.1.post4 |
| Timm | 1.0.14 |
| Accelerate | 1.2.1 |
| Peft | 0.17.0 |
| NumPy | 1.26.4 |
| Package | Version |
|---|---|
| Python | 3.10 |
| PyTorch | 2.10.0+cu128 |
| LIBERO | 0.1.0 (editable) |
| Robosuite | 1.4.0 |
| MuJoCo | 3.3.7 |
| NumPy | 1.26.4 |
Add --no-cache-dir to the pip install command.
Apply the weights_only=False patch as described in Step 5 of libero_test setup.
Install the future package: pip install future
These are cleanup warnings from robosuite's rendering context and do not affect functionality. Safe to ignore.
TensorFlow registration warnings (cuDNN, cuFFT, cuBLAS factories) are harmless. TF is only used for TensorBoard logging.