graid/README.md at main · KE7/graid

GRAID: Generating Reasoning questions from Analysis of Images via Discriminative artificial intelligence

🚀 Quick Start

Installation

Install uv (optional if you already have it): curl -LsSf https://astral.sh/uv/install.sh | sh (or see uv installation guide)
Create a virtual environment: uv venv
Activate it: source .venv/bin/activate (or use direnv with the provided .envrc)
Install dependencies: uv sync
Install all backends: uv run install_all

🤗 HuggingFace Dataset Generation

Generate high-quality VQA datasets for modern ML workflows:

# Interactive mode with step-by-step guidance
graid generate-dataset

Key Features:

🎯 Object Filtering: Smart allowable sets for focused object detection
🔬 Multi-Model Ensemble: Weighted Boxes Fusion (WBF) for improved accuracy
⚙️ Flexible Configuration: JSON configs for reproducible experiments
🌐 HuggingFace Hub Integration: Direct upload to share datasets
🖼️ PIL Image Support: Ready for modern vision-language models
📊 Rich Metadata: Comprehensive dataset documentation

Quick Examples:

# Generate with specific object types (autonomous driving focus)
uv run graid generate-dataset --allowable-set "person,car,truck,bicycle,traffic light"

# Multi-model ensemble for enhanced accuracy
uv run graid generate-dataset --config examples/wbf_ensemble.json

# Upload directly to HuggingFace Hub
uv run graid generate-dataset --upload-to-hub --hub-repo-id "your-org/dataset-name"

# List all valid COCO objects
uv run graid generate-dataset --list-objects

🎛️ Configuration-Driven Workflows

Create reusable configurations for systematic experiments:

Basic Configuration:

{
  "dataset_name": "bdd",
  "split": "val", 
  "models": [
    {
      "backend": "detectron",
      "model_name": "faster_rcnn_R_50_FPN_3x",
      "confidence_threshold": 0.7
    },
    {
      "backend": "mmdetection", 
      "model_name": "co_detr",
      "confidence_threshold": 0.6
    }
  ],
  "use_wbf": true,
  "wbf_config": {
    "iou_threshold": 0.6,
    "model_weights": [1.0, 1.2]
  },
  "allowable_set": ["person", "car", "truck", "bus", "motorcycle", "bicycle"],
  "confidence_threshold": 0.5,
  "batch_size": 4
}

Advanced Configuration with Custom Questions and Transforms:

{
  "dataset_name": "bdd",
  "split": "val",
  "models": [
    {
      "backend": "ultralytics",
      "model_name": "yolov8x.pt",
      "confidence_threshold": 0.6
    }
  ],
  "use_wbf": false,
  "allowable_set": ["person", "car", "bicycle", "motorcycle", "traffic light"],
  "confidence_threshold": 0.5,
  "batch_size": 2,
  
  "questions": [
    {
      "name": "HowMany",
      "params": {}
    },
    {
      "name": "Quadrants", 
      "params": {
        "N": 3,
        "M": 3
      }
    },
    {
      "name": "WidthVsHeight",
      "params": {
        "threshold": 0.4
      }
    },
    {
      "name": "LargestAppearance",
      "params": {
        "threshold": 0.35
      }
    },
    {
      "name": "MostClusteredObjects",
      "params": {
        "threshold": 80
      }
    }
  ],
  
  "transforms": {
    "type": "yolo_bdd",
    "new_shape": [640, 640]
  },
  
  "save_path": "./datasets/custom_bdd_vqa",
  "upload_to_hub": true,
  "hub_repo_id": "your-org/bdd-reasoning-dataset",
  "hub_private": false
}

Custom Model Configuration:

{
  "dataset_name": "custom",
  "split": "train",
  "models": [
    {
      "backend": "detectron",
      "model_name": "custom_retinanet",
      "custom_config": {
        "config": "path/to/config.yaml", 
        "weights": "path/to/model.pth"
      }
    },
    {
      "backend": "ultralytics",
      "model_name": "custom_yolo",
      "custom_config": {
        "model_path": "path/to/custom_yolo.pt"
      }
    }
  ],
  "transforms": {
    "type": "yolo_bdd",
    "new_shape": [832, 832]
  },
  "questions": [
    {
      "name": "IsObjectCentered",
      "params": {}
    },
    {
      "name": "LeftOf", 
      "params": {}
    },
    {
      "name": "RightOf",
      "params": {}
    }
  ]
}

📦 Custom Dataset Support

Bring Your Own Data: GRAID supports any PyTorch-compatible dataset:

from graid.data.generate_dataset import generate_dataset
from torch.utils.data import Dataset

class CustomDataset(Dataset):
    """Your custom dataset implementation"""
    def __getitem__(self, idx):
        # Return: (image_tensor, optional_annotations, metadata)
        # Annotations are only needed for mAP/mAR evaluation
        # For VQA generation, only images are required
        pass

# Generate HuggingFace dataset from your data
dataset = generate_dataset(
    dataset_name="custom",
    split="train",
    models=your_models,
    allowable_set=["person", "vehicle"], 
    save_path="./datasets/custom_vqa"
)

Key Point: Custom datasets only require images for VQA generation. Annotations are optional and only needed if you want to evaluate model performance with mAP/mAR metrics.

🔧 Advanced Features

Multi-Model Ensemble with WBF

Combine predictions from multiple models using Weighted Boxes Fusion for enhanced detection accuracy:

Improved precision through model consensus
Configurable fusion parameters and model weights
Supports mixed backends (Detectron2 + MMDetection + Ultralytics)

Intelligent Object Filtering

Focus datasets on specific object categories:

Common presets: Autonomous driving, indoor scenes, animals
Interactive selection: Visual picker from 80 COCO categories
Manual specification: Comma-separated object lists
Validation: Automatic checking against COCO standard

Production-Ready Outputs

Generated datasets include:

PIL Images: Direct compatibility with vision-language models
Rich Annotations: Bounding boxes, confidence scores, object classes
Structured QA Pairs: Question templates with precise answers
Comprehensive Metadata: Model info, generation parameters, statistics

📊 Supported Models & Datasets

Backends

	Detectron2	MMDetection	Ultralytics
Object Detection	✅	✅	✅
Instance Segmentation	✅	✅	✅
WBF Ensemble	✅	✅	✅

Built-in Datasets

	BDD100K	NuImages	Waymo
Object Detection	✅	✅	✅
Instance Segmentation	✅	✅	✅
HuggingFace Export	✅	✅	✅

Example Models

Detectron2: faster_rcnn_R_50_FPN_3x, retinanet_R_101_FPN_3x
MMDetection: co_detr, dino, rtmdet
Ultralytics: yolov8x, yolov10x, yolo11x, rtdetr-x

🎯 Research Applications

This framework enables systematic evaluation of:

Vision-Language Models: Generate targeted VQA benchmarks
Object Detection Methods: Compare model performance on specific object types
Reasoning Capabilities: Create challenging spatial and counting questions
Domain Adaptation: Generate domain-specific evaluation sets
Ensemble Methods: Evaluate fusion strategies across detection models

📈 Quality Assurance

Generated datasets undergo comprehensive validation:

Model Verification: Automatic testing of model loading and inference
Annotation Quality: Confidence score filtering and duplicate removal
Metadata Integrity: Complete provenance tracking for reproducibility
Format Compliance: COCO-standard annotations with HuggingFace compatibility

🔍 Example commands

Interactive CLI: User-friendly prompts for dataset and model selection

uv run graid generate

Available Commands:

uv run graid --help              # Show help
uv run graid list-models         # List available models  
uv run graid list-questions      # List available question types with parameters
uv run graid info                # Show project information
uv run graid generate-dataset    # Modern HuggingFace generation

# Interactive features
uv run graid generate-dataset --interactive-questions  # Select questions interactively
uv run graid generate-dataset --list-questions         # Show available questions

📄 License

GRAID is open source software licensed under the Apache License 2.0. This applies to both the GRAID framework code and any datasets generated using GRAID.

Important: When using GRAID with source datasets (BDD100K, Waymo, nuImages, etc.), you must also comply with the original source dataset license terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRAID: Generating Reasoning questions from Analysis of Images via Discriminative artificial intelligence

🚀 Quick Start

Installation

🤗 HuggingFace Dataset Generation

🎛️ Configuration-Driven Workflows

📦 Custom Dataset Support

🔧 Advanced Features

Multi-Model Ensemble with WBF

Intelligent Object Filtering

Production-Ready Outputs

📊 Supported Models & Datasets

Backends

Built-in Datasets

Example Models

🎯 Research Applications

📈 Quality Assurance

🔍 Example commands

📄 License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

GRAID: Generating Reasoning questions from Analysis of Images via Discriminative artificial intelligence

🚀 Quick Start

Installation

🤗 HuggingFace Dataset Generation

🎛️ Configuration-Driven Workflows

📦 Custom Dataset Support

🔧 Advanced Features

Multi-Model Ensemble with WBF

Intelligent Object Filtering

Production-Ready Outputs

📊 Supported Models & Datasets

Backends

Built-in Datasets

Example Models

🎯 Research Applications

📈 Quality Assurance

🔍 Example commands

📄 License