Skip to content

tk-yasuno/gpt-oss-20b-local-execute

Repository files navigation

GPT-OSS B20 Local Execution Environment

Overview

Environment for local execution of GPT-OSS B20 model with Python 3.12.

🎯 Purpose

  • Local execution of GPT-OSS B20 model
  • CUDA acceleration (RTX 4060 Ti support)
  • High-efficiency text generation processing

πŸ“‹ System Requirements

  • Python 3.12.x
  • PyTorch 2.6.0+ (CUDA 12.4)
  • GPU: NVIDIA RTX 4060 Ti (15GB VRAM)
  • OS: Windows 11

πŸš€ Quick Start

πŸ“„ File Naming Convention

  • English files end with _en
  • Japanese files end with _jp

1. Environment Check

setup_environment.bat

2. Run Tests

# Basic test
python test_gpt_oss.py

# Or from virtual environment
venv-gpt-oss\Scripts\python.exe test_gpt_oss.py

πŸ“ File Structure

gpt-oss-20b/
β”œβ”€β”€ venv-gpt-oss/          # Python 3.12 virtual environment
β”œβ”€β”€ test_gpt_oss.py        # Comprehensive test script
β”œβ”€β”€ config.py              # Configuration file
β”œβ”€β”€ setup_environment.bat  # Environment setup
└── README.md              # This file

πŸ”§ Installed Libraries

  • PyTorch 2.6.0+cu124
  • Transformers 4.55.1
  • Accelerate 1.2.1
  • bitsandbytes 0.45.0
  • datasets 3.2.0

⚑ Performance Optimization

  • CUDA acceleration support
  • 8bit/4bit quantization options
  • Batch processing optimization
  • Memory-efficient execution

πŸ› οΈ Troubleshooting

CUDA Related

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU device: {torch.cuda.get_device_name(0)}")

Memory Issues

  • Set USE_8BIT = True in config.py
  • Adjust GPU_MEMORY_FRACTION

πŸ“Š Test Items

  1. βœ… Python 3.12 environment check
  2. βœ… PyTorch CUDA operation check
  3. βœ… GPU memory check
  4. βœ… Model loading test
  5. βœ… Text generation test
  6. βœ… Performance benchmark

πŸ“ Usage Example

from test_gpt_oss import load_gpt_oss_model, test_text_generation

# Load model
model, tokenizer = load_gpt_oss_model()

# Generate text
result = test_text_generation(model, tokenizer, "Hello, ")
print(result)

πŸ”¬ Advanced Configuration

Model Settings

# config.py
MODEL_NAME = "cyberagent/open-calm-7b"  # Alternative model example
MAX_LENGTH = 512
TEMPERATURE = 0.7
TOP_P = 0.9

GPU Memory Management

# Memory optimization
GPU_MEMORY_FRACTION = 0.8  # Use 80% of GPU memory
USE_8BIT = True           # Enable 8-bit quantization
USE_4BIT = False          # Enable 4-bit quantization (more memory saving)

Batch Processing

# Performance tuning
BATCH_SIZE = 1
NUM_BEAMS = 1    # Disable beam search for speed
DEVICE_MAP = "auto"

πŸš€ Running Tests

Comprehensive Test Suite

The test_gpt_oss.py script includes:

  1. System Requirements Check

    • Python version validation
    • CUDA availability
    • GPU memory detection
  2. Model Loading Test

    • Alternative model loading
    • Memory usage monitoring
    • Error handling
  3. Text Generation Test

    • Sample prompt processing
    • Output quality validation
    • Response time measurement
  4. Performance Benchmark

    • Throughput testing
    • Memory efficiency
    • GPU utilization

Test Execution

# Full test suite
python test_gpt_oss.py

# Specific test functions
python -c "from test_gpt_oss import check_system_requirements; check_system_requirements()"

πŸ’‘ Tips & Best Practices

Memory Optimization

  • Use quantization for large models: USE_8BIT = True
  • Adjust batch size based on available VRAM
  • Monitor GPU memory usage during execution

Performance Tuning

  • Disable beam search for faster generation: NUM_BEAMS = 1
  • Use appropriate temperature settings: TEMPERATURE = 0.7
  • Enable CUDA when available: USE_CUDA = True

Error Handling

  • Check CUDA drivers are up to date
  • Verify PyTorch CUDA version compatibility
  • Monitor system memory usage

πŸ” Monitoring & Debugging

GPU Status Check

import torch
print(f"CUDA Version: {torch.version.cuda}")
print(f"GPU Count: {torch.cuda.device_count()}")
print(f"Current Device: {torch.cuda.current_device()}")
print(f"Device Name: {torch.cuda.get_device_name(0)}")

Memory Usage

if torch.cuda.is_available():
    print(f"Allocated: {torch.cuda.memory_allocated(0) / 1024**3:.2f} GB")
    print(f"Reserved: {torch.cuda.memory_reserved(0) / 1024**3:.2f} GB")

πŸ› Common Issues

Issue: CUDA not available

Solution: Update NVIDIA drivers and reinstall PyTorch with CUDA support

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Issue: Out of memory

Solution: Enable quantization or reduce batch size

USE_8BIT = True
BATCH_SIZE = 1
GPU_MEMORY_FRACTION = 0.6

Issue: Model loading fails

Solution: Check model name and internet connection

MODEL_NAME = "cyberagent/open-calm-7b"  # Use alternative model

πŸ“ˆ Performance Metrics

Expected Performance (RTX 4060 Ti)

  • Model Loading: 10-30 seconds
  • Text Generation: 2-5 tokens/second
  • Memory Usage: 6-12 GB VRAM
  • CPU Usage: 20-40%

Optimization Results

  • 8-bit quantization: ~50% memory reduction
  • Batch processing: 2-3x throughput improvement
  • CUDA acceleration: 10-20x speed improvement over CPU

🀝 Contributing

Feel free to contribute improvements:

  1. Fork the repository
  2. Create a feature branch
  3. Add your enhancements
  4. Submit a pull request

Support information

For issues and questions:

  • Check the troubleshooting section
  • Review system requirements
  • Verify CUDA installation

Created: December 28, 2024

Environment: Python 3.12.10 + PyTorch 2.6.0+cu124

Status: βœ… Ready for GPT-OSS B20 testing

Hardware Tested:

  • NVIDIA RTX 4060 Ti (15GB VRAM)
  • Windows 11
  • CUDA 12.4

About

GPT-OSS B20 Local Execution. Lightweight local environment for running it with Python 3.12 and CUDA acceleration. - Run GPT-OSS B20 entirely offline - Optimize text generation with GPU - Enable fast, secure inference on consumer hardware.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors