GPT-OSS B20 Local Execution Environment

Overview

Environment for local execution of GPT-OSS B20 model with Python 3.12.

🎯 Purpose

Local execution of GPT-OSS B20 model
CUDA acceleration (RTX 4060 Ti support)
High-efficiency text generation processing

📋 System Requirements

Python 3.12.x
PyTorch 2.6.0+ (CUDA 12.4)
GPU: NVIDIA RTX 4060 Ti (15GB VRAM)
OS: Windows 11

🚀 Quick Start

📄 File Naming Convention

English files end with _en
Japanese files end with _jp

1. Environment Check

setup_environment.bat

2. Run Tests

# Basic test
python test_gpt_oss.py

# Or from virtual environment
venv-gpt-oss\Scripts\python.exe test_gpt_oss.py

📁 File Structure

gpt-oss-20b/
├── venv-gpt-oss/          # Python 3.12 virtual environment
├── test_gpt_oss.py        # Comprehensive test script
├── config.py              # Configuration file
├── setup_environment.bat  # Environment setup
└── README.md              # This file

🔧 Installed Libraries

PyTorch 2.6.0+cu124
Transformers 4.55.1
Accelerate 1.2.1
bitsandbytes 0.45.0
datasets 3.2.0

⚡ Performance Optimization

CUDA acceleration support
8bit/4bit quantization options
Batch processing optimization
Memory-efficient execution

🛠️ Troubleshooting

CUDA Related

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU device: {torch.cuda.get_device_name(0)}")

Memory Issues

Set USE_8BIT = True in config.py
Adjust GPU_MEMORY_FRACTION

📊 Test Items

✅ Python 3.12 environment check
✅ PyTorch CUDA operation check
✅ GPU memory check
✅ Model loading test
✅ Text generation test
✅ Performance benchmark

📝 Usage Example

from test_gpt_oss import load_gpt_oss_model, test_text_generation

# Load model
model, tokenizer = load_gpt_oss_model()

# Generate text
result = test_text_generation(model, tokenizer, "Hello, ")
print(result)

🔬 Advanced Configuration

Model Settings

# config.py
MODEL_NAME = "cyberagent/open-calm-7b"  # Alternative model example
MAX_LENGTH = 512
TEMPERATURE = 0.7
TOP_P = 0.9

GPU Memory Management

# Memory optimization
GPU_MEMORY_FRACTION = 0.8  # Use 80% of GPU memory
USE_8BIT = True           # Enable 8-bit quantization
USE_4BIT = False          # Enable 4-bit quantization (more memory saving)

Batch Processing

# Performance tuning
BATCH_SIZE = 1
NUM_BEAMS = 1    # Disable beam search for speed
DEVICE_MAP = "auto"

🚀 Running Tests

Comprehensive Test Suite

The test_gpt_oss.py script includes:

System Requirements Check
- Python version validation
- CUDA availability
- GPU memory detection
Model Loading Test
- Alternative model loading
- Memory usage monitoring
- Error handling
Text Generation Test
- Sample prompt processing
- Output quality validation
- Response time measurement
Performance Benchmark
- Throughput testing
- Memory efficiency
- GPU utilization

Test Execution

# Full test suite
python test_gpt_oss.py

# Specific test functions
python -c "from test_gpt_oss import check_system_requirements; check_system_requirements()"

💡 Tips & Best Practices

Memory Optimization

Use quantization for large models: USE_8BIT = True
Adjust batch size based on available VRAM
Monitor GPU memory usage during execution

Performance Tuning

Disable beam search for faster generation: NUM_BEAMS = 1
Use appropriate temperature settings: TEMPERATURE = 0.7
Enable CUDA when available: USE_CUDA = True

Error Handling

Check CUDA drivers are up to date
Verify PyTorch CUDA version compatibility
Monitor system memory usage

🔍 Monitoring & Debugging

GPU Status Check

import torch
print(f"CUDA Version: {torch.version.cuda}")
print(f"GPU Count: {torch.cuda.device_count()}")
print(f"Current Device: {torch.cuda.current_device()}")
print(f"Device Name: {torch.cuda.get_device_name(0)}")

Memory Usage

if torch.cuda.is_available():
    print(f"Allocated: {torch.cuda.memory_allocated(0) / 1024**3:.2f} GB")
    print(f"Reserved: {torch.cuda.memory_reserved(0) / 1024**3:.2f} GB")

🐛 Common Issues

Issue: CUDA not available

Solution: Update NVIDIA drivers and reinstall PyTorch with CUDA support

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Issue: Out of memory

Solution: Enable quantization or reduce batch size

USE_8BIT = True
BATCH_SIZE = 1
GPU_MEMORY_FRACTION = 0.6

Issue: Model loading fails

Solution: Check model name and internet connection

MODEL_NAME = "cyberagent/open-calm-7b"  # Use alternative model

📈 Performance Metrics

Expected Performance (RTX 4060 Ti)

Model Loading: 10-30 seconds
Text Generation: 2-5 tokens/second
Memory Usage: 6-12 GB VRAM
CPU Usage: 20-40%

Optimization Results

8-bit quantization: ~50% memory reduction
Batch processing: 2-3x throughput improvement
CUDA acceleration: 10-20x speed improvement over CPU

🤝 Contributing

Feel free to contribute improvements:

Fork the repository
Create a feature branch
Add your enhancements
Submit a pull request

Support information

For issues and questions:

Check the troubleshooting section
Review system requirements
Verify CUDA installation

Created: December 28, 2024

Environment: Python 3.12.10 + PyTorch 2.6.0+cu124

Status: ✅ Ready for GPT-OSS B20 testing

Hardware Tested:

NVIDIA RTX 4060 Ti (15GB VRAM)
Windows 11
CUDA 12.4

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
README_en.md		README_en.md
README_jp.md		README_jp.md
config_en.py		config_en.py
config_jp.py		config_jp.py
setup_environment_en.bat		setup_environment_en.bat
setup_environment_jp.bat		setup_environment_jp.bat
test_gpt_oss_en.py		test_gpt_oss_en.py
test_gpt_oss_jp.py		test_gpt_oss_jp.py

Folders and files

Latest commit

History

Repository files navigation

GPT-OSS B20 Local Execution Environment

Overview

🎯 Purpose

📋 System Requirements

🚀 Quick Start

1. Environment Check

2. Run Tests

📁 File Structure

🔧 Installed Libraries

⚡ Performance Optimization

🛠️ Troubleshooting

CUDA Related

Memory Issues

📊 Test Items

📝 Usage Example

🔬 Advanced Configuration

Model Settings

GPU Memory Management

Batch Processing

🚀 Running Tests

Comprehensive Test Suite

Test Execution

💡 Tips & Best Practices

Memory Optimization

Performance Tuning

Error Handling

🔍 Monitoring & Debugging

GPU Status Check

Memory Usage

🐛 Common Issues

Issue: CUDA not available

Issue: Out of memory

Issue: Model loading fails

📈 Performance Metrics

Expected Performance (RTX 4060 Ti)

Optimization Results

🤝 Contributing

Support information

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages