Last updated: 2026-02-09 (merged M6 value research)
What to buy for local AI at different budgets.
| Tier | GPU | RAM | What You Get |
|---|---|---|---|
| Entry ($800-1,200) | RTX 3060 12GB | 32GB | 7B-14B models, basic chat |
| Prosumer ($2,000-3,000) | RTX 4070 Ti Super 16GB | 64GB | 32B models, voice, 5-8 users |
| Pro ($4,000-6,000) | RTX 4090 24GB | 128GB | 70B models, 10-20 users |
| Enterprise ($12,000-18,000) | 2x RTX 4090 | 256GB | 40+ concurrent users |
Goal: Get started with local AI, personal use
- GPU: RTX 3060 12GB (used: $200-250)
- CPU: Any modern 6+ core (i5-12400, Ryzen 5 5600)
- RAM: 32GB DDR4
- Storage: 500GB NVMe SSD
- PSU: 550W 80+ Bronze
- ✅ 7B-14B models (Qwen2.5-7B, Llama-3-8B)
- ✅ Basic voice (Whisper small/medium)
- ✅ Single user, personal projects
⚠️ Slow with complex prompts (~30 tok/s)
Look for:
- Dell Precision/HP Z workstations with RTX 3060
- Avoid: GTX cards (no FP16), AMD (CUDA issues)
Goal: Serious local AI, small team use
- GPU: RTX 4070 Ti Super 16GB ($800) or RTX 4080 16GB ($1000)
- CPU: i7-13700 or Ryzen 7 7700X
- RAM: 64GB DDR5
- Storage: 1TB NVMe Gen4
- PSU: 750W 80+ Gold
- ✅ 32B AWQ quantized models (Qwen2.5-32B-AWQ)
- ✅ Full voice pipeline (Whisper medium + Kokoro)
- ✅ 5-8 concurrent users
- ✅ ~50-60 tok/s generation
RTX 4070 Ti Super at $800 is the sweet spot for:
- 16GB VRAM (critical for 32B models)
- Good efficiency (200W TDP)
- DLSS 3 for future-proofing
Goal: Production workloads, growing business
- GPU: RTX 4090 24GB ($1800-2000)
- CPU: i9-14900K or Ryzen 9 7950X
- RAM: 128GB DDR5
- Storage: 2TB NVMe Gen4
- PSU: 1000W 80+ Platinum
- Cooling: AIO or custom loop (4090 runs hot)
- ✅ 70B AWQ models (Llama-3-70B-AWQ, Qwen2.5-72B-AWQ)
- ✅ Multiple models simultaneously
- ✅ 10-15 concurrent users
- ✅ Full RAG + embeddings + voice
Two RTX 4070 Ti Super (32GB total) can be better than one 4090 for:
- Running separate specialized models
- Redundancy
- But: More complex setup, higher power
Goal: Full production, organization-wide
- 2x RTX 4090 (48GB VRAM total)
- Requires: PCIe bifurcation, 1500W+ PSU
- Good for: Separate model instances
- Single GPU, 48GB VRAM
- Runs: 70B at FP16 (no quantization)
- Pro: Simpler than dual-GPU
- Con: $6000+
- 2x 96GB VRAM (192GB total)
- Runs: Multiple 70B models, 40+ users
- Cost: ~$15-20k total build
From M8 benchmarks on dual PRO 6000:
| Use Case | Per GPU | Both GPUs |
|---|---|---|
| Voice agents (<2s) | 10-20 | 20-40 |
| Interactive chat (<5s) | ~50 | ~100 |
| Batch processing | 100+ | 200+ |
Based on price/performance analysis (see research/M6-CONSUMER-GPU-BENCHMARKS-2026-02-09.md):
Hidden Gem: Used RTX 3090 ($700-900)
At used prices, the RTX 3090 offers:
- 24GB VRAM (same as 4090!)
- 936 GB/s bandwidth (better than new 4080 SUPER)
- Runs 32B+ models that 16GB cards can't
- ~75% of 4090 performance at ~50% cost
Trade-off: Higher power (350W), older architecture
Token generation is memory-bound, not compute-bound. This is why:
- RTX 3080 Ti (912 GB/s) matches newer cards in inference
- Used high-bandwidth cards punch above their weight
| Budget | Best Pick | Why |
|---|---|---|
| $250 | Used RTX 3060 12GB | Entry, can run 7B-14B |
| $500 | Used RTX 3080 Ti 12GB | Great bandwidth for price |
| $700-900 | Used RTX 3090 | Best overall value |
| $800 | New RTX 4070 Ti SUPER | Best new 16GB card |
| $1,600 | RTX 4090 | Maximum single-GPU |
VRAM determines what models fit. Rough guide:
| VRAM | Max Model (AWQ 4-bit) |
|---|---|
| 8GB | 7B |
| 12GB | 14B |
| 16GB | 32B |
| 24GB | 70B |
| 48GB | 70B FP16 or 2x 32B |
Faster bandwidth = faster inference
| GPU | Bandwidth | Relative Speed |
|---|---|---|
| RTX 3060 | 360 GB/s | 1.0x |
| RTX 4070 Ti | 504 GB/s | 1.4x |
| RTX 4090 | 1008 GB/s | 2.8x |
| PRO 6000 | 1792 GB/s | 5.0x |
Rule: 2x your model size minimum
| Model | Min RAM | Recommended |
|---|---|---|
| 7B | 16GB | 32GB |
| 32B | 32GB | 64GB |
| 70B | 64GB | 128GB |
❌ GTX 16xx/10xx — No FP16 tensor cores
❌ AMD GPUs — CUDA issues, ROCm limited
❌ Intel Arc — Driver problems, limited support
❌ Cloud GPUs (H100/A100) — Can't buy, rental only
❌ 8GB cards — Too limited for serious use
- Newegg, Amazon, Micro Center
- EVGA B-Stock (refurbished)
- Manufacturer direct (MSI, ASUS)
- eBay (check seller ratings)
- r/hardwareswap
- Facebook Marketplace (local pickup)
- Mining cards: Usually fine, verify fans work
| GPU | TDP | PSU Needed |
|---|---|---|
| RTX 3060 | 170W | 550W |
| RTX 4070 Ti | 285W | 700W |
| RTX 4090 | 450W | 1000W |
| Dual 4090 | 900W | 1500W |
Add 150-200W for CPU + system overhead.
- Single GPU: Good case airflow is enough
- RTX 4090: AIO or very good air cooling (315W slot power)
- Dual GPU: Custom loop or enterprise chassis
- Entry: RTX 3060 12GB — personal use, getting started
- Prosumer: RTX 4070 Ti Super 16GB — serious work, small teams
- Pro: RTX 4090 24GB — production workloads, 10-20 users
- Enterprise: Dual 4090 — organization-wide, 40+ users
VRAM is king. Buy the most VRAM you can afford.
Built by The Collective based on real-world testing