Skip to content
This repository was archived by the owner on Apr 5, 2026. It is now read-only.

Latest commit

 

History

History
247 lines (180 loc) · 6.13 KB

File metadata and controls

247 lines (180 loc) · 6.13 KB

Dream Server Hardware Guide

Last updated: 2026-02-09 (merged M6 value research)

What to buy for local AI at different budgets.


TL;DR Recommendations

Tier GPU RAM What You Get
Entry ($800-1,200) RTX 3060 12GB 32GB 7B-14B models, basic chat
Prosumer ($2,000-3,000) RTX 4070 Ti Super 16GB 64GB 32B models, voice, 5-8 users
Pro ($4,000-6,000) RTX 4090 24GB 128GB 70B models, 10-20 users
Enterprise ($12,000-18,000) 2x RTX 4090 256GB 40+ concurrent users

Tier 1: Entry ($800-1,200)

Goal: Get started with local AI, personal use

Recommended Build

  • GPU: RTX 3060 12GB (used: $200-250)
  • CPU: Any modern 6+ core (i5-12400, Ryzen 5 5600)
  • RAM: 32GB DDR4
  • Storage: 500GB NVMe SSD
  • PSU: 550W 80+ Bronze

What Runs

  • ✅ 7B-14B models (Qwen2.5-7B, Llama-3-8B)
  • ✅ Basic voice (Whisper small/medium)
  • ✅ Single user, personal projects
  • ⚠️ Slow with complex prompts (~30 tok/s)

Buy Used

Look for:

  • Dell Precision/HP Z workstations with RTX 3060
  • Avoid: GTX cards (no FP16), AMD (CUDA issues)

Tier 2: Prosumer ($2,000-3,000)

Goal: Serious local AI, small team use

Recommended Build

  • GPU: RTX 4070 Ti Super 16GB ($800) or RTX 4080 16GB ($1000)
  • CPU: i7-13700 or Ryzen 7 7700X
  • RAM: 64GB DDR5
  • Storage: 1TB NVMe Gen4
  • PSU: 750W 80+ Gold

What Runs

  • ✅ 32B AWQ quantized models (Qwen2.5-32B-AWQ)
  • ✅ Full voice pipeline (Whisper medium + Kokoro)
  • ✅ 5-8 concurrent users
  • ✅ ~50-60 tok/s generation

Best Value

RTX 4070 Ti Super at $800 is the sweet spot for:

  • 16GB VRAM (critical for 32B models)
  • Good efficiency (200W TDP)
  • DLSS 3 for future-proofing

Tier 3: Pro ($4,000-6,000)

Goal: Production workloads, growing business

Recommended Build

  • GPU: RTX 4090 24GB ($1800-2000)
  • CPU: i9-14900K or Ryzen 9 7950X
  • RAM: 128GB DDR5
  • Storage: 2TB NVMe Gen4
  • PSU: 1000W 80+ Platinum
  • Cooling: AIO or custom loop (4090 runs hot)

What Runs

  • ✅ 70B AWQ models (Llama-3-70B-AWQ, Qwen2.5-72B-AWQ)
  • ✅ Multiple models simultaneously
  • ✅ 10-15 concurrent users
  • ✅ Full RAG + embeddings + voice

Alternative: Dual 4070 Ti

Two RTX 4070 Ti Super (32GB total) can be better than one 4090 for:

  • Running separate specialized models
  • Redundancy
  • But: More complex setup, higher power

Tier 4: Enterprise ($12,000-18,000)

Goal: Full production, organization-wide

Option A: Dual RTX 4090

  • 2x RTX 4090 (48GB VRAM total)
  • Requires: PCIe bifurcation, 1500W+ PSU
  • Good for: Separate model instances

Option B: RTX 6000 Ada (48GB)

  • Single GPU, 48GB VRAM
  • Runs: 70B at FP16 (no quantization)
  • Pro: Simpler than dual-GPU
  • Con: $6000+

Option C: Dual RTX PRO 6000 Blackwell (What We Run)

  • 2x 96GB VRAM (192GB total)
  • Runs: Multiple 70B models, 40+ users
  • Cost: ~$15-20k total build

Capacity (Our Real Numbers)

From M8 benchmarks on dual PRO 6000:

Use Case Per GPU Both GPUs
Voice agents (<2s) 10-20 20-40
Interactive chat (<5s) ~50 ~100
Batch processing 100+ 200+

Best Value Picks (M6 Research)

Based on price/performance analysis (see research/M6-CONSUMER-GPU-BENCHMARKS-2026-02-09.md):

Hidden Gem: Used RTX 3090 ($700-900)

At used prices, the RTX 3090 offers:

  • 24GB VRAM (same as 4090!)
  • 936 GB/s bandwidth (better than new 4080 SUPER)
  • Runs 32B+ models that 16GB cards can't
  • ~75% of 4090 performance at ~50% cost

Trade-off: Higher power (350W), older architecture

Memory Bandwidth Insight

Token generation is memory-bound, not compute-bound. This is why:

  • RTX 3080 Ti (912 GB/s) matches newer cards in inference
  • Used high-bandwidth cards punch above their weight

Quick Value Table

Budget Best Pick Why
$250 Used RTX 3060 12GB Entry, can run 7B-14B
$500 Used RTX 3080 Ti 12GB Great bandwidth for price
$700-900 Used RTX 3090 Best overall value
$800 New RTX 4070 Ti SUPER Best new 16GB card
$1,600 RTX 4090 Maximum single-GPU

Key Specs Explained

VRAM (Most Important)

VRAM determines what models fit. Rough guide:

VRAM Max Model (AWQ 4-bit)
8GB 7B
12GB 14B
16GB 32B
24GB 70B
48GB 70B FP16 or 2x 32B

Memory Bandwidth

Faster bandwidth = faster inference

GPU Bandwidth Relative Speed
RTX 3060 360 GB/s 1.0x
RTX 4070 Ti 504 GB/s 1.4x
RTX 4090 1008 GB/s 2.8x
PRO 6000 1792 GB/s 5.0x

System RAM

Rule: 2x your model size minimum

Model Min RAM Recommended
7B 16GB 32GB
32B 32GB 64GB
70B 64GB 128GB

What NOT to Buy

GTX 16xx/10xx — No FP16 tensor cores
AMD GPUs — CUDA issues, ROCm limited
Intel Arc — Driver problems, limited support
Cloud GPUs (H100/A100) — Can't buy, rental only
8GB cards — Too limited for serious use


Where to Buy

New

  • Newegg, Amazon, Micro Center
  • EVGA B-Stock (refurbished)
  • Manufacturer direct (MSI, ASUS)

Used

  • eBay (check seller ratings)
  • r/hardwareswap
  • Facebook Marketplace (local pickup)
  • Mining cards: Usually fine, verify fans work

Power Considerations

GPU TDP PSU Needed
RTX 3060 170W 550W
RTX 4070 Ti 285W 700W
RTX 4090 450W 1000W
Dual 4090 900W 1500W

Add 150-200W for CPU + system overhead.


Cooling

  • Single GPU: Good case airflow is enough
  • RTX 4090: AIO or very good air cooling (315W slot power)
  • Dual GPU: Custom loop or enterprise chassis

Summary

  1. Entry: RTX 3060 12GB — personal use, getting started
  2. Prosumer: RTX 4070 Ti Super 16GB — serious work, small teams
  3. Pro: RTX 4090 24GB — production workloads, 10-20 users
  4. Enterprise: Dual 4090 — organization-wide, 40+ users

VRAM is king. Buy the most VRAM you can afford.


Built by The Collective based on real-world testing