vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!
-
Updated
May 24, 2026 - Python
vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!
Object-storage-native KV cache for LLM inference & RL. Cross-restart, cross-conversation, cross-engine via shared S3 bucket.
Mini LLM Serve is a Go-based LLM serving control plane for token-aware scheduling, streaming, TTFT/TBT metrics, and prefix cache metadata.
DeepSeek缓存优化器 v1.1 — Reasonix四支柱 + 语义压缩 (命中率+30%)
Production LLM gateway: OpenAI-compatible API in front of OpenAI/Anthropic/Bedrock. Weighted routing, prefix cache, Prometheus + Grafana, k8s.
Correctness-fixed Rust/PyO3 flat-array DFA prefix cache — rewrite of BCR-memory v1 with regression tests for four bugs and an SGLang/vLLM head-to-head harness.
Add a description, image, and links to the prefix-cache topic page so that developers can more easily learn about it.
To associate your repository with the prefix-cache topic, visit your repo's landing page and select "manage topics."