suhasramanand / kv-cache-compression Star 0 Code Issues Pull requests Research-quality implementation of KV cache compression strategies for efficient long-context LLM inference. Comprehensive evaluation of H2O, semantic clustering, and learned eviction policies achieving 50-87.5% compression with minimal quality degradation. research pytorch attention-mechanisms transformer-architecture memory-optimization kv-cache cache-compression long-context llm-inference h2o-cache Updated Dec 26, 2025 Jupyter Notebook