llm-quantization

Here are 21 public repositories matching this topic...

snu-mllab / GuidedQuant

Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)

quantization efficient-inference large-language-models llm-inference llm-quantization

Updated Apr 13, 2026
Python

zlaabsi / opentq

Star

Open quantization tooling for TurboQuant-style low-bit LLM releases, stock GGUF deployment, and Apple Silicon runtime experiments.

apple toolkit tooling tensor quantization apple-silicon llm llm-inference gguf llm-quantization turboquant

Updated May 5, 2026
Python

GongCheng1919 / bias-compensation

Star

[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation

post-training-quantization llm-compression output-error-optimization bias-compensation llm-quantization

Updated Mar 12, 2025
Python

Dookoo2 / SVSK

Star

Q4 quantization method

llm llms llm-inference llm-quantization

Updated Apr 28, 2026
Python

A high-performance, memory-efficient healthcare framework that deploys fine-tuned Large Language Models (LLMs) on edge devices. Multi-agent system to provide personalized diagnostic reasoning, health education, and dietary planning.

lora multi-agent-systems qlora peft-fine-tuning-llm llm-quantization

Updated Sep 7, 2025
Jupyter Notebook

Iro96 / TurboQuant-H

Star

A more deep research about TurboQuant algorithms

machine-learning algorithms llm llm-quantization turboquant

Updated Apr 6, 2026
Python

t81dev / ternary

Star

Ternary Quantization for LLMs: Implement balanced ternary (T3_K) weights for 2.63-bit quantization—the first working solution for modern large language models.

balanced-ternary llama-cpp gguf llm-quantization ai-efficiency ternary-logic

Updated Nov 29, 2025
C++

MagicTeaMC / AutoGGUF

Star

Let me make GGUF files quickly

llm llamacpp llama-cpp gguf llm-quantization

Updated Jun 4, 2025
Python

0DevDutt0 / EdgeMind

Star

Production-grade LLM quantization, benchmarking, and edge deployment toolkit. Supports bitsandbytes INT8/INT4, GPTQ (Hessian calibration), AWQ (activation-aware), and GGUF (Q2_K–Q8_0). Four-dimensional benchmarking: perplexity, TPS/TTFT, VRAM profiling, and LLM-as-Judge quality scoring. RTX 5090 Blackwell sm_120 ready.

Updated Jun 14, 2026
Python

Danny1218 / quantization-autopsy

Star

Paired capability-level GGUF quantization fragility benchmark across Qwen2.5-3B and SmolLM2 1.7B.

benchmark model-evaluation llama-cpp qwen gguf llm-quantization smollm2

Updated Jun 25, 2026
Python

hemantjuyal / LLM-Quantization-Lab

Star

LLM quantization project built around `llama.cpp` + `Ollama` + `GGUF`

large-language-models llama-cpp ollama llm-quantization llama-models

Updated Mar 22, 2026
Python

violinmelody / CelestiaLLM

Star

Local & lightweight LLM inference runtime in C++ with support for GGUF & quantization

open-source library opensource cpp17 mit-license cpp-library cpp-lib llm cpp-module llm-inference llm-local llm-tools llm-framework gguf llm-library llm-quantization llm-integration lightweight-llm

Updated Feb 27, 2026

JuiceB0xC0de / GWIQ-atlas

Star

GWIQ-Atlas: is a brain-atlasing and model-interpretability suite that combines per-layer census, compliance behaviour tracing, SAE features, and quantization analyses for LLMs.

python transformers quantization sae sparse-autoencoder brain-atlas huggingface activation-analysis mechanistic-interpretability llm-analysis llm-quantization llm-interpretability model-atlas feature-census

Updated Jul 5, 2026
Python

Kyworn / ShiftQuant

Star

Shift-based post-training quantization analysis for LLMs (ShiftQuant paper)

python machine-learning research neural-networks llm-quantization

Updated Mar 28, 2026
Python

brain-lab-research / quantized-reasoning

Star

Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery

loop-detection 2-bit chain-of-thought llm-inference llm-quantization hybrid-inference low-bit-reasoning quantized-reasoning reasoning-acceleration

Updated Jul 3, 2026
Python

aioffgrid / OVForge

Star

OpenVINO Model Manager — desktop GUI for Intel Arc

linux-gui openvino intel-gpu nncf pyqt6 intel-arc llm-tools optimum-intel llm-quantization llm-conversion

Updated Jun 22, 2026
Python

Ealow1971 / low-latency-inference-engine

Star

A high-performance inference engine optimized for deploying quantized LLMs on edge devices. Focuses on SIMD optimizations and memory management.

machine-learning performance ai deep-learning cpp mlops edge-ai senior-engineer llm-quantization

Updated Apr 9, 2026
C++

nagababumo / Quantization-in-Depth

Star

pytorch quantization dequantization 2-bit hugging-face hugging-face-hub llm-quantization torch-quantization

Updated Jun 26, 2024
Jupyter Notebook

Kyworn / PentaNet-v1.0

Star

PentaNet extends BitNet's ternary quantization to pentanary {-2,-1,0,+1,+2}, improving perplexity by 6.4% at 124M params while preserving zero-multiplier arithmetic.

python machine-learning neural-networks model-optimization llm-quantization

Updated Apr 17, 2026
Python

paraglondhe098 / sentiment-classification-llm

Star

Implemented and fine-tuned BERT for a custom sequence classification task, leveraging LoRA adapters for efficient parameter updates and 4-bit quantization to optimize performance and resource utilization.

nlp lora quantization data-augmentation nlp-augmentation llm qlora llm-fine-tuning peft-fine-tuning-llm llm-quantization

Updated Dec 30, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the llm-quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-quantization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-quantization

Here are 21 public repositories matching this topic...

snu-mllab / GuidedQuant

zlaabsi / opentq

GongCheng1919 / bias-compensation

Dookoo2 / SVSK

nithya333 / Medi-LLM

Iro96 / TurboQuant-H

t81dev / ternary

MagicTeaMC / AutoGGUF

0DevDutt0 / EdgeMind

Danny1218 / quantization-autopsy

hemantjuyal / LLM-Quantization-Lab

violinmelody / CelestiaLLM

JuiceB0xC0de / GWIQ-atlas

Kyworn / ShiftQuant

brain-lab-research / quantized-reasoning

aioffgrid / OVForge

Ealow1971 / low-latency-inference-engine

nagababumo / Quantization-in-Depth

Kyworn / PentaNet-v1.0

paraglondhe098 / sentiment-classification-llm

Improve this page

Add this topic to your repo