fast-inference

Star

Here are 18 public repositories matching this topic...

foolwood / pytorch-slimming

Star

Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.

deep-learning pytorch weight-pruning l1-regularization fast-inference

Updated May 13, 2019
Python

aredden / flux-fp8-api

Star

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

flux pytorch quantization diffusion fast-inference fp8

Updated Oct 12, 2024
Python

kssteven418 / BigLittleDecoder

Star

[NeurIPS'23] Speculative Decoding with Big Little Decoder

decoding efficient-inference speculative-execution fast-inference llm speculative-decoding

Updated Feb 6, 2024
Python

romsto / Speculative-Decoding

Star

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

fast-inference llm llm-inference speculative-decoding llm-optimization

Updated Dec 2, 2024
Python

Gilfeather / furnace

Star

🔥 Blazingly fast ML inference server powered by Rust and Burn framework

rust high-performance deep http-server production-ready burn machine-le fast-inference ml-serving inference-ser

Updated Jul 25, 2025
Rust

JIA-Lab-research / Q-LLM

Star

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

fast-inference inference-acceleration large-language-models long-context kv-cache-compression

Updated Jul 16, 2024
Python

hao-ai-lab / d3LLM

Star

d3LLM: Ultra-Fast Diffusion LLM 🚀

inference diffusion efficient-algorithm post-training fast-inference diffusion-models large-language-models llm text-diffusion diffusion-language-models dllm

Updated Dec 19, 2025
Python

lim142857 / Sparsifiner

Star

Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"

attention-mechanism fast-inference sparse-neural-networks low-rank vision-transformer efficient-transformers sparse-attention efficient-vision-transformers

Updated Jul 4, 2023
Python

Academich / translation-transformer

Star

An implementation of the encoder-decoder transformer for SMILES-to-SMILES translation tasks with inference accelerated by speculative decoding

chemistry transformer fast-inference reaction-prediction single-step-retrosynthesis

Updated Jul 19, 2025
Python

szemenyeim / RoboDNN

Star

Fast Forward-Only Deep Neural Network Library for the Nao Robots

library deep-neural-networks deep-learning neural-network robocup pruning fast-inference nao-robots

Updated Jun 6, 2019
C++

DevRafa2007 / jurisia

Star

AI-powered legal assistant for Brazilian lawyers, built with Groq to deliver fast, accurate insights and document support.

nlp ai brazil document-analysis legaltech lawyers fast-inference legal-ai groq legal-assistant

Updated Sep 17, 2025
TypeScript

u-hyszk / japanese-speculative-decoding

Star

Verification of the effect of speculative decoding in Japanese.

nlp japanese fast-inference speculative-decoding

Updated Mar 4, 2024
Python

PopoDev / BiLD

Star

Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder

reproducibility fast-inference llm speculative-decoding

Updated May 30, 2024
Python

MeoPBK / Fast_Inference_Classifiers

Star

Multilable fast inference classifiers (Ridge Regression and MLP) for NLPs with Sentence Embedder, K-Fold, Bootstrap and Boosting. NOTE: since the MLP (fully connected NN) Classifier was too heavy to be loaded, you can just compile it with the script.

python nlp bootstrap nn transformers python3 classification ridge-regression learning-curve boosting classifiers mlp-classifier mlp-networks fast-inference ridge-regression-model ridge-classifier embedders classifiers-comparison

Updated May 5, 2025
Python

PRITHIVSAKTHIUR / Qwen-Image-Edit-Object-Manipulator

Star

Demonstration for the Qwen/Qwen-Image-Edit-2511 model, specialized in object manipulation via lazy-loaded LoRA adapters. Supports adding or removing specific elements (e.g., logos, accessories, clothing) in single- or multi-image inputs while preserving lighting, realism, and background details. Features precise prompt control and fast inference.

torch python3 pytorch lora fast-inference torchvision huggingface-transformers huggingface-spaces huggingface-diffusers flash-attention-3 qwen-image-edit-2511

Updated Jan 5, 2026
Python

theSohamTUmbare / DETR_powered_Image_Captioning

Star

The excellent Image captioning model using the DETR inspired architecture

computer-vision deep-learning transformers pytorch object-detection attention-mechanism fast-inference huggingface-transformers detr detection-transformer image-captioning-ai natural-language-processing-nlp image-captioning-object-detection

Updated Aug 8, 2025
Python

KvaytG / ru-toxicity-detector

Star

A simple toxicity detector.

Updated Dec 28, 2025
Python

archer-paul / vibematch-ai

Star

AI-powered matching platform connecting content creators with sponsors using Cerebras for fast inference, behavioral analysis, and compatibility scoring

react typescript ai influencer-marketing fast-inference content-creators cerebras sponsor-matching

Updated Sep 21, 2025
TypeScript

Improve this page

Add a description, image, and links to the fast-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the fast-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fast-inference

Here are 18 public repositories matching this topic...

foolwood / pytorch-slimming

aredden / flux-fp8-api

kssteven418 / BigLittleDecoder

romsto / Speculative-Decoding

Gilfeather / furnace

JIA-Lab-research / Q-LLM

hao-ai-lab / d3LLM

lim142857 / Sparsifiner

Academich / translation-transformer

szemenyeim / RoboDNN

DevRafa2007 / jurisia

u-hyszk / japanese-speculative-decoding

PopoDev / BiLD

MeoPBK / Fast_Inference_Classifiers

PRITHIVSAKTHIUR / Qwen-Image-Edit-Object-Manipulator

theSohamTUmbare / DETR_powered_Image_Captioning

KvaytG / ru-toxicity-detector

archer-paul / vibematch-ai

Improve this page

Add this topic to your repo