Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
-
Updated
May 13, 2019 - Python
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
🔥 Blazingly fast ML inference server powered by Rust and Burn framework
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
d3LLM: Ultra-Fast Diffusion LLM 🚀
Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"
An implementation of the encoder-decoder transformer for SMILES-to-SMILES translation tasks with inference accelerated by speculative decoding
Fast Forward-Only Deep Neural Network Library for the Nao Robots
AI-powered legal assistant for Brazilian lawyers, built with Groq to deliver fast, accurate insights and document support.
Verification of the effect of speculative decoding in Japanese.
Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder
Multilable fast inference classifiers (Ridge Regression and MLP) for NLPs with Sentence Embedder, K-Fold, Bootstrap and Boosting. NOTE: since the MLP (fully connected NN) Classifier was too heavy to be loaded, you can just compile it with the script.
Demonstration for the Qwen/Qwen-Image-Edit-2511 model, specialized in object manipulation via lazy-loaded LoRA adapters. Supports adding or removing specific elements (e.g., logos, accessories, clothing) in single- or multi-image inputs while preserving lighting, realism, and background details. Features precise prompt control and fast inference.
The excellent Image captioning model using the DETR inspired architecture
A simple toxicity detector.
AI-powered matching platform connecting content creators with sponsors using Cerebras for fast inference, behavioral analysis, and compatibility scoring
Add a description, image, and links to the fast-inference topic page so that developers can more easily learn about it.
To associate your repository with the fast-inference topic, visit your repo's landing page and select "manage topics."