GitHub - lemonade-sdk/lemonade: Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

🍋 Lemonade: Local LLMs with GPU and NPU acceleration

Download | Documentation | Discord

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs.

Apps like n8n, VS Code Copilot, Morphik, and many more use Lemonade to seamlessly run LLMs on any PC.

Getting Started

Install: Windows · Linux · Source
Get Models: Browse and download with the Model Manager
Chat: Try models with the built-in chat interface
Connect: Use Lemonade with your favorite apps:

Want your app featured here? Discord · GitHub Issue · Email

Using the CLI

To run and chat with Gemma 3:

lemonade-server run Gemma-3-4b-it-GGUF

To install models ahead of time, use the pull command:

lemonade-server pull Gemma-3-4b-it-GGUF

To check all models available, use the list command:

lemonade-server list

Tip: You can use --llamacpp vulkan/rocm to select a backend when running GGUF models.

Model Library

Lemonade supports GGUF, FLM, and ONNX models across CPU, GPU, and NPU (see supported configurations).

Use lemonade-server pull or the built-in Model Manager to download models. You can also import custom GGUF/ONNX models from Hugging Face.

Browse all built-in models →

Supported Configurations

Lemonade supports the following configurations, while also making it easy to switch between them at runtime. Find more information about it here.

Hardware	Engine: OGA	Engine: llamacpp	Engine: FLM	Windows	Linux
🧠 CPU	All platforms	All platforms	-	✅	✅
🎮 GPU	—	Vulkan: All platforms ROCm: Selected AMD platforms* Metal: Apple Silicon	—	✅	✅
🤖 NPU	AMD Ryzen™ AI 300 series	—	Ryzen™ AI 300 series	✅	—

* See supported AMD ROCm platforms

Architecture	Platform Support	GPU Models
gfx1151 (STX Halo)	Windows, Ubuntu	Ryzen AI MAX+ Pro 395
gfx120X (RDNA4)	Windows, Ubuntu	Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT
gfx110X (RDNA3)	Windows, Ubuntu	Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT

Project Roadmap

Under Development	Under Consideration	Recently Completed
Image Generation	vLLM support	General speech-to-text support (whisper.cpp)
Text to speech		ROCm support for Ryzen AI 360-375 (Strix) APUs
		Lemonade desktop app

Integrate Lemonade Server with Your Application

You can use any OpenAI-compatible client library by configuring it to use http://localhost:8000/api/v1 as the base URL. A table containing official and popular OpenAI clients on different languages is shown below.

Feel free to pick and choose your preferred language.

Python	C++	Java	C#	Node.js	Go	Ruby	Rust	PHP
openai-python	openai-cpp	openai-java	openai-dotnet	openai-node	go-openai	ruby-openai	async-openai	openai-php

Python Client Example

from openai import OpenAI

# Initialize the client to use Lemonade Server
client = OpenAI(
    base_url="http://localhost:8000/api/v1",
    api_key="lemonade"  # required but unused
)

# Create a chat completion
completion = client.chat.completions.create(
    model="Llama-3.2-1B-Instruct-Hybrid",  # or any other available model
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

# Print the response
print(completion.choices[0].message.content)

For more detailed integration instructions, see the Integration Guide.

Beyond an LLM Server

The Lemonade Python SDK is also available, which includes the following components:

🐍 Lemonade Python API: High-level Python API to directly integrate Lemonade LLMs into Python applications.
🖥️ Lemonade CLI: The lemonade CLI lets you mix-and-match LLMs (ONNX, GGUF, SafeTensors) with prompting templates, accuracy testing, performance benchmarking, and memory profiling to characterize your models on your hardware.

Quick Start with Docker

You may need additional configuration depending on your environment.

Docker Run with Default Configuration

docker run -d \
  --name lemonade-server \
  -p 8000:8000 \
  -v lemonade-cache:/root/.cache/huggingface \
  -v lemonade-llama:/opt/lemonade/llama \
  -e LEMONADE_LLAMACPP_BACKEND=cpu \
  ghcr.io/lemonade-sdk/lemonade-server:latest

Docker Run with a Specific Port and Version

docker run -d \
  --name lemonade-server \
  -p 4000:5000 \
  -v lemonade-cache:/root/.cache/huggingface \
  -v lemonade-llama:/opt/lemonade/llama \
  -e LEMONADE_LLAMACPP_BACKEND=cpu \
  ghcr.io/lemonade-sdk/lemonade-server:v9.1.3 \
  ./lemonade-server serve --no-tray --host 0.0.0.0 --port 5000

This will run the server on port 5000 inside the container, mapped to port 4000 on your host.

Other Docker Methods

Docker Compose Setup

Docker Compose makes it easier to manage multi-container applications.

Make sure you have Docker Compose installed.
Create a docker-compose.yml file like this:

services:
  lemonade:
    image: ghcr.io/lemonade-sdk/lemonade-server:latest
    container_name: lemonade-server
    ports:
      - "8000:8000"
    volumes:
      # Persist downloaded models
      - lemonade-cache:/root/.cache/huggingface
      # # Persist llama binaries
      - lemonade-llama:/opt/lemonade/llama
    environment:
      - LEMONADE_LLAMACPP_BACKEND=cpu
    restart: unless-stopped

volumes:
  lemonade-cache:
  lemonade-llama:

You can add more services as needed.

Run the following command in the directory containing your docker-compose.yml:

docker-compose up -d

This will pull the latest image (or the version you specified) from the Lemonade container registry and start the server with your mapped ports.

Once the container is running, verify it’s working:

curl http://localhost:8000/api/v1/models

You should receive a response listing available models.

Build Your Own Docker Image

If you want to build a custom image, check out the DOCKER_GUIDE for detailed instructions.

FAQ

To read our frequently asked questions, see our FAQ Guide

Contributing

We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our contribution guide.

New contributors can find beginner-friendly issues tagged with "Good First Issue" to get started.

Maintainers

This project is sponsored by AMD. It is maintained by @danielholanda @jeremyfowers @ramkrishna @vgodsoe in equal measure. You can reach us by filing an issue, emailing lemonade@amd.com, or joining our Discord.

License and Attribution

This project is:

Built with C++ (server) and Python (SDK) with ❤️ for the open source community,
Standing on the shoulders of great tools from:
- ggml/llama.cpp
- OnnxRuntime GenAI
- Hugging Face Hub
- OpenAI API
- IRON/MLIR-AIE
- and more...
Accelerated by mentorship from the OCV Catalyst program.
Licensed under the Apache 2.0 License.
- Portions of the project are licensed as described in NOTICE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 491 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
docs		docs
examples		examples
src		src
test		test
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pylintrc		.pylintrc
Dockerfile		Dockerfile
LICENSE		LICENSE
NOTICE.md		NOTICE.md
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🍋 Lemonade: Local LLMs with GPU and NPU acceleration

Download | Documentation | Discord

Getting Started

Using the CLI

Model Library

Supported Configurations

Project Roadmap

Integrate Lemonade Server with Your Application

Python Client Example

Beyond an LLM Server

Quick Start with Docker

Docker Run with Default Configuration

Docker Run with a Specific Port and Version

Other Docker Methods

Docker Compose Setup

Build Your Own Docker Image

FAQ

Contributing

Maintainers

License and Attribution

About

Uh oh!

Releases 39

Packages

Uh oh!

Uh oh!

Contributors 42

Languages

License

lemonade-sdk/lemonade

Folders and files

Latest commit

History

Repository files navigation

🍋 Lemonade: Local LLMs with GPU and NPU acceleration

Download | Documentation | Discord

Getting Started

Using the CLI

Model Library

Supported Configurations

Project Roadmap

Integrate Lemonade Server with Your Application

Python Client Example

Beyond an LLM Server

Quick Start with Docker

Docker Run with Default Configuration

Docker Run with a Specific Port and Version

Other Docker Methods

Docker Compose Setup

Build Your Own Docker Image

FAQ

Contributing

Maintainers

License and Attribution

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 39

Packages 0

Uh oh!

Uh oh!

Contributors 42

Languages

Packages