From 34d8abb69804e687ca9b13a5786c9bcbdf7f3845 Mon Sep 17 00:00:00 2001 From: samkemp Date: Wed, 1 Apr 2026 15:24:05 +0100 Subject: [PATCH 1/3] Update READMEs to position Foundry Local as E2E AI solution - Rewrite root README: SDK quickstart first, CLI section minimized, E2E solution messaging, links to MS Learn docs - Create samples/README.md: top-level overview of all sample languages - Create samples/js/README.md: list all 12 JS samples with run instructions - Create samples/python/README.md: list all 9 Python samples with run instructions - Update samples/cs/README.md: add missing LiveAudioTranscription samples - Update samples/rust/README.md: add 4 tutorial samples, table format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- README.md | 272 ++++++++++++++++++--------------------- samples/README.md | 32 +++++ samples/cs/README.md | 2 + samples/js/README.md | 52 ++++++++ samples/python/README.md | 47 +++++++ samples/rust/README.md | 37 ++++-- 6 files changed, 284 insertions(+), 158 deletions(-) create mode 100644 samples/README.md create mode 100644 samples/js/README.md create mode 100644 samples/python/README.md diff --git a/README.md b/README.md index 07bc9b4d..060bc2ec 100644 --- a/README.md +++ b/README.md @@ -13,24 +13,24 @@ -## Add on-device AI to your app, effortlessly +## Ship on-device AI inside your app +Foundry Local is an **end-to-end local AI solution** for building applications that run entirely on the user's device. It provides native SDKs (C#, JavaScript, Python, and Rust), a curated catalog of optimized models, and automatic hardware acceleration — all in a lightweight package (~20 MB). -Foundry Local lets you embed generative AI directly into your applications — no cloud or server calls required. All inference runs on-device, which means user data never leaves the device, responses start immediately with zero network latency, and your app works offline. No per-token costs, no backend infrastructure to maintain. +User data never leaves the device, responses start immediately with zero network latency, and your app works offline. No per-token costs, no API keys, no backend infrastructure to maintain, and no Azure subscription required. -Key benefits include: +### Key Features -- **Self-contained SDK** — Ship AI features without requiring users to install any external dependencies. -- **Chat AND Audio in one runtime** — Text generation and speech-to-text (Whisper) through a single SDK — no need for separate tools like `whisper.cpp` + `llama.cpp`. -- **Easy-to-use CLI** — Explore models and experiment locally before integrating with your app. -- **Optimized models out-of-the-box** — State-of-the-art quantization and compression deliver both performance and quality. -- **Small footprint** — Leverages [ONNX Runtime](https://onnxruntime.ai/); a high performance inference runtime (written in C++) that has minimal disk and memory requirements. -- **Automatic hardware acceleration** — Leverage GPUs and NPUs when available, with seamless fallback to CPU. Zero hardware detection code needed. -- **Model distribution** — Popular open-source models hosted in the cloud with automatic downloading and updating. -- **Multi-platform support** — Windows, macOS (Apple silicon), Linux and Android. -- **Bring your own models** — Add and run custom models alongside the built-in catalog. +- **Native SDKs** — Embed AI directly in your app with C#, JavaScript, Python, and Rust SDKs. No separate server process needed. +- **Chat AND Audio in one runtime** — Text generation and speech-to-text (Whisper) through a single SDK. +- **Curated model catalog** — Production-ready models (Phi, Qwen, DeepSeek, Mistral, Whisper) optimized for on-device use across consumer hardware. +- **Automatic hardware acceleration** — GPU and NPU when available, with seamless CPU fallback. Zero hardware detection code needed. +- **Smart model management** — Models download on first use, cache locally, and auto-select the best variant for the user's hardware. +- **OpenAI-compatible API** — Drop-in compatible with OpenAI SDKs for minimal code changes. +- **Small footprint** — Powered by [ONNX Runtime](https://onnxruntime.ai/), a high-performance inference engine with minimal disk and memory requirements. +- **Multi-platform** — Windows, macOS (Apple silicon), and Linux. ### Supported Tasks @@ -39,67 +39,25 @@ Key benefits include: | Chat / Text Generation | `phi-3.5-mini`, `qwen2.5-0.5b`, `qwen2.5-coder-0.5b`, etc. | Chat Completions | | Audio Transcription (Speech-to-Text) | `whisper-tiny` | Audio Transcription | -> [!NOTE] -> Foundry Local is a **unified local AI runtime** — it replaces the need for separate tools like `whisper.cpp`, `llama.cpp`, or `ollama`. One SDK handles both chat and audio, with automatic hardware acceleration (NPU > GPU > CPU). - ## 🚀 Quickstart -### Explore with the CLI - -The Foundry Local CLI is a great way to explore models and test features before integrating with your app. - -1. Install the CLI to explore models interactively before integrating with your app. - - **Windows:** - ```bash - winget install Microsoft.FoundryLocal - ``` - - **macOS:** - ```bash - brew install microsoft/foundrylocal/foundrylocal - ``` - -2. Start a chat session with a model: - - ```bash - foundry model run qwen2.5-0.5b - ``` - -3. Explore available models - - ```bash - foundry model ls - ``` - -> [!TIP] -> For installation issues, see the [Installation section](#installing) below. - -### Add on-device AI to your app - -The Foundry Local SDK makes it easy to integrate local AI models into your applications. Below are quickstart examples for JavaScript, C# and Python. - -> [!TIP] -> For the JavaScript and C# SDKs you do **not** require the CLI to be installed. The Python SDK has a dependency on the CLI but a native in-process SDK is coming soon. +The fastest way to get started is with the SDK. Pick your language:
JavaScript -1. Install the SDK using npm: +1. Install the SDK: ```bash npm install foundry-local-sdk ``` - > [!NOTE] - > On Windows, NPU models are not currently available for the JavaScript SDK. These will be enabled in a subsequent release. - -2. Use the SDK in your application as follows: +2. Run your first chat completion: ```javascript import { FoundryLocalManager } from 'foundry-local-sdk'; - const manager = FoundryLocalManager.create({ appName: 'foundry_local_samples' }); + const manager = FoundryLocalManager.create({ appName: 'my-app' }); // Download and load a model (auto-selects best variant for user's hardware) const model = await manager.catalog.getModel('qwen2.5-0.5b'); @@ -120,27 +78,30 @@ The Foundry Local SDK makes it easy to integrate local AI models into your appli await model.unload(); ``` +> [!NOTE] +> On Windows, NPU models are not currently available for the JavaScript SDK. These will be enabled in a subsequent release. +
C# -1. Install the SDK using NuGet: +1. Install the SDK: ```bash - # Windows + # Windows (recommended for hardware acceleration) dotnet add package Microsoft.AI.Foundry.Local.WinML # macOS/Linux dotnet add package Microsoft.AI.Foundry.Local ``` - On Windows, we recommend using the `Microsoft.AI.Foundry.Local.WinML` package, which will enable wider hardware acceleration support. -2. Use the SDK in your application as follows: +2. Run your first chat completion: + ```csharp using Microsoft.AI.Foundry.Local; - var config = new Configuration { AppName = "foundry_local_samples" }; + var config = new Configuration { AppName = "my-app" }; await FoundryLocalManager.CreateAsync(config); var mgr = FoundryLocalManager.Instance; @@ -152,9 +113,9 @@ The Foundry Local SDK makes it easy to integrate local AI models into your appli // Create a chat client and get a streaming completion var chatClient = await model.GetChatClientAsync(); - var messages = new List - { - new() { Role = "user", Content = "What is the golden ratio?" } + var messages = new List + { + new() { Role = "user", Content = "What is the golden ratio?" } }; await foreach (var chunk in chatClient.CompleteChatStreamingAsync(messages)) @@ -171,15 +132,15 @@ The Foundry Local SDK makes it easy to integrate local AI models into your appli
Python -**NOTE:** The Python SDK currently relies on the Foundry Local CLI and uses the OpenAI-compatible REST API. A native in-process SDK (matching JS/C#) is coming soon. +> **Note:** The Python SDK currently uses the Foundry Local CLI and the OpenAI-compatible REST API. A native in-process SDK (matching JS/C#) is coming soon. -1. Install the SDK using pip: +1. Install the SDK: ```bash pip install foundry-local-sdk openai ``` -2. Use the SDK in your application as follows: +2. Run your first chat completion: ```python import openai @@ -201,33 +162,26 @@ The Foundry Local SDK makes it easy to integrate local AI models into your appli
-### More samples - -Explore complete working examples in the [`samples/`](samples/) folder: - -| Sample | Description | -|--------|-------------| -| [**cs/**](samples/cs/) | C# examples using the .NET SDK (includes audio transcription) | -| [**js/**](samples/js/) | JavaScript/Node.js examples (chat, audio transcription, tool calling) | -| [**python/**](samples/python/) | Python examples using the OpenAI-compatible API | +> [!TIP] +> For the JavaScript and C# SDKs, you do **not** need the CLI installed. The Python SDK currently requires the CLI — a native in-process SDK is coming soon. -#### Audio Transcription (Speech-to-Text) +### Audio Transcription (Speech-to-Text) -The SDK also supports audio transcription via Whisper models. Use `model.createAudioClient()` to transcribe audio files on-device: +The SDK also supports audio transcription via Whisper models: ```javascript import { FoundryLocalManager } from 'foundry-local-sdk'; -const manager = FoundryLocalManager.create({ appName: 'MyApp' }); +const manager = FoundryLocalManager.create({ appName: 'my-app' }); -// Download and load the Whisper model const whisperModel = await manager.catalog.getModel('whisper-tiny'); await whisperModel.download(); await whisperModel.load(); -// Transcribe an audio file const audioClient = whisperModel.createAudioClient(); audioClient.settings.language = 'en'; + +// Transcribe an audio file const result = await audioClient.transcribe('recording.wav'); console.log('Transcription:', result.text); @@ -240,115 +194,135 @@ await whisperModel.unload(); ``` > [!TIP] -> A single `FoundryLocalManager` can manage both chat and audio models simultaneously. See the [chat-and-audio sample](samples/js/chat-and-audio-foundry-local/) for a complete example that transcribes audio then analyzes it with a chat model. +> A single `FoundryLocalManager` can manage both chat and audio models simultaneously. See the [chat-and-audio sample](samples/js/chat-and-audio-foundry-local/) for a complete example. + +## 📦 Samples + +Explore complete working examples in the [`samples/`](samples/) folder: + +| Language | Samples | Highlights | +|----------|---------|------------| +| [**C#**](samples/cs/) | 12 | Native chat, audio transcription, tool calling, model management, web server, tutorials | +| [**JavaScript**](samples/js/) | 12 | Native chat, audio, Electron app, Copilot SDK, LangChain, tool calling, tutorials | +| [**Python**](samples/python/) | 9 | Chat completions, audio transcription, LangChain, tool calling, tutorials | +| [**Rust**](samples/rust/) | 8 | Native chat, audio transcription, tool calling, web server, tutorials | -## Manage +## 🖥️ CLI -This section provides an overview of how to manage Foundry Local, including installation, upgrading, and removing the application. +The Foundry Local CLI lets you explore models and experiment interactively. -### Installing +**Install:** -Foundry Local is available for Windows and macOS (Apple silicon only). You can install it using package managers or manually download the installer. +```bash +# Windows +winget install Microsoft.FoundryLocal + +# macOS +brew install microsoft/foundrylocal/foundrylocal +``` + +**Run a model:** + +```bash +foundry model run qwen2.5-0.5b +``` + +**List available models:** -#### Windows +```bash +foundry model ls +``` + +> For the full CLI reference and advanced usage, see the [CLI documentation on Microsoft Learn](https://learn.microsoft.com/en-us/azure/foundry-local/reference/reference-cli). + +## 📥 Installing + +Foundry Local is available for Windows, macOS (Apple silicon), and Linux. -You can install Foundry Local using the following command in a Windows console (PowerShell, cmd, etc.): +### Windows ```bash winget install Microsoft.FoundryLocal ``` -Alternatively, you can also manually download and install the packages. On [the releases page](https://github.com/microsoft/Foundry-Local/releases) -select a release and expand the Artifacts list. Copy the artifact full URI (for example: `https://github.com/microsoft/Foundry-Local/releases/download/v0.3.9267/FoundryLocal-x64-0.3.9267.43123.msix`) -to use in the below PowerShell steps. Replace `x64` with `arm64` as needed. +
+Manual installation + +On [the releases page](https://github.com/microsoft/Foundry-Local/releases), select a release and expand the Artifacts list. Copy the artifact URI and use the following PowerShell steps: ```powershell -# Download the package and its dependency $releaseUri = "https://github.com/microsoft/Foundry-Local/releases/download/v0.3.9267/FoundryLocal-x64-0.3.9267.43123.msix" Invoke-WebRequest -Method Get -Uri $releaseUri -OutFile .\FoundryLocal.msix $crtUri = "https://aka.ms/Microsoft.VCLibs.x64.14.00.Desktop.appx" Invoke-WebRequest -Method Get -Uri $crtUri -OutFile .\VcLibs.appx -# Install the Foundry Local package Add-AppxPackage .\FoundryLocal.msix -DependencyPath .\VcLibs.appx ``` -If you're having problems installing Foundry, please [file an issue](https://github.com/microsoft/foundry-local/issues) -and include logs using one of these methods: - -- For WinGet - use `winget install Microsoft.FoundryLocal --logs --verbose` - select the most-recently-dated log file - and attach it to the issue. -- For `Add-AppxPackage` - immediately after it indicates an error, in an elevated PowerShell instance, use - `Get-MsixLogs | Out-File MsixLogs.txt` and attach it to the issue. -- Use [Windows Feedback Hub](feedback-hub:) and create a Problem in the "Apps > All other apps" category. Use the - "Add More Details > Recreate my problem" and re-run the failing commands to collect more data. Once your feedback - is submitted, use the "Share" option to generate a link and put that into the filed issue. - -> [!NOTE] -> Log files may contain information like user names, IP addresses, file paths, etc. Be sure to remove those -> before sharing here. +Replace `x64` with `arm64` as needed. -#### macOS +
-Install Foundry Local using the following command in your terminal: +### macOS ```bash brew install microsoft/foundrylocal/foundrylocal ``` -Alternatively, you can also manually download and install the packages by following these steps: +
+Manual installation 1. Download the latest release from [the releases page](https://github.com/microsoft/Foundry-Local/releases). -1. Unzip the downloaded file. -1. Open a terminal and navigate to the unzipped folder, run the following command to install Foundry Local: +2. Unzip the downloaded file. +3. Run the installer: ```bash ./install-foundry.command ``` +
+ ### Upgrading -To upgrade Foundry Local, run the following command in your terminal: +```bash +# Windows +winget upgrade --id Microsoft.FoundryLocal + +# macOS (Homebrew) +brew upgrade foundrylocal +``` -- **Windows** +### Uninstalling - ```bash - winget upgrade --id Microsoft.FoundryLocal - ``` +
+Uninstall instructions -- **macOS**: - If you installed Foundry Local using Homebrew, you can upgrade it with the following command: - ``` - brew upgrade foundrylocal - ``` - If you installed Foundry Local manually, you'll first need to uninstall the current version using: - ```bash - uninstall-foundry - ``` - Then, follow the [installation instructions](#installing) to install the latest version. +**Windows:** -### Uninstalling +```bash +winget uninstall Microsoft.FoundryLocal +``` + +Or navigate to **Settings > Apps > Apps & features**, find "Foundry Local", and select **Uninstall**. -To uninstall Foundry Local, run the following command in your terminal: +**macOS (Homebrew):** + +```bash +brew rm foundrylocal +brew untap microsoft/foundrylocal +brew cleanup --scrub +``` -- **Windows**: You can uninstall Foundry Local using `winget` in a Windows console (PowerShell, cmd, etc.): +**macOS (manual install):** - ```bash - winget uninstall Microsoft.FoundryLocal - ``` +```bash +uninstall-foundry +``` - Alternatively, you can also uninstall Foundry Local by navigating to **Settings > Apps > Apps & features** in Windows, finding "Foundry Local" in the list, and selecting the ellipsis (`...`) followed by **Uninstall**. +
-- **macOS**: If you installed Foundry Local using Homebrew, you can uninstall it with the following command: - ```bash - brew rm foundrylocal - brew untap microsoft/foundrylocal - brew cleanup --scrub - ``` - If you installed Foundry Local manually, you can uninstall it by running the following command in your terminal: - ```bash - uninstall-foundry - ``` +> [!TIP] +> For installation troubleshooting, see the [troubleshooting guide](https://learn.microsoft.com/azure/ai-foundry/foundry-local/reference/reference-best-practice?view=foundry-classic) or [file an issue](https://github.com/microsoft/foundry-local/issues). ## Reporting Issues @@ -356,9 +330,11 @@ We're actively looking for feedback during this preview phase. Please report iss ## 🎓 Learn More -- [Foundry Local Documentation on Microsoft Learn](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/?view=foundry-classic) +- [Foundry Local Documentation](https://learn.microsoft.com/en-us/azure/foundry-local/) on Microsoft Learn +- [What is Foundry Local?](https://learn.microsoft.com/en-us/azure/foundry-local/what-is-foundry-local) — Architecture and concepts +- [Tutorials](https://learn.microsoft.com/en-us/azure/foundry-local/) — Chat assistant, document summarizer, tool calling, voice-to-text - [Troubleshooting guide](https://learn.microsoft.com/azure/ai-foundry/foundry-local/reference/reference-best-practice?view=foundry-classic) -- [Foundry Local Lab](https://github.com/Microsoft-foundry/foundry-local-lab): This GitHub repository contains a lab designed to help you learn how to use Foundry Local effectively. It includes hands-on exercises, sample code, and step-by-step instructions to guide you through the process of setting up and using Foundry Local in various scenarios. +- [Foundry Local Lab](https://github.com/Microsoft-foundry/foundry-local-lab) — Hands-on exercises and step-by-step instructions ## ⚖️ License diff --git a/samples/README.md b/samples/README.md new file mode 100644 index 00000000..0dbd1329 --- /dev/null +++ b/samples/README.md @@ -0,0 +1,32 @@ +# Foundry Local Samples + +Explore complete working examples that demonstrate how to use Foundry Local — an end-to-end local AI solution that runs entirely on-device. These samples cover chat completions, audio transcription, tool calling, LangChain integration, and more. + +> **New to Foundry Local?** Check out the [main README](../README.md) for an overview and quickstart, or visit the [Foundry Local documentation](https://learn.microsoft.com/en-us/azure/foundry-local/) on Microsoft Learn. + +## Samples by Language + +| Language | Samples | Description | +|----------|---------|-------------| +| [**C#**](cs/) | 12 | .NET SDK samples including native chat, audio transcription, tool calling, model management, web server, and tutorials. Uses WinML on Windows for hardware acceleration. | +| [**JavaScript**](js/) | 12 | Node.js SDK samples including native chat, audio transcription, Electron desktop app, Copilot SDK integration, LangChain, tool calling, web server, and tutorials. | +| [**Python**](python/) | 9 | Python samples using the OpenAI-compatible API, including chat, audio transcription, LangChain integration, tool calling, web server, and tutorials. | +| [**Rust**](rust/) | 8 | Rust SDK samples including native chat, audio transcription, tool calling, web server, and tutorials. | + +## Common Patterns + +Most samples follow a similar pattern: + +1. **Initialize** the Foundry Local manager +2. **Download** a model (auto-selects the best variant for your hardware) +3. **Load** the model into memory +4. **Run inference** (chat completions or audio transcription) +5. **Unload** the model when done + +## Models Used + +| Model | Task | Used In | +|-------|------|---------| +| `qwen2.5-0.5b` | Chat / Text Generation | Most chat and tool-calling samples | +| `phi-3.5-mini` | Chat / Text Generation | Some Python samples | +| `whisper-tiny` | Audio Transcription | All audio/voice samples | diff --git a/samples/cs/README.md b/samples/cs/README.md index 1847bb8e..46f3c0b6 100644 --- a/samples/cs/README.md +++ b/samples/cs/README.md @@ -21,6 +21,8 @@ Both packages provide the same APIs, so the same source code works on all platfo | [tutorial-document-summarizer](tutorial-document-summarizer/) | Summarize documents with AI (tutorial). | | [tutorial-tool-calling](tutorial-tool-calling/) | Create a tool-calling assistant (tutorial). | | [tutorial-voice-to-text](tutorial-voice-to-text/) | Transcribe and summarize audio (tutorial). | +| [live-audio-transcription-example](live-audio-transcription-example/) | Real-time microphone-to-text transcription using NAudio (Windows). | +| [LiveAudioTranscription](LiveAudioTranscription/) | WinUI 3 desktop app for live audio transcription with a graphical interface. | ## Running a sample diff --git a/samples/js/README.md b/samples/js/README.md new file mode 100644 index 00000000..588fb124 --- /dev/null +++ b/samples/js/README.md @@ -0,0 +1,52 @@ +# Foundry Local JavaScript Samples + +These samples demonstrate how to use the Foundry Local JavaScript SDK (`foundry-local-sdk`) with Node.js. + +## Prerequisites + +- [Node.js](https://nodejs.org/) (v18 or later recommended) + +> [!NOTE] +> On Windows, NPU models are not currently available for the JavaScript SDK. These will be enabled in a subsequent release. + +## Samples + +| Sample | Description | +|--------|-------------| +| [native-chat-completions](native-chat-completions/) | Initialize the SDK, download a model, and run non-streaming and streaming chat completions. | +| [audio-transcription-example](audio-transcription-example/) | Transcribe audio files using the Whisper model with streaming output. | +| [chat-and-audio-foundry-local](chat-and-audio-foundry-local/) | Unified sample demonstrating both chat and audio transcription in one application. | +| [electron-chat-application](electron-chat-application/) | Full-featured Electron desktop chat app with voice transcription and model management. | +| [copilot-sdk-foundry-local](copilot-sdk-foundry-local/) | GitHub Copilot SDK integration with Foundry Local for agentic AI workflows. | +| [langchain-integration-example](langchain-integration-example/) | LangChain.js integration for building text generation chains. | +| [tool-calling-foundry-local](tool-calling-foundry-local/) | Tool calling with custom function definitions and streaming responses. | +| [web-server-example](web-server-example/) | Start a local OpenAI-compatible web server and call it with the OpenAI SDK. | +| [tutorial-chat-assistant](tutorial-chat-assistant/) | Build an interactive multi-turn chat assistant (tutorial). | +| [tutorial-document-summarizer](tutorial-document-summarizer/) | Summarize documents with AI (tutorial). | +| [tutorial-tool-calling](tutorial-tool-calling/) | Create a tool-calling assistant (tutorial). | +| [tutorial-voice-to-text](tutorial-voice-to-text/) | Transcribe and summarize audio (tutorial). | + +## Running a Sample + +1. Clone the repository: + + ```bash + git clone https://github.com/microsoft/Foundry-Local.git + cd Foundry-Local/samples/js + ``` + +2. Navigate to a sample and install dependencies: + + ```bash + cd native-chat-completions + npm install foundry-local-sdk + ``` + +3. Run the sample: + + ```bash + node app.js + ``` + +> [!TIP] +> Some samples have additional dependencies (e.g., `openai`, `@langchain/openai`). Check the sample's `package.json` or inline install instructions for the full dependency list. diff --git a/samples/python/README.md b/samples/python/README.md new file mode 100644 index 00000000..fc690898 --- /dev/null +++ b/samples/python/README.md @@ -0,0 +1,47 @@ +# Foundry Local Python Samples + +These samples demonstrate how to use Foundry Local with Python. The Python SDK currently uses the Foundry Local CLI and the OpenAI-compatible REST API. A native in-process SDK (matching JS/C#) is coming soon. + +## Prerequisites + +- [Python](https://www.python.org/) 3.11 or later +- [Foundry Local CLI](../../README.md#installing) installed + +## Samples + +| Sample | Description | +|--------|-------------| +| [native-chat-completions](native-chat-completions/) | Initialize the SDK, start the local service, and run streaming chat completions. | +| [audio-transcription](audio-transcription/) | Transcribe audio files using the Whisper model. | +| [web-server](web-server/) | Start a local OpenAI-compatible web server and call it with the OpenAI Python SDK. | +| [tool-calling](tool-calling/) | Tool calling with custom function definitions (get_weather, calculate). | +| [langchain-integration](langchain-integration/) | LangChain integration for building translation and text generation chains. | +| [tutorial-chat-assistant](tutorial-chat-assistant/) | Build an interactive multi-turn chat assistant (tutorial). | +| [tutorial-document-summarizer](tutorial-document-summarizer/) | Summarize documents with AI (tutorial). | +| [tutorial-tool-calling](tutorial-tool-calling/) | Create a tool-calling assistant (tutorial). | +| [tutorial-voice-to-text](tutorial-voice-to-text/) | Transcribe and summarize audio (tutorial). | + +## Running a Sample + +1. Clone the repository: + + ```bash + git clone https://github.com/microsoft/Foundry-Local.git + cd Foundry-Local/samples/python + ``` + +2. Navigate to a sample and install dependencies: + + ```bash + cd native-chat-completions + pip install foundry-local-sdk + ``` + +3. Run the sample: + + ```bash + python src/app.py + ``` + +> [!TIP] +> Some samples require additional packages (e.g., `openai`, `langchain-openai`). Check for a `requirements.txt` or the import statements at the top of the source file. diff --git a/samples/rust/README.md b/samples/rust/README.md index c5399b3d..9ef4e226 100644 --- a/samples/rust/README.md +++ b/samples/rust/README.md @@ -1,25 +1,42 @@ # Foundry Local Rust Samples -This directory contains samples demonstrating how to use the Foundry Local Rust SDK. +These samples demonstrate how to use the Foundry Local Rust SDK. ## Prerequisites -- Rust 1.70.0 or later +- [Rust](https://www.rust-lang.org/) 1.70.0 or later ## Samples -### [Foundry Local Web Server](./foundry-local-webserver) +| Sample | Description | +|--------|-------------| +| [native-chat-completions](native-chat-completions/) | Non-streaming and streaming chat completions using the native chat client. | +| [audio-transcription-example](audio-transcription-example/) | Audio transcription (non-streaming and streaming) using the Whisper model. | +| [foundry-local-webserver](foundry-local-webserver/) | Start a local OpenAI-compatible web server and call it with a standard HTTP client. | +| [tool-calling-foundry-local](tool-calling-foundry-local/) | Tool calling with streaming responses, multi-turn conversation, and local tool execution. | +| [tutorial-chat-assistant](tutorial-chat-assistant/) | Build an interactive multi-turn chat assistant (tutorial). | +| [tutorial-document-summarizer](tutorial-document-summarizer/) | Summarize documents with AI (tutorial). | +| [tutorial-tool-calling](tutorial-tool-calling/) | Create a tool-calling assistant (tutorial). | +| [tutorial-voice-to-text](tutorial-voice-to-text/) | Transcribe and summarize audio (tutorial). | -Demonstrates how to start a local OpenAI-compatible web server using the SDK, then call it with a standard HTTP client. +## Running a Sample -### [Native Chat Completions](./native-chat-completions) +1. Clone the repository: -Shows both non-streaming and streaming chat completions using the SDK's native chat client. + ```bash + git clone https://github.com/microsoft/Foundry-Local.git + cd Foundry-Local/samples/rust + ``` -### [Tool Calling with Foundry Local](./tool-calling-foundry-local) +2. Run a sample: -Demonstrates tool calling with streaming responses, multi-turn conversation, and local tool execution. + ```bash + cargo run -p native-chat-completions + ``` -### [Audio Transcription](./audio-transcription-example) + Or navigate to a sample directory and run directly: -Demonstrates audio transcription (non-streaming and streaming) using the `whisper` model. \ No newline at end of file + ```bash + cd native-chat-completions + cargo run + ``` \ No newline at end of file From 56b0257b75e4c7180082ed01bcadaa379ab3a098 Mon Sep 17 00:00:00 2001 From: samkemp Date: Wed, 1 Apr 2026 18:07:43 +0100 Subject: [PATCH 2/3] readme init --- README.md | 233 +++++++++++---------------------------- samples/README.md | 20 +--- samples/cs/README.md | 5 - samples/js/README.md | 14 ++- samples/python/README.md | 15 ++- samples/rust/README.md | 12 +- 6 files changed, 96 insertions(+), 203 deletions(-) diff --git a/README.md b/README.md index 060bc2ec..e6b68d57 100644 --- a/README.md +++ b/README.md @@ -23,25 +23,24 @@ User data never leaves the device, responses start immediately with zero network ### Key Features -- **Native SDKs** — Embed AI directly in your app with C#, JavaScript, Python, and Rust SDKs. No separate server process needed. -- **Chat AND Audio in one runtime** — Text generation and speech-to-text (Whisper) through a single SDK. -- **Curated model catalog** — Production-ready models (Phi, Qwen, DeepSeek, Mistral, Whisper) optimized for on-device use across consumer hardware. -- **Automatic hardware acceleration** — GPU and NPU when available, with seamless CPU fallback. Zero hardware detection code needed. -- **Smart model management** — Models download on first use, cache locally, and auto-select the best variant for the user's hardware. -- **OpenAI-compatible API** — Drop-in compatible with OpenAI SDKs for minimal code changes. -- **Small footprint** — Powered by [ONNX Runtime](https://onnxruntime.ai/), a high-performance inference engine with minimal disk and memory requirements. -- **Multi-platform** — Windows, macOS (Apple silicon), and Linux. - -### Supported Tasks - -| Task | Model Aliases | API | -|------|--------------|-----| -| Chat / Text Generation | `phi-3.5-mini`, `qwen2.5-0.5b`, `qwen2.5-coder-0.5b`, etc. | Chat Completions | -| Audio Transcription (Speech-to-Text) | `whisper-tiny` | Audio Transcription | +- **Lightweight runtime** — The runtime handles model acquisition, hardware acceleration, model management, and inference (via [ONNX Runtime](https://onnxruntime.ai/)). The runtime adds approximately 20 MB to your application package, making it practical to embed AI directly into applications where size matters. + +- **Curated model catalog** — A catalog of high-quality models optimized for on-device use across a wide range of consumer hardware. The catalog covers chat completions (for example, GPT OSS, Qwen, DeepSeek, Mistral and Phi) and audio transcription (for example, Whisper). Every model goes through extensive quantization and compression to deliver the best balance of quality and performance. Models are versioned, so your application can pin to a specific version or automatically receive updates. + +- **Automatic hardware acceleration** — Foundry Local detects the available hardware on the user's device and selects the best execution provider. It accelerates inference on GPUs and NPUs when available and falls back to CPU seamlessly — no hardware detection code required. Execution provider and driver updates are managed automatically to ensure optimal performance across different hardware configurations. + +- **Smart model management** — Foundry Local handles the full lifecycle of models on end-user devices. Models download automatically on first use, are cached locally for instant subsequent launches, and the best-performing variant is selected for the user's specific hardware. + +- **OpenAI-compatible API** — Supports OpenAI request and response formats including the [OpenAI Responses API format](https://developers.openai.com/api/reference/resources/responses). If your application already uses the OpenAI SDK, point it to a Foundry Local endpoint with minimal code changes. + +- **Optional local server** — An OpenAI-compatible web server for serving models to multiple processes, integrating with tools like LangChain, or experimenting through REST calls. For most embedded application scenarios, use the SDK directly — it runs inference in-process without the overhead of a separate server. + ## 🚀 Quickstart -The fastest way to get started is with the SDK. Pick your language: +> [!TIP] +> The following shows a quickstart for Python and JavaScript. C# and Rust language bindings are also available. Take a look at the [samples](/samples/) for more details. +
JavaScript @@ -49,6 +48,10 @@ The fastest way to get started is with the SDK. Pick your language: 1. Install the SDK: ```bash + # Windows (recommended for hardware acceleration) + npm install foundry-local-sdk-winml + + # macOS/linux npm install foundry-local-sdk ``` @@ -78,96 +81,55 @@ The fastest way to get started is with the SDK. Pick your language: await model.unload(); ``` -> [!NOTE] -> On Windows, NPU models are not currently available for the JavaScript SDK. These will be enabled in a subsequent release. -
-
-C# + +
+Python 1. Install the SDK: ```bash # Windows (recommended for hardware acceleration) - dotnet add package Microsoft.AI.Foundry.Local.WinML + pip install foundry-local-sdk-winml # macOS/Linux - dotnet add package Microsoft.AI.Foundry.Local - ``` - -2. Run your first chat completion: - - ```csharp - using Microsoft.AI.Foundry.Local; - - var config = new Configuration { AppName = "my-app" }; - await FoundryLocalManager.CreateAsync(config); - var mgr = FoundryLocalManager.Instance; - - // Download and load a model (auto-selects best variant for user's hardware) - var catalog = await mgr.GetCatalogAsync(); - var model = await catalog.GetModelAsync("qwen2.5-0.5b"); - await model.DownloadAsync(); - await model.LoadAsync(); - - // Create a chat client and get a streaming completion - var chatClient = await model.GetChatClientAsync(); - var messages = new List - { - new() { Role = "user", Content = "What is the golden ratio?" } - }; - - await foreach (var chunk in chatClient.CompleteChatStreamingAsync(messages)) - { - Console.Write(chunk.Choices[0].Message.Content); - } - - // Unload the model when done - await model.Unload(); - ``` - -
- -
-Python - -> **Note:** The Python SDK currently uses the Foundry Local CLI and the OpenAI-compatible REST API. A native in-process SDK (matching JS/C#) is coming soon. - -1. Install the SDK: - - ```bash - pip install foundry-local-sdk openai + pip install foundry-local-sdk ``` 2. Run your first chat completion: ```python - import openai - from foundry_local import FoundryLocalManager - - # Initialize manager (starts local service and loads model) - manager = FoundryLocalManager("phi-3.5-mini") - - # Use the OpenAI SDK pointed at your local endpoint - client = openai.OpenAI(base_url=manager.endpoint, api_key=manager.api_key) - - response = client.chat.completions.create( - model=manager.get_model_info("phi-3.5-mini").id, - messages=[{"role": "user", "content": "What is the golden ratio?"}] - ) - - print(response.choices[0].message.content) + from foundry_local_sdk import Configuration, FoundryLocalManager + + config = Configuration(app_name="foundry_local_samples") + FoundryLocalManager.initialize(config) + manager = FoundryLocalManager.instance + + # Select and load a model from the catalog + model = manager.catalog.get_model("qwen2.5-0.5b") + model.download() + model.load() + + # Get a chat client + client = model.get_chat_client() + + # Create and send message + messages = [ + {"role": "user", "content": "What is the golden ratio?"} + ] + response = client.complete_chat(messages): + print(f"Response: {response.choices[0].message.content}") + + model.unload() ```
-> [!TIP] -> For the JavaScript and C# SDKs, you do **not** need the CLI installed. The Python SDK currently requires the CLI — a native in-process SDK is coming soon. -### Audio Transcription (Speech-to-Text) +### 💬 Audio Transcription (Speech-to-Text) -The SDK also supports audio transcription via Whisper models: +The SDK also supports audio transcription via Whisper models (available in JavaScript, C#, Python and Rust): ```javascript import { FoundryLocalManager } from 'foundry-local-sdk'; @@ -235,106 +197,39 @@ foundry model ls > For the full CLI reference and advanced usage, see the [CLI documentation on Microsoft Learn](https://learn.microsoft.com/en-us/azure/foundry-local/reference/reference-cli). -## 📥 Installing - -Foundry Local is available for Windows, macOS (Apple silicon), and Linux. - -### Windows -```bash -winget install Microsoft.FoundryLocal -``` - -
-Manual installation - -On [the releases page](https://github.com/microsoft/Foundry-Local/releases), select a release and expand the Artifacts list. Copy the artifact URI and use the following PowerShell steps: - -```powershell -$releaseUri = "https://github.com/microsoft/Foundry-Local/releases/download/v0.3.9267/FoundryLocal-x64-0.3.9267.43123.msix" -Invoke-WebRequest -Method Get -Uri $releaseUri -OutFile .\FoundryLocal.msix -$crtUri = "https://aka.ms/Microsoft.VCLibs.x64.14.00.Desktop.appx" -Invoke-WebRequest -Method Get -Uri $crtUri -OutFile .\VcLibs.appx - -Add-AppxPackage .\FoundryLocal.msix -DependencyPath .\VcLibs.appx -``` - -Replace `x64` with `arm64` as needed. - -
- -### macOS - -```bash -brew install microsoft/foundrylocal/foundrylocal -``` - -
-Manual installation - -1. Download the latest release from [the releases page](https://github.com/microsoft/Foundry-Local/releases). -2. Unzip the downloaded file. -3. Run the installer: - - ```bash - ./install-foundry.command - ``` - -
- -### Upgrading - -```bash -# Windows -winget upgrade --id Microsoft.FoundryLocal +## Reporting Issues -# macOS (Homebrew) -brew upgrade foundrylocal -``` +Please report issues or suggest improvements in the [GitHub Issues](https://github.com/microsoft/Foundry-Local/issues) section. -### Uninstalling +## 🎓 Learn More -
-Uninstall instructions +- [Foundry Local Documentation](https://learn.microsoft.com/en-us/azure/foundry-local/) on Microsoft Learn +- [Foundry Local Lab](https://github.com/Microsoft-foundry/foundry-local-lab) — Hands-on exercises and step-by-step instructions -**Windows:** +## ❔ Frequently asked questions -```bash -winget uninstall Microsoft.FoundryLocal -``` +### Is Foundry Local a web server and CLI tool? -Or navigate to **Settings > Apps > Apps & features**, find "Foundry Local", and select **Uninstall**. +No. Foundry Local is an **end-to-end local AI solution** that your application ships with. It handles model acquisition, hardware acceleration, and inference inside your app process through the SDK. The optional web server and CLI are available for development workflows, but the core product is the local AI runtime and SDK that you integrate directly into your application. -**macOS (Homebrew):** +### Why doesn't Foundry Local support every available model? -```bash -brew rm foundrylocal -brew untap microsoft/foundrylocal -brew cleanup --scrub -``` +Foundry Local is designed for shipping production applications, not for general-purpose model experimentation. The model catalog is intentionally curated to include models that are optimized for specific application scenarios, tested across a range of consumer hardware, and small enough to distribute to end users. This approach ensures that every model in the catalog delivers reliable performance when embedded in your application — rather than offering a broad selection of models with unpredictable on-device behavior. -**macOS (manual install):** +### Can Foundry Local run on a server? -```bash -uninstall-foundry -``` +Foundry Local is optimized for hardware-constrained devices where a single user accesses the model at a time. While you can technically install and run it on server hardware, it isn't designed as a server inference stack. -
+Server-oriented runtimes like [vLLM](https://docs.vllm.ai/en/latest/) or [Triton Inference Server](https://github.com/triton-inference-server/server) are built for multi-user scenarios — they handle concurrent request queuing, continuous batching, and efficient GPU sharing across many simultaneous clients. Foundry Local doesn't provide these capabilities. Instead, it focuses on lightweight, single-user inference with automatic hardware detection, KV-cache management, and model lifecycle handling that make sense for client applications. -> [!TIP] -> For installation troubleshooting, see the [troubleshooting guide](https://learn.microsoft.com/azure/ai-foundry/foundry-local/reference/reference-best-practice?view=foundry-classic) or [file an issue](https://github.com/microsoft/foundry-local/issues). +If you need to serve models to multiple concurrent users, use a dedicated server inference framework. Use Foundry Local when the model runs on the end user's own device. -## Reporting Issues -We're actively looking for feedback during this preview phase. Please report issues or suggest improvements in the [GitHub Issues](https://github.com/microsoft/Foundry-Local/issues) section. +### What platforms are supported? -## 🎓 Learn More +Foundry Local supports Windows, macOS (Apple silicon), and Linux. -- [Foundry Local Documentation](https://learn.microsoft.com/en-us/azure/foundry-local/) on Microsoft Learn -- [What is Foundry Local?](https://learn.microsoft.com/en-us/azure/foundry-local/what-is-foundry-local) — Architecture and concepts -- [Tutorials](https://learn.microsoft.com/en-us/azure/foundry-local/) — Chat assistant, document summarizer, tool calling, voice-to-text -- [Troubleshooting guide](https://learn.microsoft.com/azure/ai-foundry/foundry-local/reference/reference-best-practice?view=foundry-classic) -- [Foundry Local Lab](https://github.com/Microsoft-foundry/foundry-local-lab) — Hands-on exercises and step-by-step instructions ## ⚖️ License diff --git a/samples/README.md b/samples/README.md index 0dbd1329..93f3bd57 100644 --- a/samples/README.md +++ b/samples/README.md @@ -2,7 +2,7 @@ Explore complete working examples that demonstrate how to use Foundry Local — an end-to-end local AI solution that runs entirely on-device. These samples cover chat completions, audio transcription, tool calling, LangChain integration, and more. -> **New to Foundry Local?** Check out the [main README](../README.md) for an overview and quickstart, or visit the [Foundry Local documentation](https://learn.microsoft.com/en-us/azure/foundry-local/) on Microsoft Learn. +> **New to Foundry Local?** Check out the [main README](../README.md) for an overview and quickstart, or visit the [Foundry Local documentation](https://learn.microsoft.com/azure/foundry-local/) on Microsoft Learn. ## Samples by Language @@ -12,21 +12,3 @@ Explore complete working examples that demonstrate how to use Foundry Local — | [**JavaScript**](js/) | 12 | Node.js SDK samples including native chat, audio transcription, Electron desktop app, Copilot SDK integration, LangChain, tool calling, web server, and tutorials. | | [**Python**](python/) | 9 | Python samples using the OpenAI-compatible API, including chat, audio transcription, LangChain integration, tool calling, web server, and tutorials. | | [**Rust**](rust/) | 8 | Rust SDK samples including native chat, audio transcription, tool calling, web server, and tutorials. | - -## Common Patterns - -Most samples follow a similar pattern: - -1. **Initialize** the Foundry Local manager -2. **Download** a model (auto-selects the best variant for your hardware) -3. **Load** the model into memory -4. **Run inference** (chat completions or audio transcription) -5. **Unload** the model when done - -## Models Used - -| Model | Task | Used In | -|-------|------|---------| -| `qwen2.5-0.5b` | Chat / Text Generation | Most chat and tool-calling samples | -| `phi-3.5-mini` | Chat / Text Generation | Some Python samples | -| `whisper-tiny` | Audio Transcription | All audio/voice samples | diff --git a/samples/cs/README.md b/samples/cs/README.md index 46f3c0b6..6f1abfbf 100644 --- a/samples/cs/README.md +++ b/samples/cs/README.md @@ -38,8 +38,3 @@ Both packages provide the same APIs, so the same source code works on all platfo dotnet run ``` - The unified project file automatically selects the correct SDK package for your platform. - -> [!TIP] -> On Windows, we recommend using the WinML package (selected automatically) for optimal performance. Your users benefit from a wider range of hardware acceleration options and a smaller application package size. - diff --git a/samples/js/README.md b/samples/js/README.md index 588fb124..e211b35d 100644 --- a/samples/js/README.md +++ b/samples/js/README.md @@ -1,4 +1,4 @@ -# Foundry Local JavaScript Samples +# 🚀 Foundry Local JavaScript Samples These samples demonstrate how to use the Foundry Local JavaScript SDK (`foundry-local-sdk`) with Node.js. @@ -6,9 +6,6 @@ These samples demonstrate how to use the Foundry Local JavaScript SDK (`foundry- - [Node.js](https://nodejs.org/) (v18 or later recommended) -> [!NOTE] -> On Windows, NPU models are not currently available for the JavaScript SDK. These will be enabled in a subsequent release. - ## Samples | Sample | Description | @@ -37,6 +34,15 @@ These samples demonstrate how to use the Foundry Local JavaScript SDK (`foundry- 2. Navigate to a sample and install dependencies: + If you developing or shipping on **Windows**, use the Windows version - it has the same API surface area but integrates with WinML for a greater breadth of hardware acceleration: + + ```bash + cd native-chat-completions + npm install foundry-local-sdk-winl + ``` + + For **macOS and Linux**, use the cross-platform build: + ```bash cd native-chat-completions npm install foundry-local-sdk diff --git a/samples/python/README.md b/samples/python/README.md index fc690898..67bdf2c4 100644 --- a/samples/python/README.md +++ b/samples/python/README.md @@ -1,11 +1,10 @@ -# Foundry Local Python Samples +# 🚀 Foundry Local Python Samples -These samples demonstrate how to use Foundry Local with Python. The Python SDK currently uses the Foundry Local CLI and the OpenAI-compatible REST API. A native in-process SDK (matching JS/C#) is coming soon. +These samples demonstrate how to use Foundry Local with Python. ## Prerequisites - [Python](https://www.python.org/) 3.11 or later -- [Foundry Local CLI](../../README.md#installing) installed ## Samples @@ -32,10 +31,20 @@ These samples demonstrate how to use Foundry Local with Python. The Python SDK c 2. Navigate to a sample and install dependencies: + If you developing or shipping on **Windows**, use the Windows version - it has the same API surface area but integrates with WinML for a greater breadth of hardware acceleration: + + ```bash + cd native-chat-completions + pip install foundry-local-sdk-winl + ``` + + For **macOS and Linux**, use the cross-platform build: + ```bash cd native-chat-completions pip install foundry-local-sdk ``` + 3. Run the sample: diff --git a/samples/rust/README.md b/samples/rust/README.md index 9ef4e226..3e476f4b 100644 --- a/samples/rust/README.md +++ b/samples/rust/README.md @@ -1,6 +1,6 @@ -# Foundry Local Rust Samples +# 🚀 Foundry Local Rust Samples -These samples demonstrate how to use the Foundry Local Rust SDK. +These samples demonstrate how to use the Rust binding for Foundry Local. ## Prerequisites @@ -39,4 +39,10 @@ These samples demonstrate how to use the Foundry Local Rust SDK. ```bash cd native-chat-completions cargo run - ``` \ No newline at end of file + ``` +>[!NOTE] +> If you are developing or shipping on **Windows**, you should update the sample's `Cargo.toml` file to include the WinML feature - this integrates with WinML to provide a greater breadth of hardware acceleration support. +> +> ```toml +> foundry-local-sdk = { features = ["winml"] } +> ``` \ No newline at end of file From cfd64ccd059507511fe6c2b76705951144fed03d Mon Sep 17 00:00:00 2001 From: samkemp Date: Fri, 3 Apr 2026 16:15:06 +0100 Subject: [PATCH 3/3] fix typo on py quickstart --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e6b68d57..46749928 100644 --- a/README.md +++ b/README.md @@ -118,7 +118,7 @@ User data never leaves the device, responses start immediately with zero network messages = [ {"role": "user", "content": "What is the golden ratio?"} ] - response = client.complete_chat(messages): + response = client.complete_chat(messages) print(f"Response: {response.choices[0].message.content}") model.unload()