In Part 1 you used the Foundry Local CLI to run models interactively. In Part 2 you explored the full SDK API surface. Now you will learn to integrate Foundry Local into your applications using the SDK and the OpenAI-compatible API.
Foundry Local provides SDKs for three languages. Choose the one you are most comfortable with - the concepts are identical across all three.
By the end of this lab you will be able to:
- Install the Foundry Local SDK for your language (Python, JavaScript, or C#)
- Initialise
FoundryLocalManagerto start the service, check the cache, download, and load a model - Connect to the local model using the OpenAI SDK
- Send chat completions and handle streaming responses
- Understand the dynamic port architecture
Complete Part 1: Getting Started with Foundry Local and Part 2: Foundry Local SDK Deep Dive first.
Install one of the following language runtimes:
- Python 3.9+ - python.org/downloads
- Node.js 18+ - nodejs.org
- .NET 9.0+ - dot.net/download
The Foundry Local SDK manages the control plane (starting the service, downloading models), whilst the OpenAI SDK handles the data plane (sending prompts, receiving completions).
🐍 Python
cd python
python -m venv venv
# Activate the virtual environment:
# Windows (PowerShell):
venv\Scripts\Activate.ps1
# Windows (Command Prompt):
venv\Scripts\activate.bat
# macOS:
source venv/bin/activate
pip install -r requirements.txtThe requirements.txt installs:
foundry-local-sdk- The Foundry Local SDK (imported asfoundry_local)openai- The OpenAI Python SDKagent-framework- Microsoft Agent Framework (used in later parts)
📘 JavaScript
cd javascript
npm installThe package.json installs:
foundry-local-sdk- The Foundry Local SDKopenai- The OpenAI Node.js SDK
💜 C#
cd csharp
dotnet restore
dotnet buildThe csharp.csproj uses:
Microsoft.AI.Foundry.Local- The Foundry Local SDK (NuGet)OpenAI- The OpenAI C# SDK (NuGet)
Project structure: The C# project uses a command-line router in
Program.csthat dispatches to separate example files. Rundotnet run chat(or justdotnet run) for this part. Other parts usedotnet run rag,dotnet run agent, anddotnet run multi.
Open the basic chat example for your language and examine the code. Each script follows the same three-step pattern:
- Start the service -
FoundryLocalManagerstarts the Foundry Local runtime - Download and load the model - check the cache, download if needed, then load into memory
- Create an OpenAI client - connect to the local endpoint and send a streaming chat completion
🐍 Python - python/foundry-local.py
import sys
import openai
from foundry_local import FoundryLocalManager
alias = "phi-3.5-mini"
# Step 1: Create a FoundryLocalManager and start the service
print("Starting Foundry Local service...")
manager = FoundryLocalManager()
manager.start_service()
# Step 2: Check if the model is already downloaded
cached = manager.list_cached_models()
catalog_info = manager.get_model_info(alias)
is_cached = any(m.id == catalog_info.id for m in cached) if catalog_info else False
if is_cached:
print(f"Model already downloaded: {alias}")
else:
print(f"Downloading model: {alias} (this may take several minutes)...")
manager.download_model(alias)
print(f"Download complete: {alias}")
# Step 3: Load the model into memory
print(f"Loading model: {alias}...")
manager.load_model(alias)
# Create an OpenAI client pointing to the LOCAL Foundry service
client = openai.OpenAI(
base_url=manager.endpoint, # Dynamic port - never hardcode!
api_key=manager.api_key
)
# Generate a streaming chat completion
stream = client.chat.completions.create(
model=manager.get_model_info(alias).id,
messages=[{"role": "user", "content": "What is the golden ratio?"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
print()Run it:
python foundry-local.py📘 JavaScript - javascript/foundry-local.mjs
import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";
const alias = "phi-3.5-mini";
// Step 1: Start the Foundry Local service
console.log("Starting Foundry Local service...");
FoundryLocalManager.create({ appName: "FoundryLocalWorkshop" });
const manager = FoundryLocalManager.instance;
await manager.startWebService();
// Step 2: Check if the model is already downloaded
const catalog = manager.catalog;
const model = await catalog.getModel(alias);
if (model.isCached) {
console.log(`Model already downloaded: ${alias}`);
} else {
console.log(`Downloading model: ${alias} (this may take several minutes)...`);
await model.download();
console.log(`Download complete: ${alias}`);
}
// Step 3: Load the model into memory
console.log(`Loading model: ${alias}...`);
await model.load();
console.log(`Model loaded: ${model.id}`);
// Create an OpenAI client pointing to the LOCAL Foundry service
const client = new OpenAI({
baseURL: manager.urls[0] + "/v1", // Dynamic port - never hardcode!
apiKey: "foundry-local",
});
// Generate a streaming chat completion
const stream = await client.chat.completions.create({
model: model.id,
messages: [{ role: "user", content: "What is the golden ratio?" }],
stream: true,
});
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
console.log();Run it:
node foundry-local.mjs💜 C# - csharp/BasicChat.cs
using Microsoft.AI.Foundry.Local;
using Microsoft.Extensions.Logging.Abstractions;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel;
var alias = "phi-3.5-mini";
// Step 1: Start the Foundry Local service
Console.WriteLine("Starting Foundry Local service...");
await FoundryLocalManager.CreateAsync(
new Configuration
{
AppName = "FoundryLocalSamples",
Web = new Configuration.WebService { Urls = "http://127.0.0.1:0" }
}, NullLogger.Instance, default);
var manager = FoundryLocalManager.Instance;
await manager.StartWebServiceAsync(default);
// Step 2: Get the model from the catalog
var catalog = await manager.GetCatalogAsync(default);
var model = await catalog.GetModelAsync(alias, default);
// Step 3: Check if the model is already downloaded
var isCached = await model.IsCachedAsync(default);
if (isCached)
{
Console.WriteLine($"Model already downloaded: {alias}");
}
else
{
Console.WriteLine($"Downloading model: {alias} (this may take several minutes)...");
await model.DownloadAsync(null, default);
Console.WriteLine($"Download complete: {alias}");
}
// Step 4: Load the model into memory
Console.WriteLine($"Loading model: {alias}...");
await model.LoadAsync(default);
Console.WriteLine($"Loaded model: {model.Id}");
Console.WriteLine($"Endpoint: {manager.Urls[0]}");
// Create OpenAI client pointing to the LOCAL Foundry service
var key = new ApiKeyCredential("foundry-local");
var client = new OpenAIClient(key, new OpenAIClientOptions
{
Endpoint = new Uri(manager.Urls[0] + "/v1") // Dynamic port - never hardcode!
});
var chatClient = client.GetChatClient(model.Id);
// Stream a chat completion
var completionUpdates = chatClient.CompleteChatStreaming("What is the golden ratio?");
foreach (var update in completionUpdates)
{
if (update.ContentUpdate.Count > 0)
{
Console.Write(update.ContentUpdate[0].Text);
}
}
Console.WriteLine();Run it:
dotnet run chatOnce your basic example runs, try modifying the code:
- Change the user message - try different questions
- Add a system prompt - give the model a persona
- Turn off streaming - set
stream=Falseand print the full response at once - Try a different model - change the alias from
phi-3.5-minito another model fromfoundry model list
🐍 Python
# Add a system prompt - give the model a persona:
stream = client.chat.completions.create(
model=manager.get_model_info(alias).id,
messages=[
{"role": "system", "content": "You are a pirate. Answer everything in pirate speak."},
{"role": "user", "content": "What is the golden ratio?"}
],
stream=True,
)
# Or turn off streaming:
response = client.chat.completions.create(
model=manager.get_model_info(alias).id,
messages=[{"role": "user", "content": "What is the golden ratio?"}],
stream=False,
)
print(response.choices[0].message.content)📘 JavaScript
// Add a system prompt - give the model a persona:
const stream = await client.chat.completions.create({
model: modelInfo.id,
messages: [
{ role: "system", content: "You are a pirate. Answer everything in pirate speak." },
{ role: "user", content: "What is the golden ratio?" },
],
stream: true,
});
// Or turn off streaming:
const response = await client.chat.completions.create({
model: modelInfo.id,
messages: [{ role: "user", content: "What is the golden ratio?" }],
stream: false,
});
console.log(response.choices[0].message.content);💜 C#
// Add a system prompt - give the model a persona:
var completionUpdates = chatClient.CompleteChatStreaming(
new ChatMessage[]
{
new SystemChatMessage("You are a pirate. Answer everything in pirate speak."),
new UserChatMessage("What is the golden ratio?")
}
);
// Or turn off streaming:
var response = chatClient.CompleteChat("What is the golden ratio?");
Console.WriteLine(response.Value.Content[0].Text);🐍 Python SDK Methods
| Method | Purpose |
|---|---|
FoundryLocalManager() |
Create manager instance |
manager.start_service() |
Start the Foundry Local service |
manager.list_cached_models() |
List models downloaded on your device |
manager.get_model_info(alias) |
Get model ID and metadata |
manager.download_model(alias, progress_callback=fn) |
Download a model with optional progress callback |
manager.load_model(alias) |
Load a model into memory |
manager.endpoint |
Get the dynamic endpoint URL |
manager.api_key |
Get the API key (placeholder for local) |
📘 JavaScript SDK Methods
| Method | Purpose |
|---|---|
FoundryLocalManager.create({ appName }) |
Create manager instance |
FoundryLocalManager.instance |
Access the singleton manager |
await manager.startWebService() |
Start the Foundry Local service |
await manager.catalog.getModel(alias) |
Get a model from the catalogue |
model.isCached |
Check if the model is already downloaded |
await model.download() |
Download a model |
await model.load() |
Load a model into memory |
model.id |
Get the model ID for OpenAI API calls |
manager.urls[0] + "/v1" |
Get the dynamic endpoint URL |
"foundry-local" |
API key (placeholder for local) |
💜 C# SDK Methods
| Method | Purpose |
|---|---|
FoundryLocalManager.CreateAsync(config) |
Create and initialise the manager |
manager.StartWebServiceAsync() |
Start the Foundry Local web service |
manager.GetCatalogAsync() |
Get the model catalog |
catalog.ListModelsAsync() |
List all available models |
catalog.GetModelAsync(alias) |
Get a specific model by alias |
model.IsCachedAsync() |
Check if a model is downloaded |
model.DownloadAsync() |
Download a model |
model.LoadAsync() |
Load a model into memory |
manager.Urls[0] |
Get the dynamic endpoint URL |
new ApiKeyCredential("foundry-local") |
API key credential for local |
In Exercises 2 and 3 you used the OpenAI SDK for chat completions. The JavaScript and C# SDKs also provide a native ChatClient that eliminates the need for the OpenAI SDK entirely.
📘 JavaScript - model.createChatClient()
import { FoundryLocalManager } from "foundry-local-sdk";
const alias = "phi-3.5-mini";
FoundryLocalManager.create({ appName: "ChatClientDemo" });
const manager = FoundryLocalManager.instance;
await manager.startWebService();
const model = await manager.catalog.getModel(alias);
if (!model.isCached) await model.download();
await model.load();
// No OpenAI import needed — get a client directly from the model
const chatClient = model.createChatClient();
// Non-streaming completion
const response = await chatClient.completeChat([
{ role: "system", content: "You are a pirate. Answer everything in pirate speak." },
{ role: "user", content: "What is the golden ratio?" }
]);
console.log(response.choices[0].message.content);
// Streaming completion (uses a callback pattern)
await chatClient.completeStreamingChat(
[{ role: "user", content: "What is the golden ratio?" }],
(chunk) => {
if (chunk.choices?.[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
);
console.log();Note: The ChatClient's
completeStreamingChat()uses a callback pattern, not an async iterator. Pass a function as the second argument.
💜 C# - model.GetChatClientAsync()
var catalog = await manager.GetCatalogAsync(default);
var model = await catalog.GetModelAsync("phi-3.5-mini", default);
if (!await model.IsCachedAsync(default))
await model.DownloadAsync(null, default);
await model.LoadAsync(default);
// No OpenAI NuGet needed — get a client directly from the model
var chatClient = await model.GetChatClientAsync(default);
// Use it like a standard OpenAI ChatClient
var response = chatClient.CompleteChat("What is the golden ratio?");
Console.WriteLine(response.Value.Content[0].Text);When to use which:
Approach Best for OpenAI SDK Full parameter control, production apps, existing OpenAI code Native ChatClient Quick prototyping, fewer dependencies, simpler setup
| Concept | What You Learned |
|---|---|
| Control plane | The Foundry Local SDK handles starting the service and loading models |
| Data plane | The OpenAI SDK handles chat completions and streaming |
| Dynamic ports | Always use the SDK to discover the endpoint; never hardcode URLs |
| Cross-language | The same code pattern works across Python, JavaScript, and C# |
| OpenAI compatibility | Full OpenAI API compatibility means existing OpenAI code works with minimal changes |
| Native ChatClient | createChatClient() (JS) / GetChatClientAsync() (C#) provides an alternative to the OpenAI SDK |
Continue to Part 4: Building a RAG Application to learn how to build a Retrieval-Augmented Generation pipeline running entirely on your device.
