Part 3: Using the Foundry Local SDK with OpenAI

Overview

In Part 1 you used the Foundry Local CLI to run models interactively. In Part 2 you explored the full SDK API surface. Now you will learn to integrate Foundry Local into your applications using the SDK and the OpenAI-compatible API.

Foundry Local provides SDKs for three languages. Choose the one you are most comfortable with - the concepts are identical across all three.

Learning Objectives

By the end of this lab you will be able to:

Install the Foundry Local SDK for your language (Python, JavaScript, or C#)
Initialise FoundryLocalManager to start the service, check the cache, download, and load a model
Connect to the local model using the OpenAI SDK
Send chat completions and handle streaming responses
Understand the dynamic port architecture

Prerequisites

Complete Part 1: Getting Started with Foundry Local and Part 2: Foundry Local SDK Deep Dive first.

Install one of the following language runtimes:

Python 3.9+ - python.org/downloads
Node.js 18+ - nodejs.org
.NET 9.0+ - dot.net/download

Concept: How the SDK Works

The Foundry Local SDK manages the control plane (starting the service, downloading models), whilst the OpenAI SDK handles the data plane (sending prompts, receiving completions).

Lab Exercises

Exercise 1: Setup Your Environment

🐍 Python

cd python
python -m venv venv

# Activate the virtual environment:
# Windows (PowerShell):
venv\Scripts\Activate.ps1
# Windows (Command Prompt):
venv\Scripts\activate.bat
# macOS:
source venv/bin/activate

pip install -r requirements.txt

The requirements.txt installs:

foundry-local-sdk - The Foundry Local SDK (imported as foundry_local)
openai - The OpenAI Python SDK
agent-framework - Microsoft Agent Framework (used in later parts)

📘 JavaScript

cd javascript
npm install

The package.json installs:

foundry-local-sdk - The Foundry Local SDK
openai - The OpenAI Node.js SDK

💜 C#

cd csharp
dotnet restore
dotnet build

The csharp.csproj uses:

Microsoft.AI.Foundry.Local - The Foundry Local SDK (NuGet)
OpenAI - The OpenAI C# SDK (NuGet)

Project structure: The C# project uses a command-line router in Program.cs that dispatches to separate example files. Run dotnet run chat (or just dotnet run) for this part. Other parts use dotnet run rag, dotnet run agent, and dotnet run multi.

Exercise 2: Basic Chat Completion

Open the basic chat example for your language and examine the code. Each script follows the same three-step pattern:

Start the service - FoundryLocalManager starts the Foundry Local runtime
Download and load the model - check the cache, download if needed, then load into memory
Create an OpenAI client - connect to the local endpoint and send a streaming chat completion

🐍 Python - python/foundry-local.py

import sys
import openai
from foundry_local import FoundryLocalManager

alias = "phi-3.5-mini"

# Step 1: Create a FoundryLocalManager and start the service
print("Starting Foundry Local service...")
manager = FoundryLocalManager()
manager.start_service()

# Step 2: Check if the model is already downloaded
cached = manager.list_cached_models()
catalog_info = manager.get_model_info(alias)
is_cached = any(m.id == catalog_info.id for m in cached) if catalog_info else False

if is_cached:
    print(f"Model already downloaded: {alias}")
else:
    print(f"Downloading model: {alias} (this may take several minutes)...")
    manager.download_model(alias)
    print(f"Download complete: {alias}")

# Step 3: Load the model into memory
print(f"Loading model: {alias}...")
manager.load_model(alias)

# Create an OpenAI client pointing to the LOCAL Foundry service
client = openai.OpenAI(
    base_url=manager.endpoint,   # Dynamic port - never hardcode!
    api_key=manager.api_key
)

# Generate a streaming chat completion
stream = client.chat.completions.create(
    model=manager.get_model_info(alias).id,
    messages=[{"role": "user", "content": "What is the golden ratio?"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()

Run it:

python foundry-local.py

📘 JavaScript - javascript/foundry-local.mjs

import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";

const alias = "phi-3.5-mini";

// Step 1: Start the Foundry Local service
console.log("Starting Foundry Local service...");
FoundryLocalManager.create({ appName: "FoundryLocalWorkshop" });
const manager = FoundryLocalManager.instance;
await manager.startWebService();

// Step 2: Check if the model is already downloaded
const catalog = manager.catalog;
const model = await catalog.getModel(alias);

if (model.isCached) {
  console.log(`Model already downloaded: ${alias}`);
} else {
  console.log(`Downloading model: ${alias} (this may take several minutes)...`);
  await model.download();
  console.log(`Download complete: ${alias}`);
}

// Step 3: Load the model into memory
console.log(`Loading model: ${alias}...`);
await model.load();
console.log(`Model loaded: ${model.id}`);

// Create an OpenAI client pointing to the LOCAL Foundry service
const client = new OpenAI({
  baseURL: manager.urls[0] + "/v1",   // Dynamic port - never hardcode!
  apiKey: "foundry-local",
});

// Generate a streaming chat completion
const stream = await client.chat.completions.create({
  model: model.id,
  messages: [{ role: "user", content: "What is the golden ratio?" }],
  stream: true,
});

for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}
console.log();

Run it:

node foundry-local.mjs

💜 C# - csharp/BasicChat.cs

using Microsoft.AI.Foundry.Local;
using Microsoft.Extensions.Logging.Abstractions;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel;

var alias = "phi-3.5-mini";

// Step 1: Start the Foundry Local service
Console.WriteLine("Starting Foundry Local service...");
await FoundryLocalManager.CreateAsync(
    new Configuration
    {
        AppName = "FoundryLocalSamples",
        Web = new Configuration.WebService { Urls = "http://127.0.0.1:0" }
    }, NullLogger.Instance, default);
var manager = FoundryLocalManager.Instance;
await manager.StartWebServiceAsync(default);

// Step 2: Get the model from the catalog
var catalog = await manager.GetCatalogAsync(default);
var model = await catalog.GetModelAsync(alias, default);

// Step 3: Check if the model is already downloaded
var isCached = await model.IsCachedAsync(default);

if (isCached)
{
    Console.WriteLine($"Model already downloaded: {alias}");
}
else
{
    Console.WriteLine($"Downloading model: {alias} (this may take several minutes)...");
    await model.DownloadAsync(null, default);
    Console.WriteLine($"Download complete: {alias}");
}

// Step 4: Load the model into memory
Console.WriteLine($"Loading model: {alias}...");
await model.LoadAsync(default);
Console.WriteLine($"Loaded model: {model.Id}");
Console.WriteLine($"Endpoint: {manager.Urls[0]}");

// Create OpenAI client pointing to the LOCAL Foundry service
var key = new ApiKeyCredential("foundry-local");
var client = new OpenAIClient(key, new OpenAIClientOptions
{
    Endpoint = new Uri(manager.Urls[0] + "/v1")  // Dynamic port - never hardcode!
});

var chatClient = client.GetChatClient(model.Id);

// Stream a chat completion
var completionUpdates = chatClient.CompleteChatStreaming("What is the golden ratio?");

foreach (var update in completionUpdates)
{
    if (update.ContentUpdate.Count > 0)
    {
        Console.Write(update.ContentUpdate[0].Text);
    }
}
Console.WriteLine();

Run it:

dotnet run chat

Exercise 3: Experiment with Prompts

Once your basic example runs, try modifying the code:

Change the user message - try different questions
Add a system prompt - give the model a persona
Turn off streaming - set stream=False and print the full response at once
Try a different model - change the alias from phi-3.5-mini to another model from foundry model list

🐍 Python

# Add a system prompt - give the model a persona:
stream = client.chat.completions.create(
    model=manager.get_model_info(alias).id,
    messages=[
        {"role": "system", "content": "You are a pirate. Answer everything in pirate speak."},
        {"role": "user", "content": "What is the golden ratio?"}
    ],
    stream=True,
)

# Or turn off streaming:
response = client.chat.completions.create(
    model=manager.get_model_info(alias).id,
    messages=[{"role": "user", "content": "What is the golden ratio?"}],
    stream=False,
)
print(response.choices[0].message.content)

📘 JavaScript

// Add a system prompt - give the model a persona:
const stream = await client.chat.completions.create({
  model: modelInfo.id,
  messages: [
    { role: "system", content: "You are a pirate. Answer everything in pirate speak." },
    { role: "user", content: "What is the golden ratio?" },
  ],
  stream: true,
});

// Or turn off streaming:
const response = await client.chat.completions.create({
  model: modelInfo.id,
  messages: [{ role: "user", content: "What is the golden ratio?" }],
  stream: false,
});
console.log(response.choices[0].message.content);

💜 C#

// Add a system prompt - give the model a persona:
var completionUpdates = chatClient.CompleteChatStreaming(
    new ChatMessage[]
    {
        new SystemChatMessage("You are a pirate. Answer everything in pirate speak."),
        new UserChatMessage("What is the golden ratio?")
    }
);

// Or turn off streaming:
var response = chatClient.CompleteChat("What is the golden ratio?");
Console.WriteLine(response.Value.Content[0].Text);

SDK Method Reference

🐍 Python SDK Methods

Method	Purpose
`FoundryLocalManager()`	Create manager instance
`manager.start_service()`	Start the Foundry Local service
`manager.list_cached_models()`	List models downloaded on your device
`manager.get_model_info(alias)`	Get model ID and metadata
`manager.download_model(alias, progress_callback=fn)`	Download a model with optional progress callback
`manager.load_model(alias)`	Load a model into memory
`manager.endpoint`	Get the dynamic endpoint URL
`manager.api_key`	Get the API key (placeholder for local)

📘 JavaScript SDK Methods

Method	Purpose
`FoundryLocalManager.create({ appName })`	Create manager instance
`FoundryLocalManager.instance`	Access the singleton manager
`await manager.startWebService()`	Start the Foundry Local service
`await manager.catalog.getModel(alias)`	Get a model from the catalogue
`model.isCached`	Check if the model is already downloaded
`await model.download()`	Download a model
`await model.load()`	Load a model into memory
`model.id`	Get the model ID for OpenAI API calls
`manager.urls[0] + "/v1"`	Get the dynamic endpoint URL
`"foundry-local"`	API key (placeholder for local)

💜 C# SDK Methods

Method	Purpose
`FoundryLocalManager.CreateAsync(config)`	Create and initialise the manager
`manager.StartWebServiceAsync()`	Start the Foundry Local web service
`manager.GetCatalogAsync()`	Get the model catalog
`catalog.ListModelsAsync()`	List all available models
`catalog.GetModelAsync(alias)`	Get a specific model by alias
`model.IsCachedAsync()`	Check if a model is downloaded
`model.DownloadAsync()`	Download a model
`model.LoadAsync()`	Load a model into memory
`manager.Urls[0]`	Get the dynamic endpoint URL
`new ApiKeyCredential("foundry-local")`	API key credential for local

Exercise 4: Using the Native ChatClient (Alternative to OpenAI SDK)

In Exercises 2 and 3 you used the OpenAI SDK for chat completions. The JavaScript and C# SDKs also provide a native ChatClient that eliminates the need for the OpenAI SDK entirely.

📘 JavaScript - model.createChatClient()

import { FoundryLocalManager } from "foundry-local-sdk";

const alias = "phi-3.5-mini";

FoundryLocalManager.create({ appName: "ChatClientDemo" });
const manager = FoundryLocalManager.instance;
await manager.startWebService();

const model = await manager.catalog.getModel(alias);
if (!model.isCached) await model.download();
await model.load();

// No OpenAI import needed — get a client directly from the model
const chatClient = model.createChatClient();

// Non-streaming completion
const response = await chatClient.completeChat([
  { role: "system", content: "You are a pirate. Answer everything in pirate speak." },
  { role: "user", content: "What is the golden ratio?" }
]);
console.log(response.choices[0].message.content);

// Streaming completion (uses a callback pattern)
await chatClient.completeStreamingChat(
  [{ role: "user", content: "What is the golden ratio?" }],
  (chunk) => {
    if (chunk.choices?.[0]?.delta?.content) {
      process.stdout.write(chunk.choices[0].delta.content);
    }
  }
);
console.log();

Note: The ChatClient's completeStreamingChat() uses a callback pattern, not an async iterator. Pass a function as the second argument.

💜 C# - model.GetChatClientAsync()

var catalog = await manager.GetCatalogAsync(default);
var model = await catalog.GetModelAsync("phi-3.5-mini", default);
if (!await model.IsCachedAsync(default))
    await model.DownloadAsync(null, default);
await model.LoadAsync(default);

// No OpenAI NuGet needed — get a client directly from the model
var chatClient = await model.GetChatClientAsync(default);

// Use it like a standard OpenAI ChatClient
var response = chatClient.CompleteChat("What is the golden ratio?");
Console.WriteLine(response.Value.Content[0].Text);

When to use which:

Approach Best for

OpenAI SDK Full parameter control, production apps, existing OpenAI code

Native ChatClient Quick prototyping, fewer dependencies, simpler setup

Approach	Best for
OpenAI SDK	Full parameter control, production apps, existing OpenAI code
Native ChatClient	Quick prototyping, fewer dependencies, simpler setup

Key Takeaways

Concept	What You Learned
Control plane	The Foundry Local SDK handles starting the service and loading models
Data plane	The OpenAI SDK handles chat completions and streaming
Dynamic ports	Always use the SDK to discover the endpoint; never hardcode URLs
Cross-language	The same code pattern works across Python, JavaScript, and C#
OpenAI compatibility	Full OpenAI API compatibility means existing OpenAI code works with minimal changes
Native ChatClient	`createChatClient()` (JS) / `GetChatClientAsync()` (C#) provides an alternative to the OpenAI SDK

Next Steps

Continue to Part 4: Building a RAG Application to learn how to build a Retrieval-Augmented Generation pipeline running entirely on your device.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Part 3: Using the Foundry Local SDK with OpenAI

Overview

Learning Objectives

Prerequisites

Concept: How the SDK Works

Lab Exercises

Exercise 1: Setup Your Environment

Exercise 2: Basic Chat Completion

Exercise 3: Experiment with Prompts

SDK Method Reference

Exercise 4: Using the Native ChatClient (Alternative to OpenAI SDK)

Key Takeaways

Next Steps

FilesExpand file tree

part3-sdk-and-apis.md

Latest commit

History

part3-sdk-and-apis.md

File metadata and controls

Part 3: Using the Foundry Local SDK with OpenAI

Overview

Learning Objectives

Prerequisites

Concept: How the SDK Works

Lab Exercises

Exercise 1: Setup Your Environment

Exercise 2: Basic Chat Completion

Exercise 3: Experiment with Prompts

SDK Method Reference

Exercise 4: Using the Native ChatClient (Alternative to OpenAI SDK)

Key Takeaways

Next Steps