@howells/ai

Unified AI client for all projects. One package, Vercel AI Gateway by default, direct provider escape hatches, provider-aware model tiers, and normalized generation settings.

Quick Start

import { createAI } from "@howells/ai";
import { generateText, Output, streamText, embed } from "ai";

const ai = createAI({
  app: { name: "MyApp", url: "https://myapp.com" },
});

// Pick a model by tier
const { text } = await generateText({
  model: ai.model("fast"),
  prompt: "Classify this ingredient",
});

// Add capabilities per tier
const { text: analysis } = await generateText({
  model: ai.model("powerful", {
    agent: "taste-analysis",
    tools: true,
    vision: true,
  }),
  prompt: "Analyze this design",
});

// Structured output
const { output } = await generateText({
  model: ai.model("standard", { agent: "search" }),
  output: Output.object({ schema: myZodSchema }),
  prompt: "Extract entities from this text",
});

Generation Options

Use ai.generationOptions(...) for the settings that vary across providers: reasoning budget, verbosity, structured-output provider behavior, tool policy, response length, sampling, prompt cache, user attribution, and service tier.

const provider = "openai";

const { text } = await generateText({
  model: ai.model("powerful", { provider, tools: true }),
  prompt: "Plan the migration",
  tools: migrationTools,
  ...ai.generationOptions({
    provider,
    reasoning: "high",
    verbosity: "medium",
    structured: "strict",
    tools: "auto",
    maxToolSteps: 5,
    outputLength: "long",
    creativity: "focused",
    user: "migration-agent",
  }),
});

For Gateway calls, pass the canonical model ID when you want provider-specific options inferred as well as Gateway attribution:

const modelId = "openai/gpt-5.4";

await streamText({
  model: ai.modelById(modelId),
  prompt: "...",
  ...ai.generationOptions({
    provider: "gateway",
    modelId,
    reasoning: "medium",
    verbosity: "high",
  }),
});

Normalized Option	AI SDK / Provider Mapping
`reasoning`	OpenAI `reasoningEffort`, Anthropic `thinking`, Google `thinkingConfig`, OpenRouter `reasoning`. Accepts a preset (`"high"`) or `{ effort, maxTokens }`.
`verbosity`	OpenAI `textVerbosity`
`structured`	OpenAI strict JSON schema, Anthropic structured output mode, Google structured outputs
`tools`	AI SDK `toolChoice`
`maxToolSteps`	AI SDK `stopWhen: stepCountIs(n)`
`parallelTools`	OpenAI/OpenRouter parallel tool calls, Anthropic inverse disable flag
`outputLength`	AI SDK `maxOutputTokens` preset
`creativity`	AI SDK `temperature` preset
`cache`	Anthropic `cacheControl`, OpenRouter `cache_control`. Pass `"ephemeral"` or `{ ttl: "5m" \| "1h" }`.
`serviceTier`	OpenAI/Google service tier where supported
`routing`	Gateway `sort/only/order/zeroDataRetention/...`, OpenRouter `provider.{sort, only, ignore, order, allow_fallbacks, max_price, quantizations, zdr, data_collection}`
`fallbackModels`	Gateway `models`, OpenRouter `models` (model fallback chain)
`tags`	Gateway `tags` (spend reporting). Ignored elsewhere.
`webSearch`	OpenRouter `plugins: [{ id: "web", ... }]`. For Gateway, wire `gateway.tools.parallelSearch()` / `perplexitySearch()` via AI SDK `tools`.
`responseHealing`	OpenRouter `plugins: [{ id: "response-healing" }]` (auto-repair JSON for `generateObject`).
`includeCost`	OpenRouter `usage: { include: true }`. Gateway returns cost automatically.
`logprobs` / `logitBias`	OpenRouter only (`logprobs` + `top_logprobs`, `logit_bias`).

Routing & cost

// Cheapest provider, ZDR-only, with a price ceiling and fallback model
await generateText({
  model: ai.modelById("anthropic/claude-sonnet-4.6", { provider: "gateway" }),
  prompt: "...",
  ...ai.generationOptions({
    provider: "gateway",
    modelId: "anthropic/claude-sonnet-4.6",
    routing: {
      prefer: "cheapest",
      privacy: ["no-retention", "no-training"],
      allow: ["anthropic", "amazon-bedrock"],
    },
    fallbackModels: ["anthropic/claude-haiku-4.5"],
    tags: ["feature:checkout"],
  }),
});

routing.prefer accepts "auto", "cheapest", "fastest", or "highest-throughput". routing.privacy accepts any combination of "no-retention", "no-training", "hipaa". routing.maxCost (OpenRouter only) takes USD-per-million-token ceilings: { promptPerMillion, completionPerMillion, requestUsd }.

Gateway introspection

When the Gateway provider is configured, ai.gateway exposes the control-plane APIs:

const ai = createAI();
if (ai.gateway) {
  const { balance } = await ai.gateway.credits();
  const { models } = await ai.gateway.listModels();
  const spend = await ai.gateway.spend({
    startDate: "2026-04-01",
    endDate: "2026-04-30",
    groupBy: "model",
  });
  const info = await ai.gateway.generationInfo("gen_01H...");
}

Testing

Normal tests are deterministic and do not call providers:

pnpm test
pnpm check-types
pnpm build

Live tests are opt-in because they use real API keys and spend provider quota. They load keys from .env, .env.local, or apps/benchmark/.env.local, then verify every configured provider/model route plus the normalized config option matrix:

pnpm test:live

CLI

The package ships a small CLI as both ai and howells-ai:

ai models
ai providers
ai doctor
ai doctor --live
ai test --provider openai
ai models --task coding
ai bench --provider gateway --task coding --tier fast --prompt "Reply in one sentence."

Use --json on models, providers, doctor, test, and bench for scriptable output. The CLI loads local keys from .env, .env.local, and apps/benchmark/.env.local, and never prints secret values.

Model Matrix

Language Models (via Vercel AI Gateway by default)

Language models are selected by tier, then capability flags. Structured input/output is a baseline requirement for every default language model.

Tier	Text Default	Tools Default	Vision / Vision Tools Default	Use When
`nano`	`xiaomi/mimo-v2-flash`	`xiaomi/mimo-v2-flash`	`google/gemini-3.1-flash-lite-preview`	Cheap structured output and light vision work
`fast`	`x-ai/grok-4.1-fast`	`x-ai/grok-4.1-fast`	`x-ai/grok-4.1-fast`	Low-latency tool calls, chat, image reads, long context
`standard`	`google/gemini-3-flash-preview`	`google/gemini-3-flash-preview`	`google/gemini-3-flash-preview`	Everyday tasks, chat, coding, vision, 1M context
`powerful`	`x-ai/grok-4.3`	`x-ai/grok-4.3`	`x-ai/grok-4.3`	High-quality synthesis with strong speed/cost balance
`reasoning`	`anthropic/claude-opus-4.7`	`anthropic/claude-opus-4.7`	`anthropic/claude-opus-4.7`	Frontier quality and deep multi-step reasoning

ai.model("fast"); // fast text
ai.model("fast", { tools: true }); // fast tool calling
ai.model("fast", { vision: true }); // fast image understanding
ai.model("fast", { tools: true, vision: true }); // fast image + tools

Workload Tasks

Pass task when the best model depends on the job more than the generic tier. general preserves the base matrix; other tasks layer RouterBase-informed picks over the same tier/capability shape.

ai.model("fast", { task: "coding", tools: true }); // MiniMax M2.5
ai.model("standard", { task: "coding" }); // GLM 5
ai.model("fast", { task: "agentic", tools: true }); // Grok 4.1 Fast
ai.model("standard", { task: "vision", vision: true }); // Gemini 3 Flash Preview
ai.model("standard", { task: "longContext" }); // Grok 4.1 Fast

Available tasks: general, coding, agentic, chat, bulk, vision, reasoning, longContext, and creative.

When you pin a provider, task selection stays inside that provider wherever the provider has coverage. For example, provider: "openai", task: "coding" routes to OpenAI's Codex line, while provider: "zai", task: "vision" routes to GLM's vision model instead of falling back to the global winner from another provider. If a requested capability is incompatible with the resolved model, selection throws before any provider call. For example, provider: "deepseek", vision: true fails locally because DeepSeek's selected models are not vision-capable.

Retrieval Models

Slot	Voyage Default	Gemini Default	Use When
`embed`	`voyage-3`	`gemini-embedding-2-preview`	Text embeddings
`multimodalEmbed`	`voyage-multimodal-3.5`	`gemini-embedding-2-preview`	Text + image embeddings
`rerank`	`rerank-2.5`	n/a	Search result reranking

Overriding Models

Override any tier variant or retrieval model per project:

import {
  ANTHROPIC_MODELS,
  createAI,
  GOOGLE_EMBED_MODELS,
  VOYAGE_MODELS,
} from "@howells/ai";

const ai = createAI({
  app: { name: "Sorrel", url: "https://sorrel.app" },
  models: {
    standard: {
      text: ANTHROPIC_MODELS.CLAUDE_SONNET_4_6,
      tools: ANTHROPIC_MODELS.CLAUDE_SONNET_4_6,
    },
    tasks: {
      coding: {
        standard: {
          text: ANTHROPIC_MODELS.CLAUDE_SONNET_4_6,
        },
      },
    },
    embed: { voyage: VOYAGE_MODELS.VOYAGE_3_LITE },
    rerank: VOYAGE_MODELS.RERANK_2_5_LITE,
  },
});

Embedding slots are provider-aware. Configure embed and multimodalEmbed once, then select the provider at the call site:

const ai = createAI({
  models: {
    embed: {
      voyage: VOYAGE_MODELS.VOYAGE_3,
      gemini: GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_2,
    },
    multimodalEmbed: {
      voyage: VOYAGE_MODELS.MULTIMODAL_3_5,
      gemini: GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_2,
    },
  },
});

Embeddings

import { embed, embedMany } from "ai";

// Provider-neutral text embeddings
const { embedding } = await embed({
  model: ai.embeddingModel({ input: "text", provider: "voyage" }),
  value: "some text",
});

// Provider-neutral image or image+text embeddings.
// Switch to { provider: "gemini" } without changing the call site shape.
const imageModel = ai.embeddingModel({ input: "image", provider: "voyage" });

// Google Gemini text embeddings (for benchmarking)
const { embedding: g } = await embed({
  model: ai.embeddingModel({ input: "text", provider: "gemini" }),
  value: "some text",
});

// Google Gemini image+text embeddings
const { embedding: imageEmbedding } = await embed({
  model: ai.embeddingModel({ input: "image", provider: "gemini" }),
  value: "green woven upholstery",
  providerOptions: {
    google: {
      content: [
        [{ inlineData: { mimeType: "image/png", data: "<base64>" } }],
      ],
    },
  },
});

// Batch
const { embeddings } = await embedMany({
  model: ai.embeddingModel({ provider: "voyage" }),
  values: ["text one", "text two", "text three"],
});

Reranking

const reranker = ai.rerankModel();

Non-AI-SDK Runtimes

Some frameworks accept config objects instead of AI SDK models:

const model = ai.modelConfig("deepseek/deepseek-v3.2", {
  provider: "openrouter",
  agent: "materials-agent",
});
// { provider, id, service, capabilities, apiKey, serviceApiKey, baseURL, headers, user }

The capabilities field describes which config fields the selected provider can consume, so callers can pass through the useful fields without branching on one provider-specific helper.

Provider	API Key	Base URL	Headers	App Attribution	Agent Attribution
`gateway`	yes	no	no	no	no
`openrouter`	yes	yes	yes	yes	yes
`anthropic`	yes	no	no	no	no
`openai`	yes	no	no	no	no
`google`	yes	no	no	no	no
`deepseek`	yes	yes	no	no	no
`xai`	yes	yes	no	no	no
`qwen`	yes	yes	no	no	no
`zai`	yes	yes	no	no	no
`moonshotai`	yes	yes	no	no	no

For OpenRouter direct HTTP clients, request an OpenRouter model config and pass user in the request body:

const config = ai.modelConfig("deepseek/deepseek-v3.2", {
  provider: "openrouter",
  agent: "nl-search",
});
await fetch(`${config.baseURL}/chat/completions`, {
  method: "POST",
  headers: {
    Authorization: `Bearer ${config.apiKey}`,
    ...config.headers,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "deepseek/deepseek-v3.2",
    messages,
    user: config.user,
  }),
});

Escape Hatch

For models that don't fit any tier:

const { text } = await generateText({
  model: ai.modelById("openai/gpt-5-nano"),
  prompt: "...",
});

Route through OpenRouter or direct providers when needed:

ai.model("standard", { provider: "openrouter" });
ai.modelById("claude-sonnet-4-6", { provider: "anthropic" });
ai.modelById("x-ai/grok-4.3", { provider: "xai" });
ai.modelById("moonshotai/kimi-k2.6", { provider: "moonshotai" });

Constants use normalized package IDs. createAI() translates known provider mismatches at runtime, such as Anthropic's direct 4-6 IDs, OpenRouter and Google direct google/gemini-3-flash-preview IDs for Gemini 3 Flash, Gateway's xai/grok-4.1-fast-non-reasoning, and Alibaba-hosted Qwen IDs. DeepSeek, xAI, Qwen, Z.ai, and Moonshot/Kimi are direct OpenAI-compatible routes when their keys are configured. Other catalog services such as MiniMax, StepFun, Xiaomi, Inception, and Nex AGI route through Gateway or OpenRouter.

Agent Attribution

Tag OpenRouter requests for per-agent cost tracking:

ai.model("fast", { agent: "search", provider: "openrouter" });
// Sends user tag when provider is "openrouter"

Model Constants

import {
  ANTHROPIC_MODELS,
  DEEPSEEK_MODELS,
  GLM_MODELS,
  GOOGLE_EMBED_MODELS,
  GOOGLE_MODELS,
  INCEPTION_MODELS,
  KIMI_MODELS,
  MINIMAX_MODELS,
  NEX_AGI_MODELS,
  OPENAI_MODELS,
  PROVIDER_TASK_DEFAULT_MODELS,
  QWEN_MODELS,
  STEPFUN_MODELS,
  VOYAGE_MODELS,
  XAI_MODELS,
  XIAOMI_MODELS,
} from "@howells/ai";

// Anthropic
ANTHROPIC_MODELS.CLAUDE_OPUS_4_7        // "anthropic/claude-opus-4.7"
ANTHROPIC_MODELS.CLAUDE_OPUS_4_6        // "anthropic/claude-opus-4.6"
ANTHROPIC_MODELS.CLAUDE_SONNET_4_6      // "anthropic/claude-sonnet-4.6"

// DeepSeek
DEEPSEEK_MODELS.DEEPSEEK_V3_2           // "deepseek/deepseek-v3.2"
DEEPSEEK_MODELS.DEEPSEEK_V4_FLASH       // "deepseek/deepseek-v4-flash"

// GLM / Z.ai
GLM_MODELS.GLM_5                        // "z-ai/glm-5"
GLM_MODELS.GLM_5V_TURBO                 // "z-ai/glm-5v-turbo"
GLM_MODELS.GLM_4_7                      // "z-ai/glm-4.7"
GLM_MODELS.GLM_4_7_FLASH                // "z-ai/glm-4.7-flash"
GLM_MODELS.GLM_4_6V                     // "z-ai/glm-4.6v"

// Kimi / Moonshot
KIMI_MODELS.KIMI_K2_6                   // "moonshotai/kimi-k2.6"
KIMI_MODELS.KIMI_K2_5                   // "moonshotai/kimi-k2.5"
KIMI_MODELS.KIMI_K2_THINKING            // "moonshotai/kimi-k2-thinking"

// Google language models
GOOGLE_MODELS.GEMINI_3_FLASH_PREVIEW    // "google/gemini-3-flash-preview"
GOOGLE_MODELS.GEMINI_3_1_PRO_PREVIEW    // "google/gemini-3.1-pro-preview"
GOOGLE_MODELS.GEMINI_3_1_FLASH_LITE_PREVIEW

// OpenAI
OPENAI_MODELS.GPT_5_4_NANO              // "openai/gpt-5.4-nano"
OPENAI_MODELS.GPT_5_4                   // "openai/gpt-5.4"
OPENAI_MODELS.GPT_5_3_CODEX             // "openai/gpt-5.3-codex"

// Qwen
QWEN_MODELS.QWEN_3_235B_A22B_2507       // "qwen/qwen3-235b-a22b-2507"
QWEN_MODELS.QWEN_3_NEXT_80B_A3B_INSTRUCT_FREE
QWEN_MODELS.QWEN_3_6_PLUS               // "qwen/qwen3.6-plus"

// xAI
XAI_MODELS.GROK_4_1_FAST                // "x-ai/grok-4.1-fast"
XAI_MODELS.GROK_4_3                     // "x-ai/grok-4.3"

// Gateway/OpenRouter-only services
MINIMAX_MODELS.MINIMAX_M2_7             // "minimax/minimax-m2.7"
MINIMAX_MODELS.MINIMAX_M2_5             // "minimax/minimax-m2.5"
STEPFUN_MODELS.STEP_3_5_FLASH           // "stepfun/step-3.5-flash"
XIAOMI_MODELS.MIMO_V2_FLASH             // "xiaomi/mimo-v2-flash"
INCEPTION_MODELS.MERCURY_2              // "inception/mercury-2"
NEX_AGI_MODELS.DEEPSEEK_V3_1_NEX_N1     // "nex-agi/deepseek-v3.1-nex-n1"

// Provider-pinned task matrix
PROVIDER_TASK_DEFAULT_MODELS.openai?.coding?.standard?.text
// "openai/gpt-5.3-codex"

ai.modelCapabilities({ modelId: "deepseek/deepseek-v3.2" })
// { structured: true, tools: true, vision: false }

// Voyage
VOYAGE_MODELS.VOYAGE_3            // "voyage-3"
VOYAGE_MODELS.VOYAGE_3_LITE       // "voyage-3-lite"
VOYAGE_MODELS.VOYAGE_3_5          // "voyage-3.5"
VOYAGE_MODELS.VOYAGE_3_5_LITE     // "voyage-3.5-lite"
VOYAGE_MODELS.MULTIMODAL_3        // "voyage-multimodal-3"
VOYAGE_MODELS.MULTIMODAL_3_5      // "voyage-multimodal-3.5"
VOYAGE_MODELS.RERANK_2_5          // "rerank-2.5"
VOYAGE_MODELS.RERANK_2_5_LITE     // "rerank-2.5-lite"

// Google
GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_2  // "gemini-embedding-2-preview"
GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_1  // "gemini-embedding-001"

Environment Variables

Variable	Required	Used By
`AI_GATEWAY_API_KEY`	Yes locally for default language models	Vercel AI Gateway
`OPENROUTER_API_KEY`	Only if using `provider: "openrouter"`	OpenRouter provider
`ANTHROPIC_API_KEY`	Only if using `provider: "anthropic"`	Anthropic provider
`OPENAI_API_KEY`	Only if using `provider: "openai"`	OpenAI provider
`VOYAGE_API_KEY`	Yes (for embed/rerank)	Voyage provider
`GOOGLE_GEMINI_API_KEY`	Only if using Gemini embeddings or `provider: "google"`	Google provider
`DEEPSEEK_API_KEY`	Only if using `provider: "deepseek"`	DeepSeek direct provider
`XAI_API_KEY`	Only if using `provider: "xai"`	xAI direct provider
`QWEN_API_KEY`	Only if using `provider: "qwen"`	Qwen direct provider
`ZAI_API_KEY`	Only if using `provider: "zai"`	Z.ai / GLM direct provider
`MOONSHOT_API_KEY`	Only if using `provider: "moonshotai"`	Moonshot / Kimi direct provider

Keys can also be passed directly to createAI():

const ai = createAI({
  gatewayKey: "vck_...",
  openRouterKey: "sk-or-...",
  voyageKey: "pa-...",
  googleKey: "...",
  xaiKey: "...",
  moonshotKey: "...",
  serviceKeys: {
    zai: "...",
    qwen: "...",
  },
});

Service keys are exposed through ai.availableServices and ai.modelConfig() for runtimes that can use provider-specific credentials. The same keys also enable direct OpenAI-compatible AI SDK routes for DeepSeek, xAI, Qwen, Z.ai, and Moonshot/Kimi.

Architecture

Each createAI() returns an independent client (no shared module state)
Providers are lazy-initialized on first use
Safe for tests and multi-config scenarios
Language models route through Vercel AI Gateway by default
OpenRouter and direct provider routes are available per call
Embeddings/reranking through Voyage AI or Google

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
apps/benchmark		apps/benchmark
src		src
test		test
.gitignore		.gitignore
.node-version		.node-version
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json
tsdown.config.ts		tsdown.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@howells/ai

Quick Start

Generation Options

Routing & cost

Gateway introspection

Testing

CLI

Model Matrix

Language Models (via Vercel AI Gateway by default)

Workload Tasks

Retrieval Models

Overriding Models

Embeddings

Reranking

Non-AI-SDK Runtimes

Escape Hatch

Agent Attribution

Model Constants

Environment Variables

Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

@howells/ai

Quick Start

Generation Options

Routing & cost

Gateway introspection

Testing

CLI

Model Matrix

Language Models (via Vercel AI Gateway by default)

Workload Tasks

Retrieval Models

Overriding Models

Embeddings

Reranking

Non-AI-SDK Runtimes

Escape Hatch

Agent Attribution

Model Constants

Environment Variables

Architecture

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages