Skip to content

howells/ai

Repository files navigation

@howells/ai

Unified AI client for all projects. One package, Vercel AI Gateway by default, direct provider escape hatches, provider-aware model tiers, and normalized generation settings.

Quick Start

import { createAI } from "@howells/ai";
import { generateText, Output, streamText, embed } from "ai";

const ai = createAI({
  app: { name: "MyApp", url: "https://myapp.com" },
});

// Pick a model by tier
const { text } = await generateText({
  model: ai.model("fast"),
  prompt: "Classify this ingredient",
});

// Add capabilities per tier
const { text: analysis } = await generateText({
  model: ai.model("powerful", {
    agent: "taste-analysis",
    tools: true,
    vision: true,
  }),
  prompt: "Analyze this design",
});

// Structured output
const { output } = await generateText({
  model: ai.model("standard", { agent: "search" }),
  output: Output.object({ schema: myZodSchema }),
  prompt: "Extract entities from this text",
});

Generation Options

Use ai.generationOptions(...) for the settings that vary across providers: reasoning budget, verbosity, structured-output provider behavior, tool policy, response length, sampling, prompt cache, user attribution, and service tier.

const provider = "openai";

const { text } = await generateText({
  model: ai.model("powerful", { provider, tools: true }),
  prompt: "Plan the migration",
  tools: migrationTools,
  ...ai.generationOptions({
    provider,
    reasoning: "high",
    verbosity: "medium",
    structured: "strict",
    tools: "auto",
    maxToolSteps: 5,
    outputLength: "long",
    creativity: "focused",
    user: "migration-agent",
  }),
});

For Gateway calls, pass the canonical model ID when you want provider-specific options inferred as well as Gateway attribution:

const modelId = "openai/gpt-5.4";

await streamText({
  model: ai.modelById(modelId),
  prompt: "...",
  ...ai.generationOptions({
    provider: "gateway",
    modelId,
    reasoning: "medium",
    verbosity: "high",
  }),
});
Normalized Option AI SDK / Provider Mapping
reasoning OpenAI reasoningEffort, Anthropic thinking, Google thinkingConfig, OpenRouter reasoning. Accepts a preset ("high") or { effort, maxTokens }.
verbosity OpenAI textVerbosity
structured OpenAI strict JSON schema, Anthropic structured output mode, Google structured outputs
tools AI SDK toolChoice
maxToolSteps AI SDK stopWhen: stepCountIs(n)
parallelTools OpenAI/OpenRouter parallel tool calls, Anthropic inverse disable flag
outputLength AI SDK maxOutputTokens preset
creativity AI SDK temperature preset
cache Anthropic cacheControl, OpenRouter cache_control. Pass "ephemeral" or { ttl: "5m" | "1h" }.
serviceTier OpenAI/Google service tier where supported
routing Gateway sort/only/order/zeroDataRetention/..., OpenRouter provider.{sort, only, ignore, order, allow_fallbacks, max_price, quantizations, zdr, data_collection}
fallbackModels Gateway models, OpenRouter models (model fallback chain)
tags Gateway tags (spend reporting). Ignored elsewhere.
webSearch OpenRouter plugins: [{ id: "web", ... }]. For Gateway, wire gateway.tools.parallelSearch() / perplexitySearch() via AI SDK tools.
responseHealing OpenRouter plugins: [{ id: "response-healing" }] (auto-repair JSON for generateObject).
includeCost OpenRouter usage: { include: true }. Gateway returns cost automatically.
logprobs / logitBias OpenRouter only (logprobs + top_logprobs, logit_bias).

Routing & cost

// Cheapest provider, ZDR-only, with a price ceiling and fallback model
await generateText({
  model: ai.modelById("anthropic/claude-sonnet-4.6", { provider: "gateway" }),
  prompt: "...",
  ...ai.generationOptions({
    provider: "gateway",
    modelId: "anthropic/claude-sonnet-4.6",
    routing: {
      prefer: "cheapest",
      privacy: ["no-retention", "no-training"],
      allow: ["anthropic", "amazon-bedrock"],
    },
    fallbackModels: ["anthropic/claude-haiku-4.5"],
    tags: ["feature:checkout"],
  }),
});

routing.prefer accepts "auto", "cheapest", "fastest", or "highest-throughput". routing.privacy accepts any combination of "no-retention", "no-training", "hipaa". routing.maxCost (OpenRouter only) takes USD-per-million-token ceilings: { promptPerMillion, completionPerMillion, requestUsd }.

Gateway introspection

When the Gateway provider is configured, ai.gateway exposes the control-plane APIs:

const ai = createAI();
if (ai.gateway) {
  const { balance } = await ai.gateway.credits();
  const { models } = await ai.gateway.listModels();
  const spend = await ai.gateway.spend({
    startDate: "2026-04-01",
    endDate: "2026-04-30",
    groupBy: "model",
  });
  const info = await ai.gateway.generationInfo("gen_01H...");
}

Testing

Normal tests are deterministic and do not call providers:

pnpm test
pnpm check-types
pnpm build

Live tests are opt-in because they use real API keys and spend provider quota. They load keys from .env, .env.local, or apps/benchmark/.env.local, then verify every configured provider/model route plus the normalized config option matrix:

pnpm test:live

CLI

The package ships a small CLI as both ai and howells-ai:

ai models
ai providers
ai doctor
ai doctor --live
ai test --provider openai
ai models --task coding
ai bench --provider gateway --task coding --tier fast --prompt "Reply in one sentence."

Use --json on models, providers, doctor, test, and bench for scriptable output. The CLI loads local keys from .env, .env.local, and apps/benchmark/.env.local, and never prints secret values.

Model Matrix

Language Models (via Vercel AI Gateway by default)

Language models are selected by tier, then capability flags. Structured input/output is a baseline requirement for every default language model.

Tier Text Default Tools Default Vision / Vision Tools Default Use When
nano xiaomi/mimo-v2-flash xiaomi/mimo-v2-flash google/gemini-3.1-flash-lite-preview Cheap structured output and light vision work
fast x-ai/grok-4.1-fast x-ai/grok-4.1-fast x-ai/grok-4.1-fast Low-latency tool calls, chat, image reads, long context
standard google/gemini-3-flash-preview google/gemini-3-flash-preview google/gemini-3-flash-preview Everyday tasks, chat, coding, vision, 1M context
powerful x-ai/grok-4.3 x-ai/grok-4.3 x-ai/grok-4.3 High-quality synthesis with strong speed/cost balance
reasoning anthropic/claude-opus-4.7 anthropic/claude-opus-4.7 anthropic/claude-opus-4.7 Frontier quality and deep multi-step reasoning
ai.model("fast"); // fast text
ai.model("fast", { tools: true }); // fast tool calling
ai.model("fast", { vision: true }); // fast image understanding
ai.model("fast", { tools: true, vision: true }); // fast image + tools

Workload Tasks

Pass task when the best model depends on the job more than the generic tier. general preserves the base matrix; other tasks layer RouterBase-informed picks over the same tier/capability shape.

ai.model("fast", { task: "coding", tools: true }); // MiniMax M2.5
ai.model("standard", { task: "coding" }); // GLM 5
ai.model("fast", { task: "agentic", tools: true }); // Grok 4.1 Fast
ai.model("standard", { task: "vision", vision: true }); // Gemini 3 Flash Preview
ai.model("standard", { task: "longContext" }); // Grok 4.1 Fast

Available tasks: general, coding, agentic, chat, bulk, vision, reasoning, longContext, and creative.

When you pin a provider, task selection stays inside that provider wherever the provider has coverage. For example, provider: "openai", task: "coding" routes to OpenAI's Codex line, while provider: "zai", task: "vision" routes to GLM's vision model instead of falling back to the global winner from another provider. If a requested capability is incompatible with the resolved model, selection throws before any provider call. For example, provider: "deepseek", vision: true fails locally because DeepSeek's selected models are not vision-capable.

Retrieval Models

Slot Voyage Default Gemini Default Use When
embed voyage-3 gemini-embedding-2-preview Text embeddings
multimodalEmbed voyage-multimodal-3.5 gemini-embedding-2-preview Text + image embeddings
rerank rerank-2.5 n/a Search result reranking

Overriding Models

Override any tier variant or retrieval model per project:

import {
  ANTHROPIC_MODELS,
  createAI,
  GOOGLE_EMBED_MODELS,
  VOYAGE_MODELS,
} from "@howells/ai";

const ai = createAI({
  app: { name: "Sorrel", url: "https://sorrel.app" },
  models: {
    standard: {
      text: ANTHROPIC_MODELS.CLAUDE_SONNET_4_6,
      tools: ANTHROPIC_MODELS.CLAUDE_SONNET_4_6,
    },
    tasks: {
      coding: {
        standard: {
          text: ANTHROPIC_MODELS.CLAUDE_SONNET_4_6,
        },
      },
    },
    embed: { voyage: VOYAGE_MODELS.VOYAGE_3_LITE },
    rerank: VOYAGE_MODELS.RERANK_2_5_LITE,
  },
});

Embedding slots are provider-aware. Configure embed and multimodalEmbed once, then select the provider at the call site:

const ai = createAI({
  models: {
    embed: {
      voyage: VOYAGE_MODELS.VOYAGE_3,
      gemini: GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_2,
    },
    multimodalEmbed: {
      voyage: VOYAGE_MODELS.MULTIMODAL_3_5,
      gemini: GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_2,
    },
  },
});

Embeddings

import { embed, embedMany } from "ai";

// Provider-neutral text embeddings
const { embedding } = await embed({
  model: ai.embeddingModel({ input: "text", provider: "voyage" }),
  value: "some text",
});

// Provider-neutral image or image+text embeddings.
// Switch to { provider: "gemini" } without changing the call site shape.
const imageModel = ai.embeddingModel({ input: "image", provider: "voyage" });

// Google Gemini text embeddings (for benchmarking)
const { embedding: g } = await embed({
  model: ai.embeddingModel({ input: "text", provider: "gemini" }),
  value: "some text",
});

// Google Gemini image+text embeddings
const { embedding: imageEmbedding } = await embed({
  model: ai.embeddingModel({ input: "image", provider: "gemini" }),
  value: "green woven upholstery",
  providerOptions: {
    google: {
      content: [
        [{ inlineData: { mimeType: "image/png", data: "<base64>" } }],
      ],
    },
  },
});

// Batch
const { embeddings } = await embedMany({
  model: ai.embeddingModel({ provider: "voyage" }),
  values: ["text one", "text two", "text three"],
});

Reranking

const reranker = ai.rerankModel();

Non-AI-SDK Runtimes

Some frameworks accept config objects instead of AI SDK models:

const model = ai.modelConfig("deepseek/deepseek-v3.2", {
  provider: "openrouter",
  agent: "materials-agent",
});
// { provider, id, service, capabilities, apiKey, serviceApiKey, baseURL, headers, user }

The capabilities field describes which config fields the selected provider can consume, so callers can pass through the useful fields without branching on one provider-specific helper.

Provider API Key Base URL Headers App Attribution Agent Attribution
gateway yes no no no no
openrouter yes yes yes yes yes
anthropic yes no no no no
openai yes no no no no
google yes no no no no
deepseek yes yes no no no
xai yes yes no no no
qwen yes yes no no no
zai yes yes no no no
moonshotai yes yes no no no

For OpenRouter direct HTTP clients, request an OpenRouter model config and pass user in the request body:

const config = ai.modelConfig("deepseek/deepseek-v3.2", {
  provider: "openrouter",
  agent: "nl-search",
});
await fetch(`${config.baseURL}/chat/completions`, {
  method: "POST",
  headers: {
    Authorization: `Bearer ${config.apiKey}`,
    ...config.headers,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "deepseek/deepseek-v3.2",
    messages,
    user: config.user,
  }),
});

Escape Hatch

For models that don't fit any tier:

const { text } = await generateText({
  model: ai.modelById("openai/gpt-5-nano"),
  prompt: "...",
});

Route through OpenRouter or direct providers when needed:

ai.model("standard", { provider: "openrouter" });
ai.modelById("claude-sonnet-4-6", { provider: "anthropic" });
ai.modelById("x-ai/grok-4.3", { provider: "xai" });
ai.modelById("moonshotai/kimi-k2.6", { provider: "moonshotai" });

Constants use normalized package IDs. createAI() translates known provider mismatches at runtime, such as Anthropic's direct 4-6 IDs, OpenRouter and Google direct google/gemini-3-flash-preview IDs for Gemini 3 Flash, Gateway's xai/grok-4.1-fast-non-reasoning, and Alibaba-hosted Qwen IDs. DeepSeek, xAI, Qwen, Z.ai, and Moonshot/Kimi are direct OpenAI-compatible routes when their keys are configured. Other catalog services such as MiniMax, StepFun, Xiaomi, Inception, and Nex AGI route through Gateway or OpenRouter.

Agent Attribution

Tag OpenRouter requests for per-agent cost tracking:

ai.model("fast", { agent: "search", provider: "openrouter" });
// Sends user tag when provider is "openrouter"

Model Constants

import {
  ANTHROPIC_MODELS,
  DEEPSEEK_MODELS,
  GLM_MODELS,
  GOOGLE_EMBED_MODELS,
  GOOGLE_MODELS,
  INCEPTION_MODELS,
  KIMI_MODELS,
  MINIMAX_MODELS,
  NEX_AGI_MODELS,
  OPENAI_MODELS,
  PROVIDER_TASK_DEFAULT_MODELS,
  QWEN_MODELS,
  STEPFUN_MODELS,
  VOYAGE_MODELS,
  XAI_MODELS,
  XIAOMI_MODELS,
} from "@howells/ai";

// Anthropic
ANTHROPIC_MODELS.CLAUDE_OPUS_4_7        // "anthropic/claude-opus-4.7"
ANTHROPIC_MODELS.CLAUDE_OPUS_4_6        // "anthropic/claude-opus-4.6"
ANTHROPIC_MODELS.CLAUDE_SONNET_4_6      // "anthropic/claude-sonnet-4.6"

// DeepSeek
DEEPSEEK_MODELS.DEEPSEEK_V3_2           // "deepseek/deepseek-v3.2"
DEEPSEEK_MODELS.DEEPSEEK_V4_FLASH       // "deepseek/deepseek-v4-flash"

// GLM / Z.ai
GLM_MODELS.GLM_5                        // "z-ai/glm-5"
GLM_MODELS.GLM_5V_TURBO                 // "z-ai/glm-5v-turbo"
GLM_MODELS.GLM_4_7                      // "z-ai/glm-4.7"
GLM_MODELS.GLM_4_7_FLASH                // "z-ai/glm-4.7-flash"
GLM_MODELS.GLM_4_6V                     // "z-ai/glm-4.6v"

// Kimi / Moonshot
KIMI_MODELS.KIMI_K2_6                   // "moonshotai/kimi-k2.6"
KIMI_MODELS.KIMI_K2_5                   // "moonshotai/kimi-k2.5"
KIMI_MODELS.KIMI_K2_THINKING            // "moonshotai/kimi-k2-thinking"

// Google language models
GOOGLE_MODELS.GEMINI_3_FLASH_PREVIEW    // "google/gemini-3-flash-preview"
GOOGLE_MODELS.GEMINI_3_1_PRO_PREVIEW    // "google/gemini-3.1-pro-preview"
GOOGLE_MODELS.GEMINI_3_1_FLASH_LITE_PREVIEW

// OpenAI
OPENAI_MODELS.GPT_5_4_NANO              // "openai/gpt-5.4-nano"
OPENAI_MODELS.GPT_5_4                   // "openai/gpt-5.4"
OPENAI_MODELS.GPT_5_3_CODEX             // "openai/gpt-5.3-codex"

// Qwen
QWEN_MODELS.QWEN_3_235B_A22B_2507       // "qwen/qwen3-235b-a22b-2507"
QWEN_MODELS.QWEN_3_NEXT_80B_A3B_INSTRUCT_FREE
QWEN_MODELS.QWEN_3_6_PLUS               // "qwen/qwen3.6-plus"

// xAI
XAI_MODELS.GROK_4_1_FAST                // "x-ai/grok-4.1-fast"
XAI_MODELS.GROK_4_3                     // "x-ai/grok-4.3"

// Gateway/OpenRouter-only services
MINIMAX_MODELS.MINIMAX_M2_7             // "minimax/minimax-m2.7"
MINIMAX_MODELS.MINIMAX_M2_5             // "minimax/minimax-m2.5"
STEPFUN_MODELS.STEP_3_5_FLASH           // "stepfun/step-3.5-flash"
XIAOMI_MODELS.MIMO_V2_FLASH             // "xiaomi/mimo-v2-flash"
INCEPTION_MODELS.MERCURY_2              // "inception/mercury-2"
NEX_AGI_MODELS.DEEPSEEK_V3_1_NEX_N1     // "nex-agi/deepseek-v3.1-nex-n1"

// Provider-pinned task matrix
PROVIDER_TASK_DEFAULT_MODELS.openai?.coding?.standard?.text
// "openai/gpt-5.3-codex"

ai.modelCapabilities({ modelId: "deepseek/deepseek-v3.2" })
// { structured: true, tools: true, vision: false }

// Voyage
VOYAGE_MODELS.VOYAGE_3            // "voyage-3"
VOYAGE_MODELS.VOYAGE_3_LITE       // "voyage-3-lite"
VOYAGE_MODELS.VOYAGE_3_5          // "voyage-3.5"
VOYAGE_MODELS.VOYAGE_3_5_LITE     // "voyage-3.5-lite"
VOYAGE_MODELS.MULTIMODAL_3        // "voyage-multimodal-3"
VOYAGE_MODELS.MULTIMODAL_3_5      // "voyage-multimodal-3.5"
VOYAGE_MODELS.RERANK_2_5          // "rerank-2.5"
VOYAGE_MODELS.RERANK_2_5_LITE     // "rerank-2.5-lite"

// Google
GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_2  // "gemini-embedding-2-preview"
GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_1  // "gemini-embedding-001"

Environment Variables

Variable Required Used By
AI_GATEWAY_API_KEY Yes locally for default language models Vercel AI Gateway
OPENROUTER_API_KEY Only if using provider: "openrouter" OpenRouter provider
ANTHROPIC_API_KEY Only if using provider: "anthropic" Anthropic provider
OPENAI_API_KEY Only if using provider: "openai" OpenAI provider
VOYAGE_API_KEY Yes (for embed/rerank) Voyage provider
GOOGLE_GEMINI_API_KEY Only if using Gemini embeddings or provider: "google" Google provider
DEEPSEEK_API_KEY Only if using provider: "deepseek" DeepSeek direct provider
XAI_API_KEY Only if using provider: "xai" xAI direct provider
QWEN_API_KEY Only if using provider: "qwen" Qwen direct provider
ZAI_API_KEY Only if using provider: "zai" Z.ai / GLM direct provider
MOONSHOT_API_KEY Only if using provider: "moonshotai" Moonshot / Kimi direct provider

Keys can also be passed directly to createAI():

const ai = createAI({
  gatewayKey: "vck_...",
  openRouterKey: "sk-or-...",
  voyageKey: "pa-...",
  googleKey: "...",
  xaiKey: "...",
  moonshotKey: "...",
  serviceKeys: {
    zai: "...",
    qwen: "...",
  },
});

Service keys are exposed through ai.availableServices and ai.modelConfig() for runtimes that can use provider-specific credentials. The same keys also enable direct OpenAI-compatible AI SDK routes for DeepSeek, xAI, Qwen, Z.ai, and Moonshot/Kimi.

Architecture

  • Each createAI() returns an independent client (no shared module state)
  • Providers are lazy-initialized on first use
  • Safe for tests and multi-config scenarios
  • Language models route through Vercel AI Gateway by default
  • OpenRouter and direct provider routes are available per call
  • Embeddings/reranking through Voyage AI or Google

About

Unified TypeScript AI client with Vercel AI Gateway defaults, provider escape hatches, model tiers, and normalized generation options.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors