Dual SDK (TypeScript + Python) for calculating LLM API costs from provider response payloads using live pricing data.
Pass any LLM response object and get back the cost — the SDK automatically extracts token usage, resolves the model, and looks up pricing from multiple sources.
- Zero-config cost calculation — pass a raw LLM response, get back the cost.
- Multi-source pricing — aggregates pricing from OpenRouter, BerriAI/LiteLLM, Portkey, and Helicone.
- Broad provider coverage — OpenAI, Anthropic, Google, Meta, Mistral, Cohere, xAI, DeepSeek, Qwen, Jina, and more.
- Automatic model resolution — normalizes model IDs and strips provider prefixes for cross-source consistency.
- Provider-aware token extraction — handles each provider's response format via configurable JSONPath mappings.
- Tool call cost tracking — detects tool use in responses and reports the additional system prompt cost when available.
- Best-effort fallback — tries multiple pricing sources until one succeeds.
- Shared config — both SDKs use the same mapping files for consistent behavior.
npm install ai-cost-calculatorimport { BestEffortCalculator } from "ai-cost-calculator";
const result = await BestEffortCalculator.getCost({
model: "gpt-4o-mini",
usage: {
prompt_tokens: 1000,
completion_tokens: 500,
total_tokens: 1500,
},
});
console.log(result);
// { currency: "USD", cost: 0.000225 }You can optionally pass model and/or provider to skip automatic extraction:
const result = await BestEffortCalculator.getCost(response, {
model: "gpt-4o-mini",
provider: "openai",
});pip install ai-cost-calculatorfrom ai_cost_calculator import BestEffortCalculator
result = BestEffortCalculator.get_cost({
"model": "gpt-4o-mini",
"usage": {
"prompt_tokens": 1000,
"completion_tokens": 500,
"total_tokens": 1500,
}
})
print(result)
# {"currency": "USD", "cost": 0.000225}You can optionally pass model and/or provider to skip automatic extraction:
result = BestEffortCalculator.get_cost(response, model="gpt-4o-mini", provider="openai")Override the external pricing lookup with your own rates (e.g. bulk or negotiated pricing):
Node.js / TypeScript:
const result = await BestEffortCalculator.getCost(response, {
pricing: { inputCostPer1M: 0.0455, outputCostPer1M: 0 },
});Python:
result = BestEffortCalculator.get_cost(
response,
pricing={"input_cost_per_1m": 0.0455, "output_cost_per_1m": 0},
)When pricing is provided, no external pricing APIs are called. Both inputCostPer1M and outputCostPer1M are required. Cache token rates (cacheReadCostPer1M, cacheCreationCostPer1M) are optional.
- Extract model — reads the
modelfield from the response. - Infer provider — looks up the model in pricing metadata to determine the provider (e.g.
openai,anthropic). - Extract token usage — uses provider-specific JSONPath mappings to read input/output/total token counts.
- Resolve pricing — fetches and caches live pricing data, matches the model, and computes cost as
(input_tokens / 1M) × input_price + (output_tokens / 1M) × output_price. - Detect tool calls — if the response contains tool use, reports the additional system prompt token cost (when available from the pricing source).
┌─────────────────────────────────────────────────────────────────┐
│ User Code │
│ getCost({ model: "gpt-4o", usage: { ... } }) │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ BestEffortCalculator │
│ Tries each pricing source in order until one succeeds │
│ OpenRouter → Berri → Portkey → Helicone │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Calculator Base Class (shared flow) │
│ │
│ 1. Extract model ID & infer provider │
│ 2. Extract token usage via JSONPath │
│ 3. Fetch pricing map from the configured source │
│ 4. Normalize model ID → look up pricing │
│ 5. Validate pricing values │
│ 6. Compute cost (input + output + cache tokens) │
│ 7. Detect tool calls → compute tool call cost │
│ 8. Return { currency, cost, toolCallCost? } │
└──────────────────────────┬──────────────────────────────────────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌───────────┐ ┌──────────────┐
│ Response │ │ Model │ │ Pricing │
│ Transformer │ │ Resolver │ │ Providers │
│ │ │ │ │ │
│ Extract │ │ Normalize │ │ Fetch live │
│ model ID & │ │ model IDs │ │ pricing from │
│ token usage │ │ & strip │ │ remote APIs │
│ via JSONPath │ │ prefixes │ │ & cache it │
└──────┬──────┘ └─────┬─────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ computeCost() │
│ │
│ cost = (input / 1M) × inputRate │
│ + (cacheRead / 1M) × cacheReadRate │
│ + (cacheCreation / 1M) × cacheCreationRate │
│ + (output / 1M) × outputRate │
└─────────────────────────────────────────────────────────────────┘
Each calculator subclass is a thin config wrapper (~5 lines) that specifies its pricing source name, pricing map getter, and optional custom lookup function. All shared logic lives in the base class.
llmcost-sdk/
├── configs/ # Shared config (single source of truth)
│ └── response-mappings.json # JSONPath mappings for token extraction per provider
├── node/ # TypeScript SDK
│ └── src/
│ ├── index.ts # Public API exports
│ ├── types.ts # CostResult, TokenUsage, PricingSource interfaces
│ ├── errors.ts # Error hierarchy (extends LlmcostError)
│ ├── utils.ts # Shared numeric parsing utilities
│ ├── calculator/ # Calculator implementations
│ │ ├── Calculator.ts # Base class with shared getCost flow
│ │ ├── computeCost.ts # Cost formula
│ │ ├── BestEffortCalculator.ts
│ │ └── <Source>BasedCalculator.ts # ~5 lines each
│ ├── providers/ # Remote pricing API clients (fetch + cache)
│ │ └── <source>Client.ts
│ └── data/ # Config loading, model resolution, token extraction
│ ├── configLoader.ts # Cached config loading
│ ├── responseTransformer.ts
│ └── modelResolver.ts
├── python/ # Python SDK (mirrors Node structure)
│ └── src/ai_cost_calculator/
│ ├── __init__.py # Public API exports
│ ├── types.py
│ ├── errors.py
│ ├── utils.py # Shared numeric parsing utilities
│ ├── calculator/ # Calculator implementations
│ │ ├── base.py # Base class with shared get_cost flow
│ │ ├── cost_utils.py
│ │ ├── best_effort.py
│ │ └── <source>.py # ~5 lines each
│ ├── providers/ # Remote pricing API clients (httpx + cache)
│ │ └── <source>_client.py
│ └── data/ # Config loading, model resolution, token extraction
│ ├── config_loader.py # Cached config loading
│ ├── response_transformer.py
│ └── model_resolver.py
└── .github/workflows/ # CI/CD
├── auto-release.yml # Detects version bumps, triggers publish
├── publish-npm.yml # Build, test, publish to npm, smoke test
└── publish-pypi.yml # Build, test, publish to PyPI, smoke test
A cost calculation request goes through the base class flow:
Response Object
│
▼
┌──────────────────────────────────────────────────┐
│ 1. Extract Model │
│ Read the "model" field from the response │
└───────────────────────┬──────────────────────────┘
▼
┌──────────────────────────────────────────────────┐
│ 2. Infer Provider │
│ Look up model in Berri's model→provider map │
│ to determine the provider (openai, anthropic…) │
└───────────────────────┬──────────────────────────┘
▼
┌──────────────────────────────────────────────────┐
│ 3. Extract Tokens │
│ Load provider-specific JSONPath mappings from │
│ response-mappings.json and extract: │
│ • input tokens • output tokens │
│ • cache read • cache creation (if present) │
└───────────────────────┬──────────────────────────┘
▼
┌──────────────────────────────────────────────────┐
│ 4. Resolve Pricing & Compute │
│ Fetch live pricing → normalize model ID → │
│ match model → apply cost formula │
└───────────────────────┬──────────────────────────┘
▼
┌──────────────────────────────────────────────────┐
│ 5. Detect Tool Calls & Compute Tool Cost │
│ Check for tool_calls / tool_use / functionCall │
│ If present and pricing includes system prompt │
│ token count, compute additional tool call cost │
└───────────────────────┬──────────────────────────┘
▼
{ currency, cost, toolCallCost? }
| Layer | Node | Python | Role |
|---|---|---|---|
| Calculators | calculator/ |
calculator/ |
Base class implements shared flow; subclasses declare pricing source config |
| Providers | providers/ |
providers/ |
Fetch, cache, and normalize remote pricing data |
| Data | data/ |
data/ |
Cached config loading, JSONPath-based token extraction, model normalization |
| Utils | utils.ts |
utils.py |
Shared numeric parsing utilities |
| Shared Config | configs/ |
configs/ |
Provider-agnostic JSONPath mappings bundled into both SDKs at publish time |
| Decision | Rationale |
|---|---|
| Parameterized base class | Each calculator is ~5 lines of config (pricing source name + getter + optional custom lookup); all shared logic is in one place |
Shared configs/ directory |
Single source of truth ensures both SDKs extract tokens identically |
| Best-effort fallback chain | Multiple pricing sources improve resilience — if one is down or lacks a model, the next source is tried |
| Provider-aware token extraction | Each LLM provider uses a different response schema; JSONPath mappings handle this without hardcoded parsing |
| Provider inference from model ID | Uses Berri's model→provider map so callers don't need to specify the provider |
| Model normalization | Strips prefixes (openai/gpt-4o → gpt-4o) and lowercases for cross-source consistency |
| Cache token support | Separately priced cache read/creation tokens (Anthropic, OpenAI) are computed with their own rates |
| Tool call cost tracking | When a response contains tool use and the pricing source provides tool_use_system_prompt_tokens, the additional cost is reported separately |
| Cached config loading | response-mappings.json is read from disk once and cached in memory |
Shared parseNumeric utility |
One canonical implementation per SDK avoids inconsistent numeric parsing across provider clients |
| Async Node / sync Python | Node uses native fetch; Python uses blocking httpx — each follows its ecosystem's conventions |
| Independent versioning | Node and Python packages version and release independently, allowing different cadences |
| CI bundles configs at publish | configs/*.json are copied into each SDK's dist at build time so published packages are self-contained |
| Directory | Description |
|---|---|
node/ |
TypeScript SDK (Node.js ≥ 18, Bun for tests) |
python/ |
Python SDK (≥ 3.10) |
configs/ |
Shared JSONPath mappings |
Each calculator uses a different pricing source. BestEffortCalculator tries them all in order.
| Calculator | Pricing Source | Node | Python |
|---|---|---|---|
BestEffortCalculator |
All sources (fallback chain) | Yes | Yes |
OpenRouterBasedCalculator |
OpenRouter | Yes | Yes |
BerrilmBasedCalculator |
BerriAI/LiteLLM | Yes | Yes |
PortkeyBasedCalculator |
Portkey | Yes | Yes |
HeliconeBasedCalculator |
Helicone | Yes | Yes |
BestEffortCalculator fallback order: OpenRouter → Berri → Portkey → Helicone.
Token extraction supports provider-specific response formats out of the box:
| Provider | Response format |
|---|---|
| OpenAI (Chat Completions) | usage.prompt_tokens / usage.completion_tokens |
| OpenAI (Responses API) | usage.input_tokens / usage.output_tokens |
| OpenRouter | usage.prompt_tokens / usage.completion_tokens |
| Anthropic | usage.input_tokens / usage.output_tokens |
usageMetadata.promptTokenCount / usageMetadata.candidatesTokenCount |
|
| Cohere | meta.billed_units.input_tokens / meta.billed_units.output_tokens |
| Meta, Mistral, xAI, DeepSeek, Qwen | usage.prompt_tokens / usage.completion_tokens |
| Jina | usage.prompt_tokens / usage.total_tokens |
Unknown providers fall back to a default mapping that covers the most common response shapes.
Both SDKs read from the configs/ directory:
| File | Purpose |
|---|---|
response-mappings.json |
JSONPath expressions for extracting token usage per provider |
Config resolution order: LLMCOST_CONFIGS_DIR env var → bundled package data → repo-root configs/.
Configs are automatically bundled into the published packages during CI. The shared source of truth is the configs/ directory at the repo root.
All errors extend LlmcostError:
| Error | When |
|---|---|
UsageNotFoundError |
Token usage cannot be extracted from the response |
ModelNotFoundError |
Model not found in the pricing source |
PricingUnavailableError |
Pricing data has invalid values |
ModelInferenceError |
Model ID cannot be read from the response |
ProviderInferenceError |
Provider cannot be determined from model ID |
BestEffortCalculationError |
All pricing sources failed (.causes has individual errors) |
| Aspect | Node.js | Python |
|---|---|---|
| Calculators | All async (await getCost(response, options?)) |
Synchronous (get_cost(response, *, model?, provider?)) |
| HTTP client | fetch |
httpx |
| JSONPath engine | jsonpath-plus |
jsonpath-ng |
cd node
npm install
npm run typecheck
npm test
npm run test:e2ecd python
uv sync
uv run pytestCreate .env in the repo root for optional live tests:
LLMCOST_E2E_LIVE=true
LLMCOST_E2E_MODEL=gpt-4o-mini
OPENROUTER_API_KEY=...
LLMCOST_E2E_OPENROUTER_MODEL=openai/gpt-4o-mini
OPENAI_API_KEY=...
LLMCOST_E2E_OPENAI_RESPONSES_MODEL=gpt-4o-mini
LLMCOST_E2E_OPENAI_COMPLETIONS_MODEL=gpt-3.5-turbo-instructE2E tests prefer OpenRouter when OPENROUTER_API_KEY is set and fall back to native OpenAI. OpenAI endpoint coverage includes /v1/responses, /v1/chat/completions, and /v1/completions (legacy, opt-in).
MIT