This guide helps AI assistants understand and work with the Lingua codebase effectively.
Lingua is a universal message format that compiles to provider-specific formats with zero runtime overhead. It's designed to allow seamless interoperability between different LLM providers without runtime penalties.
- Universal compatibility: Supports 100% of provider-specific quirks and capabilities
- Zero runtime overhead: Pure compile-time translation to native provider formats
- Type safety: Full TypeScript and Rust type generation with bidirectional validation
- No network calls: This is a message format library, not an API client
- Explicit error handling: All errors must be properly handled, never silently swallowed
- No hidden marker fields: Do not encode provider semantics via internal marker keys (for example in
provider_options) to fake lossless roundtrips. - Ask when non-lossy mapping is unclear: If the universal type cannot represent a provider feature non-lossily, stop and ask for clarification on the intended canonical representation before implementing a workaround.
- No unapproved fallback logic: Do not add ad-hoc fallback parsing/translation paths (for example
fallback_*helpers) without checking with the programmer first. - Typed boundaries only: At provider boundaries, parse into well-defined typed structs/enums. Do not add lenient raw-JSON parsing that guesses defaults for required fields (for example defaulting missing
roletouser, lowercasing unknown roles, or inventing emptycontent). - Do not handwrite provider-format structs: Do not manually define Rust structs/enums that represent provider wire formats when generated or canonical provider types already exist. Fix generation or add typed adapters around canonical types instead.
- Do not inspect
serde_json::Valuedirectly for provider semantics: Do not branch on provider-format fields via ad-hocValuemap access. Deserialize into typed provider or typed compatibility structs first, then convert. - Lenient import paths are typed boundaries too: Files like
processing/import.rsare not exempt. For anyrole/content/tool_call_idcompatibility handling, first deserialize into typed compatibility structs (with serde aliases as needed), then branch on typed enums/fields. - Pre-edit parser guardrail: Before finalizing parser/converter changes in typed-boundary code, scan your diff for new
as_object(),.get(\"...\"),Value::Object, or rawMap<String, Value>field-plucking used for semantics. If present, rewrite to typed deserialization or stop and ask. - Fix via types or explicit errors: If fuzzing finds unsupported/ambiguous shapes, either model them explicitly in types/converters or return a clear error. Do not silently coerce invalid input into a "best effort" shape.
- Typed-boundary CI gate: CI enforces
make typed-boundary-check-branch BASE=origin/<base-branch>on pull requests. Runningmake typed-boundary-checklocally is recommended for faster feedback, but not required as a pre-commit hook. - Typed extras views over raw map access: If provider extras must be read, deserialize extras into a typed view struct first; do not pluck fields ad-hoc with
map.get(...).
Always use sentence case for all headings, not title case:
- ✅
## Pipeline overview - ❌
## Pipeline Overview
Be concise and direct:
- Focus on what, not why (unless specifically asked)
- Avoid unnecessary explanations or summaries
- Use bullet points and structured formats
src/
├── universal/ # Core Lingua message types
├── providers/ # Provider-specific API type definitions
├── translators/ # Bidirectional format conversion logic
├── capabilities/ # Provider capability detection
└── lib.rs # Main entry point and re-exports
Each provider should have:
- Separate request/response types: Don't conflate them into single structs
- Complete type coverage: All fields from provider SDKs, even optional ones
- Validation tests: TypeScript compatibility tests in
tests/typescript/{provider}/
- Check for SDK updates in provider test directories
- Extract TypeScript types manually from provider SDKs
- Convert to Rust following consistent patterns (see pipelines/ docs)
- Validate compatibility through multi-layer testing
- Update translators to use new types
Rust type derivations:
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")] // when neededTypeScript exports (for ts-rs):
#[derive(TS)]
#[ts(export, export_to = "bindings/typescript/")]Optional fields: Always use Option<T> for optional provider fields
Union types: Convert TypeScript unions to Rust enums or separate structs
Type compatibility: Verify Rust-generated TypeScript matches provider SDK types Round-trip testing: Ensure lossless serialization/deserialization Real API integration: Test with actual provider APIs when possible
When adding or fixing a case under payloads/import-cases/, follow this order.
- Anonymize first (if needed).
- If the span includes non-anonymized user/company/PII content, ask the user whether to anonymize now and confirm expected anonymization level.
- Do not proceed with fixture assertions on unanonymized sensitive data unless the user explicitly approves.
- Generate baseline assertions from imported messages.
- Run exactly:
GENERATE_MISSING=1 cargo test -p lingua --test import_fixtures -- --nocapture - Confirm the failing behavior intentionally.
- Re-run the test without
GENERATE_MISSINGand verify the target case fails for the expected reason. - If expected behavior is unclear, stop and ask for clarification before implementing a fallback.
- Re-run the test without
- Fix importer/converter logic.
- Keep typed boundaries (typed structs/enums at provider boundaries and compatibility boundaries).
- Add or update assertions only after the code fix, not as a substitute for the fix.
- Name the case by behavior, not by span ID.
- Rename fixture pairs to descriptive kebab-case names that describe behavior (for example
chat-completions-tool-role-string-content.json). - Avoid UUID or
span_<id>filenames once behavior is understood.
- Rename fixture pairs to descriptive kebab-case names that describe behavior (for example
When fixing provider transform behavior, follow this order. Do not skip steps.
- Add or update the payload case first.
- For provider parameter behavior, start in
payloads/cases/params.ts. - Use a case name that maps directly to the behavior being fixed.
- For provider parameter behavior, start in
- Capture and triage failures.
- Run
make capture FILTER=<case_name>. - If capture emits failed requests/transforms, treat this as unresolved logic in adapter/converter code.
- Use transform path names to triage ownership:
payloads/transforms/chat-completions_to_anthropic/<case>.json→ Anthropic adapter pathpayloads/transforms/chat-completions_to_google/<case>.json→ Google adapter path- same pattern for other providers.
- Run
- Write the fix plan before implementation.
- Create
plan.mdinlingua/before making code changes. - The plan must include:
- root cause,
- target files,
- expected behavior after fix,
- tests to add/update,
- expected-diff impact (if any),
- command sequence to validate.
- Create
- Fix adapter/converter logic first.
- Do not use artifact regeneration as a substitute for code fixes.
- Prefer provider
adapter.rsfor cross-field policy/orchestration fixes. - Keep typed boundaries: parse into typed structs/enums; avoid new ad-hoc raw
Valueaccess.
- Run targeted tests, then re-capture.
- Run focused Rust tests for touched adapters first.
- Re-run capture for the affected case/pair.
- Run payload tests and sync checks.
- Only then update expected diffs (if intentional behavior loss remains).
- Use narrow
perTestCaseentries in:crates/coverage-report/src/requests_expected_differences.jsoncrates/coverage-report/src/streaming_expected_differences.jsoncrates/coverage-report/src/responses_expected_differences.json
- Do not add broad global exceptions for case-specific behavior.
- Use narrow
Use this exact flow after implementing a fix:
# 1) Capture and inspect this behavior
make capture FILTER=<case_name>
# 2) Run focused adapter tests
cargo test -p lingua <targeted_test_name_or_module>
# 3) Re-capture transforms for fixed behavior
make capture FILTER=<case_name>
# 4) Run payload transform checks
make test-payloads
# 5) If snapshots/transforms are stale after logic fix, regenerate failed artifacts
make regenerate-failed-transforms
# 6) Cross-provider guard
cargo test -p coverage-report --test cross_provider_test cross_provider_transformations_have_no_unexpected_failures
# 7) Typed-boundary checks
make typed-boundary-check
make typed-boundary-check-branch BASE=main- Do not run
make regenerate-failed-transformsbefore fixing adapter/converter logic. - Do not patch transform/snapshot files manually to hide failing transforms.
- Do not add new direct
Value.get(...)assertions/logic in typed-boundary-protected paths. - Do not add new semantic branching in
parse_lenient_*or import compatibility code using raw JSON map access; use typed compatibility structs/enums.
- Correctness over convenience: Match provider APIs exactly
- Type safety over flexibility: Strict typing prevents runtime errors
- Manual precision over automation: Control type design decisions
- Validation over assumptions: Test everything thoroughly
- Provider modules:
src/providers/{provider}/(e.g.,openai/,anthropic/) - Request types:
{provider}_request.rsorrequest.rsin provider directory - Response types:
{provider}_response.rsorresponse.rsin provider directory - Tests:
tests/typescript/{provider}/with provider-specific validation
🚨 DO NOT EDIT generated.rs FILES DIRECTLY 🚨
Files named generated.rs are automatically generated and will be overwritten:
src/providers/google/generated.rs- Generated from protobuf filessrc/providers/openai/generated.rs- Generated from OpenAPI specssrc/providers/anthropic/generated.rs- Generated from OpenAPI specs
ANY MANUAL CHANGES TO THESE FILES WILL BE PERMANENTLY LOST ON NEXT REGENERATION
If you need to fix issues in generated files:
- ✅ DO: Edit the generation logic in
scripts/generate_types/main.rs - ✅ DO: Add fixes to the
fix_google_type_references()or similar functions - ✅ DO: Regenerate using
cargo run --bin generate-types <provider> - ❌ DON'T: Edit the generated files directly - your changes will be lost!
- Any struct, enum, or type definitions in
generated.rsfiles - Field types, names, or annotations in generated types
- Serde attributes or derives in generated code
Claude Code AI Assistant: You must NEVER directly edit generated.rs files. Always use the generation pipeline and post-processing functions.
Example of proper fix approach:
// In scripts/generate_types/main.rs, in fix_google_type_references():
fn fix_google_type_references(content: String) -> String {
let mut fixed = content;
// Fix doctest JSON examples that fail to compile
fixed = fixed.replace(
" /// ```\n /// {\n /// \"type\": \"object\",",
" /// ```json\n /// {\n /// \"type\": \"object\","
);
fixed
}This ensures fixes are permanent and survive regeneration cycles.
TypeScript → Rust conversions:
string | numberunions need careful handling (usually separate enums)- Optional properties (
field?:) becomeOption<field> - Nested objects may need
serde_json::Valuefor unknown structures - Array types become
Vec<T>
Serde configuration:
- Use
rename_all = "snake_case"sparingly (only when provider uses snake_case) - Most providers use camelCase, so default serde behavior is correct
- Add
#[serde(skip_serializing_if = "Option::is_none")]for optional fields
🚨 CRITICAL: Never silently swallow errors with unwrap_or_default() or unwrap_or() 🚨
Silent error handling makes debugging extremely difficult and can hide important issues. Always use explicit error propagation or logging.
❌ NEVER DO THIS:
// Dangerous - silently swallows serialization errors
serde_json::to_value(data).unwrap_or_default()
serde_json::to_string(data).unwrap_or(String::new())✅ ALWAYS DO THIS INSTEAD:
For functions that return Result (most conversions):
// Propagate errors with proper context
serde_json::to_value(data).map_err(|e| ConvertError::JsonSerializationFailed {
field: "field_name".to_string(),
error: e.to_string(),
})?
// Or for String error types:
serde_json::to_string(data)
.map_err(|e| format!("Failed to serialize field_name to JSON: {}", e))?For filter_map closures that return Option:
// For invalid data that's already known to be invalid, use appropriate fallback
match tool_arguments {
ToolCallArguments::Valid(map) => serde_json::Value::Object(map.clone()),
ToolCallArguments::Invalid(s) => serde_json::Value::String(s.clone()), // Don't try to parse invalid data
}Error types to use:
- OpenAI conversions: Use
ConvertErrorenum with specific variants - Anthropic conversions: Use descriptive
Stringerror messages - Always include context: field names, operation type, original data when safe
When adding new ConvertError variants:
pub enum ConvertError {
// Existing variants...
JsonSerializationFailed { field: String, error: String },
// Add new specific variants as needed
}
// Update the Display impl:
ConvertError::JsonSerializationFailed { field, error } => {
write!(f, "JSON serialization failed for field '{}': {}", field, error)
}Testing error conditions:
- Always test that error conditions produce meaningful error messages
- Verify that errors propagate correctly through the call stack
- Never ignore warnings from error handling during development
The pipelines/ directory contains automated tooling for:
- Downloading latest OpenAPI specifications from providers
- Generating Rust types automatically using typify
- Building and validating generated code
- Minimal type generation focused on chat completion APIs
Run the pipeline to update provider types:
./pipelines/generate-provider-types.sh openaiThis process is fully automated and generates only essential types to minimize code size.
Git hooks installation: After cloning the repository, install pre-commit hooks for consistent formatting:
./scripts/install-hooks.shThis installs hooks that automatically run:
cargo fmt- ensures consistent formatting
Code quality checks: Clippy linting is handled by GitHub Actions CI and will run on pull requests.
cargo clippy- catches common issues and enforces best practices
Hooks run automatically before each commit. To bypass temporarily: git commit --no-verify
Follow this step-by-step guide to add support for a new LLM provider:
mkdir -p src/providers/{provider}
touch src/providers/{provider}/mod.rs
touch src/providers/{provider}/request.rs
touch src/providers/{provider}/response.rs[features]
default = ["openai", "anthropic", "google", "bedrock", "{provider}"]
{provider} = ["dep:{provider-sdk}"] # Only if external SDK needed
[dependencies]
{provider-sdk} = { version = "1.0", optional = true } # If neededsrc/providers/{provider}/request.rs:
use serde::{Deserialize, Serialize};
use ts_rs::TS;
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize, TS)]
#[ts(export, export_to = "bindings/typescript/")]
pub struct {Provider}Request {
pub messages: Vec<{Provider}Message>,
pub model: String,
// ... other required fields
}
// Define all necessary types following provider API exactlysrc/providers/{provider}/response.rs:
use serde::{Deserialize, Serialize};
use ts_rs::TS;
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize, TS)]
#[ts(export, export_to = "bindings/typescript/")]
pub struct {Provider}Response {
pub choices: Vec<{Provider}Choice>,
pub usage: {Provider}Usage,
// ... other response fields
}src/providers/{provider}/mod.rs:
/*!
{Provider} API provider types.
*/
pub mod request;
pub mod response;
pub use request::{Provider}Request;
pub use response::{Provider}Response;src/providers/mod.rs:
#[cfg(feature = "{provider}")]
pub mod {provider};src/translators/{provider}.rs:
use crate::providers::{provider}::{Provider}Request, {Provider}Response};
use crate::translators::{TranslationResult, Translator};
use crate::universal::{SimpleMessage, SimpleRole};
pub struct {Provider}Translator;
impl Translator<{Provider}Request, {Provider}Response> for {Provider}Translator {
fn to_provider_request(messages: Vec<SimpleMessage>) -> TranslationResult<{Provider}Request> {
// Convert SimpleMessage to provider format
todo!()
}
fn from_provider_response(response: {Provider}Response) -> TranslationResult<Vec<SimpleMessage>> {
// Convert provider response back to SimpleMessage
todo!()
}
}
// Convenience functions
pub fn to_{provider}_format(messages: Vec<SimpleMessage>) -> TranslationResult<{Provider}Request> {
{Provider}Translator::to_provider_request(messages)
}
pub fn from_{provider}_response(response: {Provider}Response) -> TranslationResult<Vec<SimpleMessage>> {
{Provider}Translator::from_provider_response(response)
}src/translators/mod.rs:
#[cfg(feature = "{provider}")]
pub mod {provider};
// Re-export convenience functions
#[cfg(feature = "{provider}")]
pub use {provider}::{from_{provider}_response, to_{provider}_format};Message structure:
- Use
Vec<ContentBlock>pattern for multi-modal content - Support text, images, tool calls as separate enum variants
- Follow provider's exact field names and casing
Serde configuration:
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize, TS)]
#[ts(export, export_to = "bindings/typescript/")]
#[serde(rename_all = "camelCase")] // Match provider API casing
pub struct {Provider}Message {
#[serde(skip_serializing_if = "Option::is_none")]
pub optional_field: Option<String>,
}Handle serde_json::Value for TypeScript:
// For unknown/flexible JSON structures
#[ts(type = "any")]
pub field: serde_json::Value,
// For fields that shouldn't appear in TypeScript
#[ts(skip)]
pub internal_field: InternalType,- Compile test:
cargo check --features="{provider}" - Isolation test:
cargo check --no-default-features --features="{provider}" - Integration test: Create simple translation examples
- TypeScript generation: Verify TS types are generated correctly
Update README.md:
- Add provider to feature flags section
- Update architecture diagram
- Add usage examples
OpenAPI-based providers (OpenAI, Anthropic):
- Can use automated generation from specs
- Usually have consistent REST API patterns
- Focus on chat completion endpoints
SDK-based providers (Bedrock, Google):
- May need to work with existing SDKs
- Handle SDK type conversion carefully
- Consider optional dependencies for large SDKs
Custom API providers:
- Manual type extraction from documentation
- Focus on core chat/completion functionality
- Implement streaming support if available
- Start minimal: Implement basic text chat first, add features incrementally
- Follow existing patterns: Study OpenAI and Bedrock implementations
- Test thoroughly: Verify type compatibility and serialization
- Document differences: Note any provider-specific quirks or limitations
- Consider streaming: Many providers support streaming responses
This process ensures consistent provider integration while maintaining type safety and zero-runtime overhead.