Dynamic BAML is a Python library that enables you to extract structured data from text using Large Language Models (LLMs) with dynamically generated schemas. Built on top of BoundaryML, it provides a high-level Python interface for BAML (Boundary Augmented Markup Language) with automatic schema generation.
Define your desired output structure as a simple Python dictionary, and Dynamic BAML handles the rest!
- π― Schema-First Approach: Define output structure with Python dictionaries
- π Dynamic BAML Generation: Automatically converts schemas to BAML code
- π Multi-Provider Support: Works with OpenAI, Anthropic, Ollama, and OpenRouter
- π‘οΈ Type Safety: Ensures structured, validated outputs
- π§ Easy Integration: Simple API with comprehensive error handling
- π Complex Types: Support for nested objects, enums, arrays, and optional fields
- β‘ Performance: Efficient temporary project management and cleanup
- πΌοΈ Image Support: Analyze images with vision models using BamlRuntime
pip install dynamic-bamlfrom dynamic_baml import call_with_schema
# Define your desired output structure
schema = {
"name": "string",
"age": "int",
"email": "string",
"is_active": "bool"
}
# Extract structured data from text
text = "John Doe is 30 years old, email: john@example.com, currently active user"
result = call_with_schema(
prompt_text=f"Extract user information from: {text}",
schema_dict=schema,
options={"provider": "openai", "model": "gpt-4"}
)
print(result)
# Output: {"name": "John Doe", "age": 30, "email": "john@example.com", "is_active": True}- Installation & Setup
- Core Concepts
- Schema Types
- Provider Configuration
- Logging
- Advanced Usage
- Error Handling
- Examples
- API Reference
- Python 3.8+
- BAML CLI from BoundaryML:
npm install -g @boundaryml/baml- This provides the core BAML compiler and runtime
- Learn more at docs.boundaryml.com
export OPENAI_API_KEY="your-openai-api-key"export ANTHROPIC_API_KEY="your-anthropic-api-key"# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve
ollama pull gemma3:1bexport OPENROUTER_API_KEY="your-openrouter-api-key"Define your desired output structure using Python dictionaries:
schema = {
"field_name": "field_type",
"nested_object": {
"sub_field": "string"
},
"optional_field": {"type": "string", "optional": True}
}Dynamic BAML automatically converts your schema to BAML code:
# Your schema:
{"name": "string", "age": "int"}
# Generated BAML:
class UserInfo {
name string
age int
}schema = {
"text": "string", # Text data
"number": "int", # Integer
"price": "float", # Decimal number
"active": "bool" # True/False
}schema = {
"tags": ["string"], # Array of strings
"scores": ["int"], # Array of integers
"ratings": ["float"] # Array of floats
}schema = {
"status": {
"type": "enum",
"values": ["draft", "published", "archived"]
},
"priority": {
"type": "enum",
"values": ["low", "medium", "high", "urgent"]
}
}schema = {
"user": {
"name": "string",
"email": "string",
"profile": {
"bio": "string",
"avatar_url": "string"
}
},
"metadata": {
"created_at": "string",
"updated_at": "string"
}
}schema = {
"name": "string", # Required
"email": {"type": "string", "optional": True}, # Optional
"phone": {"type": "string", "optional": True} # Optional
}options = {
"provider": "openai",
"model": "gpt-4",
"temperature": 0.1,
"max_tokens": 2000,
"timeout": 60
}options = {
"provider": "anthropic",
"model": "claude-3-5-sonnet-20241022",
"temperature": 0.1,
"max_tokens": 2000,
"timeout": 60
}options = {
"provider": "ollama",
"model": "gemma3:1b",
"base_url": "http://localhost:11434", # Optional
"temperature": 0.1,
"timeout": 120
}options = {
"provider": "openrouter",
"model": "google/gemini-2.0-flash-exp",
"temperature": 0.1,
"max_tokens": 2000,
"timeout": 60
}Dynamic BAML provides flexible logging options to control output verbosity and destination.
from dynamic_baml import call_with_schema
# Basic usage with logging to file
options = {
"provider": "openai",
"model": "gpt-4",
"log_level": "info", # Control verbosity
"log_file": "./baml.log" # Output to file
}
result = call_with_schema(prompt, schema, options)Control the verbosity of BAML logging output:
| Level | Description | Use Case |
|---|---|---|
"off" |
No logging output | Production where logs aren't needed |
"error" |
Only fatal errors | Production minimal logging |
"warn" |
Errors and warnings (default) | Standard production logging |
"info" |
Detailed execution info | Development and debugging |
"debug" |
Verbose details and requests | Deep debugging |
"trace" |
Everything (very verbose) | Troubleshooting |
Specify where logs should be written:
- Default (no
log_file): Logs go to terminal/stdout - File path: Logs written to specified file
- Directory creation: Parent directories created automatically
- Append mode: Multiple calls append to the same file
options = {
"provider": "openai",
"log_level": "info" # Logs to terminal with info level
}options = {
"provider": "openai",
"log_file": "./logs/baml.log" # Uses default log level
}options = {
"provider": "openai",
"log_level": "debug",
"log_file": "/var/log/baml/debug.log"
}options = {
"provider": "openai",
"log_level": "off" # No logging output at all
}options = {
"provider": "openai",
"log_level": "info",
"log_file": "./logs/2024/january/extraction.log" # Dirs created automatically
}from dynamic_baml import call_with_schema_safe
result = call_with_schema_safe(
prompt_text="Extract data from this text...",
schema_dict=schema,
options=options
)
if result["success"]:
data = result["data"]
print(f"Extracted: {data}")
else:
print(f"Error: {result['error']}")
print(f"Error type: {result['error_type']}")# Build effective prompts for better extraction
prompt = f"""
Please extract the following information from the text below:
REQUIRED FIELDS:
- name: Person's full name
- age: Person's age as a number
- email: Valid email address
TEXT TO ANALYZE:
{input_text}
Please be accurate and only extract information that is clearly stated.
"""
result = call_with_schema(prompt, schema, options)def process_documents(documents, schema, options):
results = []
for doc in documents:
try:
result = call_with_schema(
f"Extract information from: {doc['content']}",
schema,
options
)
results.append({"doc_id": doc["id"], "data": result})
except Exception as e:
results.append({"doc_id": doc["id"], "error": str(e)})
return resultsfrom dynamic_baml.exceptions import (
DynamicBAMLError, # Base exception
SchemaGenerationError, # Schema conversion failed
BAMLCompilationError, # BAML code compilation failed
LLMProviderError, # LLM provider call failed
ResponseParsingError, # Response parsing failed
ConfigurationError, # Provider configuration invalid
TimeoutError # Request timeout
)
try:
result = call_with_schema(prompt, schema, options)
except SchemaGenerationError as e:
print(f"Schema error: {e.message}")
print(f"Invalid schema: {e.schema_dict}")
except LLMProviderError as e:
print(f"Provider error: {e.message}")
print(f"Provider: {e.provider}")
except ResponseParsingError as e:
print(f"Parsing error: {e.message}")
print(f"Raw response: {e.raw_response}")def robust_extraction(text, schema, providers):
"""Try multiple providers for reliable extraction."""
for provider_opts in providers:
try:
return call_with_schema(text, schema, provider_opts)
except LLMProviderError:
continue # Try next provider
except Exception as e:
print(f"Unexpected error with {provider_opts['provider']}: {e}")
raise Exception("All providers failed")
# Usage
providers = [
{"provider": "openai", "model": "gpt-4"},
{"provider": "anthropic", "model": "claude-3-5-sonnet-20241022"},
{"provider": "ollama", "model": "gemma3:1b"}
]
result = robust_extraction(text, schema, providers)See the examples/ directory for comprehensive examples:
- Basic Usage
- Complex Schemas
- Multi-Provider Setup
- Error Handling
- Batch Processing
- Real-World Use Cases
- Image Analysis - NEW! Multimodal AI with vision models
Dynamic BAML supports image analysis through BamlRuntime for advanced multimodal use cases.
The main call_with_schema API currently supports text-only prompts. For actual image analysis, use BamlRuntime directly as shown below.
from baml_py import BamlRuntime, Image
import base64
# Convert image to base64
def image_to_base64(image_path):
with open(image_path, 'rb') as f:
return base64.b64encode(f.read()).decode('utf-8')
# Define BAML schema with image support
baml_schema = """
enum ObjectType {
Person
Animal
Vehicle
Building
Other
}
class ImageAnalysis {
primary_object ObjectType
description string
confidence float
}
function AnalyzeImage(image: image) -> ImageAnalysis {
client VisionModel
prompt #"
Analyze this image and identify the primary object.
Image: {{ image }}
"#
}
client<llm> VisionModel {
provider openai-generic
options {
model "gpt-4-vision-preview"
api_key env.OPENAI_API_KEY
}
}
"""
# Create runtime and analyze image
async def analyze_image(image_path):
runtime = BamlRuntime.from_files(
root_path=".",
files={"schema.baml": baml_schema}
)
# Create image object
image_data = image_to_base64(image_path)
image = Image.from_base64("image/jpeg", image_data)
# Call the function
ctx = runtime.create_context_manager()
result = await runtime.call_function(
"AnalyzeImage",
{"image": image},
ctx
)
return result- OpenAI: GPT-4 Vision (
gpt-4-vision-preview) - Anthropic: Claude 3 models with vision
- OpenRouter: Google Gemini Vision, Claude 3, and others
- Ollama: LLaVA and other local vision models
For a complete example with multiple providers and fallback strategies, see examples/image_analysis.py.
Extract structured data using a schema.
Parameters:
prompt_text(str): Text prompt to send to the LLMschema_dict(dict): Schema definition dictionaryoptions(dict, optional): Provider configuration options
Returns:
dict: Extracted data matching the schema structure
Raises:
DynamicBAMLError: Base exception for all errorsSchemaGenerationError: Schema conversion failedLLMProviderError: Provider call failedResponseParsingError: Response parsing failed
Safe version that returns structured results instead of raising exceptions.
Returns:
{
"success": bool,
"data": dict, # Present if success=True
"error": str, # Present if success=False
"error_type": str # Present if success=False
}Generate BAML schema code from dictionary.
Parameters:
schema_dict(dict): Schema definitionschema_name(str): Name for the generated schema
Returns:
str: Valid BAML schema code
Create provider instance based on options.
Get list of currently available providers.
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Dynamic BAML is built on top of BoundaryML and the powerful BAML language. We extend our gratitude to the BoundaryML team for creating the foundational technology that makes structured LLM outputs possible.
About BoundaryML:
- π Website: boundaryml.com
- π BAML Documentation: docs.boundaryml.com
- π οΈ BAML CLI:
npm install -g @boundaryml/baml
Dynamic BAML provides a Python-friendly interface and automatic schema generation on top of the robust BAML foundation.
This project is licensed under the MIT License - see LICENSE file for details.
- π Documentation
- π Issue Tracker
- π¬ Discussions
# Complex manual prompt engineering
prompt = """
Extract user data and format as JSON with these exact fields:
- name (string)
- age (integer)
- email (string)
- is_active (boolean)
Text: "John Doe is 30 years old..."
Please ensure the output is valid JSON with no extra text.
"""
response = llm.call(prompt)
data = json.loads(response) # Hope it's valid JSON!# Clean, type-safe schema definition
schema = {
"name": "string",
"age": "int",
"email": "string",
"is_active": "bool"
}
data = call_with_schema(
"Extract user info from: John Doe is 30 years old...",
schema
) # Guaranteed structured output!Benefits:
- β Type Safety: Guaranteed schema compliance
- β No JSON Parsing: Direct structured output
- β Better Prompts: Optimized prompt engineering
- β Error Handling: Comprehensive error management
- β Multi-Provider: Easy provider switching
- β Complex Types: Enums, nested objects, arrays