Dynamic BAML 🚀

Dynamic BAML is a Python library that enables you to extract structured data from text using Large Language Models (LLMs) with dynamically generated schemas. Built on top of BoundaryML, it provides a high-level Python interface for BAML (Boundary Augmented Markup Language) with automatic schema generation.

Define your desired output structure as a simple Python dictionary, and Dynamic BAML handles the rest!

✨ Features

🎯 Schema-First Approach: Define output structure with Python dictionaries
🔄 Dynamic BAML Generation: Automatically converts schemas to BAML code
🌐 Multi-Provider Support: Works with OpenAI, Anthropic, Ollama, and OpenRouter
🛡️ Type Safety: Ensures structured, validated outputs
🔧 Easy Integration: Simple API with comprehensive error handling
📊 Complex Types: Support for nested objects, enums, arrays, and optional fields
⚡ Performance: Efficient temporary project management and cleanup
🖼️ Image Support: Analyze images with vision models using BamlRuntime

🚀 Quick Start

Installation

pip install dynamic-baml

Basic Usage

from dynamic_baml import call_with_schema

# Define your desired output structure
schema = {
    "name": "string",
    "age": "int", 
    "email": "string",
    "is_active": "bool"
}

# Extract structured data from text
text = "John Doe is 30 years old, email: john@example.com, currently active user"

result = call_with_schema(
    prompt_text=f"Extract user information from: {text}",
    schema_dict=schema,
    options={"provider": "openai", "model": "gpt-4"}
)

print(result)
# Output: {"name": "John Doe", "age": 30, "email": "john@example.com", "is_active": True}

🛠️ Installation & Setup

Requirements

Python 3.8+
BAML CLI from BoundaryML: npm install -g @boundaryml/baml
- This provides the core BAML compiler and runtime
- Learn more at docs.boundaryml.com

Provider Setup

OpenAI

export OPENAI_API_KEY="your-openai-api-key"

Anthropic

export ANTHROPIC_API_KEY="your-anthropic-api-key"

Ollama (Local)

# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve
ollama pull gemma3:1b

OpenRouter

export OPENROUTER_API_KEY="your-openrouter-api-key"

🧠 Core Concepts

Schema Dictionary

Define your desired output structure using Python dictionaries:

schema = {
    "field_name": "field_type",
    "nested_object": {
        "sub_field": "string"
    },
    "optional_field": {"type": "string", "optional": True}
}

BAML Generation

Dynamic BAML automatically converts your schema to BAML code:

# Your schema:
{"name": "string", "age": "int"}

# Generated BAML:
class UserInfo {
  name string
  age int
}

📊 Schema Types

Basic Types

schema = {
    "text": "string",        # Text data
    "number": "int",        # Integer
    "price": "float",       # Decimal number
    "active": "bool"        # True/False
}

Arrays

schema = {
    "tags": ["string"],     # Array of strings
    "scores": ["int"],      # Array of integers
    "ratings": ["float"]    # Array of floats
}

Enums

schema = {
    "status": {
        "type": "enum",
        "values": ["draft", "published", "archived"]
    },
    "priority": {
        "type": "enum", 
        "values": ["low", "medium", "high", "urgent"]
    }
}

Nested Objects

schema = {
    "user": {
        "name": "string",
        "email": "string",
        "profile": {
            "bio": "string",
            "avatar_url": "string"
        }
    },
    "metadata": {
        "created_at": "string",
        "updated_at": "string"
    }
}

Optional Fields

schema = {
    "name": "string",                              # Required
    "email": {"type": "string", "optional": True}, # Optional
    "phone": {"type": "string", "optional": True}  # Optional
}

⚙️ Provider Configuration

OpenAI

options = {
    "provider": "openai",
    "model": "gpt-4",
    "temperature": 0.1,
    "max_tokens": 2000,
    "timeout": 60
}

Anthropic

options = {
    "provider": "anthropic", 
    "model": "claude-3-5-sonnet-20241022",
    "temperature": 0.1,
    "max_tokens": 2000,
    "timeout": 60
}

Ollama (Local)

options = {
    "provider": "ollama",
    "model": "gemma3:1b",
    "base_url": "http://localhost:11434",  # Optional
    "temperature": 0.1,
    "timeout": 120
}

OpenRouter

options = {
    "provider": "openrouter",
    "model": "google/gemini-2.0-flash-exp",
    "temperature": 0.1,
    "max_tokens": 2000,
    "timeout": 60
}

🪵 Logging

Dynamic BAML provides flexible logging options to control output verbosity and destination.

Quick Start

from dynamic_baml import call_with_schema

# Basic usage with logging to file
options = {
    "provider": "openai",
    "model": "gpt-4",
    "log_level": "info",           # Control verbosity  
    "log_file": "./baml.log"       # Output to file
}

result = call_with_schema(prompt, schema, options)

Configuration Options

Log Levels

Control the verbosity of BAML logging output:

Level	Description	Use Case
`"off"`	No logging output	Production where logs aren't needed
`"error"`	Only fatal errors	Production minimal logging
`"warn"`	Errors and warnings (default)	Standard production logging
`"info"`	Detailed execution info	Development and debugging
`"debug"`	Verbose details and requests	Deep debugging
`"trace"`	Everything (very verbose)	Troubleshooting

Log File Output

Specify where logs should be written:

Default (no log_file): Logs go to terminal/stdout
File path: Logs written to specified file
Directory creation: Parent directories created automatically
Append mode: Multiple calls append to the same file

Usage Examples

1. Log Level Only (Terminal Output)

options = {
    "provider": "openai",
    "log_level": "info"  # Logs to terminal with info level
}

2. Log File Only (Default Level)

options = {
    "provider": "openai", 
    "log_file": "./logs/baml.log"  # Uses default log level
}

3. Both Level and File

options = {
    "provider": "openai",
    "log_level": "debug",
    "log_file": "/var/log/baml/debug.log"
}

4. Disable Logging Completely

options = {
    "provider": "openai",
    "log_level": "off"  # No logging output at all
}

5. Nested Log Directories

options = {
    "provider": "openai",
    "log_level": "info",
    "log_file": "./logs/2024/january/extraction.log"  # Dirs created automatically
}

🔄 Advanced Usage

Safe Calling (No Exceptions)

from dynamic_baml import call_with_schema_safe

result = call_with_schema_safe(
    prompt_text="Extract data from this text...",
    schema_dict=schema,
    options=options
)

if result["success"]:
    data = result["data"]
    print(f"Extracted: {data}")
else:
    print(f"Error: {result['error']}")
    print(f"Error type: {result['error_type']}")

Custom Prompting

# Build effective prompts for better extraction
prompt = f"""
Please extract the following information from the text below:

REQUIRED FIELDS:
- name: Person's full name
- age: Person's age as a number
- email: Valid email address

TEXT TO ANALYZE:
{input_text}

Please be accurate and only extract information that is clearly stated.
"""

result = call_with_schema(prompt, schema, options)

Batch Processing

def process_documents(documents, schema, options):
    results = []
    for doc in documents:
        try:
            result = call_with_schema(
                f"Extract information from: {doc['content']}", 
                schema, 
                options
            )
            results.append({"doc_id": doc["id"], "data": result})
        except Exception as e:
            results.append({"doc_id": doc["id"], "error": str(e)})
    return results

🚨 Error Handling

Exception Types

from dynamic_baml.exceptions import (
    DynamicBAMLError,           # Base exception
    SchemaGenerationError,      # Schema conversion failed
    BAMLCompilationError,       # BAML code compilation failed
    LLMProviderError,          # LLM provider call failed
    ResponseParsingError,       # Response parsing failed
    ConfigurationError,         # Provider configuration invalid
    TimeoutError               # Request timeout
)

try:
    result = call_with_schema(prompt, schema, options)
except SchemaGenerationError as e:
    print(f"Schema error: {e.message}")
    print(f"Invalid schema: {e.schema_dict}")
except LLMProviderError as e:
    print(f"Provider error: {e.message}")
    print(f"Provider: {e.provider}")
except ResponseParsingError as e:
    print(f"Parsing error: {e.message}")
    print(f"Raw response: {e.raw_response}")

Error Recovery

def robust_extraction(text, schema, providers):
    """Try multiple providers for reliable extraction."""
    for provider_opts in providers:
        try:
            return call_with_schema(text, schema, provider_opts)
        except LLMProviderError:
            continue  # Try next provider
        except Exception as e:
            print(f"Unexpected error with {provider_opts['provider']}: {e}")
    
    raise Exception("All providers failed")

# Usage
providers = [
    {"provider": "openai", "model": "gpt-4"},
    {"provider": "anthropic", "model": "claude-3-5-sonnet-20241022"},
    {"provider": "ollama", "model": "gemma3:1b"}
]

result = robust_extraction(text, schema, providers)

📚 Examples

See the examples/ directory for comprehensive examples:

Basic Usage
Complex Schemas
Multi-Provider Setup
Error Handling
Batch Processing
Real-World Use Cases
Image Analysis - NEW! Multimodal AI with vision models

🖼️ Image Support (Experimental)

Dynamic BAML supports image analysis through BamlRuntime for advanced multimodal use cases.

Current Limitations

The main call_with_schema API currently supports text-only prompts. For actual image analysis, use BamlRuntime directly as shown below.

Image Analysis Example

from baml_py import BamlRuntime, Image
import base64

# Convert image to base64
def image_to_base64(image_path):
    with open(image_path, 'rb') as f:
        return base64.b64encode(f.read()).decode('utf-8')

# Define BAML schema with image support
baml_schema = """
enum ObjectType {
  Person
  Animal
  Vehicle
  Building
  Other
}

class ImageAnalysis {
  primary_object ObjectType
  description string
  confidence float
}

function AnalyzeImage(image: image) -> ImageAnalysis {
  client VisionModel
  prompt #"
    Analyze this image and identify the primary object.
    Image: {{ image }}
  "#
}

client<llm> VisionModel {
  provider openai-generic
  options {
    model "gpt-4-vision-preview"
    api_key env.OPENAI_API_KEY
  }
}
"""

# Create runtime and analyze image
async def analyze_image(image_path):
    runtime = BamlRuntime.from_files(
        root_path=".",
        files={"schema.baml": baml_schema}
    )
    
    # Create image object
    image_data = image_to_base64(image_path)
    image = Image.from_base64("image/jpeg", image_data)
    
    # Call the function
    ctx = runtime.create_context_manager()
    result = await runtime.call_function(
        "AnalyzeImage",
        {"image": image},
        ctx
    )
    
    return result

Supported Vision Models

OpenAI: GPT-4 Vision (gpt-4-vision-preview)
Anthropic: Claude 3 models with vision
OpenRouter: Google Gemini Vision, Claude 3, and others
Ollama: LLaVA and other local vision models

For a complete example with multiple providers and fallback strategies, see examples/image_analysis.py.

📖 API Reference

Core Functions

`call_with_schema(prompt_text, schema_dict, options=None) -> dict`

Extract structured data using a schema.

Parameters:

prompt_text (str): Text prompt to send to the LLM
schema_dict (dict): Schema definition dictionary
options (dict, optional): Provider configuration options

Returns:

dict: Extracted data matching the schema structure

Raises:

DynamicBAMLError: Base exception for all errors
SchemaGenerationError: Schema conversion failed
LLMProviderError: Provider call failed
ResponseParsingError: Response parsing failed

`call_with_schema_safe(prompt_text, schema_dict, options=None) -> dict`

Safe version that returns structured results instead of raising exceptions.

Returns:

{
    "success": bool,
    "data": dict,      # Present if success=True
    "error": str,      # Present if success=False
    "error_type": str  # Present if success=False
}

Schema Generator

`DictToBAMLGenerator.generate_schema(schema_dict, schema_name) -> str`

Generate BAML schema code from dictionary.

Parameters:

schema_dict (dict): Schema definition
schema_name (str): Name for the generated schema

Returns:

str: Valid BAML schema code

Provider Factory

`LLMProviderFactory.create_provider(options) -> LLMProvider`

Create provider instance based on options.

`LLMProviderFactory.get_available_providers() -> List[str]`

Get list of currently available providers.

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

🙏 Acknowledgments

Dynamic BAML is built on top of BoundaryML and the powerful BAML language. We extend our gratitude to the BoundaryML team for creating the foundational technology that makes structured LLM outputs possible.

About BoundaryML:

🔗 Website: boundaryml.com
📚 BAML Documentation: docs.boundaryml.com
🛠️ BAML CLI: npm install -g @boundaryml/baml

Dynamic BAML provides a Python-friendly interface and automatic schema generation on top of the robust BAML foundation.

📄 License

This project is licensed under the MIT License - see LICENSE file for details.

🆘 Support

🏆 Why Dynamic BAML?

Traditional Approach

# Complex manual prompt engineering
prompt = """
Extract user data and format as JSON with these exact fields:
- name (string)
- age (integer) 
- email (string)
- is_active (boolean)

Text: "John Doe is 30 years old..."

Please ensure the output is valid JSON with no extra text.
"""

response = llm.call(prompt)
data = json.loads(response)  # Hope it's valid JSON!

Dynamic BAML Approach

# Clean, type-safe schema definition
schema = {
    "name": "string",
    "age": "int", 
    "email": "string",
    "is_active": "bool"
}

data = call_with_schema(
    "Extract user info from: John Doe is 30 years old...",
    schema
)  # Guaranteed structured output!

Benefits:

✅ Type Safety: Guaranteed schema compliance
✅ No JSON Parsing: Direct structured output
✅ Better Prompts: Optimized prompt engineering
✅ Error Handling: Comprehensive error management
✅ Multi-Provider: Easy provider switching
✅ Complex Types: Enums, nested objects, arrays

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
dynamic_baml		dynamic_baml
examples		examples
tests		tests
.gitignore		.gitignore
DISTRIBUTION_GUIDE.md		DISTRIBUTION_GUIDE.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

License

roboalchemist/dynamic-baml

Folders and files

Latest commit

History

Repository files navigation

Dynamic BAML 🚀

✨ Features

🚀 Quick Start

Installation

Basic Usage

📋 Table of Contents

🛠️ Installation & Setup

Requirements

Provider Setup

OpenAI

Anthropic

Ollama (Local)

OpenRouter

🧠 Core Concepts

Schema Dictionary

BAML Generation

📊 Schema Types

Basic Types

Arrays

Enums

Nested Objects

Optional Fields

⚙️ Provider Configuration

OpenAI

Anthropic

Ollama (Local)

OpenRouter

🪵 Logging

Quick Start

Configuration Options

Log Levels

Log File Output

Usage Examples

1. Log Level Only (Terminal Output)

2. Log File Only (Default Level)

3. Both Level and File

4. Disable Logging Completely

5. Nested Log Directories

🔄 Advanced Usage

Safe Calling (No Exceptions)

Custom Prompting

Batch Processing

🚨 Error Handling

Exception Types

Error Recovery

📚 Examples

🖼️ Image Support (Experimental)

Current Limitations

Image Analysis Example

Supported Vision Models

📖 API Reference

Core Functions

call_with_schema(prompt_text, schema_dict, options=None) -> dict

call_with_schema_safe(prompt_text, schema_dict, options=None) -> dict

Schema Generator

DictToBAMLGenerator.generate_schema(schema_dict, schema_name) -> str

Provider Factory

LLMProviderFactory.create_provider(options) -> LLMProvider

LLMProviderFactory.get_available_providers() -> List[str]

🤝 Contributing

🙏 Acknowledgments

📄 License

🆘 Support

🏆 Why Dynamic BAML?

Traditional Approach

Dynamic BAML Approach

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

`call_with_schema(prompt_text, schema_dict, options=None) -> dict`

`call_with_schema_safe(prompt_text, schema_dict, options=None) -> dict`

`DictToBAMLGenerator.generate_schema(schema_dict, schema_name) -> str`

`LLMProviderFactory.create_provider(options) -> LLMProvider`

`LLMProviderFactory.get_available_providers() -> List[str]`

Packages