Skip to content

Python library for dynamic BAML schema generation and LLM structured data extraction. Built on BoundaryML with support for OpenAI, Anthropic, Ollama, and OpenRouter.

License

Notifications You must be signed in to change notification settings

roboalchemist/dynamic-baml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Dynamic BAML πŸš€

Dynamic BAML is a Python library that enables you to extract structured data from text using Large Language Models (LLMs) with dynamically generated schemas. Built on top of BoundaryML, it provides a high-level Python interface for BAML (Boundary Augmented Markup Language) with automatic schema generation.

Define your desired output structure as a simple Python dictionary, and Dynamic BAML handles the rest!

✨ Features

  • 🎯 Schema-First Approach: Define output structure with Python dictionaries
  • πŸ”„ Dynamic BAML Generation: Automatically converts schemas to BAML code
  • 🌐 Multi-Provider Support: Works with OpenAI, Anthropic, Ollama, and OpenRouter
  • πŸ›‘οΈ Type Safety: Ensures structured, validated outputs
  • πŸ”§ Easy Integration: Simple API with comprehensive error handling
  • πŸ“Š Complex Types: Support for nested objects, enums, arrays, and optional fields
  • ⚑ Performance: Efficient temporary project management and cleanup
  • πŸ–ΌοΈ Image Support: Analyze images with vision models using BamlRuntime

πŸš€ Quick Start

Installation

pip install dynamic-baml

Basic Usage

from dynamic_baml import call_with_schema

# Define your desired output structure
schema = {
    "name": "string",
    "age": "int", 
    "email": "string",
    "is_active": "bool"
}

# Extract structured data from text
text = "John Doe is 30 years old, email: john@example.com, currently active user"

result = call_with_schema(
    prompt_text=f"Extract user information from: {text}",
    schema_dict=schema,
    options={"provider": "openai", "model": "gpt-4"}
)

print(result)
# Output: {"name": "John Doe", "age": 30, "email": "john@example.com", "is_active": True}

πŸ“‹ Table of Contents

πŸ› οΈ Installation & Setup

Requirements

  • Python 3.8+
  • BAML CLI from BoundaryML: npm install -g @boundaryml/baml

Provider Setup

OpenAI

export OPENAI_API_KEY="your-openai-api-key"

Anthropic

export ANTHROPIC_API_KEY="your-anthropic-api-key"

Ollama (Local)

# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve
ollama pull gemma3:1b

OpenRouter

export OPENROUTER_API_KEY="your-openrouter-api-key"

🧠 Core Concepts

Schema Dictionary

Define your desired output structure using Python dictionaries:

schema = {
    "field_name": "field_type",
    "nested_object": {
        "sub_field": "string"
    },
    "optional_field": {"type": "string", "optional": True}
}

BAML Generation

Dynamic BAML automatically converts your schema to BAML code:

# Your schema:
{"name": "string", "age": "int"}

# Generated BAML:
class UserInfo {
  name string
  age int
}

πŸ“Š Schema Types

Basic Types

schema = {
    "text": "string",        # Text data
    "number": "int",        # Integer
    "price": "float",       # Decimal number
    "active": "bool"        # True/False
}

Arrays

schema = {
    "tags": ["string"],     # Array of strings
    "scores": ["int"],      # Array of integers
    "ratings": ["float"]    # Array of floats
}

Enums

schema = {
    "status": {
        "type": "enum",
        "values": ["draft", "published", "archived"]
    },
    "priority": {
        "type": "enum", 
        "values": ["low", "medium", "high", "urgent"]
    }
}

Nested Objects

schema = {
    "user": {
        "name": "string",
        "email": "string",
        "profile": {
            "bio": "string",
            "avatar_url": "string"
        }
    },
    "metadata": {
        "created_at": "string",
        "updated_at": "string"
    }
}

Optional Fields

schema = {
    "name": "string",                              # Required
    "email": {"type": "string", "optional": True}, # Optional
    "phone": {"type": "string", "optional": True}  # Optional
}

βš™οΈ Provider Configuration

OpenAI

options = {
    "provider": "openai",
    "model": "gpt-4",
    "temperature": 0.1,
    "max_tokens": 2000,
    "timeout": 60
}

Anthropic

options = {
    "provider": "anthropic", 
    "model": "claude-3-5-sonnet-20241022",
    "temperature": 0.1,
    "max_tokens": 2000,
    "timeout": 60
}

Ollama (Local)

options = {
    "provider": "ollama",
    "model": "gemma3:1b",
    "base_url": "http://localhost:11434",  # Optional
    "temperature": 0.1,
    "timeout": 120
}

OpenRouter

options = {
    "provider": "openrouter",
    "model": "google/gemini-2.0-flash-exp",
    "temperature": 0.1,
    "max_tokens": 2000,
    "timeout": 60
}

πŸͺ΅ Logging

Dynamic BAML provides flexible logging options to control output verbosity and destination.

Quick Start

from dynamic_baml import call_with_schema

# Basic usage with logging to file
options = {
    "provider": "openai",
    "model": "gpt-4",
    "log_level": "info",           # Control verbosity  
    "log_file": "./baml.log"       # Output to file
}

result = call_with_schema(prompt, schema, options)

Configuration Options

Log Levels

Control the verbosity of BAML logging output:

Level Description Use Case
"off" No logging output Production where logs aren't needed
"error" Only fatal errors Production minimal logging
"warn" Errors and warnings (default) Standard production logging
"info" Detailed execution info Development and debugging
"debug" Verbose details and requests Deep debugging
"trace" Everything (very verbose) Troubleshooting

Log File Output

Specify where logs should be written:

  • Default (no log_file): Logs go to terminal/stdout
  • File path: Logs written to specified file
  • Directory creation: Parent directories created automatically
  • Append mode: Multiple calls append to the same file

Usage Examples

1. Log Level Only (Terminal Output)

options = {
    "provider": "openai",
    "log_level": "info"  # Logs to terminal with info level
}

2. Log File Only (Default Level)

options = {
    "provider": "openai", 
    "log_file": "./logs/baml.log"  # Uses default log level
}

3. Both Level and File

options = {
    "provider": "openai",
    "log_level": "debug",
    "log_file": "/var/log/baml/debug.log"
}

4. Disable Logging Completely

options = {
    "provider": "openai",
    "log_level": "off"  # No logging output at all
}

5. Nested Log Directories

options = {
    "provider": "openai",
    "log_level": "info",
    "log_file": "./logs/2024/january/extraction.log"  # Dirs created automatically
}

πŸ”„ Advanced Usage

Safe Calling (No Exceptions)

from dynamic_baml import call_with_schema_safe

result = call_with_schema_safe(
    prompt_text="Extract data from this text...",
    schema_dict=schema,
    options=options
)

if result["success"]:
    data = result["data"]
    print(f"Extracted: {data}")
else:
    print(f"Error: {result['error']}")
    print(f"Error type: {result['error_type']}")

Custom Prompting

# Build effective prompts for better extraction
prompt = f"""
Please extract the following information from the text below:

REQUIRED FIELDS:
- name: Person's full name
- age: Person's age as a number
- email: Valid email address

TEXT TO ANALYZE:
{input_text}

Please be accurate and only extract information that is clearly stated.
"""

result = call_with_schema(prompt, schema, options)

Batch Processing

def process_documents(documents, schema, options):
    results = []
    for doc in documents:
        try:
            result = call_with_schema(
                f"Extract information from: {doc['content']}", 
                schema, 
                options
            )
            results.append({"doc_id": doc["id"], "data": result})
        except Exception as e:
            results.append({"doc_id": doc["id"], "error": str(e)})
    return results

🚨 Error Handling

Exception Types

from dynamic_baml.exceptions import (
    DynamicBAMLError,           # Base exception
    SchemaGenerationError,      # Schema conversion failed
    BAMLCompilationError,       # BAML code compilation failed
    LLMProviderError,          # LLM provider call failed
    ResponseParsingError,       # Response parsing failed
    ConfigurationError,         # Provider configuration invalid
    TimeoutError               # Request timeout
)

try:
    result = call_with_schema(prompt, schema, options)
except SchemaGenerationError as e:
    print(f"Schema error: {e.message}")
    print(f"Invalid schema: {e.schema_dict}")
except LLMProviderError as e:
    print(f"Provider error: {e.message}")
    print(f"Provider: {e.provider}")
except ResponseParsingError as e:
    print(f"Parsing error: {e.message}")
    print(f"Raw response: {e.raw_response}")

Error Recovery

def robust_extraction(text, schema, providers):
    """Try multiple providers for reliable extraction."""
    for provider_opts in providers:
        try:
            return call_with_schema(text, schema, provider_opts)
        except LLMProviderError:
            continue  # Try next provider
        except Exception as e:
            print(f"Unexpected error with {provider_opts['provider']}: {e}")
    
    raise Exception("All providers failed")

# Usage
providers = [
    {"provider": "openai", "model": "gpt-4"},
    {"provider": "anthropic", "model": "claude-3-5-sonnet-20241022"},
    {"provider": "ollama", "model": "gemma3:1b"}
]

result = robust_extraction(text, schema, providers)

πŸ“š Examples

See the examples/ directory for comprehensive examples:

πŸ–ΌοΈ Image Support (Experimental)

Dynamic BAML supports image analysis through BamlRuntime for advanced multimodal use cases.

Current Limitations

The main call_with_schema API currently supports text-only prompts. For actual image analysis, use BamlRuntime directly as shown below.

Image Analysis Example

from baml_py import BamlRuntime, Image
import base64

# Convert image to base64
def image_to_base64(image_path):
    with open(image_path, 'rb') as f:
        return base64.b64encode(f.read()).decode('utf-8')

# Define BAML schema with image support
baml_schema = """
enum ObjectType {
  Person
  Animal
  Vehicle
  Building
  Other
}

class ImageAnalysis {
  primary_object ObjectType
  description string
  confidence float
}

function AnalyzeImage(image: image) -> ImageAnalysis {
  client VisionModel
  prompt #"
    Analyze this image and identify the primary object.
    Image: {{ image }}
  "#
}

client<llm> VisionModel {
  provider openai-generic
  options {
    model "gpt-4-vision-preview"
    api_key env.OPENAI_API_KEY
  }
}
"""

# Create runtime and analyze image
async def analyze_image(image_path):
    runtime = BamlRuntime.from_files(
        root_path=".",
        files={"schema.baml": baml_schema}
    )
    
    # Create image object
    image_data = image_to_base64(image_path)
    image = Image.from_base64("image/jpeg", image_data)
    
    # Call the function
    ctx = runtime.create_context_manager()
    result = await runtime.call_function(
        "AnalyzeImage",
        {"image": image},
        ctx
    )
    
    return result

Supported Vision Models

  • OpenAI: GPT-4 Vision (gpt-4-vision-preview)
  • Anthropic: Claude 3 models with vision
  • OpenRouter: Google Gemini Vision, Claude 3, and others
  • Ollama: LLaVA and other local vision models

For a complete example with multiple providers and fallback strategies, see examples/image_analysis.py.

πŸ“– API Reference

Core Functions

call_with_schema(prompt_text, schema_dict, options=None) -> dict

Extract structured data using a schema.

Parameters:

  • prompt_text (str): Text prompt to send to the LLM
  • schema_dict (dict): Schema definition dictionary
  • options (dict, optional): Provider configuration options

Returns:

  • dict: Extracted data matching the schema structure

Raises:

  • DynamicBAMLError: Base exception for all errors
  • SchemaGenerationError: Schema conversion failed
  • LLMProviderError: Provider call failed
  • ResponseParsingError: Response parsing failed

call_with_schema_safe(prompt_text, schema_dict, options=None) -> dict

Safe version that returns structured results instead of raising exceptions.

Returns:

{
    "success": bool,
    "data": dict,      # Present if success=True
    "error": str,      # Present if success=False
    "error_type": str  # Present if success=False
}

Schema Generator

DictToBAMLGenerator.generate_schema(schema_dict, schema_name) -> str

Generate BAML schema code from dictionary.

Parameters:

  • schema_dict (dict): Schema definition
  • schema_name (str): Name for the generated schema

Returns:

  • str: Valid BAML schema code

Provider Factory

LLMProviderFactory.create_provider(options) -> LLMProvider

Create provider instance based on options.

LLMProviderFactory.get_available_providers() -> List[str]

Get list of currently available providers.

🀝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

πŸ™ Acknowledgments

Dynamic BAML is built on top of BoundaryML and the powerful BAML language. We extend our gratitude to the BoundaryML team for creating the foundational technology that makes structured LLM outputs possible.

About BoundaryML:

Dynamic BAML provides a Python-friendly interface and automatic schema generation on top of the robust BAML foundation.

πŸ“„ License

This project is licensed under the MIT License - see LICENSE file for details.

πŸ†˜ Support

πŸ† Why Dynamic BAML?

Traditional Approach

# Complex manual prompt engineering
prompt = """
Extract user data and format as JSON with these exact fields:
- name (string)
- age (integer) 
- email (string)
- is_active (boolean)

Text: "John Doe is 30 years old..."

Please ensure the output is valid JSON with no extra text.
"""

response = llm.call(prompt)
data = json.loads(response)  # Hope it's valid JSON!

Dynamic BAML Approach

# Clean, type-safe schema definition
schema = {
    "name": "string",
    "age": "int", 
    "email": "string",
    "is_active": "bool"
}

data = call_with_schema(
    "Extract user info from: John Doe is 30 years old...",
    schema
)  # Guaranteed structured output!

Benefits:

  • βœ… Type Safety: Guaranteed schema compliance
  • βœ… No JSON Parsing: Direct structured output
  • βœ… Better Prompts: Optimized prompt engineering
  • βœ… Error Handling: Comprehensive error management
  • βœ… Multi-Provider: Easy provider switching
  • βœ… Complex Types: Enums, nested objects, arrays

About

Python library for dynamic BAML schema generation and LLM structured data extraction. Built on BoundaryML with support for OpenAI, Anthropic, Ollama, and OpenRouter.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published