Human-Text DSL Compiler

A powerful compiler that converts human-readable text into structured DSL (Domain Specific Language), supporting both controlled scripts and natural language input with LLM enhancement.

Features

Dual Input Modes:
- Controlled scripts with explicit directives (@task, @tool, etc.)
- Free-form natural language with LLM-powered structuring
Multi-format Output: YAML, JSON, and Protocol Buffers
Advanced Processing: Lexical analysis, semantic validation, optimization
LLM Integration: Support for multiple LLM providers (DashScope, OpenAI, Context Service)
CLI & Library: Both command-line tool and Python library interface
Structured Representation: Complex conditionals, tool calls, agent invocations, and flow control

Quick Start

Prerequisites

Python 3.12+
uv package manager

Installation

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repository
git clone https://github.com/supercontext/dsl-compiler.git
cd dsl-compiler

# Install dependencies and create virtual environment
uv sync

# Activate the virtual environment
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install in development mode
uv pip install -e .

Basic Usage

Python Library

from dsl_compiler import compile, CompilerConfig

# Create configuration
config = CompilerConfig(
    llm_enabled=True,
    output_format="yaml"
)

# Compile a file
result = compile("input.txt", config)
print(result.to_yaml())

# Compile from string
source_code = """
@task data_processing
    Process user data from database
    Validate and clean the data
    Generate comprehensive report

@var user_id = 12345
@tool data_validator
    Tool for validating data integrity
"""

result = compile(source_code, config)

Command Line Interface

# Basic compilation
uv run dslc input.txt -o output.yaml

# Different output formats
uv run dslc input.txt -f json -o output.json

# Disable LLM for faster processing
uv run dslc input.txt --no-llm

# Syntax validation only
uv run dslc validate input.txt

# Show configuration
uv run dslc config --show

# Or use the traditional Python module syntax
uv run python -m dsl_compiler.cli input.txt -o output.yaml

Syntax Guide

Basic Directives

Task Definition

@task task_name
    Task description
    
    Detailed steps and instructions...

Variable Declaration

@var variable_name = value
@var user_id = 12345
@var debug_mode = true
@var config_file = "settings.json"

Tool Definition

@tool tool_name
    Tool description and usage instructions

Agent Invocation

@agent AgentName(param1=value1, param2=value2)

Flow Control

@next target_task

@if condition_expression
    Actions when condition is true
@else
    Actions when condition is false
@endif

Advanced Features

Conditional Statements

@task order_validation
    Validate customer order
    
    @tool check_order
        Order validation tool
    
    @if result.valid == false
        Order is invalid, terminate process
        @next END
    @else
        Proceed with order processing
        @next process_payment
    @endif

Structured Output Example

The above compiles to:

version: "1.0"
tasks:
  - id: order_validation
    title: Order validation
    body:
      - type: text
        content: "Validate customer order"
        line_number: 2
      - type: tool_call
        tool_call:
          name: check_order
          description: "Order validation tool"
        line_number: 4
      - type: conditional
        conditional:
          branches:
            - condition: "result.valid == false"
              actions:
                - type: text
                  content: "Order is invalid, terminate process"
                - type: jump
                  jump:
                    target: END
            - condition: null  # else branch
              actions:
                - type: text
                  content: "Proceed with order processing"
                - type: jump
                  jump:
                    target: process_payment
        line_number: 6

Configuration

Environment Variables

Copy dsl_compiler.env.example to .env and configure:

# Output format
DSL_OUTPUT_FORMAT=yaml

# LLM configuration
DSL_LLM_ENABLED=true
DSL_LLM_PROVIDER=dashscope
DSL_LLM_API_KEY=your_api_key_here
DSL_LLM_MODEL=qwen-turbo

# Performance settings
DSL_MAX_FILE_SIZE=10485760
DSL_PARSE_TIMEOUT=60

# Debug settings
DSL_DEBUG=false
DSL_LOG_LEVEL=INFO

Configuration Options

Option	Default	Description
`output_format`	`yaml`	Output format (yaml/json/proto)
`llm_enabled`	`true`	Enable LLM enhancement
`llm_provider`	`dashscope`	LLM provider
`llm_save_intermediate`	`false`	Save intermediate DSL code
`llm_intermediate_dir`	`null`	Directory for intermediate files
`strict_mode`	`true`	Strict validation mode
`compact_mode`	`false`	Compact output format
`max_file_size`	`10MB`	Maximum file size
`parse_timeout`	`60s`	Parse timeout

LLM Integration

The compiler supports multiple LLM providers for natural language processing:

DashScope (Alibaba Cloud)

export DSL_LLM_PROVIDER=dashscope
export DSL_LLM_API_KEY=your_dashscope_key
export DSL_LLM_MODEL=qwen-turbo

OpenAI

export DSL_LLM_PROVIDER=openai
export DSL_LLM_API_KEY=your_openai_key
export DSL_LLM_MODEL=gpt-3.5-turbo

Context Service (Internal)

export DSL_LLM_PROVIDER=context_service
export DSL_CONTEXT_SERVICE_URL=http://localhost:8001

中间结果保存

为了调试和分析LLM转换过程，您可以保存LLM生成的中间DSL代码：

# 启用中间结果保存
export DSL_LLM_SAVE_INTERMEDIATE=true

# 指定保存目录（可选，默认为源文件目录下的 llm_intermediate 子目录）
export DSL_LLM_INTERMEDIATE_DIR=./intermediate_results

启用后，每次LLM转换都会生成带时间戳的 .dsl 文件，包含：

原始DSL代码
生成时间和来源信息
使用的LLM提供商和模型信息

示例生成的文件名：password_reset_llm_generated_20250714_162839.dsl

Architecture

The compiler follows a multi-stage pipeline:

Input Text → Preprocessor → Lexer → Parser → Semantic Analyzer
                                              ↓
Output ← Serializer ← Optimizer ← Validator ← LLM Augmentor

Components

Preprocessor: BOM removal, line normalization, tab expansion
Lexer: Tokenization with indentation tracking
Parser: AST construction with directive parsing
Semantic Analyzer: Symbol table building, type checking, scope validation
LLM Augmentor: Natural language enhancement (optional)
Validator: DAG validation, reference checking, conflict detection
Optimizer: Dead code elimination, constant folding, text compression
Serializer: Multi-format output generation

Output Formats

YAML (Default)

version: "1.0"
tasks:
  - id: "data_processing"
    title: "Data Processing Task"
    body:
      - type: "text"
        content: "Process user data"
        line_number: 2

JSON

{
  "version": "1.0",
  "tasks": [
    {
      "id": "data_processing",
      "title": "Data Processing Task",
      "body": [
        {
          "type": "text",
          "content": "Process user data",
          "line_number": 2
        }
      ]
    }
  ]
}

Protocol Buffers

syntax = "proto3";
package dsl;

message DSLWorkflow {
  string version = 1;
  map<string, string> metadata = 2;
  repeated Task tasks = 3;
}

Development

Project Structure

src/dsl_compiler/
├── __init__.py          # Main interface
├── config.py            # Configuration management
├── compiler.py          # Main compiler logic
├── preprocessor.py      # Text preprocessing
├── lexer.py             # Lexical analyzer
├── parser.py            # Syntax parser
├── semantic_analyzer.py # Semantic analysis
├── llm_augmentor.py     # LLM enhancement
├── validator.py         # Validation engine
├── optimizer.py         # Code optimization
├── serializer.py        # Output serialization
├── cli.py               # Command-line interface
├── models.py            # Data models
├── exceptions.py        # Exception classes
└── requirements.txt     # Dependencies

Running Tests

# Install development dependencies
pip install pytest pytest-asyncio black flake8 mypy

# Run tests
python -m pytest tests/

# Run with coverage
python -m pytest --cov=src/dsl_compiler tests/

Code Quality

# Format code
black src/

# Lint code
flake8 src/

# Type checking
mypy src/

Error Handling

The compiler provides detailed error information:

from dsl_compiler import compile
from dsl_compiler.exceptions import CompilerError, ValidationError

try:
    result = compile("input.txt")
except ValidationError as e:
    print(f"Validation error: {e}")
    print(f"Rule: {e.rule}")
    print(f"Suggestions: {e.suggestions}")
except CompilerError as e:
    print(f"Compilation error: {e}")
    print(f"File: {e.source_file}")
    print(f"Line: {e.line}")

Performance Features

Dead Code Elimination: Remove unreachable code blocks
Constant Folding: Evaluate constant expressions at compile time
Text Compression: Optimize text content while preserving meaning
Structure Optimization: Flatten unnecessary nesting
Duplicate Removal: Eliminate redundant definitions

Troubleshooting

Common Issues

LLM Call Failures
- Check API key configuration
- Verify network connectivity
- Check LLM service status
Parse Errors
- Validate directive format
- Check file encoding (should be UTF-8)
- Review detailed error messages
Performance Issues
- Disable LLM with --no-llm flag
- Reduce file size
- Adjust timeout settings

Debug Mode

# Enable debug output
python -m dsl_compiler.cli input.txt --debug

# Set environment variable
export DSL_DEBUG=true

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Run the test suite
Submit a pull request

License

MIT License

Changelog

v1.0.0

Initial release
Multi-format output support
LLM integration
Comprehensive validation
CLI and library interfaces

FilesExpand file tree

dsl_compiler_README.md

Latest commit

History