A powerful compiler that converts human-readable text into structured DSL (Domain Specific Language), supporting both controlled scripts and natural language input with LLM enhancement.
-
Dual Input Modes:
- Controlled scripts with explicit directives (
@task,@tool, etc.) - Free-form natural language with LLM-powered structuring
- Controlled scripts with explicit directives (
-
Multi-format Output: YAML, JSON, and Protocol Buffers
-
Advanced Processing: Lexical analysis, semantic validation, optimization
-
LLM Integration: Support for multiple LLM providers (DashScope, OpenAI, Context Service)
-
CLI & Library: Both command-line tool and Python library interface
-
Structured Representation: Complex conditionals, tool calls, agent invocations, and flow control
- Python 3.12+
- uv package manager
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone the repository
git clone https://github.com/supercontext/dsl-compiler.git
cd dsl-compiler
# Install dependencies and create virtual environment
uv sync
# Activate the virtual environment
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in development mode
uv pip install -e .from dsl_compiler import compile, CompilerConfig
# Create configuration
config = CompilerConfig(
llm_enabled=True,
output_format="yaml"
)
# Compile a file
result = compile("input.txt", config)
print(result.to_yaml())
# Compile from string
source_code = """
@task data_processing
Process user data from database
Validate and clean the data
Generate comprehensive report
@var user_id = 12345
@tool data_validator
Tool for validating data integrity
"""
result = compile(source_code, config)# Basic compilation
uv run dslc input.txt -o output.yaml
# Different output formats
uv run dslc input.txt -f json -o output.json
# Disable LLM for faster processing
uv run dslc input.txt --no-llm
# Syntax validation only
uv run dslc validate input.txt
# Show configuration
uv run dslc config --show
# Or use the traditional Python module syntax
uv run python -m dsl_compiler.cli input.txt -o output.yaml@task task_name
Task description
Detailed steps and instructions...
@var variable_name = value
@var user_id = 12345
@var debug_mode = true
@var config_file = "settings.json"
@tool tool_name
Tool description and usage instructions
@agent AgentName(param1=value1, param2=value2)
@next target_task
@if condition_expression
Actions when condition is true
@else
Actions when condition is false
@endif
@task order_validation
Validate customer order
@tool check_order
Order validation tool
@if result.valid == false
Order is invalid, terminate process
@next END
@else
Proceed with order processing
@next process_payment
@endif
The above compiles to:
version: "1.0"
tasks:
- id: order_validation
title: Order validation
body:
- type: text
content: "Validate customer order"
line_number: 2
- type: tool_call
tool_call:
name: check_order
description: "Order validation tool"
line_number: 4
- type: conditional
conditional:
branches:
- condition: "result.valid == false"
actions:
- type: text
content: "Order is invalid, terminate process"
- type: jump
jump:
target: END
- condition: null # else branch
actions:
- type: text
content: "Proceed with order processing"
- type: jump
jump:
target: process_payment
line_number: 6Copy dsl_compiler.env.example to .env and configure:
# Output format
DSL_OUTPUT_FORMAT=yaml
# LLM configuration
DSL_LLM_ENABLED=true
DSL_LLM_PROVIDER=dashscope
DSL_LLM_API_KEY=your_api_key_here
DSL_LLM_MODEL=qwen-turbo
# Performance settings
DSL_MAX_FILE_SIZE=10485760
DSL_PARSE_TIMEOUT=60
# Debug settings
DSL_DEBUG=false
DSL_LOG_LEVEL=INFO| Option | Default | Description |
|---|---|---|
output_format |
yaml |
Output format (yaml/json/proto) |
llm_enabled |
true |
Enable LLM enhancement |
llm_provider |
dashscope |
LLM provider |
llm_save_intermediate |
false |
Save intermediate DSL code |
llm_intermediate_dir |
null |
Directory for intermediate files |
strict_mode |
true |
Strict validation mode |
compact_mode |
false |
Compact output format |
max_file_size |
10MB |
Maximum file size |
parse_timeout |
60s |
Parse timeout |
The compiler supports multiple LLM providers for natural language processing:
export DSL_LLM_PROVIDER=dashscope
export DSL_LLM_API_KEY=your_dashscope_key
export DSL_LLM_MODEL=qwen-turboexport DSL_LLM_PROVIDER=openai
export DSL_LLM_API_KEY=your_openai_key
export DSL_LLM_MODEL=gpt-3.5-turboexport DSL_LLM_PROVIDER=context_service
export DSL_CONTEXT_SERVICE_URL=http://localhost:8001为了调试和分析LLM转换过程,您可以保存LLM生成的中间DSL代码:
# 启用中间结果保存
export DSL_LLM_SAVE_INTERMEDIATE=true
# 指定保存目录(可选,默认为源文件目录下的 llm_intermediate 子目录)
export DSL_LLM_INTERMEDIATE_DIR=./intermediate_results启用后,每次LLM转换都会生成带时间戳的 .dsl 文件,包含:
- 原始DSL代码
- 生成时间和来源信息
- 使用的LLM提供商和模型信息
示例生成的文件名:password_reset_llm_generated_20250714_162839.dsl
The compiler follows a multi-stage pipeline:
Input Text → Preprocessor → Lexer → Parser → Semantic Analyzer
↓
Output ← Serializer ← Optimizer ← Validator ← LLM Augmentor
- Preprocessor: BOM removal, line normalization, tab expansion
- Lexer: Tokenization with indentation tracking
- Parser: AST construction with directive parsing
- Semantic Analyzer: Symbol table building, type checking, scope validation
- LLM Augmentor: Natural language enhancement (optional)
- Validator: DAG validation, reference checking, conflict detection
- Optimizer: Dead code elimination, constant folding, text compression
- Serializer: Multi-format output generation
version: "1.0"
tasks:
- id: "data_processing"
title: "Data Processing Task"
body:
- type: "text"
content: "Process user data"
line_number: 2{
"version": "1.0",
"tasks": [
{
"id": "data_processing",
"title": "Data Processing Task",
"body": [
{
"type": "text",
"content": "Process user data",
"line_number": 2
}
]
}
]
}syntax = "proto3";
package dsl;
message DSLWorkflow {
string version = 1;
map<string, string> metadata = 2;
repeated Task tasks = 3;
}src/dsl_compiler/
├── __init__.py # Main interface
├── config.py # Configuration management
├── compiler.py # Main compiler logic
├── preprocessor.py # Text preprocessing
├── lexer.py # Lexical analyzer
├── parser.py # Syntax parser
├── semantic_analyzer.py # Semantic analysis
├── llm_augmentor.py # LLM enhancement
├── validator.py # Validation engine
├── optimizer.py # Code optimization
├── serializer.py # Output serialization
├── cli.py # Command-line interface
├── models.py # Data models
├── exceptions.py # Exception classes
└── requirements.txt # Dependencies
# Install development dependencies
pip install pytest pytest-asyncio black flake8 mypy
# Run tests
python -m pytest tests/
# Run with coverage
python -m pytest --cov=src/dsl_compiler tests/# Format code
black src/
# Lint code
flake8 src/
# Type checking
mypy src/The compiler provides detailed error information:
from dsl_compiler import compile
from dsl_compiler.exceptions import CompilerError, ValidationError
try:
result = compile("input.txt")
except ValidationError as e:
print(f"Validation error: {e}")
print(f"Rule: {e.rule}")
print(f"Suggestions: {e.suggestions}")
except CompilerError as e:
print(f"Compilation error: {e}")
print(f"File: {e.source_file}")
print(f"Line: {e.line}")- Dead Code Elimination: Remove unreachable code blocks
- Constant Folding: Evaluate constant expressions at compile time
- Text Compression: Optimize text content while preserving meaning
- Structure Optimization: Flatten unnecessary nesting
- Duplicate Removal: Eliminate redundant definitions
-
LLM Call Failures
- Check API key configuration
- Verify network connectivity
- Check LLM service status
-
Parse Errors
- Validate directive format
- Check file encoding (should be UTF-8)
- Review detailed error messages
-
Performance Issues
- Disable LLM with
--no-llmflag - Reduce file size
- Adjust timeout settings
- Disable LLM with
# Enable debug output
python -m dsl_compiler.cli input.txt --debug
# Set environment variable
export DSL_DEBUG=true- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the test suite
- Submit a pull request
MIT License
- Initial release
- Multi-format output support
- LLM integration
- Comprehensive validation
- CLI and library interfaces