A Go-based AI agent system that uses evolutionary algorithms and WebAssembly to solve computational problems. The agent learns from successful solutions and builds a knowledge base of reusable skills.
- Knowledge Base: In-memory registry of skills with persistence
- Vector Database: Qdrant integration for semantic search and retrieval
- WASM Interpreter: Sandboxed execution using wazero runtime
- Evolutionary Algorithm: Mutates and improves solutions over time
- LLM Integration: Mock LLM client for algorithm proposals
- LMStudio Support: Local LLM integration with Qwen3-4B model
- Hypothesis Persistence: Saves successful solutions for reuse
- Test Suite: Comprehensive test tasks (simple, complex, decomposable)
- Management Tools: CLI tools for Qdrant and task management
- Structured Logging: JSON logs with contextual information
- Metrics & Health: HTTP endpoints for monitoring
- Policy Guard: Security controls and resource limits
- Go 1.21 or later
- Make (optional, for build automation)
# Clone the repository
git clone https://github.com/snow-ghost/agent.git
cd agent
# Install dependencies
go mod tidy
# Build the worker
go build -o worker-bin ./cmd/worker# Start the worker with default settings
./worker-bin
# Or with custom configuration
WORKER_PORT=9002 LLM_MODE=mock ./worker-bin
# Start with artifact-based knowledge base
ARTIFACTS_DIR=./artifacts ./worker-bin
# Start with Qdrant vector database
ARTIFACTS_DIR=./artifacts EMBEDDINGS_MODE=lmstudio VECTOR_BACKEND=qdrant ./worker-binThe worker will start on port 9002 (or the port specified by WORKER_PORT) and provide:
/solve- POST endpoint for submitting tasks/health- Health check endpoint
The system includes a comprehensive test suite with tasks of varying complexity:
# Run all test tasks
make test-all-tasks
# Run specific test categories
make test-simple # Basic operations (sort, max, reverse, sum, filter)
make test-complex # Complex algorithms (fibonacci, prime, binary search)
make test-decomposable # Multi-step tasks requiring orchestration
# Run individual tasks
make run-task-sort
make run-task-fibonacci
make run-task-pipeline
# Generate test report
make test-all-tasks-reportTest tasks are JSON files in testdata/tasks/ with the following structure:
{
"id": "task-identifier",
"domain": "algorithms.category",
"description": "Task description",
"input": {...},
"spec": {
"props": {
"type": "operation_type",
"input_schema": "JSON schema",
"output_schema": "JSON schema",
"complexity_hint": "simple|complex|decomposable"
}
},
"budget": {
"cpu_millis": 5000,
"mem_mb": 128,
"timeout": "30s"
}
}The system uses Qdrant for vector storage and semantic search:
# Start Qdrant with Docker
make qdrant-up
# Reindex artifacts to Qdrant
make qdrant-reindex
# Check Qdrant status
make qdrant-stats# Qdrant operations
make qdrant-up # Start Qdrant
make qdrant-down # Stop Qdrant
make qdrant-logs # View logs
make qdrant-stats # Get statistics
make qdrant-clear # Clear collection
make qdrant-reindex # Reindex artifacts
make qdrant-query QUERY="search term" # Search collection
make qdrant-verify # Verify consistency# Task runner
go run ./cmd/task-runner -task ./testdata/tasks/simple/sort_numbers.json -verbose
# Knowledge base manager
go run ./cmd/kb-manager -action stats
go run ./cmd/kb-manager -action query -query "sorting algorithm"
go run ./cmd/kb-manager -action reindex -artifacts-dir ./artifacts
# Test suite runner
go run ./cmd/test-suite -dir ./testdata/tasks/simple -verbose/metrics- Prometheus-compatible metrics
The agent can be configured using environment variables:
| Variable | Default | Description |
|---|---|---|
WORKER_PORT |
9002 |
HTTP server port |
LLM_MODE |
mock |
LLM mode (mock or disabled) |
LLM_ROUTER_URL |
http://localhost:9000 |
LLM router base URL |
LLM_MODEL |
lmstudio:qwen/qwen3-4b-2507 |
Default LLM model for design |
MAX_CODE_BYTES |
65536 |
Max AF-DSL source size (bytes) |
DSL_MAX_STEPS |
100000 |
AF-DSL runtime max execution steps |
DSL_MAX_DEPTH |
128 |
AF-DSL runtime max call depth |
PROP_K |
64 |
Property tests to generate |
POLICY_ALLOW_TOOLS |
example.com,api.example.com |
Comma-separated list of allowed domains for HTTP tools |
SANDBOX_MEM_MB |
4 |
WASM sandbox memory limit in MB |
TASK_TIMEOUT |
30s |
Default task timeout duration |
HYPOTHESES_DIR |
./hypotheses |
Directory for saving successful hypotheses |
LOG_LEVEL |
info |
Logging level (debug, info, warn, error) |
| Variable | Default | Description |
|---|---|---|
METRICS_MODE |
prom |
Metrics mode: prom (Prometheus) or otel (OpenTelemetry) |
METRICS_PATH |
/metrics |
Metrics endpoint path |
SERVICE_NAME |
agent |
Service name for metrics |
METRICS_NAMESPACE |
agent |
Metrics namespace prefix |
METRICS_COLLECT_RUNTIME |
true |
Collect Go runtime metrics |
Basic Configuration:
export WORKER_PORT=9002
export LLM_MODE=mock
export POLICY_ALLOW_TOOLS="api.github.com,api.openai.com"
export SANDBOX_MEM_MB=8
export TASK_TIMEOUT=60s
export HYPOTHESES_DIR="/var/lib/agent/hypotheses"
export LOG_LEVEL=debug
# Metrics configuration
export METRICS_MODE=prom
export SERVICE_NAME=agent-worker
export METRICS_NAMESPACE=agent
export METRICS_COLLECT_RUNTIME=true
./workerOpenTelemetry Configuration:
export WORKER_PORT=9002
export LLM_MODE=mock
export LOG_LEVEL=info
# OpenTelemetry metrics
export METRICS_MODE=otel
export SERVICE_NAME=agent-worker
export METRICS_NAMESPACE=agent
export METRICS_COLLECT_RUNTIME=true
./workerSend a POST request to /solve with a JSON task:
Via Router (recommended):
curl -X POST http://localhost:9006/solve \
-H "Content-Type: application/json" \
-d '{
"id": "sort-task-1",
"domain": "algorithms.sorting",
"spec": {
"success_criteria": ["sorted_non_decreasing"],
"props": {"type": "sort"},
"metrics_weights": {"cases_passed": 1.0, "cases_total": 0.0}
},
"input": "{\"numbers\": [3,1,2]}",
"budget": {
"cpu_millis": 1000,
"timeout": "5s"
},
"flags": {
"requires_sandbox": true,
"max_complexity": 5
},
"created_at": "2024-01-01T00:00:00Z"
}'Direct to Worker:
curl -X POST http://localhost:9004/solve \
-H "Content-Type: application/json" \
-d '{
"id": "sort-task-1",
"domain": "algorithms.sorting",
"spec": {
"success_criteria": ["sorted_non_decreasing"],
"props": {"type": "sort"},
"metrics_weights": {"cases_passed": 1.0, "cases_total": 0.0}
},
"input": "{\"numbers\": [3,1,2]}",
"budget": {
"cpu_millis": 1000,
"timeout": "5s"
},
"flags": {
"requires_sandbox": true,
"max_complexity": 5
},
"created_at": "2024-01-01T00:00:00Z"
}'{
"id": "unique-task-id",
"domain": "problem-domain",
"spec": {
"success_criteria": ["criterion1", "criterion2"],
"props": {"key": "value"},
"metrics_weights": {"metric": 1.0}
},
"input": "{\"data\": [1,2,3]}",
"budget": {
"cpu_millis": 1000,
"timeout": "30s"
},
"flags": {
"requires_sandbox": true,
"max_complexity": 5
},
"created_at": "2024-01-01T00:00:00Z"
}{
"Success": true,
"Score": 0.95,
"Output": "{\"result\": [1,2,3]}",
"Logs": "Task solved by KB skill algorithms/sort.v1",
"Metrics": {
"cases_passed": 5,
"cases_total": 5,
"execution_time_ms": 150
}
}Router:
curl http://localhost:9007/healthzWorker:
curl http://localhost:9005/healthzResponse:
{"status":"ok","service":"agent-worker"}The system provides comprehensive metrics in both Prometheus and OpenTelemetry formats.
LLM Router:
curl http://localhost:9001/metricsRouter:
curl http://localhost:9007/metricsWorkers:
curl http://localhost:9005/metrics # Light worker
curl http://localhost:9003/metrics # Heavy workerConfigure metrics using environment variables:
| Variable | Default | Description |
|---|---|---|
METRICS_MODE |
prom |
Metrics mode: prom (Prometheus) or otel (OpenTelemetry) |
METRICS_PATH |
/metrics |
Metrics endpoint path |
SERVICE_NAME |
agent |
Service name for metrics |
METRICS_NAMESPACE |
agent |
Metrics namespace prefix |
METRICS_COLLECT_RUNTIME |
true |
Collect Go runtime metrics |
To use OpenTelemetry instead of Prometheus:
export METRICS_MODE=otel
export SERVICE_NAME=agent-worker
export METRICS_NAMESPACE=agent
export METRICS_COLLECT_RUNTIME=true
./worker-binTask Processing:
worker_task_received_total{worker_type,domain}- Total tasks receivedworker_task_completed_total{worker_type,domain,status}- Tasks completed by statusworker_task_duration_seconds{worker_type,domain}- Task execution duration (histogram)
Solve Stages:
worker_solve_stage_seconds{stage}- Time spent in each solve stage (histogram)- Stages:
kb,llm,evolve,interpret,tests
- Stages:
Knowledge Base:
worker_kb_hits_total- KB cache hitsworker_kb_misses_total- KB cache missesworker_kb_artifacts_loaded- Number of loaded artifacts (gauge)worker_kb_save_artifact_total- Artifacts saved to KB
RAG/Vector Search:
worker_rag_hits_total- RAG search hitsworker_rag_search_total{backend}- RAG searches performedworker_rag_search_duration_seconds{backend}- RAG search duration (histogram)worker_rag_candidates_found{backend}- Candidates found in search (histogram)
Sandbox/Execution:
worker_sandbox_exec_total{result}- Sandbox executions by resultworker_sandbox_exec_seconds- Sandbox execution duration (histogram)
Policy & Security:
worker_policy_denied_total{reason}- Policy denials by reason
Evolution & Testing:
worker_mutations_total{kind}- Mutations performed by typeworker_tests_run_total{result}- Tests run by resultworker_tests_duration_seconds- Test execution duration (histogram)
Request Processing:
llm_requests_total{provider,model,status,cache}- LLM requests by provider/modelllm_request_duration_seconds{provider,model}- Request duration (histogram)
Token Usage:
llm_tokens_input_total{provider,model}- Input tokens consumedllm_tokens_output_total{provider,model}- Output tokens generated
Cost Tracking:
llm_cost_total{provider,model,currency}- Total cost by provider/model
Reliability:
llm_retries_total{provider,model}- Retry attemptsllm_circuit_open_total{provider,model}- Circuit breaker activations
Request Metrics:
http_requests_total{path,method,code}- HTTP requests by path/method/statushttp_request_duration_seconds{path,method}- HTTP request duration (histogram)
✅ Good Labels (Low Cardinality):
worker_type:light,heavydomain:algorithms.sorting,data.structuresprovider:openai,anthropic,mockmodel:gpt-4,claude-3,mockstatus:ok,error,timeoutstage:kb,llm,evolve,interpret,tests
❌ Bad Labels (High Cardinality):
task_id: Unique per task (avoid!)user_id: Unique per user (avoid!)request_id: Unique per request (avoid!)timestamp: Changes constantly (avoid!)
All metrics must include these mandatory labels:
service: Service name (e.g.,agent-worker,agent-router)worker_type: For worker metrics (light,heavy)domain: For task-related metricsprovider: For LLM metricsmodel: For LLM metrics
Counters - For events that only increase:
worker_task_received_totalllm_requests_totalworker_kb_hits_total
Histograms - For latencies and durations:
worker_task_duration_secondsllm_request_duration_secondsworker_solve_stage_seconds
Gauges - For current state/size:
worker_kb_artifacts_loadedworker_memory_usage_bytes
Task Success Rate:
rate(worker_task_completed_total{status="ok"}[5m]) / rate(worker_task_received_total[5m])
Average Task Duration:
histogram_quantile(0.5, rate(worker_task_duration_seconds_bucket[5m]))
LLM Request Rate:
rate(llm_requests_total[5m])
KB Hit Rate:
rate(worker_kb_hits_total[5m]) / (rate(worker_kb_hits_total[5m]) + rate(worker_kb_misses_total[5m]))
# Build all components
make build
# Build specific component
go build -o worker-bin ./cmd/worker# Run all tests
make test
# Run tests with verbose output
make test-verbose
# Run specific package tests
go test ./kb/memory# Run all linters
make lint
# Format code
make fmt
# Run go vet
make vet# Install development tools
make install-tools
# Run full CI pipeline
make ci
# Clean build artifacts
make clean# Run heavy worker (LLM+WASM+KB)
make run-heavy
# Run light worker (KB only)
make run-light
# Run router (capability-based routing)
make run-router
# Reindex artifacts for vector search
make reindex ARTIFACTS_DIR=./artifactsThe system supports a unified artifact-based knowledge base that replaces Go skills with standardized artifacts containing WASM code and metadata.
Each artifact is stored in a directory with the following structure:
artifacts/
├── artifact-id@version/
│ ├── manifest.json # Artifact metadata
│ └── code.wasm # WASM bytecode (for WASM artifacts)
{
"id": "alg.sort.v1",
"version": "1.0.0",
"domain": "algorithms.sorting",
"description": "Stable integer sort",
"tags": ["sort", "stable"],
"lang": "wasm",
"entry": "solve",
"code_path": "code.wasm",
"sha256": "abc123...",
"embedding_model": "text-embedding-3-small",
"embedding": [0.1, 0.2, ...],
"tests": [
{
"name": "sort_test_1",
"input": "[3,1,2]",
"oracle": "[1,2,3]",
"checks": ["sorted_non_decreasing"],
"weight": 1.0
}
],
"created_at": "2024-01-01T00:00:00Z"
}- Language:
"wasm" - Entry Point:
"solve"function - Code File:
code.wasm(WebAssembly bytecode) - SHA256: Verified integrity checksum
- Execution: Sandboxed WASM runtime
- Language:
"go-skill" - Entry Point: Package function name (e.g.,
"algorithms.Sort") - Code File: None (compiled into binary)
- SHA256: Not applicable
- Execution: Direct Go function call
- Unified Storage: Both WASM and Go skills stored as artifacts
- SHA256 Verification: Automatic integrity checking for WASM artifacts
- Tag-based Search: Find artifacts by domain, tags, or keywords
- Vector Search (RAG): Semantic search using embeddings for better artifact discovery
- Automatic Migration: Existing Go skills can be converted to artifacts
- Hypothesis Persistence: LLM-generated solutions saved as artifacts
The system automatically uses the artifact-based knowledge base when ARTIFACTS_DIR is configured. Workers will:
- Load all artifacts on startup
- Convert them to skills for task solving
- Save successful hypotheses as new artifacts
- Support both WASM and Go skill artifacts during migration
The system includes advanced vector search capabilities for semantic artifact discovery:
- Mock TF-IDF: Local TF-IDF based embedder for testing
- OpenAI: Production-ready embeddings using OpenAI's API
- Memory: In-memory cosine similarity search
- Qdrant: Production vector database (placeholder implementation)
# Index artifacts using mock embedder
./kb-indexer -artifacts-dir ./artifacts -embedder mock -vector-store memory
# Index with OpenAI embeddings
export OPENAI_API_KEY=your_key
./kb-indexer -artifacts-dir ./artifacts -embedder openai -vector-store memory
# Show index statistics
./kb-indexer -stats
# Using Makefile
make reindex ARTIFACTS_DIR=./artifactsLocal Development (Mock Embeddings)
export ARTIFACTS_DIR=./artifacts
export EMBEDDINGS_MODE=mock
export VECTOR_BACKEND=memory
export INDEX_ON_START=true
./worker-binProduction (OpenAI + Qdrant)
export ARTIFACTS_DIR=/var/lib/agent/artifacts
export EMBEDDINGS_MODE=openai
export EMBEDDINGS_MODEL=text-embedding-3-large
export VECTOR_BACKEND=qdrant
export QDRANT_URL=qdrant.example.com:6333
export QDRANT_API_KEY=your_api_key
export OPENAI_API_KEY=your_openai_key
./worker-bin| Variable | Default | Description |
|---|---|---|
EMBEDDINGS_MODEL |
text-embedding-3-small |
OpenAI embedding model |
EMBEDDINGS_DIMENSION |
1536 |
Vector dimension |
QDRANT_URL |
localhost:6333 |
Qdrant server URL |
QDRANT_API_KEY |
- | Qdrant API key |
QDRANT_COLLECTION |
artifacts |
Qdrant collection name |
-
Start with artifacts directory:
export ARTIFACTS_DIR=./artifacts ./worker-bin -
Submit a task that will be solved by artifacts:
curl -X POST http://localhost:9006/solve \ -H "Content-Type: application/json" \ -d '{ "id": "test-sort", "domain": "algorithms.sorting", "spec": { "success_criteria": ["sorted_non_decreasing"], "props": {"type": "sort"}, "metrics_weights": {"cases_passed": 1.0} }, "input": "{\"numbers\": [3,1,2]}", "budget": {"cpu_millis": 1000, "timeout": "5s"}, "flags": {"requires_sandbox": true, "max_complexity": 5}, "created_at": "2024-01-01T00:00:00Z" }'
-
Check that artifacts are created:
ls -la ./artifacts/ # Should show artifact directories with manifest.json and code.wasm -
Verify hypothesis persistence:
- First run: Task solved by LLM, hypothesis saved as artifact
- Second run: Task solved by artifact from knowledge base
-
Build and start all services:
make docker-up
-
Access the services:
- Router: http://localhost:9006
- Light Worker: http://localhost:9004
- Heavy Worker: http://localhost:9002
- With Nginx load balancer:
make docker-up-nginx
- Access via: http://localhost (port 80)
┌─────────────────┐ ┌─────────────────┐
│ Nginx │ │ Router │
│ (Port 80) │───▶│ (Port 9006) │
│ Load Balancer │ │ Capability- │
│ │ │ Based Router │
└─────────────────┘ └─────────────────┘
│
┌───────────┴───────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Light Worker │ │ Heavy Worker │
│ (Port 9004) │ │ (Port 9002) │
│ KB Only │ │ LLM+WASM+KB │
│ Capabilities: │ │ Capabilities: │
│ KB │ │ KB+WASM+LLM │
└─────────────────┘ └─────────────────┘
The system supports two types of workers with different capabilities:
- Capabilities: KB only
- Use Cases: Simple tasks that can be solved with existing knowledge
- Performance: Fast, low resource usage
- Endpoints:
/solve,/health,/metrics,/caps,/ready
- Capabilities: KB + WASM + LLM
- Use Cases: Complex tasks requiring code generation and execution
- Performance: Slower, higher resource usage
- Endpoints:
/solve,/health,/metrics,/caps,/ready
Tasks are automatically routed based on their requirements:
- Requires Sandbox → Heavy Worker (needs WASM)
- High Complexity (> threshold) → Heavy Worker (needs LLM)
- Default → Light Worker (KB only)
# Build Docker image
make docker-build
# Start all services
make docker-up
# Stop all services
make docker-down
# View logs
make docker-logs
# Start with nginx
make docker-up-nginx| Variable | Default | Description |
|---|---|---|
WORKER_TYPE |
heavy |
Worker type: light or heavy |
WORKER_PORT |
9002 |
Port for worker service |
LOG_LEVEL |
info |
Logging level: debug, info, warn, error |
TASK_TIMEOUT |
30s |
Default task timeout |
COMPLEXITY_THRESHOLD |
5 |
Complexity threshold for heavy worker routing |
| Variable | Default | Description |
|---|---|---|
ARTIFACTS_DIR |
./artifacts |
Directory for artifact-based knowledge base |
HYPOTHESES_DIR |
./hypotheses |
Directory for saved hypotheses (legacy) |
INDEX_ON_START |
false |
Whether to index artifacts on worker startup |
| Variable | Default | Description |
|---|---|---|
LLM_MODE |
mock |
LLM mode: mock or real |
SANDBOX_MEM_MB |
4 |
Memory limit for WASM sandbox |
| Variable | Default | Description |
|---|---|---|
EMBEDDINGS_MODE |
mock |
Embeddings mode: mock or openai |
EMBEDDINGS_MODEL |
text-embedding-3-small |
OpenAI embedding model |
VECTOR_BACKEND |
memory |
Vector database backend: memory or qdrant |
| Variable | Default | Description |
|---|---|---|
QDRANT_URL |
localhost:6333 |
Qdrant server URL |
QDRANT_API_KEY |
- | Qdrant API key (optional) |
| Variable | Default | Description |
|---|---|---|
METRICS_MODE |
prom |
Metrics mode: prom or otel |
METRICS_PATH |
/metrics |
Metrics endpoint path |
SERVICE_NAME |
agent |
Service name for metrics |
METRICS_NAMESPACE |
agent |
Metrics namespace prefix |
METRICS_COLLECT_RUNTIME |
true |
Collect Go runtime metrics |
All services include comprehensive health check endpoints:
GET /health- Basic health statusGET /caps- Worker capabilities and routing rulesGET /ready- Readiness status (checks worker availability)
GET /health- Basic health statusGET /metrics- Prometheus-compatible metricsGET /caps- Worker capabilitiesGET /ready- Readiness status
# Check router capabilities
curl http://localhost:9006/caps
# Check if all workers are ready
curl http://localhost:9006/ready
# Check specific worker capabilities
curl http://localhost:9004/caps # Light worker
curl http://localhost:9002/caps # Heavy worker- Core: Domain types, interfaces, and business logic
- KB/Memory: In-memory knowledge base with persistence
- Interp/WASM: WebAssembly interpreter using wazero
- LLM/Mock: Mock LLM client for algorithm proposals
- TestKit: Test runner and evaluation framework
- Worker: Main solver with evolutionary algorithm
- Policy: Security controls and resource limits
- Task Submission: HTTP request → Ingestor → Solver
- Knowledge Base Check: Search for existing skills
- LLM Proposal: Generate algorithm if no KB match
- Evolution: Mutate and test hypotheses
- Execution: Run WASM in sandboxed environment
- Persistence: Save successful solutions to KB
- Response: Return result with metrics
- Sandboxed Execution: WASM runs in isolated environment
- Resource Limits: Memory and CPU constraints
- Policy Guard: Tool allowlisting and timeout controls
- Input Validation: JSON schema validation
Worker won't start:
- Check if port is available
- Verify Go version (1.21+)
- Run
go mod tidyto update dependencies
Tasks failing:
- Check input format matches expected schema
- Verify domain matches available skills
- Check logs for detailed error messages
Memory issues:
- Increase
SANDBOX_MEM_MBfor complex tasks - Monitor metrics for memory usage patterns
Enable debug logging:
export LOG_LEVEL=debug
./worker-binCheck worker logs for detailed execution information.
MIT License - see LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and linters
- Submit a pull request
- Architecture Documentation - System architecture and design
- API Usage Guide - Comprehensive API usage examples
- OpenAPI Specifications - API specifications for all services
For issues and questions:
- Create an issue on GitHub
- Check the troubleshooting section
- Review the logs for error details
- Consult the API Usage Guide for detailed examples