Docker Installation and Deployment Guide

tl;dr for next time

git pull
docker-compose down
docker-compose up -d --build

This guide provides instructions for running Semem using Docker. The Docker deployment works exactly like a local installation - it uses the same configuration files and .env setup, with only service hostnames changed for containerization.

Prerequisites
Quick Start
Production Deployment
Development Setup
Service Configuration
Environment Variables
Advanced Deployments
Troubleshooting
Maintenance

Prerequisites

System Requirements

Docker: Version 20.10+ with Docker Compose V2
System Memory: Minimum 4GB RAM (8GB recommended)
Storage: 10GB free disk space (for models and data)
Platform: Linux, macOS, or Windows with WSL2

Native Build Dependencies

Semem requires native compilation for scientific computing libraries. The Docker build process includes:

Build Dependencies:

Node.js 22+: JavaScript runtime
Python 3: Required for node-gyp compilation
C/C++ Compiler: GCC or Clang toolchain
Make: GNU Make for build scripts
CMake: Cross-platform build system
Scientific Libraries: OpenBLAS, BLAS, LAPACK for faiss-node
Git: Required for dependency resolution

Build Requirements: Semem initially requires building from source as no pre-built images are currently published. The build process includes native compilation which takes 15-20 minutes:

# Build is required (includes native dependency compilation)
docker compose build

# Start services after build
docker compose up -d

Build Optimization Strategies:

Use Docker BuildKit: Faster builds with better caching
Multi-stage builds: Smaller production images
Build caching: Reuse layers between builds
Pre-compiled base images: Use images with dependencies pre-installed

Docker Installation

Linux (Ubuntu/Debian):

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

# Install Docker Compose V2
sudo apt-get update && sudo apt-get install docker-compose-plugin

# Add user to docker group
sudo usermod -aG docker $USER
newgrp docker

macOS:

# Install Docker Desktop
brew install --cask docker
# Or download from: https://www.docker.com/products/docker-desktop/

# Start Docker Desktop and ensure it's running

Windows:

# Install Docker Desktop with WSL2 backend
# Download from: https://www.docker.com/products/docker-desktop/
# Ensure WSL2 is enabled and configured

Verify Installation

# Check Docker version
docker --version
docker compose version

# Test Docker installation
docker run hello-world

Port Mapping Reference

This table shows how internal container ports are mapped to external host ports:

Service	Component	Internal Port	External Port	URL
Fuseki	SPARQL Database	3030	4050	http://localhost:4050
Semem	API Server	4100	4100	http://localhost:4100
Semem	MCP Server	4101	4101	http://localhost:4101
Semem	Workbench UI	4102	4102	http://localhost:4102
Nginx	Reverse Proxy	80/443	80/443	http://localhost:80

Important Notes:

Fuseki is mapped to port 4050 externally to avoid conflicts with local Fuseki installations on port 3030
All Semem services maintain the same port numbers both internally and externally
Use the External Port column when accessing services from your host machine
Internal ports are used for container-to-container communication

Quick Start

1. Get Semem

# Clone the repository
git clone https://github.com/danja/semem.git
cd semem

# Verify Docker files
ls -la Dockerfile docker compose*.yml

2. Environment Setup

Copy your existing .env file or create one:

# If you have an existing local .env, use it as-is
# OR copy the example and add your API keys
cp .env.docker.example .env

# Edit with your API keys (same as local setup)
nano .env

3. Build and Start

Standard build process:

# Enable BuildKit for faster builds
export DOCKER_BUILDKIT=1
export COMPOSE_DOCKER_CLI_BUILD=1

# Build with caching (15-20 minutes for first build)
docker compose build

# Start services
docker compose up -d

# Monitor startup
docker compose logs -f

Subsequent runs (much faster):

# If nothing changed, just start existing containers
docker compose up -d

# Only rebuild if you've modified package.json or Dockerfile

4. Development vs Production Startup

Development (with live reload):

# Use development compose file
docker compose -f docker-compose.dev.yml up -d

# Monitor startup
docker compose -f docker-compose.dev.yml logs -f

Production (optimized):

# Use production settings
docker compose up -d

# Monitor startup
docker compose logs -f

5. Access Semem

Once all services are running:

# Check service health
curl http://localhost:4100/health  # API Server
curl http://localhost:4102/health  # Workbench UI
curl http://localhost:4101/health  # MCP Server

# Access web interface
open http://localhost:4102  # Workbench UI
open http://localhost:4050  # Fuseki SPARQL interface

That's it! The Docker deployment works exactly like the local version, using:

The same configuration priorities (Mistral → Claude → external Ollama if configured)
The same .env file format and variables
The same port assignments
Only difference: service hostnames changed for Docker networking

Note: Ollama is not included in the Docker containers - if you want to use Ollama, install it separately on your host or another server and configure OLLAMA_HOST in your .env file.

Production Deployment

1. Environment Configuration

# Use your existing .env or create from template
cp .env.docker.example .env

# Edit with your API keys (exactly like local setup)
nano .env

Your .env file should look like:

# Same format as local installation
SEMEM_API_KEY=your-api-key
NODE_ENV=production

# SPARQL Store credentials
SPARQL_USER=admin
SPARQL_PASSWORD=admin123

# LLM Provider API Keys (same priorities as config.json)
MISTRAL_API_KEY=your-mistral-key  # Priority 1
CLAUDE_API_KEY=your-claude-key    # Priority 2
OPENAI_API_KEY=your-openai-key
NOMIC_API_KEY=your-nomic-key      # Embeddings priority 1
OLLAMA_API_KEY=NO_KEY_REQUIRED    # Fallback

2. SSL Certificate Setup (Optional)

Self-signed certificates for testing:

mkdir -p nginx/ssl

# Generate certificate
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout nginx/ssl/semem.key \
  -out nginx/ssl/semem.crt \
  -subj "/C=US/ST=State/L=City/O=Organization/OU=OrgUnit/CN=localhost"

Let's Encrypt certificates:

# Install certbot
sudo apt-get install certbot

# Generate certificate (replace your-domain.com)
sudo certbot certonly --standalone -d your-domain.com

# Copy certificates
sudo cp /etc/letsencrypt/live/your-domain.com/fullchain.pem nginx/ssl/semem.crt
sudo cp /etc/letsencrypt/live/your-domain.com/privkey.pem nginx/ssl/semem.key
sudo chmod 644 nginx/ssl/semem.crt
sudo chmod 600 nginx/ssl/semem.key

3. Start Production Services

Basic Production Deployment:

# Start core services
docker compose up -d

# Check status
docker compose ps
docker compose logs -f semem

Production with SSL Proxy:

# Start with nginx reverse proxy
docker compose --profile proxy up -d

# Monitor all services
docker compose --profile proxy logs -f

Production with SSL:

docker compose --profile proxy up -d

# Verify all services
docker compose ps

4. Verify Production Deployment

# Service health checks
curl https://localhost/health          # Via nginx (SSL)
curl http://localhost:4100/health      # Direct API access
curl http://localhost:4050/$/ping      # Fuseki SPARQL

# Check logs for errors
docker compose logs semem | grep -i error
docker compose logs fuseki | grep -i error
docker compose logs nginx | grep -i error

Development Setup

1. Development Environment

The development setup includes:

Live code reloading via volume mounts
Debug port exposure (9229)
Simplified configuration
Optional external API integration

# Development with live reload
docker compose -f docker-compose.dev.yml up -d

# View development logs  
docker compose -f docker-compose.dev.yml logs -f semem-dev

# Access development container
docker compose -f docker-compose.dev.yml exec semem-dev bash

2. Development Tools

Run Tests:

# Run test suite in container
docker compose -f docker-compose.dev.yml exec semem-dev npm test

# Run specific test suites
docker compose -f docker-compose.dev.yml exec semem-dev npm run test:core
docker compose -f docker-compose.dev.yml exec semem-dev npm run test:sparql

Debug Node.js:

# Node.js debug port is exposed on 9229
# Configure your IDE to connect to localhost:9229
# Or use Chrome DevTools: chrome://inspect

Development Utilities:

# Start development tools container
docker compose -f docker-compose.dev.yml --profile tools up -d dev-tools

# Access tools container
docker compose -f docker-compose.dev.yml exec dev-tools bash

# Install additional development dependencies
docker compose -f docker-compose.dev.yml exec dev-tools npm install --save-dev <package>

3. Live Code Editing

Development containers mount source code as volumes:

# Changes to these directories are reflected immediately:
./src      -> /app/src      (Application source)
./config   -> /app/config   (Configuration)
./prompts  -> /app/prompts  (Prompt templates)

# Restart only required for dependency changes
docker compose -f docker-compose.dev.yml restart semem-dev

Service Configuration

Semem Application

Ports:

4100 - API Server (HTTP REST endpoints)
4101 - MCP Server (Model Context Protocol)
4102 - Workbench UI (Web interface)
9229 - Debug Port (development only)

Configuration:

{
  "servers": {
    "api": 4100,
    "workbench": 4102, 
    "mcp": 4101
  },
  "storage": {
    "type": "sparql",
    "options": {
      "query": "http://fuseki:3030/semem/query",
      "update": "http://fuseki:3030/semem/update"
    }
  }
}

Fuseki SPARQL Database

Access:

Web UI: http://localhost:4050
Query endpoint: http://localhost:4050/semem/query
Update endpoint: http://localhost:4050/semem/update

Initial Setup:

# Create dataset (automatic on first run)
curl -X POST \
  http://localhost:4050/$/datasets \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -d 'dbName=semem&dbType=tdb2'

External Ollama Service (Optional)

If you want to use Ollama, install it separately and configure the connection:

Host Installation:

# Install Ollama on your host machine
curl -fsSL https://ollama.ai/install.sh | sh

# Pull required models
ollama pull qwen2:1.5b
ollama pull nomic-embed-text

Docker Configuration:

# In your .env file, point to host Ollama
OLLAMA_HOST=http://host.docker.internal:11434  # On Docker Desktop
# OR
OLLAMA_HOST=http://172.17.0.1:11434            # On Linux
# OR
OLLAMA_HOST=http://your-ollama-server:11434    # External server

Nginx Reverse Proxy

Configuration:

# Production URLs
https://localhost/workbench/     # Workbench UI
https://localhost/api/           # API endpoints  
https://localhost/mcp/           # MCP endpoints
https://localhost/health         # Health check

SSL Configuration:

Certificate: nginx/ssl/semem.crt
Private key: nginx/ssl/semem.key
Protocols: TLS 1.2, 1.3

Environment Variables

Core Application Variables

Variable	Description	Default	Required
`NODE_ENV`	Runtime environment	`production`	Yes
`SEMEM_API_KEY`	API authentication key	-	Yes
`SPARQL_USER`	SPARQL database username	`admin`	Yes
`SPARQL_PASSWORD`	SPARQL database password	-	Yes

LLM Provider Variables

Variable	Description	Default	Required
`MISTRAL_API_KEY`	Mistral AI API key	-	No*
`CLAUDE_API_KEY`	Anthropic Claude API key	-	No*
`OPENAI_API_KEY`	OpenAI API key	-	No*
`GROQ_API_KEY`	Groq API key	-	No*
`NOMIC_API_KEY`	Nomic embedding API key	-	No*
`OLLAMA_HOST`	External Ollama URL	`http://host.docker.internal:11434`	No

*At least one LLM provider should be configured

Service Configuration Variables

Variable	Description	Default	Required
`API_PORT`	API server port	`4100`	No
`WORKBENCH_PORT`	Workbench UI port	`4102`	No
`MCP_PORT`	MCP server port	`4101`	No
`FUSEKI_DATASET`	SPARQL dataset name	`semem`	No
`GRAPH_NAME`	RDF graph name	`http://hyperdata.it/content`	No

Performance Tuning Variables

Variable	Description	Default	Required
`CONCEPT_MAX_CONCEPTS`	Max concepts per extraction	`3`	No
`SEARCH_DEFAULT_LIMIT`	Default search result limit	`10`	No
`WIKIDATA_RATE_LIMIT`	Wikidata API rate limit	`200`	No
`WIKIPEDIA_RATE_LIMIT`	Wikipedia API rate limit	`100`	No

Advanced Deployments

Multi-Architecture Builds

Build for multiple architectures:

# Setup buildx
docker buildx create --use

# Build for AMD64 and ARM64
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  -t semem:latest \
  --push .

Build Performance Optimization

Avoiding Repeated Builds:

The Docker build process compiles native dependencies which takes 15-20 minutes. To minimize build time:

# 1. Build with aggressive caching (first time)
export DOCKER_BUILDKIT=1
export BUILDKIT_PROGRESS=plain
docker compose build --build-arg BUILDKIT_INLINE_CACHE=1

# 2. Subsequent runs use cached layers (much faster)
docker compose up -d

# 3. Use build cache mounts for advanced caching (requires BuildKit)
docker buildx build --cache-from type=local,src=/tmp/.buildx-cache \
                   --cache-to type=local,dest=/tmp/.buildx-cache \
                   .

Development Build Strategy:

For development, use volume mounts to avoid rebuilds:

# Development with live code reload
docker compose -f docker-compose.dev.yml up -d

# Code changes are reflected immediately without rebuilds
# Only restart container for package.json changes:
docker compose -f docker-compose.dev.yml restart semem-dev

When You Must Build:

Only build from source when:

Modifying native dependencies in package.json
Changing Dockerfile or build configuration
Pre-built images are unavailable for your architecture
Contributing code that requires testing build process

Build Troubleshooting:

# Clean build (if cache is corrupted)
docker compose build --no-cache

# Build with verbose output
docker compose build --progress=plain

# Check build context size
docker build --dry-run .

Production Scaling

Horizontal Scaling:

# docker-compose.prod.yml
services:
  semem:
    deploy:
      replicas: 3
    ports:
      - "4100-4102:4100"

Load Balancing:

# nginx.conf
upstream semem_api {
    server semem_1:4100;
    server semem_2:4100; 
    server semem_3:4100;
}

External Services

External SPARQL Endpoint:

# Environment variables
FUSEKI_QUERY_URL=https://your-sparql-server.com/query
FUSEKI_UPDATE_URL=https://your-sparql-server.com/update
SPARQL_USER=your-username
SPARQL_PASSWORD=your-password

External Ollama:

# Point to external Ollama instance
OLLAMA_HOST=http://your-ollama-server:11434

Resource Management

Memory Limits:

# docker-compose.yml
services:
  semem:
    deploy:
      resources:
        limits:
          memory: 4G
          cpus: '2.0'
        reservations:
          memory: 2G
          cpus: '1.0'

Storage Configuration:

volumes:
  fuseki_data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /data/fuseki  # Custom storage location

Troubleshooting

Common Issues

1. Container Won't Start

# Check container logs
docker compose logs semem

# Common issues:
# - Port already in use: sudo lsof -i :4100
# - Permission denied: sudo chown -R 1001:1001 ./data
# - Out of memory: Increase Docker memory limits

2. Service Connection Failures

# Test service connectivity
docker compose exec semem curl http://fuseki:3030/$/ping

# Check network
docker network ls
docker network inspect semem_semem-network

3. SPARQL Database Issues

# Reset Fuseki database
docker compose stop fuseki
docker volume rm semem_fuseki_data
docker compose up -d fuseki

# Check Fuseki logs
docker compose logs fuseki

4. External Ollama Connection Issues

# Test external Ollama connectivity
curl http://host.docker.internal:11434/api/tags  # Docker Desktop
curl http://172.17.0.1:11434/api/tags           # Linux

# Check available space
docker system df
docker system prune  # Clean up unused data

5. Performance Issues

# Monitor resource usage
docker stats

# Check memory usage
docker compose exec semem free -h
docker compose exec semem ps aux --sort=-%mem | head

# Restart services
docker compose restart semem

Debug Mode

Enable Debug Logging:

# Development
DEBUG=semem:* docker compose -f docker-compose.dev.yml up -d

# Production
docker compose exec semem sh -c 'DEBUG=semem:* npm start'

Container Shell Access:

# Access running container
docker compose exec semem bash

# Access with root privileges
docker compose exec --user root semem bash

# Start temporary debug container
docker run -it --rm --network semem_semem-network semem:latest bash

Health Check Debugging

# Manual health checks
curl -f http://localhost:4100/health || echo "API unhealthy"
curl -f http://localhost:4102/health || echo "Workbench unhealthy" 
curl -f http://localhost:4050/$/ping || echo "Fuseki unhealthy"
curl -f http://localhost:11434/api/tags || echo "Ollama unhealthy"

# Check Docker health status
docker compose ps
docker inspect semem_semem_1 | grep -A 5 Health

Maintenance

Updates and Upgrades

Update Semem:

# Pull latest code
git pull origin main

# Rebuild containers
docker compose build --no-cache
docker compose up -d

# Verify update
docker compose exec semem node --version

Update Base Images:

# Pull latest images
docker compose pull

# Rebuild with latest base images
docker compose build --pull
docker compose up -d

Backup and Restore

Database Backup:

# Backup Fuseki data
docker run --rm \
  -v semem_fuseki_data:/data \
  -v $(pwd):/backup \
  alpine tar czf /backup/fuseki-$(date +%Y%m%d).tar.gz /data

# Backup application data
docker run --rm \
  -v semem_semem_data:/data \
  -v $(pwd):/backup \
  alpine tar czf /backup/semem-data-$(date +%Y%m%d).tar.gz /data

Configuration Backup:

# Export current configuration
docker compose exec semem cat /app/config/config.json > config-backup.json

# Export environment
cp .env.docker .env.docker.backup

Restore:

# Stop services
docker compose down

# Restore data
docker run --rm \
  -v semem_fuseki_data:/data \
  -v $(pwd):/backup \
  alpine tar xzf /backup/fuseki-20240101.tar.gz -C /

# Restart services
docker compose up -d

Log Management

Log Rotation:

# Configure Docker log rotation
cat > /etc/docker/daemon.json << EOF
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}
EOF

sudo systemctl restart docker

Log Analysis:

# Follow all logs
docker compose logs -f

# Filter logs
docker compose logs semem 2>&1 | grep ERROR

# Export logs
docker compose logs --no-color > semem-logs-$(date +%Y%m%d).txt

Monitoring

Resource Monitoring:

# Real-time stats
docker stats

# Historical data
docker compose exec semem top
docker compose exec fuseki free -h

Service Monitoring:

# Health check script
#!/bin/bash
services=("4100/health" "4102/health" "4050/$/ping" "11434/api/tags")
for service in "${services[@]}"; do
  if curl -f -s http://localhost:$service > /dev/null; then
    echo "✓ Service on port ${service%/*} is healthy"
  else
    echo "✗ Service on port ${service%/*} is unhealthy"
  fi
done

Security Updates

Regular Maintenance:

# Update all images monthly
docker compose pull
docker compose up -d --force-recreate

# Clean unused resources
docker system prune -a

# Update SSL certificates (Let's Encrypt)
sudo certbot renew
sudo systemctl restart nginx

Next Steps

For more help:

FilesExpand file tree

docker.md

Latest commit

History