Date: 2026-05-28
Status: Planned (Phase 4-5, Month 19+)
Related: VirtOS Issue #128 - AI Architecture Split
platform-java provides application-level AI capabilities for deploying, managing, and serving AI/ML workloads.
This complements VirtOS's infrastructure-level AI (VM placement, auto-scaling, GPU management) with application orchestration, model serving, and MLOps workflows.
- AI VM placement (optimal host selection)
- Predictive auto-scaling (resource management)
- GPU passthrough/vGPU (hardware resources)
- Infrastructure security (VM anomaly detection)
- Cost optimization (waste detection)
- Self-healing VMs (recovery)
See: VirtOS AI_ARCHITECTURE_SPLIT.md
- MLOps platform (workflow orchestration)
- Model marketplace (curated catalog)
- LLM serving (inference services)
- RAG infrastructure (document Q&A)
- Experiment tracking (MLflow integration)
- AI governance (compliance, bias testing)
- Prompt management (versioning, A/B testing)
- Multi-modal AI (vision, speech models)
This Document describes platform-java's application AI capabilities.
Purpose: End-to-end ML workflow orchestration
Features:
- Experiment tracking (MLflow integration)
- Model training orchestration
- Distributed training support
- Model registry and versioning
- A/B testing infrastructure
- Deployment automation
- Performance monitoring
Example:
# Create ML project
platform-java ml project create fraud-detection
# Train model
platform-java ml train \
--model fraud-detection \
--data s3://training-data/ \
--framework pytorch \
--gpus 4
# Track experiments
platform-java ml experiments list
# Deploy best model
platform-java ml deploy fraud-detection-v2 --productionTimeline: Phase 4 (Month 19-24), 20 weeks effort
Purpose: Curated catalog of pre-trained AI models
Features:
- Model catalog (LLMs, vision, speech, etc.)
- One-click deployment
- Version management
- Resource optimization (quantization, pruning)
- Multi-modal model support
- Model performance metrics
Example:
# Browse models
platform-java marketplace list
# Deploy LLM
platform-java marketplace deploy llama-3.1-70b \
--optimization fp16 \
--gpus 2 \
--auto-scale
# Deploy vision model
platform-java marketplace deploy yolov8 \
--task object-detection \
--optimization tensorrtAvailable Models:
- LLMs: Llama 3.1, Mistral, Gemma
- Vision: YOLO, Stable Diffusion, CLIP
- Speech: Whisper, TTS models
- Embeddings: all-MiniLM, BGE, E5
Timeline: Phase 4 (Month 19-24), 14 weeks effort
Purpose: Retrieval Augmented Generation platform
Features:
- Vector database integration (Qdrant, Weaviate, Milvus)
- Document ingestion pipeline
- Embedding generation
- LLM integration
- Query API
- Web UI
Example:
# Create RAG project
platform-java rag project create internal-docs
# Ingest documents
platform-java rag ingest \
--source /path/to/docs \
--embedding all-MiniLM-L6-v2 \
--chunk-size 512
# Query
platform-java rag query "How do I configure high availability?"
# Returns:
# Answer: To configure HA, use virtos-ha command...
# Sources:
# - docs/HA_SETUP.md (lines 45-67)
# - docs/CLUSTERING.md (lines 120-145)
# Confidence: 94%Timeline: Phase 5 (Year 3), 18 weeks effort
Purpose: Production-grade LLM inference
Features:
- vLLM integration (fast inference)
- Text Generation Inference (TGI)
- OpenAI API compatibility
- Request batching
- Token streaming
- Rate limiting
- Cost tracking
Example:
# Deploy LLM inference service
platform-java llm serve llama-3.1-70b \
--gpus 2 \
--max-batch-size 32 \
--api openai-compatible
# Use with OpenAI SDK
export OPENAI_API_BASE=http://platform-java.local/v1
python my_openai_app.py # Works unchangedPurpose: Track ML experiments and compare results
Features:
- MLflow integration
- Experiment versioning
- Parameter tracking
- Metric comparison
- Artifact storage
- Model promotion
Example:
# List experiments
platform-java ml experiments list
# Compare runs
platform-java ml experiments compare \
fraud-detection-v1 \
fraud-detection-v2
# Output:
# v1: Accuracy 94.2%, F1 0.91
# v2: Accuracy 96.1%, F1 0.94 ⭐ Best
# Promote to production
platform-java ml promote fraud-detection-v2 --to productionPurpose: Ensure responsible AI usage
Features:
- Model versioning and lineage
- Bias testing and detection
- Explainability (SHAP, LIME)
- Audit logging
- Privacy controls (differential privacy)
- Compliance reporting
Example:
# Run governance check
platform-java ai governance-check
# Output:
# ✅ Model Versioning: All models tracked
# ✅ Data Lineage: Training data documented
# ⚠️ Bias Testing: fraud-detection not tested
# ⚠️ Explainability: No SHAP/LIME integration
# ✅ Privacy: Differential privacy enabled
#
# Compliance Status: 70% (needs improvement)Purpose: Version control and testing for prompts
Features:
- Prompt versioning
- A/B testing
- Performance tracking
- Template library
- Multi-language support
Example:
# Create prompt
platform-java ai prompts create customer-support \
--template "You are a helpful customer support agent..." \
--version 1.0
# A/B test prompts
platform-java ai prompts test \
--variants customer-support-v1,customer-support-v2 \
--metric satisfaction-score
# Results:
# v1: 4.2/5 satisfaction
# v2: 4.7/5 satisfaction ⭐ Winner
# Promote winner
platform-java ai prompts promote customer-support-v2Purpose: Deploy vision, speech, and text models
Features:
- Vision models (object detection, image generation)
- Speech models (STT, TTS)
- Multi-modal models (CLIP, vision-language)
- Unified API
Example:
# Deploy vision model
platform-java marketplace deploy yolov8 \
--task object-detection
# Deploy speech model
platform-java marketplace deploy whisper-large \
--task transcription \
--languages en,es,fr
# Deploy text-to-speech
platform-java marketplace deploy tts-1 \
--voices alloy,echo,fableplatform-java requests infrastructure from VirtOS and deploys applications on top.
Flow:
-
User → platform-java:
platform-java marketplace deploy llama-3.1-70b
-
platform-java → VirtOS (REST API):
POST /api/v1/vms { "name": "llm-inference-1", "gpu": "nvidia-a100", "cpu": 8, "ram": 32768, "ai_placement": true }
-
VirtOS (infrastructure AI):
- Analyzes available hosts
- Selects optimal placement
- Creates VM with GPU
- Returns VM details
-
platform-java (application AI):
- Deploys Llama model on VM
- Configures inference server (vLLM)
- Sets up monitoring
- Returns API endpoint
Result: User gets working LLM endpoint, VirtOS handled infrastructure, platform-java handled application.
Request GPU VM:
POST /api/v1/vms
{
"name": "llm-inference-1",
"gpu": "nvidia-a100",
"cpu": 8,
"ram": 32768,
"ai_placement": true,
"optimization": "ml-inference"
}VirtOS Response:
{
"vm_id": "vm-12345",
"host": "virtos-node-3",
"ip": "192.168.1.50",
"gpu_device": "0000:81:00.0",
"placement_confidence": 0.94
}Focus: MLOps and model deployment
-
MLOps Platform Basics (Issue #303)
- Experiment tracking (MLflow)
- Model registry
- Deployment automation
- Effort: 20 weeks
-
Model Marketplace (Issue #304)
- Curated model catalog
- One-click deployment
- LLM serving (vLLM, TGI)
- Effort: 14 weeks
Deliverable: Users can deploy and manage AI/ML models
Focus: RAG, governance, multi-modal
-
RAG Infrastructure (Issue #305)
- Vector database integration
- Document Q&A platform
- Effort: 18 weeks
-
AI Governance
- Bias testing
- Explainability
- Compliance reporting
- Effort: 12 weeks
-
Multi-Modal Support
- Vision models
- Speech models
- Unified API
- Effort: 10 weeks
Deliverable: Enterprise-grade AI platform
- Java 21+ - Platform runtime
- Spring Boot - Application framework
- MLflow - Experiment tracking
- vLLM - LLM inference
- Qdrant/Weaviate - Vector databases
- PyTorch/TensorFlow - Model frameworks
- VirtOS REST API - Infrastructure requests
- Docker/Podman - Container runtime
- Kubernetes (optional) - Orchestration
- OpenAI API - Compatibility layer
- ✅ Easy deployment - One command to deploy models
- ✅ Curated catalog - Pre-tested, optimized models
- ✅ Production-ready - Monitoring, scaling, governance
- ✅ Cost-effective - Optimal resource usage
- ✅ Flexible - Works on VirtOS, VMware, cloud, bare metal
- ✅ Clear separation - VirtOS handles infrastructure only
- ✅ Independent evolution - platform-java can advance separately
- ✅ Lightweight - No Java/ML dependencies in VirtOS
- ✅ Modular - Use platform-java or not, your choice
- ✅ Focused scope - Application AI only
- ✅ Rich ecosystem - Java ML libraries (DL4J, DJL)
- ✅ Cloud-ready - Not tied to one hypervisor
- ✅ Extensible - Plugin-based architecture
- VirtOS AI_ARCHITECTURE_SPLIT.md - Complete architecture separation
- VirtOS AI_STRATEGY.md - 3-phase AI roadmap
- VirtOS Issue #128 - AI capabilities split
- platform-java Issue #303 - MLOps Platform
- platform-java Issue #304 - Model Marketplace
- platform-java Issue #305 - RAG Infrastructure
When implemented (Phase 4+):
# Install platform-java
curl -sSL https://platform-java.io/install.sh | bash
# Configure VirtOS backend
platform-java config set virtos.api http://virtos.local/api/v1
# Deploy your first model
platform-java marketplace deploy llama-3.1-8b
# Query the model
curl http://platform-java.local/v1/completions \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello!", "max_tokens": 50}'Created: 2026-05-28
Status: Planned (not yet implemented)
Timeline: Phase 4-5 (Month 19+, Year 3)
Issues: #303 (MLOps), #304 (Marketplace), #305 (RAG)