Releases: NVIDIA/OSMO
6.2.10
Highlights
- Authorization Bug Fixes — Multiple RBAC path corrections for credentials, workflow exec, and rsync operations
- Interactive Exec Improvements — Dynamic terminal sizing and resize support for full-screen tools like vim when exec-ing into a running workflow task
- Default Pool Submission — Users can now submit workflows to the default pool via osmo-user
Authorization
- Credentials create path: Fixed RBAC action registry to include the more specific path required for credential creation (#737)
- Workflow exec permissions: Corrected authorization paths for workflow exec and rsync operations (#738, #739)
- Restart API access: Added missing action registry entry for the restart API (#716)
Interactive Development
osmo workflow exec and the browser shell now correctly handle terminal geometry, improving the experience when running interactive tools inside a workflow task:
- Dynamic terminal sizing: The terminal reports its actual dimensions to the runtime, fixing rendering of full-screen applications like vim (#717)
- Shell resize support: Resizing the browser shell window propagates correctly to the running session (#727)
Workflow Engine
- Default pool submission:
osmo-usercan now submit workflows without specifying a pool, falling back to the user's default pool (#728)
Brev Deployment
- New UI support: The new UI is now supported by allowing cross-domain URL proxying
- KAI Scheduler v0.13.4: Quick-start and one-click launchable updated to KAI Scheduler v0.13.4 with corrected Helm chart registry path (#725)
Getting OSMO
Helm Charts and Containers
Helm charts and docker containers are available in NGC
CLI Client
The installers for the CLI client for MacOS (Apple Silicon), x86-64 Linux, and ARM64 Linux are attached as assets to this release.
6.2.8
Highlights
- RBAC & Authentication Overhaul — Full OAuth2 proxy integration, RBAC authorization sidecar, user mapping, and JWT-based auth APIs
- New UI Platform — Complete UI rewrite with OAuth2 integration, dataset collections, workflow submission flow, and WCAG 2.1 accessibility compliance
- AI Agentic Skills — Agent skills framework with workflow-expert, logs-reader, and language-specific expert sub-agents for autoscaling workflow submissions
- Database Migration — pgroll-based database migration system
- NVLink & Topology-Aware Scheduling — NVLink topology support and intelligent pool grouping for shared nodes within the same nodeset
Authentication & Authorization
OSMO introduces a comprehensive authentication and authorization layer to secure access across all services:
- RBAC Authorization Sidecar: Dedicated authz sidecar deployed alongside services, enabled by default, enforcing role-based access control at the request level (#445, #471)
- OAuth2 Proxy Integration: Full OAuth2 proxy support for both UI and backend services, replacing the previous auth model with standard OAuth2 flows including device code login and token refresh (#443, #520, #585)
- User Mapping: Map external identity provider users to OSMO roles and pool permissions, with syncing between role maps and pool assignments (#418, #515)
AI Agentic Skills
A new agent framework enables AI-driven workflow management and codebase assistance:
- Skills Framework: Extensible skill system with cross-platform installation via npx, structured for framework-agnostic usage (#555, #598, #599, #605)
- Workflow Expert Agent: Specialized agent with detailed knowledge of workflow execution phases for intelligent troubleshooting and guidance (#565)
Scheduling & Compute
- NVLink Topology Support: Scheduling-aware NVLink detection enabling topology-aware task placement for multi-GPU workloads (#479)
- KAI Scheduler Default: Switched default scheduler to KAI for improved scheduling performance (#115)
Workflow Engine & Backend
- CLI Workflow Events: Workflow event streaming available through the CLI for real-time monitoring (#533)
- Supporting Large Workflows: Websocket connection between agent service and backend worker will no longer break on large workflows, and status updates are now sped up by at least 30% (#398, #391, #655, #676)
- Workflow Submission Speedup: For large workflows (e.g. 100 tasks), workflow submission response is 4x faster (#701)
Data & Storage
- Non-AWS S3 Support: S3-compatible storage backends (MinIO, Azure Blob, etc.) work without requiring AWS environment variables, with automatic endpoint detection during data auth validation (#421, #385)
- Credential-less Data Operations: Data Access Layer supports operations without explicit credentials when environment-based auth is available, with client-side auth checks (#159, #177)
Database
- pgroll Migration System: Pre-upgrade migration jobs using pgroll for schema changes
Web UI
- The UI has been completely rewritten and relocated to
/src/ui, replacing the legacy frontend.
Getting OSMO
Helm Charts and Containers
Helm charts and docker containers are available in NGC
CLI Client
The installers for the CLI client for MacOS (Apple Silicon), x86-64 Linux, and ARM64 Linux are attached as assests to this release.
6.2.6
Release Candidate for v6.2
6.0.0
Major features
Workflow Management
OSMO provides a sophisticated workflow orchestration system that allows users to define, submit, and monitor complex AI workflows through both a web UI and CLI:
- Multi-Task Orchestration: Define complex workflows with serial and parallel task execution patterns, with automatic dependency management and synchronization through barriers
- Priority-Based Scheduling: Support for HIGH, NORMAL, and LOW priority levels with intelligent preemption and GPU borrowing across pools to maximize utilization
- Interactive Development: Exec into running containers, port-forward services, and rsync files between local workstations and remote tasks for seamless debugging
- Resource Management: Flexible resource specification with support for GPUs, CPUs, memory, and storage across multiple platforms and node types
- Automatic Rescheduling: Handle transient failures gracefully with configurable retry policies and exit code handling
- Template Support: Create reusable, parameterized workflow specifications with Jinja templating for automation and scaling
Data Management
OSMO's data layer provides efficient storage and access to datasets with versioning and metadata support:
- Dataset Versioning: Track dataset evolution with automatic versioning and deduplication to optimize storage
- Multiple Storage Backends: Support for AWS S3, Azure Blob Storage, and Google Storage with seamless integration
- Efficient Data Transfer: Multi-threaded, multi-process uploads and downloads with automatic resume capabilities
- Collections: Group related datasets together for easier organization and management
- Metadata and Labels: Tag datasets with custom metadata and labels for powerful querying and discovery
- Regex-Based Selection: Upload or download partial datasets using regex patterns for fine-grained control
Applications
The Apps feature allows users to create reusable applications from workflow specifications:
- Parameterized Applications: Define apps with customizable parameters that users can adjust at launch time
- Easy Sharing: Package complex workflows as simple-to-launch applications for team collaboration
- Workflow Abstraction: Hide complexity behind user-friendly interfaces while maintaining full workflow power
Pools and Resource Management
OSMO introduces a sophisticated resource management system based on pools and platforms:
- Pool-Based Access Control: Teams are granted access to specific resource pools with RBAC for secure multi-tenancy within an organization
- Dynamic Pool Sizing: Pool sizes can be adjusted dynamically to respond to changing workload priorities
- Platform Support: Each pool supports one or more platform types (GPU models, architectures) with automatic resource validation
- Resource Sharing: Resources can be allocated to multiple pools simultaneously for maximum utilization
- Quota Management: View and track resource quotas and usage across pools
- Maintenance Mode: Admins can mark pools for maintenance to prevent new submissions during updates
Compute Backend Integration
OSMO seamlessly integrates with Kubernetes clusters and various compute backends:
- Multi-Cluster Support: Manage workflows across multiple Kubernetes clusters (AWS EKS, Azure AKS, GCP GKE, on-premise)
- KAI Scheduler Support: Support for advanced workflow scheduling with NVIDIA KAI Scheduler
- Customizable Pod Templates: Flexible pod template configurations allowing administrators to customize resource requests, limits, tolerations, and node affinities per backend
Web User Interface
A modern, responsive web interface provides comprehensive workflow and data management:
- Workflow Dashboard: View, filter, and manage workflows with real-time status updates
- Interactive Task Graphs: Visualize workflow structure and task dependencies
- Live Log Streaming: Stream logs in real-time from running workflows with syntax highlighting
- Resource Visualization: Monitor cluster resources, pool quotas, and node utilization
- Dataset Browser: Browse, visualize, and manage datasets with metadata editing
- Shell Access: Browser-native terminal for executing commands in running tasks
- Pool Management: View pool information, supported platforms, and available quotas
Command Line Interface
A powerful CLI provides full access to OSMO capabilities with scripting and automation support:
- Intuitive Commands: Organized command structure for workflows, datasets, resources, pools, and configuration
- Multi-Platform Support: Native support for Mac (Apple Silicon), as well as both x86-64 and ARM 64 architectures on Linux
- Auto-Completion: Tab completion support on Linux and macOS for faster command entry
- Multiple Output Formats: JSON and human-readable text output formats for easy integration with scripts
- Profile Management: Configure default settings for backend, bucket, and notification preferences
- Automatic Reconnection: Port-forward and exec commands automatically reconnect on disconnection
Security and Authentication
Enterprise-grade security features protect workflows and data:
- OIDC Integration: OAuth2.0-based authentication via Keycloak, which can be configured to connect to other OAuth 2.0 or SAML authentication providers
- RBAC: Role-based access control for pools, backends, and resources
- Token Scoping: Limited-scope JWT tokens with appropriate time-to-live durations
- Limited Scope Access Tokens: Users can create access tokens with restricted scopes, enabling secure and granular control over permissions. These tokens can be used for login and automated access, ensuring users and services only have the access they require.
Framework Integration
OSMO integrates seamlessly with popular AI/ML frameworks and tools, and comes with tutorials to demonstrate their use:
- Distributed Training: TorchRun, DeepSpeed, and elastic training support for multi-node DNN training
- Reinforcement Learning: Isaac Lab integration for RL training workflows
- Simulation: Isaac Sim integration for synthetic data generation (SDG) and simulation workflows
- ROS/ROS2: Support for robotics workflows with multi-node communication and hardware-in-the-loop testing
- Development Tools: Jupyter Notebook, VSCode, and File Browser integration for interactive development
- ML Tools: Weights & Biases (wandb) integration for experiment tracking
- NVIDIA GR00T: Sample workflows for Gr00T finetuning, GR00T mimic, and GR00T interactive notebook
Getting OSMO
Helm Charts and Containers
Helm charts and docker containers are available in NGC
CLI Client
The installers for the CLI client for MacOS (Apple Silicon), x86-64 Linux, and ARM64 Linux are attached as assests to this release.