Skip to content

Releases: NVIDIA/OSMO

6.2.10

25 Mar 15:55
7904629

Choose a tag to compare

Highlights

  • Authorization Bug Fixes — Multiple RBAC path corrections for credentials, workflow exec, and rsync operations
  • Interactive Exec Improvements — Dynamic terminal sizing and resize support for full-screen tools like vim when exec-ing into a running workflow task
  • Default Pool Submission — Users can now submit workflows to the default pool via osmo-user

Authorization

  • Credentials create path: Fixed RBAC action registry to include the more specific path required for credential creation (#737)
  • Workflow exec permissions: Corrected authorization paths for workflow exec and rsync operations (#738, #739)
  • Restart API access: Added missing action registry entry for the restart API (#716)

Interactive Development

osmo workflow exec and the browser shell now correctly handle terminal geometry, improving the experience when running interactive tools inside a workflow task:

  • Dynamic terminal sizing: The terminal reports its actual dimensions to the runtime, fixing rendering of full-screen applications like vim (#717)
  • Shell resize support: Resizing the browser shell window propagates correctly to the running session (#727)

Workflow Engine

  • Default pool submission: osmo-user can now submit workflows without specifying a pool, falling back to the user's default pool (#728)

Brev Deployment

  • New UI support: The new UI is now supported by allowing cross-domain URL proxying
  • KAI Scheduler v0.13.4: Quick-start and one-click launchable updated to KAI Scheduler v0.13.4 with corrected Helm chart registry path (#725)

Getting OSMO

Helm Charts and Containers

Helm charts and docker containers are available in NGC

CLI Client

The installers for the CLI client for MacOS (Apple Silicon), x86-64 Linux, and ARM64 Linux are attached as assets to this release.

6.2.8

18 Mar 20:01
8dab0c7

Choose a tag to compare

Highlights

  • RBAC & Authentication Overhaul — Full OAuth2 proxy integration, RBAC authorization sidecar, user mapping, and JWT-based auth APIs
  • New UI Platform — Complete UI rewrite with OAuth2 integration, dataset collections, workflow submission flow, and WCAG 2.1 accessibility compliance
  • AI Agentic Skills — Agent skills framework with workflow-expert, logs-reader, and language-specific expert sub-agents for autoscaling workflow submissions
  • Database Migration — pgroll-based database migration system
  • NVLink & Topology-Aware Scheduling — NVLink topology support and intelligent pool grouping for shared nodes within the same nodeset

Authentication & Authorization

OSMO introduces a comprehensive authentication and authorization layer to secure access across all services:

  • RBAC Authorization Sidecar: Dedicated authz sidecar deployed alongside services, enabled by default, enforcing role-based access control at the request level (#445, #471)
  • OAuth2 Proxy Integration: Full OAuth2 proxy support for both UI and backend services, replacing the previous auth model with standard OAuth2 flows including device code login and token refresh (#443, #520, #585)
  • User Mapping: Map external identity provider users to OSMO roles and pool permissions, with syncing between role maps and pool assignments (#418, #515)

AI Agentic Skills

A new agent framework enables AI-driven workflow management and codebase assistance:

  • Skills Framework: Extensible skill system with cross-platform installation via npx, structured for framework-agnostic usage (#555, #598, #599, #605)
  • Workflow Expert Agent: Specialized agent with detailed knowledge of workflow execution phases for intelligent troubleshooting and guidance (#565)

Scheduling & Compute

  • NVLink Topology Support: Scheduling-aware NVLink detection enabling topology-aware task placement for multi-GPU workloads (#479)
  • KAI Scheduler Default: Switched default scheduler to KAI for improved scheduling performance (#115)

Workflow Engine & Backend

  • CLI Workflow Events: Workflow event streaming available through the CLI for real-time monitoring (#533)
  • Supporting Large Workflows: Websocket connection between agent service and backend worker will no longer break on large workflows, and status updates are now sped up by at least 30% (#398, #391, #655, #676)
  • Workflow Submission Speedup: For large workflows (e.g. 100 tasks), workflow submission response is 4x faster (#701)

Data & Storage

  • Non-AWS S3 Support: S3-compatible storage backends (MinIO, Azure Blob, etc.) work without requiring AWS environment variables, with automatic endpoint detection during data auth validation (#421, #385)
  • Credential-less Data Operations: Data Access Layer supports operations without explicit credentials when environment-based auth is available, with client-side auth checks (#159, #177)

Database

  • pgroll Migration System: Pre-upgrade migration jobs using pgroll for schema changes

Web UI

  • The UI has been completely rewritten and relocated to /src/ui, replacing the legacy frontend.

Getting OSMO

Helm Charts and Containers

Helm charts and docker containers are available in NGC

CLI Client

The installers for the CLI client for MacOS (Apple Silicon), x86-64 Linux, and ARM64 Linux are attached as assests to this release.

6.2.6

05 Mar 21:07
94e6f63

Choose a tag to compare

6.2.6 Pre-release
Pre-release

Release Candidate for v6.2

6.0.0

20 Nov 18:25

Choose a tag to compare

Major features

Workflow Management

OSMO provides a sophisticated workflow orchestration system that allows users to define, submit, and monitor complex AI workflows through both a web UI and CLI:

  • Multi-Task Orchestration: Define complex workflows with serial and parallel task execution patterns, with automatic dependency management and synchronization through barriers
  • Priority-Based Scheduling: Support for HIGH, NORMAL, and LOW priority levels with intelligent preemption and GPU borrowing across pools to maximize utilization
  • Interactive Development: Exec into running containers, port-forward services, and rsync files between local workstations and remote tasks for seamless debugging
  • Resource Management: Flexible resource specification with support for GPUs, CPUs, memory, and storage across multiple platforms and node types
  • Automatic Rescheduling: Handle transient failures gracefully with configurable retry policies and exit code handling
  • Template Support: Create reusable, parameterized workflow specifications with Jinja templating for automation and scaling

Data Management

OSMO's data layer provides efficient storage and access to datasets with versioning and metadata support:

  • Dataset Versioning: Track dataset evolution with automatic versioning and deduplication to optimize storage
  • Multiple Storage Backends: Support for AWS S3, Azure Blob Storage, and Google Storage with seamless integration
  • Efficient Data Transfer: Multi-threaded, multi-process uploads and downloads with automatic resume capabilities
  • Collections: Group related datasets together for easier organization and management
  • Metadata and Labels: Tag datasets with custom metadata and labels for powerful querying and discovery
  • Regex-Based Selection: Upload or download partial datasets using regex patterns for fine-grained control

Applications

The Apps feature allows users to create reusable applications from workflow specifications:

  • Parameterized Applications: Define apps with customizable parameters that users can adjust at launch time
  • Easy Sharing: Package complex workflows as simple-to-launch applications for team collaboration
  • Workflow Abstraction: Hide complexity behind user-friendly interfaces while maintaining full workflow power

Pools and Resource Management

OSMO introduces a sophisticated resource management system based on pools and platforms:

  • Pool-Based Access Control: Teams are granted access to specific resource pools with RBAC for secure multi-tenancy within an organization
  • Dynamic Pool Sizing: Pool sizes can be adjusted dynamically to respond to changing workload priorities
  • Platform Support: Each pool supports one or more platform types (GPU models, architectures) with automatic resource validation
  • Resource Sharing: Resources can be allocated to multiple pools simultaneously for maximum utilization
  • Quota Management: View and track resource quotas and usage across pools
  • Maintenance Mode: Admins can mark pools for maintenance to prevent new submissions during updates

Compute Backend Integration

OSMO seamlessly integrates with Kubernetes clusters and various compute backends:

  • Multi-Cluster Support: Manage workflows across multiple Kubernetes clusters (AWS EKS, Azure AKS, GCP GKE, on-premise)
  • KAI Scheduler Support: Support for advanced workflow scheduling with NVIDIA KAI Scheduler
  • Customizable Pod Templates: Flexible pod template configurations allowing administrators to customize resource requests, limits, tolerations, and node affinities per backend

Web User Interface

A modern, responsive web interface provides comprehensive workflow and data management:

  • Workflow Dashboard: View, filter, and manage workflows with real-time status updates
  • Interactive Task Graphs: Visualize workflow structure and task dependencies
  • Live Log Streaming: Stream logs in real-time from running workflows with syntax highlighting
  • Resource Visualization: Monitor cluster resources, pool quotas, and node utilization
  • Dataset Browser: Browse, visualize, and manage datasets with metadata editing
  • Shell Access: Browser-native terminal for executing commands in running tasks
  • Pool Management: View pool information, supported platforms, and available quotas

Command Line Interface

A powerful CLI provides full access to OSMO capabilities with scripting and automation support:

  • Intuitive Commands: Organized command structure for workflows, datasets, resources, pools, and configuration
  • Multi-Platform Support: Native support for Mac (Apple Silicon), as well as both x86-64 and ARM 64 architectures on Linux
  • Auto-Completion: Tab completion support on Linux and macOS for faster command entry
  • Multiple Output Formats: JSON and human-readable text output formats for easy integration with scripts
  • Profile Management: Configure default settings for backend, bucket, and notification preferences
  • Automatic Reconnection: Port-forward and exec commands automatically reconnect on disconnection

Security and Authentication

Enterprise-grade security features protect workflows and data:

  • OIDC Integration: OAuth2.0-based authentication via Keycloak, which can be configured to connect to other OAuth 2.0 or SAML authentication providers
  • RBAC: Role-based access control for pools, backends, and resources
  • Token Scoping: Limited-scope JWT tokens with appropriate time-to-live durations
  • Limited Scope Access Tokens: Users can create access tokens with restricted scopes, enabling secure and granular control over permissions. These tokens can be used for login and automated access, ensuring users and services only have the access they require.

Framework Integration

OSMO integrates seamlessly with popular AI/ML frameworks and tools, and comes with tutorials to demonstrate their use:

  • Distributed Training: TorchRun, DeepSpeed, and elastic training support for multi-node DNN training
  • Reinforcement Learning: Isaac Lab integration for RL training workflows
  • Simulation: Isaac Sim integration for synthetic data generation (SDG) and simulation workflows
  • ROS/ROS2: Support for robotics workflows with multi-node communication and hardware-in-the-loop testing
  • Development Tools: Jupyter Notebook, VSCode, and File Browser integration for interactive development
  • ML Tools: Weights & Biases (wandb) integration for experiment tracking
  • NVIDIA GR00T: Sample workflows for Gr00T finetuning, GR00T mimic, and GR00T interactive notebook

Getting OSMO

Helm Charts and Containers

Helm charts and docker containers are available in NGC

CLI Client

The installers for the CLI client for MacOS (Apple Silicon), x86-64 Linux, and ARM64 Linux are attached as assests to this release.