diff --git a/docs/en/envoy_ai_gateway/index.mdx b/docs/en/envoy_ai_gateway/index.mdx new file mode 100644 index 0000000..e1a3b19 --- /dev/null +++ b/docs/en/envoy_ai_gateway/index.mdx @@ -0,0 +1,6 @@ +--- +weight: 115 +--- +# Alauda Build of Envoy AI Gateway + + diff --git a/docs/en/envoy_ai_gateway/install.mdx b/docs/en/envoy_ai_gateway/install.mdx new file mode 100644 index 0000000..2b080c0 --- /dev/null +++ b/docs/en/envoy_ai_gateway/install.mdx @@ -0,0 +1,36 @@ +--- +weight: 20 +--- + +# Install Envoy AI Gateway + +## Downloading Cluster Plugin + +:::info + +`Alauda Build of Envoy AI Gateway` cluster plugin can be retrieved from Customer Portal. + +Please contact Consumer Support for more information. + +::: + +## Uploading the Cluster Plugin + +For more information on uploading the cluster plugin, please refer to + +## Installing Alauda Build of Envoy AI Gateway + +1. Go to the `Administrator` -> `Marketplace` -> `Cluster Plugin` page, switch to the target cluster, and then deploy the `Alauda Build of Envoy AI Gateway` Cluster plugin. + :::info + **Note: Deploy form parameters can be kept as default or modified after knowing how to use them.** + ::: + +2. Verify result. You can see the status of "Installed" in the UI or you can check the pod status: + ```bash + kubectl get pods -n envoy-gateway-system | grep "ai-gateway" + ``` + +## Upgrading Alauda Build of Envoy AI Gateway + +1. Upload the new version for package of **Alauda Build of Envoy AI Gateway** plugin to ACP. +2. Go to the `Administrator` -> `Clusters` -> `Target Cluster` -> `Functional Components` page, then click the `Upgrade` button to upgrade **Alauda Build of Envoy AI Gateway** to the new version. diff --git a/docs/en/envoy_ai_gateway/intro.mdx b/docs/en/envoy_ai_gateway/intro.mdx new file mode 100644 index 0000000..2d5da83 --- /dev/null +++ b/docs/en/envoy_ai_gateway/intro.mdx @@ -0,0 +1,33 @@ +--- +weight: 10 +--- + +# Introduction + +## Envoy AI Gateway + +**Alauda Build of Envoy AI Gateway** is based on the [Envoy AI Gateway](https://aigateway.envoyproxy.io/) project. +Envoy AI Gateway is a Kubernetes-native, AI-specific gateway layer built on top of [Envoy Gateway](https://gateway.envoyproxy.io/), providing intelligent traffic management, routing, and policy enforcement for AI inference workloads. + +Main components and capabilities include: + +- **AI-Aware Routing**: Routes inference requests to the appropriate backend model service based on request content, model name, and backend availability — enabling transparent multi-model serving behind a single endpoint. +- **OpenAI-Compatible API**: Exposes a unified, OpenAI-compatible API surface (`/v1/chat/completions`, `/v1/completions`, `/v1/models`) for all downstream inference services, regardless of the underlying runtime. +- **Per-Model Rate Limiting & Policies**: Enforces fine-grained rate limiting, token quotas, and traffic policies at the individual model level, preventing resource starvation and ensuring fair usage across tenants. +- **Backend Load Balancing**: Distributes inference requests across multiple replicas of the same model using configurable load-balancing strategies, with health checking and automatic failover. +- **Envoy Gateway Integration**: Runs as an extension of Envoy Gateway, inheriting its Kubernetes Gateway API-native control plane, TLS termination, and observability features (metrics, access logs, distributed tracing). +- **Gateway API Inference Extension (GIE)**: Integrates with the Kubernetes SIG Gateway API Inference Extension for advanced, inference-aware scheduling and load balancing decisions based on real-time backend state. + +Envoy AI Gateway is a required dependency of **Alauda Build of KServe** for exposing inference services. + +For installation on the platform, see [Install Envoy AI Gateway](./install). + +## Documentation + +Envoy AI Gateway upstream documentation and related resources: + +- **Envoy AI Gateway Documentation**: [https://aigateway.envoyproxy.io/](https://aigateway.envoyproxy.io/) — Official documentation covering architecture, configuration, and API references. +- **Envoy AI Gateway GitHub**: [https://github.com/envoyproxy/ai-gateway](https://github.com/envoyproxy/ai-gateway) — Source code, release notes, and issues. +- **Envoy Gateway**: [https://gateway.envoyproxy.io/](https://gateway.envoyproxy.io/) — The underlying gateway infrastructure that Envoy AI Gateway extends. +- **Gateway API Inference Extension (GIE)**: [https://gateway-api-inference-extension.sigs.k8s.io/](https://gateway-api-inference-extension.sigs.k8s.io/) — Kubernetes SIG project for AI-aware routing integrated with Envoy AI Gateway. +- **KServe (Alauda Build)**: [../kserve/intro](../kserve/intro) — KServe uses Envoy AI Gateway as a required dependency for exposing and routing inference services. diff --git a/docs/en/installation/ai-cluster.mdx b/docs/en/installation/ai-cluster.mdx index ee72e11..c74f6fd 100644 --- a/docs/en/installation/ai-cluster.mdx +++ b/docs/en/installation/ai-cluster.mdx @@ -155,6 +155,10 @@ Confirm that the **Alauda AI** tile shows one of the following states: +## Installing Alauda Build of KServe Operator + +For detailed installation steps, see [Install KServe](../kserve/install.mdx) in Alauda Build of KServe. + ## Enabling Knative Functionality Knative functionality is an optional capability that requires an additional operator and instance to be deployed. @@ -220,6 +224,7 @@ Once **Knative Operator** is installed, you need to create the `KnativeServing` 6. Replace the content with the following YAML: 7. Click **Create**. + ```yaml apiVersion: operator.knative.dev/v1beta1 kind: KnativeServing @@ -254,10 +259,14 @@ Once **Knative Operator** is installed, you need to create the `KnativeServing` kourier: enabled: true ``` +:::warning +- For ACP 4.0, use version **1.18.1** +- For ACP 4.1 and above, use version **1.19.6** +::: -1. For ACP 4.0, keep the version as "1.18.1". For ACP 4.1 and above, change the version to "1.19.6". +1. Specify the version of Knative Serving to be deployed. 2. `private-registry` is a placeholder for your private registry address. You can find this in the **Administrator** view, then click **Clusters**, select `your cluster`, and check the **Private Registry** value in the **Basic Info** section. @@ -347,75 +356,6 @@ default True Succeeded Now, the core capabilities of Alauda AI have been successfully deployed. If you want to quickly experience the product, please refer to the [Quick Start](../../overview/quick_start.mdx). -## Migrating to Knative Operator - -In the 1.x series of products, the serverless capability for inference services was provided by the `Alauda AI Model Serving` operator. In the 2.x series, this capability is provided by the `Knative Operator`. This section guides you through migrating your serverless capability from the legacy operator to the new one. - -### 1. Remove Legacy Serving Instance - - - -#### Procedure - -In **Administrator** view: - -1. Click **Marketplace / OperatorHub**. -2. At the top of the console, from the **Cluster** dropdown list, select the destination cluster where **Alauda AI** is installed. -3. Select **Alauda AI**, then click the **All Instances** tab. -4. Locate the `default` instance and click **Update**. -5. In the update form, locate the **Serverless Configuration** section. -6. Set **BuiltIn Knative Serving** to `Removed`. -7. Click **Update** to apply the changes. - - - -### 2. Install Knative Operator and Create Serving Instance - -Install the **Knative Operator** from the Marketplace and create the `KnativeServing` instance. For detailed instructions, refer to the [Enabling Knative Functionality](#enabling-knative-functionality) section. - -:::info -Once the above steps are completed, the migration of the Knative serving control plane is complete. - -- If you are migrating from the **Alauda AI 2.0** + **Alauda AI Model Serving** combination, the migration is fully complete here. Business services will automatically switch their configuration shortly. -- If you are migrating from the **Alauda AI 1.x** + **Alauda AI Model Serving** combination, please ensure that **Alauda AI** is simultaneously upgraded to version **2.x**. -::: - -## Replace GitLab Service After Installation - -If you want to replace GitLab Service after installation, follow these steps: - -1. **Reconfigure GitLab Service** - Refer to the [Pre-installation Configuration](./pre-configuration.mdx) and re-execute its steps. - -2. **Update Alauda AI Instance** - - In Administrator view, navigate to **Marketplace > OperatorHub** - - From the **Cluster** dropdown, select the target cluster - - Choose **Alauda AI** and click the **All Instances** tab - - Locate the **'default'** instance and click **Update** - -3. **Modify GitLab Configuration** - In the **Update default** form: - - Locate the **GitLab** section - - Enter: - - **Base URL**: The URL of your new GitLab instance - - **Admin Token Secret Namespace**: `cpaas-system` - - **Admin Token Secret Name**: `aml-gitlab-admin-token` - -4. **Restart Components** - Restart the `aml-controller` deployment in the `kubeflow` namespace. - -5. **Refresh Platform Data** - In Alauda AI management view, re-manage all namespaces. - - In Alauda AI view, navigate to **Admin** view from **Business View** - - On the **Namespace Management** page, delete all existing managed namespaces - - Use "Managed Namespace" to add namespaces requiring Alauda AI integration - :::info - Original models won't migrate automatically - Continue using these models: - - Recreate and re-upload in new GitLab OR - - Manually transfer model files to new repository - ::: - ## FAQ ### 1. Configure the audit output directory for aml-skipper diff --git a/docs/en/installation/ai-generative.mdx b/docs/en/installation/ai-generative.mdx deleted file mode 100644 index 4cfbe45..0000000 --- a/docs/en/installation/ai-generative.mdx +++ /dev/null @@ -1,104 +0,0 @@ ---- -weight: 35 ---- - -# Install Alauda Build of KServe - -**Alauda Build of KServe** is a cloud-native component built on **KServe** for serving generative AI models. As an extension of the Alauda AI ecosystem, it specifically optimizes for **Large Language Models (LLMs)**, offering essential features such as inference orchestration, streaming responses, and resource-based auto-scaling for generative workloads. - - -## Prerequisites - -Before installing **Alauda Build of KServe**, you need to ensure the following dependencies are installed: - -### Required Dependencies - -| Dependency | Type | Description | -|------------|------|-------------| -| Alauda build of Envoy Gateway | Operator | Provides the underlying gateway functionality for AI services | -| Envoy AI Gateway | Cluster Plugin | Provides AI-specific gateway capabilities | -| [Alauda Build of LeaderWorkerSet](../../lws/install.mdx) | Cluster Plugin | Provides leader-worker set functionality for AI workloads | - -:::info -`Alauda build of Envoy Gateway` is natively integrated into ACP 4.2. For environments running earlier versions (including ACP 4.0 and 4.1), please contact Customer Support for compatibility and installation guidance. -::: - -### Optional Dependencies - -| Dependency | Type | Description | -|------------|------|-------------| -| GIE | Built-in | Integrated GIE (gateway-api-inference-extension) for enhanced AI capabilities. Can be enabled through the Alauda Build of KServe UI. | -| Alauda AI | Operator | Required only if you need to use KServe Predictive AI functionality. Can be disabled if you only need LLM Generative AI functionality. | - -### Installation Notes - -1. **Required Dependencies**: All three required dependencies must be installed before installing Alauda Build of KServe. -2. **GIE Integration**: If you want to use GIE, you can enable it during the installation process by selecting the "Integrated GIE" option in the Alauda Build of KServe UI. -3. **Alauda AI Integration**: If you don't need KServe Predictive AI functionality and only want to use LLM Generative AI, you can disable the "Integrated With Alauda AI" option during installation. - -## Downloading Cluster Plugin - -:::info - -`Alauda Build of KServe` cluster plugin can be retrieved from Customer Portal. - -Please contact Consumer Support for more information. - -::: - -## Uploading the Cluster Plugin - -For more information on uploading the cluster plugin, please refer to - -## Installing Alauda Build of KServe - -1. Go to the `Administrator` -> `Marketplace` -> `Cluster Plugin` page, switch to the target cluster, and then deploy the `Alauda Build of KServe` Cluster plugin. - -2. In the deployment form, configure the following parameters as needed: - -### Envoy Gateway Configuration - -| Parameter | Description | Default Value | -|-----------|-------------|---------------| -| **ServiceAccount Name** | The name of the service account used by Envoy Gateway. | envoy-gateway | -| **ServiceAccount Namespace** | The namespace where the service account is located. | envoy-gateway-system | -| **Create Instance** | Create an Envoy Gateway instance to manage inference traffic with bundled extensions. | Enabled | -| **Instance Name** | The name of the Envoy Gateway instance to be created. | aieg | - -### Envoy AI Gateway Configuration - -| Parameter | Description | Default Value | -|-----------|-------------|---------------| -| **Service Name** | The Kubernetes service name for Envoy AI Gateway. | ai-gateway-controller | -| **Port Number** | The port number used by Envoy AI Gateway. | 1063 | - -### KServe Gateway Configuration - -| Parameter | Description | Default Value | -|-----------|-------------|---------------| -| **Enabled** | Install a KServe Gateway Instance for inferenceservices functionality. | Enabled | -| **Gateway Name** | The name of the KServe Gateway. | kserve-ingress-gateway | -| **Gateway Namespace** | The namespace where the KServe Gateway is deployed. | kserve | -| **GatewayClass** | Optional. The custom name for the GatewayClass. If left empty, the system will automatically derive it following the "\{Namespace\}-\{Name\}" pattern. | (Empty) | -| **Port Number** | The port number used by KServe Gateway. | 80 | - -### GIE(gateway-api-inference-extension) Configuration - -| Parameter | Description | Default Value | -|-----------|-------------|---------------| -| **BuiltIn** | Install with the bundled gateway-api-inference-extension v0.5.1 dependencies for enhanced AI capabilities. | Enabled | - -### Alauda AI Integration - -| Parameter | Description | Default Value | -|-----------|-------------|---------------| -| **Integrated** | Enable integration with Alauda AI core plugin to reuse existing configurations. | Disabled | - -3. Click **Install** to begin the installation process. - -4. Verify result. You can see the status of "Installed" in the UI. - -## Upgrading Alauda Build of KServe - -1. Upload the new version for package of **Alauda Build of KServe** plugin to ACP. -2. Go to the `Administrator` -> `Clusters` -> `Target Cluster` -> `Functional Components` page, then click the `Upgrade` button, and you will see the `Alauda Build of KServe` can be upgraded. diff --git a/docs/en/installation/pre-configuration.mdx b/docs/en/installation/pre-configuration.mdx index 3be96ff..439704f 100644 --- a/docs/en/installation/pre-configuration.mdx +++ b/docs/en/installation/pre-configuration.mdx @@ -4,16 +4,6 @@ weight: 5 # Pre-installation Configuration -## **Deploy Service Mesh** - -Since Alauda AI leverages Service Mesh capabilities for model inference services, Service Mesh must be deployed in the cluster before deploying Alauda AI. For detailed deployment procedures, refer to . - -:::info - -After completing the **Prerequisites** on the **Create Service Mesh** page, proceed to the **Creating a Service Mesh** page and follow the on-screen instructions to finalize the deployment of the Service Mesh. - -::: - ## **Preparing the GitLab Service** In Alauda AI, GitLab is the core component for **Model Management**. Before deploying Alauda AI, you **must prepare** a GitLab service. diff --git a/docs/en/kserve/index.mdx b/docs/en/kserve/index.mdx new file mode 100644 index 0000000..7a59347 --- /dev/null +++ b/docs/en/kserve/index.mdx @@ -0,0 +1,7 @@ +--- +weight: 95 +--- + +# Alauda Build of KServe + + diff --git a/docs/en/kserve/install.mdx b/docs/en/kserve/install.mdx new file mode 100644 index 0000000..a070fe7 --- /dev/null +++ b/docs/en/kserve/install.mdx @@ -0,0 +1,182 @@ +--- +weight: 20 +--- + +# Install KServe + +## Prerequisites + +Before installing **Alauda Build of KServe**, you need to ensure the following dependencies are installed: + +### Required Dependencies + +| Dependency | Type | Description | +|------------|------|-------------| +| Alauda build of Envoy Gateway | Operator | Provides the underlying gateway functionality for AI services | +| [Alauda Build of Envoy AI Gateway](../../envoy_ai_gateway/install.mdx) | Cluster Plugin | Provides AI-specific gateway capabilities | +| [Alauda Build of LeaderWorkerSet](../../lws/install.mdx) | Cluster Plugin | Provides leader-worker set functionality for AI workloads | +| GIE (gateway-api-inference-extension) | Built-in | Bundled with Alauda Build of KServe by default. If GIE is already installed in the cluster, the built-in installation can be disabled via the `gie.builtIn` parameter during operator configuration. | + +:::info +`Alauda build of Envoy Gateway` is natively integrated into ACP 4.2. For environments running earlier versions (including ACP 4.0 and 4.1), please contact Customer Support for compatibility and installation guidance. +::: + +### Installation Notes + +1. **Required Dependencies**: All required dependencies must be installed before installing Alauda Build of KServe. +2. **GIE Integration**: GIE is bundled and enabled by default. If your environment already has GIE installed separately, set `gie.builtIn` to `false` in the operator configuration to disable the built-in installation. + +## Upload Operator + +Download the Alauda Build of KServe Operator installation file (e.g., `kserve-operator.ALL.xxxx.tgz`). + +Use the `violet` command to publish it to the platform repository: + +```bash +violet push --platform-address= --platform-username= --platform-password= kserve-operator.ALL.xxxx.tgz +``` + +## Install Operator + +In **Administrator** view: + +1. Click **Marketplace / OperatorHub**. +2. At the top of the console, from the **Cluster** dropdown list, select the destination cluster where you want to install the KServe Operator. +3. Search for and select **Alauda Build of KServe**, then click **Install**. + + **Install Alauda Build of KServe** window will pop up. + +4. Leave **Channel** unchanged. +5. Check whether the **Version** matches the **Alauda Build of KServe** version you want to install. +6. Leave **Installation Location** unchanged, it should be `kserve-operator` by default. +7. Select **Manual** for **Upgrade Strategy**. +8. Click **Install**. + +### Verification + +Confirm that the **Alauda Build of KServe** tile shows one of the following states: + +- `Installing`: installation is in progress; wait for this to change to `Installed`. +- `Installed`: installation is complete. + +## Create KServe Instance + +After the operator is installed, create a `KServe` custom resource to deploy the KServe instance. + +Switch to **YAML view** and apply the following configuration, then adjust the callout fields for your environment: + +```yaml +apiVersion: components.aml.dev/v1alpha1 +kind: KServe +metadata: + name: default-kserve +spec: + namespace: kserve # [!code callout] + values: + global: + clusterName: # [!code callout] + deployFlavor: single-node # [!code callout] + platformAddress: # [!code callout] + preset: + GIE: # [!code callout] + enabled: true + envoy_ai_gateway: # [!code callout] + port: 1063 + service: ai-gateway-controller + envoy_gateway: # [!code callout] + create_instance: true + deploy_type: ControllerNamespace + instance_name: aieg + sa_namespace: envoy-gateway-system + service_account: envoy-gateway + kserve_gateway: # [!code callout] + enabled: true + gateway_class: "" + name: kserve-ingress-gateway + namespace: kserve + port: 80 + registry: + address: # [!code callout] + kserve: + controller: + deploymentMode: Knative # [!code callout] + gateway: + domain: # [!code callout] + storage: + caBundleConfigMapName: aml-global-ca-bundle # [!code callout] +``` + + + +1. `spec.namespace` — Kubernetes namespace where KServe components are deployed. Default: `kserve`. +2. `global.clusterName` — Cluster name as registered in the platform. Example: `business-1`. +3. `global.deployFlavor` — `single-node` for non-HA, `ha-cluster` for production HA. +4. `global.platformAddress` — Alauda Container Platform management endpoint address. Example: `https://192.168.131.112`. +5. `preset.GIE` — Built-in Gateway API Inference Extension for enhanced AI capabilities. See [GIE Configuration](#gie-gateway-api-inference-extension-configuration). +6. `preset.envoy_ai_gateway` — AI-specific gateway for intelligent routing and policy enforcement. See [Envoy AI Gateway Configuration](#envoy-ai-gateway-configuration). +7. `preset.envoy_gateway` — Underlying Envoy-based gateway infrastructure. See [Envoy Gateway Configuration](#envoy-gateway-configuration). +8. `preset.kserve_gateway` — Ingress gateway for KServe inference services. See [KServe Gateway Configuration](#kserve-gateway-configuration). +9. `global.registry.address` — The container registry endpoint used by the target cluster (`global.clusterName`) to pull KServe infrastructure and runtime images. +Example: `registry.alauda.cn:60070`. +10. `kserve.controller.deploymentMode` — Set to `Knative` for serverless features like scale-to-zero, or `Standard` for native Kubernetes deployments +11. `kserve.controller.gateway.domain` — Domain for the ingress gateway to expose inference service endpoints. Use a wildcard domain, e.g., `*.example.com`. +12. `kserve.storage.caBundleConfigMapName` — ConfigMap name containing the CA bundle for storage connections. + + + + +### Verification + +Check the status of the `KServe` resource: + +```bash +kubectl get kserve default-kserve -n kserve-operator +``` + +The instance is ready when the status shows `DEPLOYED: True`. + +### Envoy Gateway Configuration + +| Field | Description | Default | +|-------|-------------|---------| +| `preset.envoy_gateway.service_account` | Service account name used by Envoy Gateway. | `envoy-gateway` | +| `preset.envoy_gateway.sa_namespace` | Namespace where the Envoy Gateway service account is located. | `envoy-gateway-system` | +| `preset.envoy_gateway.create_instance` | Create an Envoy Gateway instance to manage inference traffic with bundled extensions. | `true` | +| `preset.envoy_gateway.instance_name` | Name of the Envoy Gateway instance to create. | `aieg` | + +### Envoy AI Gateway Configuration + +| Field | Description | Default | +|-------|-------------|---------| +| `preset.envoy_ai_gateway.service` | Kubernetes service name for Envoy AI Gateway. | `ai-gateway-controller` | +| `preset.envoy_ai_gateway.port` | Port number used by Envoy AI Gateway. | `1063` | + +### KServe Gateway Configuration + +| Field | Description | Default | +|-------|-------------|---------| +| `preset.kserve_gateway.enabled` | Deploy a KServe Gateway instance for InferenceService traffic. | `true` | +| `preset.kserve_gateway.name` | Name of the KServe Gateway. | `kserve-ingress-gateway` | +| `preset.kserve_gateway.namespace` | Namespace where the KServe Gateway is deployed. | `kserve` | +| `preset.kserve_gateway.gateway_class` | Optional custom GatewayClass name. If empty, derived as `{namespace}-{name}`. | `""` | +| `preset.kserve_gateway.port` | Port number used by the KServe Gateway. | `80` | + +### GIE (gateway-api-inference-extension) Configuration + +| Field | Description | Default | +|-------|-------------|---------| +| `preset.GIE.enabled` | Enable the bundled Gateway API Inference Extension. Set to `false` if GIE is already installed separately in the cluster. | `true` | + + +## Upgrading Alauda Build of KServe + +1. Upload the new version of the **Alauda Build of KServe** operator package using the `violet` tool. +2. Go to the `Administrator` -> `Marketplace` -> `OperatorHub` page, find **Alauda Build of KServe**, and click **Confirm** to apply the new version. + +### Verification + +After upgrading, confirm that the **Alauda Build of KServe** tile shows `Installed` and verify the KServe instance status: + +```bash +kubectl get kserve default-kserve -n kserve-operator +``` \ No newline at end of file diff --git a/docs/en/kserve/intro.mdx b/docs/en/kserve/intro.mdx new file mode 100644 index 0000000..6f277f3 --- /dev/null +++ b/docs/en/kserve/intro.mdx @@ -0,0 +1,44 @@ +--- +weight: 10 +--- + +# Introduction + +## KServe + +**Alauda Build of KServe** is based on the [KServe](https://kserve.github.io/website/). +KServe provides a standardized, cloud-native interface for serving machine learning models at scale on Kubernetes. +It has evolved around two primary scenarios: **Predictive AI** for traditional ML inference, and **Generative AI** for LLM-based workloads. + +### Generative AI + +Generative AI support is optimized for Large Language Model (LLM) serving with OpenAI-compatible APIs. + +- **llm-d (Distributed LLM Inference)**: A Kubernetes-native distributed inference framework that runs under the KServe control plane. llm-d orchestrates multi-node LLM inference using a Leader/Worker pattern and makes real-time routing decisions based on KV cache state and GPU load — enabling KV-cache-aware request scheduling, elastic tensor/pipeline parallelism, and cluster-wide inference that behaves like a single machine. This lowers cost per token and maximizes GPU utilization for large models (e.g., Llama 3.1 405B) that exceed single-node memory. +- **LLM Inference & Streaming**: Native support for streaming responses (SSE / chunked transfer), enabling real-time token delivery for chat and completion workloads, with OpenAI-compatible `/chat/completions` and `/completions` APIs. +- **vLLM Runtime**: First-class integration with vLLM as the high-performance LLM serving backend, with support for continuous batching and PagedAttention. +- **Gateway Integration**: Native integration with Envoy Gateway and the Gateway API Inference Extension (GIE) for AI-aware traffic routing, load balancing, and per-model rate limiting across inference services. +- **Autoscaling for LLMs**: Metrics-driven autoscaling policies tailored to LLM throughput characteristics, including scale-to-zero for cost efficiency. + +### Predictive AI + +Predictive AI covers traditional machine learning model serving with high throughput and low latency requirements. + +- **InferenceService**: The core CRD for deploying and managing model serving endpoints. Supports canary rollouts, traffic splitting across model versions, and A/B testing workflows. +- **Model Serving Runtimes**: Pre-integrated runtimes for popular ML frameworks — TensorFlow Serving, TorchServe, Triton Inference Server, SKLearn, XGBoost, and more. Custom runtimes are supported via the **ClusterServingRuntime** and **ServingRuntime** CRDs. +- **Inference Graph**: The **InferenceGraph** CRD enables composing multiple models into a pipeline, including pre/post-processing nodes, routing logic, and ensemble patterns. +- **Autoscaling**: Scale-to-zero and scale-from-zero support via KEDA or Kubernetes HPA, with policies based on request rate, queue depth, or custom metrics. + +For installation on the platform, see [Install KServe](./install). + +## Documentation + +KServe upstream documentation and key dependencies: + +- **KServe Documentation**: [https://kserve.github.io/website/](https://kserve.github.io/website/) — Official documentation covering concepts, model serving runtimes, and API references. +- **KServe GitHub**: [https://github.com/kserve/kserve](https://github.com/kserve/kserve) — Source code, release notes, and issues. +- **llm-d**: [https://github.com/llm-d/llm-d](https://github.com/llm-d/llm-d) — Kubernetes-native distributed LLM inference framework with KV-cache-aware scheduling and elastic parallelism. +- **LeaderWorkerSet (LWS)**: [https://github.com/kubernetes-sigs/lws](https://github.com/kubernetes-sigs/lws) — Kubernetes SIG workload controller for multi-node Leader/Worker patterns, required for multi-node LLM inference. +- **Envoy Gateway**: [https://gateway.envoyproxy.io/](https://gateway.envoyproxy.io/) — Kubernetes-native gateway built on Envoy Proxy, providing the underlying traffic management for KServe inference services. +- **Envoy AI Gateway**: [https://aigateway.envoyproxy.io/](https://aigateway.envoyproxy.io/) — AI-specific gateway capabilities layered on top of Envoy Gateway, including AI-aware routing and per-model policies. +- **Gateway API Inference Extension (GIE)**: [https://gateway-api-inference-extension.sigs.k8s.io/](https://gateway-api-inference-extension.sigs.k8s.io/) — Kubernetes SIG project providing AI-aware routing and load balancing for inference services. diff --git a/docs/en/kubeflow/index.mdx b/docs/en/kubeflow/index.mdx index dae986a..fd4b38d 100644 --- a/docs/en/kubeflow/index.mdx +++ b/docs/en/kubeflow/index.mdx @@ -1,5 +1,5 @@ --- -weight: 61 +weight: 120 --- # Alauda support for Kubeflow diff --git a/docs/en/kueue/index.mdx b/docs/en/kueue/index.mdx index fbd9406..37983a9 100644 --- a/docs/en/kueue/index.mdx +++ b/docs/en/kueue/index.mdx @@ -1,5 +1,5 @@ --- -weight: 82 +weight: 92 --- # Alauda Build of Kueue diff --git a/docs/en/kueue/install.mdx b/docs/en/kueue/install.mdx index f1a1a5c..aff3e3c 100644 --- a/docs/en/kueue/install.mdx +++ b/docs/en/kueue/install.mdx @@ -2,7 +2,7 @@ weight: 20 --- -# Install +# Install Kueue ## Downloading Cluster plugin diff --git a/docs/en/llama_stack/index.mdx b/docs/en/llama_stack/index.mdx index 9e4afeb..483f684 100644 --- a/docs/en/llama_stack/index.mdx +++ b/docs/en/llama_stack/index.mdx @@ -1,5 +1,5 @@ --- -weight: 83 +weight: 98 --- # Alauda Build of Llama Stack diff --git a/docs/en/lws/index.mdx b/docs/en/lws/index.mdx index 873f482..fc78b27 100644 --- a/docs/en/lws/index.mdx +++ b/docs/en/lws/index.mdx @@ -1,5 +1,5 @@ --- -weight: 90 +weight: 100 --- # Alauda Build of LeaderWorkerSet diff --git a/docs/en/lws/install.mdx b/docs/en/lws/install.mdx index c4dd5a2..2e7cb61 100644 --- a/docs/en/lws/install.mdx +++ b/docs/en/lws/install.mdx @@ -2,7 +2,7 @@ weight: 20 --- -# Install +# Install LeaderWorkerSet ## Downloading Cluster plugin diff --git a/docs/en/lws/intro.mdx b/docs/en/lws/intro.mdx new file mode 100644 index 0000000..4813776 --- /dev/null +++ b/docs/en/lws/intro.mdx @@ -0,0 +1,29 @@ +--- +weight: 10 +--- + +# Introduction + +## LeaderWorkerSet + +**Alauda Build of LeaderWorkerSet** is based on the [LeaderWorkerSet (LWS)](https://github.com/kubernetes-sigs/lws) Kubernetes SIG project. +LeaderWorkerSet provides a Kubernetes-native workload API for deploying groups of pods in a **Leader/Worker** pattern, enabling multi-node distributed workloads — particularly large AI model training and inference — to run as first-class citizens on Kubernetes. + +Main components and capabilities include: + +- **LeaderWorkerSet CRD**: The core API resource that defines a group of replicated Leader/Worker pod sets. Each replica consists of one leader pod and a configurable number of worker pods, co-scheduled and managed as a unit. +- **Co-scheduling & Topology Awareness**: Leader and worker pods within a group are scheduled together, with support for topology spread constraints to co-locate pods on the same node, rack, or availability zone for low-latency inter-node communication (e.g., NVLink, InfiniBand). +- **Multi-node LLM Inference**: Enables large language models that exceed single-node GPU memory (e.g., Llama 3.1 405B) to be served across multiple nodes using tensor parallelism or pipeline parallelism. LWS is a required dependency of **Alauda Build of KServe** for this use case. +- **Multi-node Training**: Supports distributed training frameworks (PyTorch DDP, DeepSpeed, Megatron-LM) by providing stable, co-located leader/worker pod groups with predictable hostnames and network identities. +- **Rolling Updates & Failure Recovery**: Supports rolling restarts and automatic pod replacement at the group level, ensuring the entire Leader/Worker group is recycled consistently when a failure or update occurs. +- **Startup Sequencing**: The leader pod can act as the entry point and coordinator, with worker pods starting after the leader is ready — enabling frameworks that require a master process to be initialized before workers connect. + +For installation on the platform, see [Install LeaderWorkerSet](./install). + +## Documentation + +LeaderWorkerSet upstream documentation and related resources: + +- **LeaderWorkerSet Documentation**: [https://lws.sigs.k8s.io/](https://lws.sigs.k8s.io/) — Official documentation covering concepts, API reference, and usage guides. +- **LeaderWorkerSet GitHub**: [https://github.com/kubernetes-sigs/lws](https://github.com/kubernetes-sigs/lws) — Source code, API reference, and examples for the LeaderWorkerSet Kubernetes SIG project. +- **KServe (Alauda Build)**: [../kserve/intro](../kserve/intro) — KServe uses LeaderWorkerSet as a required dependency for multi-node LLM inference workloads. diff --git a/docs/en/trustyai/index.mdx b/docs/en/trustyai/index.mdx index f9fdd99..359407d 100644 --- a/docs/en/trustyai/index.mdx +++ b/docs/en/trustyai/index.mdx @@ -1,5 +1,5 @@ --- -weight: 95 +weight: 110 --- # Alauda Build of TrustyAI diff --git a/docs/en/upgrade/migrating-to-knative-operator.mdx b/docs/en/upgrade/migrating-to-knative-operator.mdx new file mode 100644 index 0000000..77332f1 --- /dev/null +++ b/docs/en/upgrade/migrating-to-knative-operator.mdx @@ -0,0 +1,43 @@ +--- +weight: 20 +--- + +# Migrating to Knative Operator + +In the 1.x series of products, the serverless capability for inference services was provided by the `Alauda AI Model Serving` operator. In the 2.x series, this capability is provided by the `Knative Operator`. This section guides you through migrating your serverless capability from the legacy operator to the new one. + +## 1. Remove Legacy Serving Instance + + + +### Procedure + +In **Administrator** view: + +1. Click **Marketplace / OperatorHub**. +2. At the top of the console, from the **Cluster** dropdown list, select the destination cluster where **Alauda AI** is installed. +3. Select **Alauda AI**, then click the **All Instances** tab. +4. Locate the `default` instance and click **Update**. +5. On the update page, switch to the **YAML** view. +6. Set `spec.knativeServing.managementState` to `Removed`, for example: + + ```yaml + spec: + knativeServing: + managementState: Removed + ``` + +7. Click **Update** to apply the changes. + + + +## 2. Install Knative Operator and Create Serving Instance + +Install the **Knative Operator** from the Marketplace and create the `KnativeServing` instance. For detailed instructions, refer to the [Enabling Knative Functionality](../installation/ai-cluster.mdx#enabling-knative-functionality) section. + +:::info +Once the above steps are completed, the migration of the Knative serving control plane is complete. + +- If you are migrating from the **Alauda AI 2.0** + **Alauda AI Model Serving** combination, the migration is fully complete here. Business services will automatically switch their configuration shortly. +- If you are migrating from the **Alauda AI 1.x** + **Alauda AI Model Serving** combination, please ensure that **Alauda AI** is simultaneously upgraded to version **2.x**. +::: diff --git a/docs/en/upgrade/upgrade-from-previous-version.mdx b/docs/en/upgrade/upgrade-from-previous-version.mdx index 0fdd35b..aa9b8de 100644 --- a/docs/en/upgrade/upgrade-from-previous-version.mdx +++ b/docs/en/upgrade/upgrade-from-previous-version.mdx @@ -3,7 +3,7 @@ weight: 10 --- export const prevVersion = '1.5' -export const curVer = '2.0' +export const curVer = '2.2' # Upgrade Alauda AI @@ -16,12 +16,13 @@ Upgrade from {prevVersion} to {curVer} Please visit [Alauda AI Cluster](../installation/ai-cluster.mdx) for: :::warning -Please ignore `Creating Alauda AI Cluster Instance` since we are upgrading **Alauda AI** from a previously managed version. +Please ignore `Creating Alauda AI Instance` since we are upgrading **Alauda AI** from a previously managed version. ::: -1. [Downloading](../installation/ai-cluster.mdx#downloading) operator bundle packages for `Alauda AI Cluster` and `KServeless`. -2. [Uploading](../installation/ai-cluster.mdx#uploading) operator bundle packages to the destination cluster. -3. To upgrade, follow the process described below. +1. [Downloading](../installation/ai-cluster.mdx#downloading) operator bundle packages for `Alauda AI` and `Knative Operator` (Optional). +2. [Downloading](../kserve/install.mdx#upload-operator) operator bundle packages for `Alauda Build of KServe`. +3. [Uploading](../installation/ai-cluster.mdx#uploading) operator bundle packages to the destination cluster. +4. To upgrade, follow the process described below. ## Pre-Upgrade Operations @@ -76,19 +77,25 @@ After the upgrade is complete, please confirm that the status of **Alauda AI Ess ### Upgrading Alauda AI Operators -The procedure for upgrading both operators is nearly identical, with only the target component being different. +The procedure for upgrading the operator is nearly identical, with only the target component being different. -| Step | Alauda AI Operator | Alauda AI Model Serving Operator | -|:----------------|:--------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------| -| **1. Navigate** | Log into the Web Console, then go to **Marketplace > OperatorHub** in the **Administrator** view. | Log into the Web Console, then go to **Marketplace > OperatorHub** in the **Administrator** view. | -| **2. Select** | Select your target **cluster**. | Select your target **cluster**. | -| **3. Click** | Click the **Alauda AI** card. | Click the **Alauda AI Model Serving** card. | -| **4. Confirm** | Click **Confirm** on the upgrade prompt. | Click **Confirm** on the upgrade prompt. | +| Step | Alauda AI Operator | +|:----------------|:--------------------------------------------------------------------------------------------------| +| **1. Navigate** | Log into the Web Console, then go to **Marketplace > OperatorHub** in the **Administrator** view. | +| **2. Select** | Select your target **cluster**. | +| **3. Click** | Click the **Alauda AI** card. | +| **4. Confirm** | Click **Confirm** on the upgrade prompt. | :::info Once the new version is uploaded and recognized by the platform, an upgrade prompt will appear at the top of the operator's page. ::: +### Installing Alauda Build of KServe Operator + +Starting from version {curVer}, **Alauda Build of KServe** is provided as a separate operator plugin to offer more specialized and flexible model serving capabilities. After completing the core AI operator upgrades, you must install the KServe operator to enable model serving functionality. + +For detailed installation and configuration steps, please refer to the [Alauda Build of KServe Installation Guide](../kserve/install.mdx). + ### Upgrading Cluster Plugins :::info @@ -178,7 +185,7 @@ For each existing inference service, perform the following steps: ### Alauda AI -Check the status field from the `AmlCluster` resource which named `default`: +Check the status field from the `AmlCluster` resource named `default`: ```bash kubectl get amlcluster default @@ -191,22 +198,22 @@ NAME READY REASON default True Succeeded ``` -### Alauda AI Model Serving +### Alauda Build of KServe -Check the status field from the `KnativeServing` resource which named `default-knative-serving`: +Check the status field from the `KServe` resource named `default-kserve`: ```bash -kubectl get KnativeServing.components.aml.dev default-knative-serving +kubectl get kserve default-kserve -n kserve-operator ``` -Should returns `InstallSuccessful`: +Should return `DEPLOYED: True`: ``` -NAME DEPLOYED REASON -default-knative-serving True UpgradeSuccessful +NAME DEPLOYED REASON +default-kserve True UpgradeSuccessful ``` -### Alauda AI Cluster Plugins +### Other Cluster Plugins In the **Administrator** view, navigate to **Marketplace > Cluster Plugins** and confirm that the following cluster plugins show `Installed` status with the new version: @@ -215,3 +222,9 @@ In the **Administrator** view, navigate to **Marketplace > Cluster Plugins** and - Alauda AI Volcano (if deployed) + +## Deprecating Alauda AI Model Serving + +Starting from the **Alauda AI 2.x** series, the legacy **Alauda AI Model Serving** operator is deprecated. We strongly recommend that users requiring serverless inference capabilities switch to the **Knative Operator** as soon as possible to ensure long-term support and access to the latest features. + +For guidance on how to move your serverless workloads to the new operator, please see the [Migrating to Knative Operator](./migrating-to-knative-operator.mdx) guide.