diff --git a/docs/en/infrastructure_management/hardware_profile/functions/hardware_profile.mdx b/docs/en/infrastructure_management/hardware_profile/functions/hardware_profile.mdx new file mode 100644 index 0000000..ef7507c --- /dev/null +++ b/docs/en/infrastructure_management/hardware_profile/functions/hardware_profile.mdx @@ -0,0 +1,128 @@ +--- +weight: 10 +--- + +# Hardware Profile Manage + +To configure specific hardware configurations and constraints for your data scientists and engineers to use when deploying model inference services on the platform, you must create and manage associated hardware profiles. The hardware profile encapsulates node affinities, tolerations, and resource constraints into a single, reusable entity. + +## Create a hardware profile + +**Prerequisites** + +* You have logged in to the platform as a user with administrator privileges. +* You have verified your desired computing resources, including CPU, memory, and any specialized accelerators (e.g., GPU models) available in the underlying Kubernetes cluster. +* You are familiar with Kubernetes scheduling concepts such as Node Selectors, Taints, and Tolerations. + +**Procedure** + + + +### Step 1: Navigate to Hardware Profile +From the main navigation menu, go to **Hardware Profile**. The Hardware Profiles page opens, displaying existing hardware profiles in the system. + +### Step 2: Initiate hardware profile creation +Click **Create hardware profile** in the top right corner. The Create hardware profile configuration page opens. + +### Step 3: Configure basic details +In the Basic Details section, provide identifying information for the profile: +* **Name**: Enter a unique and descriptive name for the hardware profile (e.g., `gpu-high-performance-profile`). +* **Description**: (Optional) Enter a clear description of the hardware profile to help other users understand its intended use case. + +### Step 4: Configure resource identifiers (requests and limits) +You can define constraints for compute resources, such as CPU, memory, or specific accelerators (e.g., `nvidia.com/gpu`). Click **Add Identifier** or modify the pre-existing resource fields. You can add two types of identifiers: + +- **Built-in Identifiers**: Select from a dropdown list of standard resource types configured by the platform (e.g., `cpu`, `memory`, `nvidia.com/gpu`). For these built-in types, the **Identifier**, **Display Name**, and **Resource Type** are strictly predefined by the platform and cannot be altered. +- **Custom Identifiers**: Enter your own unique resource parameters. You must manually define: + * **Identifier**: The exact Kubernetes resource key (e.g., `nvidia.com/a100` or a custom vendor ASIC). + * **Display Name**: A human-readable name for the resource that will appear on the UI (e.g., `NVIDIA A100 GPU`). + * **Resource Type**: Categorize the resource accurately for the cluster: + * **`CPU` / `Memory`**: Select to define standard compute boundaries. + * **`Accelerator`**: Select this primarily for any specialized AI chips (like NVIDIA GPUs, AMD GPUs, or Intel Gaudi accelerators) used for model training or heavy inference tasks. By setting the type to Accelerator, the platform explicitly recognizes the dependency as a core AI computing engine. + * **`Other`**: Select this for non-AI auxiliary devices attached to nodes (such as high-speed network interfaces for RDMA, infiniband, or unique storage parameters). + +For both built-in and custom identifiers, you must configure the exact allocation boundaries: +* **Default**: Set the baseline amount of this resource to allocate. This is initially injected into the user's workload when they select the profile. +* **Minimum allowed**: Define the minimum acceptable request amount. This acts as a hard lower bound to prevent users from requesting insufficient resources for critical models. +* **Maximum allowed**: (Optional) Specify an absolute maximum limit. This firmly prevents users from reserving excessive cluster resources beyond the defined capacity threshold. + +### Step 5: Configure node scheduling rules +To rigidly control which nodes the inference workload schedule applies to, set Node Selectors and Tolerations. This ensures high-performance workloads land on the physically correct node pools. +* **Node Selectors**: Under the Node Selectors section, click **Add Node Selector**. Enter the **Key** and **Value** constraints. The platform will automatically inject these key-value pairs to restrict workloads solely to nodes with matching labels. +* **Tolerations**: Under the Tolerations section, click **Add Toleration** to explicitly allow scheduling workloads onto nodes with matching taints. Define the **Key**, **Operator** (e.g., `Equal`, `Exists`), **Value**, **Effect** (e.g., `NoSchedule`, `NoExecute`), and optional **Toleration Seconds**. Like native Kubernetes tolerations, you can add multiple tolerations to a single hardware profile. + +### Step 6: Finalize creation +Review the configurations you have entered to ensure accuracy. Click **Create** to finalize the hardware profile creation. + + + +## Updating a hardware profile + +You can update the existing hardware profiles in your deployment to adapt to new infrastructure changes, hardware upgrades, or iteratively revised resource policies. You can reliably change important identifying information, minimum and maximum resource constraints, or adjust cluster node placements via node selectors and tolerations. + + + +### Step 1: Locate the hardware profile +From the navigation menu, click **Hardware Profile**. Locate the hardware profile you want to update from the list. + +### Step 2: Edit the hardware profile +On the right side of the row containing the relevant hardware profile, click the Action menu (⋮) and select **Update**. + +### Step 3: Modify the configurations +Make the necessary modifications to your hardware profile configurations: +* Safely adjust the **Description**. +* Update the **Default**, **Minimum**, or **Maximum allowed** thresholds for specific resource identifiers to strictly match your modern cluster capacity. +* Modify the **Node Selectors** to target different node labels, or update **Tolerations** to align with newly tainted worker nodes. + +### Step 4: Apply changes +Click **Update** to permanently apply your changes. + + + +*Note: Updating a hardware profile typically affects solely newly configured workloads going forward. Active deployments previously instantiated using this hardware profile will firmly preserve their originally injected constraints. To enforce the new hardware profile settings on an already-running workload, you must explicitly edit or redeploy the corresponding inference service.* + +## Deleting a hardware profile + +When a specific hardware configuration becomes outdated or spans obsolete Kubernetes nodes, you can safely delete its hardware profile. This ensures no future data scientists can incorrectly select obsolete node configurations or unmanageable limits. + + + +### Step 1: Locate the hardware profile +From the main navigation menu, click **Hardware Profile**. Locate the hardware profile you want to delete. + +### Step 2: Delete +Click the Action menu (⋮) on the far right side of the relevant hardware profile row, and securely select **Delete**. + +### Step 3: Confirm deletion +A warning dialog will appear asking you to confirm the deletion context. Click **Delete**. + + + +*Note: Deleting a hardware profile does not delete or actively disrupt running inference services that previously deployed with this profile. They will continue to operate flawlessly with the resource limitations and topology constraints initially injected by the platform's webhook. However, the deleted hardware profile will immediately disappear from the profile selection dropdown for all newly created deployments.* + +## Using a hardware profile for inference services + +When users (such as data scientists, AI engineers, and developers) dynamically create or configure model inference services (both `InferenceService` and `LLMInferenceService`), they can leverage predefined hardware profiles efficiently. + +A hardware profile seamlessly streamlines the tedious task of manually configuring intricate node scheduling rules and setting explicit resource limitations. Depending on your workload specifics, you have the flexibility to accept the strict default configurations or finely customize your limits within the officially boundaries authorized by the selected profile. + + + +### Step 1: Launch deployment form +From the navigation menu, go to **Service Manage**. Click **Create** to launch the form for deploying a brand-new model inference service. + +### Step 2: Select a Hardware Profile +In the deployment form, scroll down and navigate to the **Deployment Resources** section. Here, you can define your resource limits by first choosing a **Config Type**: +* By default, it is set to **Hardware Profile**. You can then click the **Profile** drop-down menu to select a specific hardware profile that is currently enabled by the platform administrator for your desired compute environment. +* Alternatively, you can choose **Custom** if you prefer to bypass predefined profiles and manually supply raw Kubernetes resource limits. + +### Step 3: Review and customize resource allocations +Once you've selected a hardware profile, the form safely locks in corresponding baseline definitions curated by the administrator. However, you are empowered to refine your exact resource limits: +* To view the administrator's designated boundaries, click the **View Detail** button adjacent to the profile dropdown. This opens an informative drawer or modal explicitly highlighting the hardware profile specifics, including the configured node rules and the absolute limits for CPU, Memory, and GPUs. +* Depending on your precise workload needs, click the **Custom Configuration** button displayed dynamically below the hardware profile section. Custom requests and limits strictly must conceptually remain *within the range* defined by the hardware profile's minimum and maximum constraints. +* By triggering this customization, you unlock the ability to directly modify the final **Requests** and **Limits** configuration for the inference service. If you submit an invalid request parameter, the validation engine will elegantly catch the divergence and present you with a validation error. + +### Step 4: Deploy +Populate the remaining parameters for your service and click **Deploy**. + + \ No newline at end of file diff --git a/docs/en/infrastructure_management/hardware_profile/functions/index.mdx b/docs/en/infrastructure_management/hardware_profile/functions/index.mdx new file mode 100644 index 0000000..7176c7c --- /dev/null +++ b/docs/en/infrastructure_management/hardware_profile/functions/index.mdx @@ -0,0 +1,10 @@ +--- +weight: 50 +i18n: + title: + en: Guides +--- + +# Guides + + diff --git a/docs/en/infrastructure_management/hardware_profile/how_to/cpu_and_gpu_profiles.mdx b/docs/en/infrastructure_management/hardware_profile/how_to/cpu_and_gpu_profiles.mdx new file mode 100644 index 0000000..0e46ebc --- /dev/null +++ b/docs/en/infrastructure_management/hardware_profile/how_to/cpu_and_gpu_profiles.mdx @@ -0,0 +1,104 @@ +--- +weight: 30 +i18n: + title: + en: Creating CPU-Only and GPU-Accelerated Profiles + zh: 创建纯 CPU 与 GPU 加速的 Hardware Profile +--- + +# Creating CPU-Only and GPU-Accelerated Profiles + +In a production AI platform, you often need to serve different types of machine learning workloads. For example, traditional machine learning models (like scikit-learn or XGBoost) or simple data processing tasks only require CPU resources, while Large Language Models (LLMs) or complex deep learning models require GPU acceleration. + +By creating distinct Hardware Profiles for CPU-only and GPU-accelerated workloads, you can effectively isolate these two types of services and prevent lightweight CPU models from unintentionally consuming expensive GPU resources. + +## Example 1: CPU-Only Hardware Profile + +A CPU-only profile omits any accelerator identifiers (such as `nvidia.com/gpu`) and strictly relies on `cpu` and `memory` identifiers. + +When creating a CPU-only profile, ensure that: +1. The **Accelerator** resource type is entirely excluded. +2. The Node Selector does not target any GPU-specific nodes. +3. The name and description clearly indicate that this profile is meant for standard ML inference or lightweight models. + +Here is an example of a CPU-only hardware profile: + +```yaml +apiVersion: infrastructure.opendatahub.io/v1alpha1 +kind: HardwareProfile +metadata: + name: standard-cpu-profile + namespace: kube-public +spec: + # Do not include nvidia.com/gpu + identifiers: + - identifier: "cpu" + displayName: "CPU" + minCount: "1" + maxCount: "8" + defaultCount: "2" + resourceType: CPU + - identifier: "memory" + displayName: "Memory" + minCount: "2Gi" + maxCount: "16Gi" + defaultCount: "4Gi" + resourceType: Memory + # Standard CPU nodes + scheduling: + type: Node + node: + nodeSelector: + node-role.kubernetes.io/worker: "true" +``` + +## Example 2: GPU-Accelerated Hardware Profile + +A GPU-accelerated profile explicitly requires the `nvidia.com/gpu` identifier, ensuring that any workload selecting this profile will be allocated physical GPU resources. + +When creating a GPU-accelerated profile: +1. Include an identifier for the specific accelerator (e.g., `nvidia.com/gpu`). +2. Add the corresponding Tolerations if your GPU nodes are tainted (e.g., `nvidia.com/gpu:NoSchedule`). +3. Optionally add a Node Selector to target specific GPU architectures (e.g., `accelerator: nvidia-t4`). + +Here is an example of a GPU-accelerated hardware profile: + +```yaml +apiVersion: infrastructure.opendatahub.io/v1alpha1 +kind: HardwareProfile +metadata: + name: gpu-t4-profile + namespace: kube-public +spec: + identifiers: + # Crucially include the GPU resource + - identifier: "nvidia.com/gpu" + displayName: "GPU" + minCount: "1" + maxCount: "4" + defaultCount: "1" + resourceType: Accelerator + - identifier: "cpu" + displayName: "CPU" + minCount: "4" + maxCount: "16" + defaultCount: "8" + resourceType: CPU + - identifier: "memory" + displayName: "Memory" + minCount: "16Gi" + maxCount: "64Gi" + defaultCount: "32Gi" + resourceType: Memory + scheduling: + type: Node + node: + nodeSelector: + accelerator: nvidia-t4 + tolerations: + - key: "nvidia.com/gpu" + operator: "Exists" + effect: "NoSchedule" +``` + +By providing these two distinctly different profiles, platform administrators can ensure Data Scientists have the exact environment they need, without wasting high-value compute resources on simple tasks. diff --git a/docs/en/infrastructure_management/hardware_profile/how_to/create_hardware_profile_cli.mdx b/docs/en/infrastructure_management/hardware_profile/how_to/create_hardware_profile_cli.mdx new file mode 100644 index 0000000..f28f234 --- /dev/null +++ b/docs/en/infrastructure_management/hardware_profile/how_to/create_hardware_profile_cli.mdx @@ -0,0 +1,84 @@ +--- +weight: 15 +i18n: + title: + en: Create Hardware Profile using CLI + zh: 使用 CLI 创建 Hardware Profile +--- + +# Create Hardware Profile using CLI + +This document describes how to create `HardwareProfile` resources using the command line and provides a sample YAML. + +## Prerequisites + +- You have access to a Kubernetes cluster with the platform installed. +- You have configured `kubectl` to communicate with your cluster. +- You have a namespace where you have permissions to view or create `HardwareProfile` resources (typically a cluster-scoped resource or in a specific admin namespace). + +## Create a HardwareProfile + +Create a YAML file named `gpu-high-performance-profile.yaml` with the following content: + +```yaml +apiVersion: infrastructure.opendatahub.io/v1alpha1 +kind: HardwareProfile +metadata: + name: gpu-high-performance-profile + namespace: kube-public +spec: + # Define resource limitations and defaults + identifiers: + - identifier: "nvidia.com/gpu" + displayName: "GPU" + minCount: "1" + maxCount: "8" + defaultCount: "1" + resourceType: Accelerator + - identifier: "cpu" + displayName: "CPU" + minCount: "4" + maxCount: "32" + defaultCount: "8" + resourceType: CPU + - identifier: "memory" + displayName: "Memory" + minCount: "16Gi" + maxCount: "128Gi" + defaultCount: "32Gi" + resourceType: Memory + # Configure Node Selectors and Tolerations for scheduling + scheduling: + type: Node + node: + nodeSelector: + accelerator: nvidia-a100 + node-role.kubernetes.io/worker: "true" + tolerations: + - key: "nvidia.com/gpu" + operator: "Exists" + effect: "NoSchedule" +``` + +Then apply the YAML file to your cluster using `kubectl`: + +```bash +kubectl apply -f gpu-high-performance-profile.yaml -n kube-public +``` + +## Check HardwareProfile Status + +You can check whether the `HardwareProfile` has been successfully created using the following command: + +```bash +kubectl get hardwareprofile gpu-high-performance-profile -n kube-public +``` + +The output should look similar to this: + +```bash +NAME AGE +gpu-high-performance-profile 2m +``` + +Once correctly applied, your Data Scientists will be able to select **GPU High Performance** when deploying their Inference Services using the UI, and the constraints specified in the profile will automatically validate and inject into the deployed workloads. diff --git a/docs/en/infrastructure_management/hardware_profile/how_to/index.mdx b/docs/en/infrastructure_management/hardware_profile/how_to/index.mdx new file mode 100644 index 0000000..aca892f --- /dev/null +++ b/docs/en/infrastructure_management/hardware_profile/how_to/index.mdx @@ -0,0 +1,11 @@ +--- +weight: 60 +i18n: + title: + en: How To +title: How To +--- + +# How To + + diff --git a/docs/en/infrastructure_management/hardware_profile/how_to/schedule_to_specific_gpu_nodes.mdx b/docs/en/infrastructure_management/hardware_profile/how_to/schedule_to_specific_gpu_nodes.mdx new file mode 100644 index 0000000..cdaacec --- /dev/null +++ b/docs/en/infrastructure_management/hardware_profile/how_to/schedule_to_specific_gpu_nodes.mdx @@ -0,0 +1,45 @@ +--- +weight: 20 +i18n: + title: + en: Schedule Workloads to Specific GPU Nodes + zh: 将工作负载调度到特定的 GPU 节点 +--- + +# Schedule Workloads to Specific GPU Nodes + +When defining a Hardware Profile, you often need to ensure that the AI inference workload is strictly scheduled onto nodes with a specific type of GPU (such as an NVIDIA A100 or H100) and that the workload tolerates the taints on those dedicated nodes to avoid regular CPU workloads taking over the GPU nodes. + +This guide demonstrates how to configure these constraints in a Hardware Profile so that your Data Scientists don't need to manually configure them. + +## Use Node Selectors + +Node selectors allow you to guide pods to specific nodes based on node labels. + +1. Find the exact Kubernetes label of the GPU nodes in your cluster. For example: + * `accelerator: nvidia-a100` + * `nvidia.com/gpu.present: "true"` +2. Edit or create your Hardware Profile. +3. In the **Node Selectors** section, add the Key-Value pair corresponding to the label: + * **Key**: `accelerator` + * **Value**: `nvidia-a100` + +Once saved, any Inference Service attempting to use this Hardware Profile will inherently receive this node selector, ensuring it only lands on a node with an A100 GPU. + +## Use Taints and Tolerations + +GPU nodes are frequently "tainted" by cluster administrators so that standard pods (like web servers or generic databases) are not scheduled on them, thereby reserving the GPU processing power for AI workloads. + +If your GPU nodes have a taint like `nvidia.com/gpu:NoSchedule`, your Hardware Profile must include a corresponding toleration. + +1. Under the **Tolerations** section of your Hardware Profile, add a new toleration. +2. Configure it to match the taint on the GPU node: + * **Key**: `nvidia.com/gpu` + * **Operator**: `Exists` (This tolerates any value for the key `nvidia.com/gpu`. Alternatively, use `Equal` and explicitly set the **Value**). + * **Effect**: `NoSchedule` (Matches the restrictive effect of the taint). + +By adding this toleration to the Hardware Profile, the deployed Inference Service is explicitly granted "permission" to be scheduled on the dedicated GPU nodes. + +## Combined Configuration + +By combining both a **Node Selector** (to instruct the scheduler *where* to go) and a **Toleration** (to allow the scheduler to *place* it there), your Hardware Profile effectively acts as a reliable blueprint for heterogeneous node architectures. diff --git a/docs/en/infrastructure_management/hardware_profile/index.mdx b/docs/en/infrastructure_management/hardware_profile/index.mdx new file mode 100644 index 0000000..9f31ec2 --- /dev/null +++ b/docs/en/infrastructure_management/hardware_profile/index.mdx @@ -0,0 +1,7 @@ +--- +weight: 60 +--- + +# Hardware Profile + + diff --git a/docs/en/infrastructure_management/hardware_profile/intro.mdx b/docs/en/infrastructure_management/hardware_profile/intro.mdx new file mode 100644 index 0000000..ebd18aa --- /dev/null +++ b/docs/en/infrastructure_management/hardware_profile/intro.mdx @@ -0,0 +1,35 @@ +--- +weight: 5 +--- + +# Introduction + +Hardware profiles centrally allow platform administrators to provision specific and standardized hardware configurations. These configurations tightly encapsulate computing resource limits, node selectors, and node tolerations directly into a cohesive unit that platform users can effortlessly select when deploying varying model inference services. + +Utilizing hardware profiles significantly reduces manual errors resulting from raw configuration via YAML, prevents unintentional scheduling on wrong topology groups, and comprehensively ensures robust resource management on cluster workloads. + +Hardware profiles natively support and interact prominently with the platform's `InferenceService` and `LLMInferenceService` resources. + +## Why do we need a Hardware Profile? + +While standard Kubernetes offers resource requests and limits through Pod specifications, constructing and deploying AI inference workloads (such as Large Language Models or specialized KServe predictors) introduces unique operational challenges. Our implementation of Hardware Profiles is tailored specifically to solve these challenges with the following platform-specific characteristics: + +1. **Topology & Specialized Accelerator Abstraction** + Data scientists prioritize model performance and logic rather than the underlying cluster topology. They may not know the exact node labels or taints required to schedule workloads onto specific GPU nodes, vGPU resources, or interconnect networks. A Hardware Profile abstracts away these technical complexities. Administrators can embed precise `Node Selectors` and `Tolerations` directly into the profile, ensuring that when a user selects a "High-End NVIDIA A100" profile from the UI, the workload automatically targets the correct physical machine pools. + +2. **Dynamic Bounded Customization (Not Just Rigid Quotas)** + Unlike platforms that strictly enforce a single, immutable resource size (t-shirt sizing), our system defines a dynamic, scalable boundary for each resource type. Administrators configure the **Minimum allowed**, **Default**, and **Maximum allowed** limits. When a user selects a profile, they inherit the *Default* settings immediately. However, through the **Customize Data** option, they retain the profound flexibility to manually fine-tune their specific Requests and Limits. As long as those values fall within the authorized profile boundaries, they succeed—allowing elasticity for distinct models without risking excessive cluster monopolization. + +3. **Smart Webhook Validation & Asymmetric Auto-Correction** + Our platform employs a dedicated Mutating Webhook that deeply integrates with the model serving pipelines. Instead of relying on users to perfectly craft YAML manifests, the webhook gracefully intercepts the request and safely injects the profile's constraints into the workload runtime. Furthermore, it intelligently safeguards the cluster—for instance, if a user specifies limits but omits requests (or vice versa), the webhook natively performs smart semantic adjustments (capping requests to limits, or elevating defaults) and comprehensively blocks configurations that violate the profile's defined minimum or maximum limits before any Pods are spawned. + +4. **Native Interoperability with Custom Serving Engines** + Whether deploying a standard `InferenceService` or a heavily customized `LLMInferenceService`, the hardware profile engine natively tracks the complex Pod/Container structures behind the scenes and injects exactly into the active predictive container's resources. + +### Key Aspects of a Hardware Profile + +* **Resource Identifiers (Limits & Requests):** Profiles securely govern native Kubernetes limitations (such as minimal CPU thresholds, default available Memory allocations, and strict maximum GPU acceleration limits) to prevent system overload while maintaining operational stability. +* **Taints & Tolerations:** Hardware profiles inherently instruct workload pods precisely which nodes they are resilient enough to handle (e.g., tolerating dedicated heterogeneous hardware taints). +* **Node Selectors:** They strictly constrain workloads to distinct node label selectors to match the correct machine architectures without implicit guessing. +* **Backend Webhook Injection:** Through automated interception mechanisms installed in the cluster, hardware constraints transparently merge and attach to submitted workloads directly from the management namespace. +