Skip to content

Commit d884069

Browse files
committed
fix: update GPU configuration keys in CI files
- Changed `gpu_id` to `gpu_ids` in `config.yaml` and related documentation for consistency. - Updated `run.py` to reflect the new key for GPU ID retrieval. - Enhanced README to clarify the usage of the updated GPU configuration.
1 parent 373fa40 commit d884069

3 files changed

Lines changed: 20 additions & 8 deletions

File tree

.ci/README.md

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -24,34 +24,46 @@
2424
## 配置文件 `config.yaml`
2525

2626
```yaml
27+
repo:
28+
url: https://github.com/InfiniTensor/InfiniOps.git
29+
branch: master
30+
2731
registry:
28-
url: "" # Harbor 地址,本地开发时留空
32+
url: "" # Harbor 地址,本地开发时留空
2933
project: infiniops
34+
credentials_env: REGISTRY_TOKEN
3035

3136
images:
3237
nvidia:
3338
dockerfile: .ci/images/nvidia/
3439
build_args:
3540
BASE_IMAGE: nvcr.io/nvidia/pytorch:24.10-py3
41+
ascend:
42+
dockerfile: .ci/images/ascend/
43+
build_args:
44+
BASE_IMAGE: ascendhub.huawei.com/public-ascendhub/ascend-pytorch:24.0.0
45+
private_sdk:
46+
source: "${PRIVATE_SDK_URL}"
3647

3748
jobs:
3849
nvidia_gpu:
39-
image: stable # stable | latest | 具体 commit hash
50+
image: stable # stable | latest | 具体 commit hash
4051
platform: nvidia
4152
resources:
42-
gpu_id: "0" # GPU 设备 ID,如 "0" "0,2" "all"
53+
gpu_ids: "0" # GPU 设备 ID,如 "0" "0,2" "all"
54+
gpu_type: A100
4355
memory: 32GB
4456
timeout: 3600
4557
setup: pip install .[dev]
4658
stages:
4759
- name: test
48-
run: pytest tests/ -v --tb=short
60+
run: pytest tests/ -v --tb=short --junitxml=/workspace/test-results.xml
4961
```
5062
5163
- **`registry.url`** 为空时镜像仅保存在本地,tag 格式为 `<project>-ci/<platform>:<tag>`。
5264
- **`images.<platform>.build_args`** 会作为 `--build-arg` 传入 `docker build`。
5365
- **`jobs.<name>.image`** 支持 `stable`、`latest` 或具体 commit hash。
54-
- **`resources.gpu_id`** 指定 GPU 设备 ID,支持 `"0"`、`"0,2"`、`"all"` 等格式,映射为 `docker run --gpus "device=..."`。也可保留 `gpu_count` 按数量分配。
66+
- **`resources.gpu_ids`** 指定 GPU 设备 ID,支持 `"0"`、`"0,2"`、`"all"` 等格式,映射为 `docker run --gpus "device=..."`。也可保留 `gpu_count` 按数量分配。
5567

5668
## 镜像构建 `build.py`
5769

@@ -110,7 +122,7 @@ python .ci/run.py [options]
110122
| `--branch` | `config.yaml` 中的 `repo.branch` | 覆盖克隆分支 |
111123
| `--stage` | 全部 | 仅运行指定 stage |
112124
| `--image-tag` | job 中的 `image` 字段 | 覆盖镜像版本 |
113-
| `--gpu-id` | config 中的 `gpu_id` | GPU 设备 ID,如 `0`、`0,2`、`all` |
125+
| `--gpu-id` | config 中的 `gpu_ids` | GPU 设备 ID,如 `0`、`0,2`、`all` |
114126
| `--dry-run` | — | 仅打印 docker 命令,不执行 |
115127
| `--config` | `.ci/config.yaml` | 配置文件路径 |
116128

.ci/config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ jobs:
2424
image: stable
2525
platform: nvidia
2626
resources:
27-
gpu_id: "0" # 指定 GPU ID,如 "0" "0,2" "all"
27+
gpu_ids: "0" # 指定 GPU ID,如 "0" "0,2" "all"
2828
gpu_type: A100
2929
memory: 32GB
3030
timeout: 3600

.ci/run.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ def build_docker_args(
9292
args.append("-e")
9393
args.append(f"STAGE_{i + 1}_CMD={s['run']}")
9494

95-
gpu_id = gpu_id_override or str(resources.get("gpu_id", ""))
95+
gpu_id = gpu_id_override or str(resources.get("gpu_ids", ""))
9696
gpu_count = resources.get("gpu_count", 0)
9797
if gpu_id:
9898
if gpu_id == "all":

0 commit comments

Comments
 (0)