Skip to content

Commit ffba7e6

Browse files
author
“Justbin482”
committed
Merge remote-tracking branch 'upstream/main'
2 parents ab7d4d5 + 89e398e commit ffba7e6

56 files changed

Lines changed: 1577 additions & 608 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/math_e2e.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,11 @@ jobs:
6767
export REPO_PATH=$(pwd)
6868
bash tests/e2e_tests/math/sglang/run_pipeline.sh
6969
70+
- name: vLLM Pipeline mode
71+
run: |
72+
export REPO_PATH=$(pwd)
73+
bash tests/e2e_tests/math/vllm/run_pipeline.sh
74+
7075
qwen-grpo-test-sglang044:
7176
runs-on: rlinf
7277
container:

.github/workflows/math_e2e_rollout_logprobs.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,3 +66,9 @@ jobs:
6666
run: |
6767
export REPO_PATH=$(pwd)
6868
bash tests/e2e_tests/math/sglang/run_pipeline.sh qwen2.5-1.5b-grpo-pipeline-rollout-logprobs.yaml
69+
70+
- name: vLLM Pipeline mode
71+
run: |
72+
export REPO_PATH=$(pwd)
73+
bash tests/e2e_tests/math/vllm/run_pipeline.sh qwen2.5-1.5b-grpo-pipeline-rollout-logprobs.yaml
74+

.pre-commit-config.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,8 @@ repos:
33
rev: "v0.12.9"
44
hooks:
55
- id: ruff
6-
args: ["--preview"]
6+
args: ["--preview", "--fix"]
77
- id: ruff-format
8-
args: ["--check"]
98

109
- repo: https://github.com/commit-check/commit-check
1110
rev: "v0.10.2"

README.md

Lines changed: 33 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,13 @@
1111
<a href="https://github.com/RLinf/misc/blob/main/pic/wechat.jpg?raw=true"><img src="https://img.shields.io/badge/微信-green?logo=wechat&amp"></a>
1212
</div>
1313

14+
<div align="center">
15+
16+
[![English](https://img.shields.io/badge/lang-English-blue.svg)](README.md)
17+
[![简体中文](https://img.shields.io/badge/语言-简体中文-red.svg)](README.zh-CN.md)
18+
19+
</div>
20+
1421
<h1 align="center">
1522
<sub>RLinf: Reinforcement Learning Infrastructure for Agentic AI</sub>
1623
</h1>
@@ -25,6 +32,7 @@ RLinf is a flexible and scalable open-source infrastructure designed for post-tr
2532
## What's NEW!
2633
- [2025/09] <img src="https://github.githubassets.com/images/icons/emoji/unicode/1f525.png" width="18" /> [Example Gallery](https://rlinf.readthedocs.io/en/latest/rst_source/examples/index.html) is updated, users can find various off-the-shelf examples!
2734
- [2025/09] The paper [RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation](https://arxiv.org/abs/2509.15965) is released.
35+
- [2025/09] The [report on RLinf by Machine Heart](https://mp.weixin.qq.com/s/Xtv4gDu3lhDDGadLrzt6Aw) is released.
2836
- [2025/08] RLinf is open-sourced. The formal v0.1 will be released soon.
2937

3038
## Key Features
@@ -68,7 +76,7 @@ RLinf is a flexible and scalable open-source infrastructure designed for post-tr
6876
<div align="center">
6977
<table>
7078
<tr>
71-
<th colspan="5" style="text-align:center;"><strong>OpenVLA-OFT model results on ManiSkill3</strong></th>
79+
<th colspan="5" style="text-align:center;"><strong>OpenVLA and OpenVLA-OFT model results on ManiSkill3</strong></th>
7280
</tr>
7381
<tr>
7482
<th>Model</th>
@@ -120,10 +128,10 @@ RLinf is a flexible and scalable open-source infrastructure designed for post-tr
120128
</tr>
121129
<tr>
122130
<th>Model</th>
123-
<th><a href="https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-spatial">Spatial</a></th>
124-
<th><a href="https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-goal">Goal</a></th>
125-
<th><a href="https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-object">Object</a></th>
126-
<th><a href="https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-long">Long</a></th>
131+
<th><a href="https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-spatial"><img src="docs/source-en/_static/svg/hf-logo.svg" alt="HF" width="16" height="16" style="vertical-align: middle;">Spatial</a></th>
132+
<th><a href="https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-goal"><img src="docs/source-en/_static/svg/hf-logo.svg" alt="HF" width="16" height="16" style="vertical-align: middle;">Goal</a></th>
133+
<th><a href="https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-object"><img src="docs/source-en/_static/svg/hf-logo.svg" alt="HF" width="16" height="16" style="vertical-align: middle;">Object</a></th>
134+
<th><a href="https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-long"><img src="docs/source-en/_static/svg/hf-logo.svg" alt="HF" width="16" height="16" style="vertical-align: middle;">Long</a></th>
127135
<th>Average</th>
128136
</tr>
129137
<tr>
@@ -166,9 +174,9 @@ RLinf is a flexible and scalable open-source infrastructure designed for post-tr
166174
</tr>
167175
<tr>
168176
<th>Model</th>
169-
<th><a href="https://huggingface.co/datasets/RLinf/AIME24">AIME 24</a></th>
170-
<th><a href="https://huggingface.co/datasets/RLinf/AIME25">AIME 25</a></th>
171-
<th><a href="https://huggingface.co/datasets/RLinf/GPQA-diamond">GPQA-diamond</a></th>
177+
<th>AIME 24</a></th>
178+
<th>AIME 25</a></th>
179+
<th>GPQA-diamond</a></th>
172180
<th>Average</th>
173181
</tr>
174182
<tr>
@@ -211,9 +219,9 @@ RLinf is a flexible and scalable open-source infrastructure designed for post-tr
211219
</tr>
212220
<tr>
213221
<th>Model</th>
214-
<th><a href="https://huggingface.co/datasets/RLinf/AIME24">AIME 24</a></th>
215-
<th><a href="https://huggingface.co/datasets/RLinf/AIME25">AIME 25</a></th>
216-
<th><a href="https://huggingface.co/datasets/RLinf/GPQA-diamond">GPQA-diamond</a></th>
222+
<th>AIME 24</a></th>
223+
<th>AIME 25</a></th>
224+
<th>GPQA-diamond</a></th>
217225
<th>Average</th>
218226
</tr>
219227
<tr>
@@ -330,7 +338,20 @@ If you find **RLinf** helpful, please cite the paper:
330338
}
331339
```
332340

333-
If you use RL+VLA in RLinf, you can also cite our empirical study paper:
341+
If you use RL+VLA in RLinf, you can also cite our technical report and empirical study paper:
342+
343+
```bibtex
344+
@misc{zang2025rlinfvlaunifiedefficientframework,
345+
title={RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training},
346+
author={Hongzhi Zang and Mingjie Wei and Si Xu and Yongji Wu and Zhen Guo and Yuanqing Wang and Hao Lin and Liangzhi Shi and Yuqing Xie and Zhexuan Xu and Zhihao Liu and Kang Chen and Wenhao Tang and Quanlu Zhang and Weinan Zhang and Chao Yu and Yu Wang},
347+
year={2025},
348+
eprint={2510.06710},
349+
archivePrefix={arXiv},
350+
primaryClass={cs.RO},
351+
url={https://arxiv.org/abs/2510.06710},
352+
}
353+
```
354+
334355
```bibtex
335356
@misc{liu2025rlbringvlageneralization,
336357
title={What Can RL Bring to VLA Generalization? An Empirical Study},

0 commit comments

Comments
 (0)