Feature/ascend-rlinf-add-math-vqa-model #1

Taoxu-unihannover · 2025-11-07T02:56:29Z

Description

Motivation and Context

How has this been tested?

Additional information (optional, e.g., figures and logs):

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed.

…nf#129) Signed-off-by: Bo Dai <daibo@infini-ai.com>

Signed-off-by: QuanluZhang <zhangquanlu@infini-ai.com>

…Linf#139) Signed-off-by: Hao Lin <linhaomails@gmail.com>

* feat: simplify the maniskill reset id * feat: fix offload wrapper for all reset state id Signed-off-by: hongzhi <zanghongzhi@infini-ai.com>

Signed-off-by: xusi <xusiforwork@gmail.com>

Signed-off-by: Hao Lin <linhaomails@gmail.com>

…inf#145) Signed-off-by: Hao Lin <linhaomails@gmail.com>

…aces (RLinf#90) * feat(async_vllm): add async vllm worker and unified sync/async interfaces Signed-off-by: Bo Dai <daibo@infini-ai.com>

* chore: fix README-EN and add README-CN Signed-off-by: LiuYiwei <1252642155@qq.com>

Signed-off-by: Hao Lin <linhaomails@gmail.com>

Signed-off-by: hongzhi <zanghongzhi@infini-ai.com>

* chore: fix README-EN and add README-CN Signed-off-by: LiuYiwei <1252642155@qq.com>

Signed-off-by: Hao Lin <linhaomails@gmail.com>

…nto optional; override get_benchmark to support multi-task training; (RLinf#153) * fix(embodied): fix function of ordered reset id; make gradient ckpt into optional; override get_benchmark to support multi-task training; change the default gradient ckpt as True; add yaml for libero90 and libero130; change the way of overriding get_benchmark; use logging.info; Signed-off-by: weimingjie <weimingjie@infini-ai.com>

…Linf#136) Signed-off-by: qurakchin <czzcy3832515@hotmail.com> Signed-off-by: yaozhuyu <yaozhuyu@infini-ai.com> Co-authored-by: huangzx02 <huangzx02@foxmail.com> Co-authored-by: zhuchunyang <zhuchunyang@infini-ai.com> Co-authored-by: yaozhuyu <yaozhuyu@infini-ai.com>

…vel for reward type; support chunk-level with mask setting (RLinf#150) * fix: maniskill_render; simulator_stage_seed; quickstart; reward_type step_level to action_level; support chunk-level reward with loss mask in grpo; del abs path; split rank and seed-offset; fix eval pipeline bug; also fix for libero and robotwin Signed-off-by: hongzhi <zanghongzhi@infini-ai.com>

Signed-off-by: infiniAI_wmj <mjwei@infiniAI-wmjdeMacBook-Air.local> Co-authored-by: infiniAI_wmj <mjwei@infiniAI-wmjdeMacBook-Air.local>

Signed-off-by: Hao Lin <linhaomails@gmail.com>

…vertor feat(weight): refactor and add qwen2.5-vl mg2hf convertor

…stbin-coder#2) * Merge pull request justbin-coder#1 from anHappyDog/feature/weight_convertor feat(weight): refactor and add qwen2.5-vl mg2hf convertor * feat(mm_input): add basic vision-language dataset processor and yaml config Signed-off-by: Bo Dai <daibo@infini-ai.com> * feat(mm_input): add vLLM multimodal support Signed-off-by: Bo Dai <daibo@infini-ai.com> --------- Signed-off-by: Bo Dai <daibo@infini-ai.com>

Signed-off-by: guozhen1997 <2997871698@qq.com> Signed-off-by: Bo Dai <daibo@infini-ai.com> Co-authored-by: Bo Dai <daibo@infini-ai.com>