feat: support video nsa by wooway777 · Pull Request #463 · InfiniTensor/InfiniLM

wooway777 · 2026-06-29T09:53:29Z

Summary

Motivation

Closes #

Type of Change

feat — new feature / new model
fix — bug fix
perf — performance improvement (no behavioral change)
refactor — code restructuring without behavior change
test — adding or fixing tests only
docs — documentation only
build / ci — build system or CI configuration
chore — tooling, formatting, or other non-code changes
Breaking change

Test Results of Involved Models on Supported Platforms (Please attach screenshots)

Benchmark / Performance Impact

Notes for Reviewers

CI / ChatOps

Checklist

Every contributor must verify every item below before requesting
review. Tick each box only after the check has actually been performed —
do not tick speculatively. If an item truly does not apply, replace the
checkbox with N/A and briefly explain why in an inline comment.

Title, Branch, and Commits

PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
Each commit message follows Conventional Commits.
Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
No stray merge commits from main — the branch is rebased cleanly on top of the current main.
No fixup! / squash! / wip commits remain.
Existing PR/branch/commit that followed the legacy issue format.

Scope and Design

Changes are minimal — nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
No unrelated formatting churn that would obscure the diff.
Public API changes (if any) are intentional, documented, and reflected in affected callers/tests.

General Code Hygiene (applies to all languages)

The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
No trailing whitespace, tab/space mixing, or stray BOMs.
Identifiers in comments and error messages are wrapped in backticks (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General).
All comments and error messages are in English (CONTRIBUTING.md §Code/General).
Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific (if C++ files changed)

Code follows the Google C++ Style Guide strictly.
Error and warning message wording follows the LLVM Coding Standards (CONTRIBUTING.md §C++).
Constructor initializer list order matches member declaration order (CONTRIBUTING.md §C++).
No raw new/delete; RAII / smart pointers / existing allocators are used.
Changed files are formatted by scripts/format.py.
No changes/reference to csrc/models/llama_legacy/.

Python Specific (if Python files changed)

Code is PEP 8 compliant.
Comments are complete English sentences, starting with a capital letter and ending with punctuation; Markdown backticks are used for code references (CONTRIBUTING.md §Python).
Docstrings (if any) follow PEP 257 (CONTRIBUTING.md §Python).
Changed files are formatted by scripts/format.py.
No changes/reference to python/infinilm/auto_config.py.

Testing

For any platform that could not be tested, an explicit reason is given in the table and a reviewer with access has been tagged.
Passed single request test (examples/test_infer.py), or specify the reason for skipping.
Passed offline performance test (examples/bench.py), or specify the reason for skipping.
Passed sanity test (test/bench/test_benchmark.py), or specify the reason for skipping.
Passed service test (python/infinilm/server/inference_server.py + scripts/test_perf.py), or specify the reason for skipping.

Build, CI, and Tooling

The project builds cleanly from a fresh directory on at least one affected platform.
CI has been triggered manually (Actions → CI on this branch), or /retest was requested.

Documentation

README.md, CONTRIBUTING.md, or inline docs updated when behavior, build flags, or developer workflow changed.
Any user-visible breaking change is called out explicitly under "Motivation" and in the commit/PR title with a ! or BREAKING CHANGE: footer.

Security and Safety

No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
Third-party code is license-compatible and attributed.
No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

wooway777 · 2026-06-29T12:27:55Z

/test

pengcheng888 · 2026-06-30T02:24:10Z

 import os


+def normalize_hf_config_for_infinilm(config_dict, model_path):


normalize_hf_config_for_infinilm(config_dict, model_path)函数的model_path参数是不是没有用到

pengcheng888 · 2026-06-30T02:29:14Z

 from infinilm.llm.llm import LLM


+def decode_video_frames(video_path, num_frames):


decode_video_frames这个函数重复定义了，可以从examples/bench_videonsa.py导入么

感觉test_infer不应该对这个脚本有依赖

pengcheng888 · 2026-06-30T02:34:52Z

 struct MultiModalMetadata {
    std::optional<std::vector<size_t>> image_req_ids;
+    // Flattened [start, end) token ranges in the current packed language sequence.
+    std::optional<std::vector<size_t>> visual_token_ranges;


之前的attnmeta都在python端计算号的，然后传递给c++中，存储到全局的attnmeta的位置。
但这个visual_token_ranges是在c++中被赋值的。和之前的默认行为不一样

pengcheng888 · 2026-06-30T02:36:49Z

+        auto batched_grids = grid_tensors.size() == 1 ? grid_tensors.front() : infinicore::op::cat(grid_tensors, 0);
+        auto batched_vision_hidden = visual_->forward(batched_pixels, batched_grids);
+
+        std::vector<size_t> visual_token_ranges;


再确认下visual_token_ranges变量是不是得在python/infinilm/processors/videonsa_processor.py中计算好，传递过来得，而不是在c++中to_cpu现算

如果是的话，得修改bind的Input结构体，最后的位置新增一个变量。

pengcheng888 · 2026-06-30T02:43:47Z

+def normalize_hf_config_for_infinilm(config_dict, model_path):
+    model_type = config_dict.get("model_type")
+
+    if model_type == "qwen2_5_vl" and config_dict.get("architectures") == [


这是啥意思，为什么要吧"model_type"从qwen2_5_vl修改为videonsa。

pengcheng888 · 2026-06-30T02:44:57Z

+            normalized["text_config"] = text_config
+        return normalized
+
+    return config_dict


这个模型适配了的话，是不是之后qwen2_5_vl模型的适配，能复用csrc/models/videonsa/文件夹中的绝大部分文件

pengcheng888

修改后的代码，（1）给出测试命令和测试截图；（2）已经有多模态模型也能跑

Vincent777 · 2026-06-30T03:58:27Z

/retest

github-actions · 2026-06-30T03:58:35Z

⛔ Only repository members can run retest.

wooway777 · 2026-06-30T13:42:23Z

/test

github-actions · 2026-06-30T13:43:04Z

✅ Started CI workflow run 28448940484 for commit 24c8323 on branch feat/support-video-nsa (triggered by /test).

pengcheng888 · 2026-07-01T02:23:23Z

        /// Target patch sizes for each image (MiniCPM-V).
        std::optional<std::vector<infinicore::Tensor>> tgt_sizes;
+        /// Flattened [start, end) visual token ranges in the packed language sequence.
+        std::optional<std::vector<size_t>> visual_token_ranges;


image_bound tgt_sizes image_req_ids这几个之前是连着的。

visual_token_ranges变量是新增的，放到最后一个位置比较好感觉。

pengcheng888 · 2026-07-01T02:24:18Z

                         std::optional<std::vector<infinicore::Tensor>> pixel_values,
                         std::optional<std::vector<infinicore::Tensor>> image_bound,
                         std::optional<std::vector<infinicore::Tensor>> tgt_sizes,
+                         std::optional<std::vector<size_t>> visual_token_ranges,


同上，visual_token_ranges 位置要不要往后放

wooway777 requested a review from a team June 29, 2026 09:53

pengcheng888 reviewed Jun 30, 2026

View reviewed changes

pengcheng888 requested changes Jun 30, 2026

View reviewed changes

wooway777 force-pushed the feat/support-video-nsa branch from 443e44c to 92d2ee4 Compare June 30, 2026 03:13

wooway777 force-pushed the feat/support-video-nsa branch 2 times, most recently from 4121275 to 24c8323 Compare June 30, 2026 11:14

pengcheng888 reviewed Jul 1, 2026

View reviewed changes

feat: support video nsa

04ba524

wooway777 force-pushed the feat/support-video-nsa branch from 24c8323 to 04ba524 Compare July 1, 2026 03:08

		import os


		def normalize_hf_config_for_infinilm(config_dict, model_path):

		from infinilm.llm.llm import LLM


		def decode_video_frames(video_path, num_frames):

Uh oh!

Conversation

wooway777 commented Jun 29, 2026

Summary

Motivation

Type of Change

Test Results of Involved Models on Supported Platforms (Please attach screenshots)

Benchmark / Performance Impact

Notes for Reviewers

CI / ChatOps

Checklist

Title, Branch, and Commits

Scope and Design

General Code Hygiene (applies to all languages)

C++ Specific (if C++ files changed)

Python Specific (if Python files changed)

Testing

Build, CI, and Tooling

Documentation

Security and Safety

Uh oh!

wooway777 commented Jun 29, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pengcheng888 Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pengcheng888 Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pengcheng888 left a comment

Choose a reason for hiding this comment

Uh oh!

Vincent777 commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

wooway777 commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pengcheng888 Jun 30, 2026 •

edited

Loading

pengcheng888 Jun 30, 2026 •

edited

Loading