issue/277 perf: batch async dispatch, fix GIL release, fix build_model_inputs by ma-hang · Pull Request #278 · InfiniTensor/InfiniLM

ma-hang · 2026-03-24T02:53:47Z

测试平台

A100

优化前

============ Serving Benchmark Result ============
Successful requests:                     16
Failed requests:                         0
Maximum request concurrency:             16
Benchmark duration (s):                  27.37
Total input tokens:                      4080
Total generated tokens:                  16508
Request throughput (req/s):              0.58
Output token throughput (tok/s):         603.17
Peak output token throughput (tok/s):    704.00
Peak concurrent requests:                16.00
Total token throughput (tok/s):          752.24
---------------Time to First Token----------------
Mean TTFT (ms):                          1059.22
Median TTFT (ms):                        1056.99
P99 TTFT (ms):                           1236.73
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          25.48
Median TPOT (ms):                        25.66
P99 TPOT (ms):                           25.73
---------------Inter-token Latency----------------
Mean ITL (ms):                           25.65
Median ITL (ms):                         25.53
P99 ITL (ms):                            67.47
==================================================

优化后

============ Serving Benchmark Result ============
Successful requests:                     16        
Failed requests:                         0         
Maximum request concurrency:             16        
Benchmark duration (s):                  20.38     
Total input tokens:                      4080      
Total generated tokens:                  16400     
Request throughput (req/s):              0.79      
Output token throughput (tok/s):         804.65    
Peak output token throughput (tok/s):    1008.00   
Peak concurrent requests:                16.00     
Total token throughput (tok/s):          1004.83   
---------------Time to First Token----------------
Mean TTFT (ms):                          697.80    
Median TTFT (ms):                        723.32    
P99 TTFT (ms):                           725.15    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          19.22     
Median TPOT (ms):                        19.23     
P99 TPOT (ms):                           19.51     
---------------Inter-token Latency----------------
Mean ITL (ms):                           19.23     
Median ITL (ms):                         19.32     
P99 ITL (ms):                            21.85     
==================================================

…l_inputs

pengcheng888 · 2026-03-24T03:03:25Z

补充一个测试截图

pengcheng888 · 2026-03-24T03:02:41Z

csrc/pybind11/engine/engine.hpp

        .def(
-            "forward", [](InferEngine &self, const InferEngine::Input &input) -> InferEngine::Output { return self.forward(input); }, "Run inference on all ranks with arbitrary arguments")
+            "forward", [](InferEngine &self, const InferEngine::Input &input) -> InferEngine::Output {
+                py::gil_scoped_release release;


py::gil_scoped_release release; 这是什么

issue/277 perf: batch async dispatch, fix GIL release, fix build_mode…

ad2f05d

…l_inputs

ma-hang requested review from Ceng23333, pengcheng888 and wooway777 March 24, 2026 02:53

ma-hang linked an issue Mar 24, 2026 that may be closed by this pull request

[DEV] 推理服务性能优化 #277

Open

pengcheng888 reviewed Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue/277 perf: batch async dispatch, fix GIL release, fix build_model_inputs#278

issue/277 perf: batch async dispatch, fix GIL release, fix build_model_inputs#278
ma-hang wants to merge 1 commit intomainfrom
issue/277

ma-hang commented Mar 24, 2026 •

edited

Loading

Uh oh!

pengcheng888 commented Mar 24, 2026

Uh oh!

pengcheng888 Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ma-hang commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

测试平台

优化前

优化后

Uh oh!

pengcheng888 commented Mar 24, 2026

Uh oh!

pengcheng888 Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ma-hang commented Mar 24, 2026 •

edited

Loading