Skip to content

Maximum token limit per request causes failure with large system_prompt_len #288

@kfirtoledo

Description

@kfirtoledo

What happened:
Running inference-perf with a large system_prompt_len (25000 tokens) causes the tool to fail with the following error:

File "/usr/local/lib/python3.12/site-packages/aiohttp/streams.py", line 388, in readuntil
    raise ValueError("Chunk too big")

How to reproduce it (as minimally and precisely as possible):
Use a shared_prefix workload with a large system prompt:

type: shared_prefix
  shared_prefix:
    num_groups: 20              
    num_prompts_per_group: 2      
    system_prompt_len: 25000   

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions