What happened:
Running inference-perf with a large system_prompt_len (25000 tokens) causes the tool to fail with the following error:
File "/usr/local/lib/python3.12/site-packages/aiohttp/streams.py", line 388, in readuntil
raise ValueError("Chunk too big")
How to reproduce it (as minimally and precisely as possible):
Use a shared_prefix workload with a large system prompt:
type: shared_prefix
shared_prefix:
num_groups: 20
num_prompts_per_group: 2
system_prompt_len: 25000