-
Notifications
You must be signed in to change notification settings - Fork 57
Description
What happened:
The current implementation of the HFShareGPTDataSource loads the dataset using itertools.cycle() function.
This causes two major issues:
-
itertools.cycleis not picklable, so multiprocessing workers cannot receive or use it.
This breaks benchmarks that use multiple workers and produces errors such as pickling failures or duplicated data sequences. -
When using HuggingFace datasets with streaming=True, the dataset becomes a generator, which is also not picklable.
Combining cycle() with a streaming dataset makes it impossible for multiprocessing to work correctly.
Some workers restart the iterator, while others receive empty data, resulting in inconsistent or incorrect benchmarking behavior.
Due to these two reasons, the ShareGPT data loader fails under multiprocessing, resulting in runtime errors or incorrect repeated samples.
What you expected to happen:
The dataset loader should also work correctly under multiprocessing configuration setups, such as the shared_gpt_multi_turn example. However, the current implementation fails.
How to reproduce it (as minimally and precisely as possible):
- Use a ShareGPT dataset path with streaming=True. (Sufficient to reproduce).
- Any hf dataset load is causing the failure.
- Run inference-perf with
num_workers > 1. (Multiprocessing issue).
inference-perf --config_file examples/vllm/config.yml
2025-12-02 00:26:07,449 - inference_perf.client.filestorage.local - INFO - Report files will be stored at: reports-20251202-002604
Traceback (most recent call last):
File "/Users/sneh.lata/Documents/openSource/inference-perf/.venv/bin/inference-perf", line 10, in <module>
sys.exit(main_cli())
^^^^^^^^^^
File "/Users/sneh.lata/Documents/openSource/inference-perf/inference_perf/main.py", line 319, in main_cli
perfrunner.run()
File "/Users/sneh.lata/Documents/openSource/inference-perf/inference_perf/main.py", line 87, in run
asyncio.run(_run())
File "/Users/sneh.lata/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/Users/sneh.lata/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sneh.lata/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/asyncio/base_events.py", line 686, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/sneh.lata/Documents/openSource/inference-perf/inference_perf/main.py", line 85, in _run
await self.loadgen.run(self.client)
File "/Users/sneh.lata/Documents/openSource/inference-perf/inference_perf/loadgen/load_generator.py", line 530, in run
return await self.mp_run(client)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sneh.lata/Documents/openSource/inference-perf/inference_perf/loadgen/load_generator.py", line 478, in mp_run
self.workers[-1].start()
File "/Users/sneh.lata/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "/Users/sneh.lata/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sneh.lata/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/multiprocessing/context.py", line 289, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "/Users/sneh.lata/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/Users/sneh.lata/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/Users/sneh.lata/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/Users/sneh.lata/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'generator' object
** Would you know anything else we need to know?**:
I tried to make a workaround to fix the same. I understand the need for using itertool.cycle() in the first place, as it provides an infinite iterator over the dataset, which is useful for long-running benchmarks that should never exhaust their data.
I implemented a workaround that resolves the multiprocessing breakage:
-
I replaced itertools.cycle() with a manual cycling mechanism using a list, but this won't fly when we consider very large datasets. I believe a more exhaustive approach using threads can prevent any racing conditions in multiprocessing scenarios. I am still thinking through a more ideal soltuion to this.
-
I also disabled HuggingFace streaming mode and loaded the entire dataset into a list, which is picklable and safe for multiprocessing.
Environment:
- inference-perf version: 0.0.1
- config.yml (entire one printed by the benchmark run):
api:
type: chat
streaming: false
headers: null
data:
type: shareGPT
path: null
input_distribution: null
output_distribution: null
shared_prefix: null
trace: null
load:
type: constant
interval: 1.0
stages:
- !!python/object:inference_perf.config.StandardLoadStage
__dict__:
rate: 1.0
duration: 30
num_requests: null
concurrency_level: null
__pydantic_extra__: null
__pydantic_fields_set__: !!set
rate: null
duration: null
__pydantic_private__: null
sweep: null
num_workers: 12
worker_max_concurrency: 100
worker_max_tcp_connections: 2500
trace: null
circuit_breakers: []
request_timeout: null
metrics: null
report:
request_lifecycle:
summary: true
per_stage: true
per_request: false
prometheus:
summary: true
per_stage: false
storage:
local_storage:
path: reports-20251202-002604
report_file_prefix: null
google_cloud_storage: null
simple_storage_service: null
server:
type: vllm
model_name: <model_name>
base_url: <base_url>
api_key: <api_key>
ignore_eos: false
tokenizer:
pretrained_model_name_or_path: meta-llama/Llama-3.2-1B-Instruct
circuit_breakers: null
- cloud provider or hardware configuration: custom provider, tested on a Linux VM and model running on a H100.
- others: