-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
Describe the bug
The summarize() function in scripts/performance/benchmark_utils.py mutates the argument list used to invoke the summarization script.
Inside the function, a list summarize_args is created and used for the first subprocess.check_call() invocation. Before the second invocation, the same list is modified using .extend():
summarize_args.extend(['--output-format', 'json'])Because the same list object is reused, the function mutates the original argument list. This can cause duplicated CLI arguments if the function is reused or invoked multiple times in the same process.
This behavior introduces unintended side effects and makes the function non-idempotent.
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
Expected Behavior
summarize() should construct a fresh argument list for each subprocess call so that the original list of arguments remains unchanged.
Each subprocess invocation should receive only the arguments required for that specific command execution.
For example:
- First call should receive arguments for generating the text summary.
- Second call should receive arguments for generating the JSON summary without mutating the original list.
Current Behavior
Current Behavior
The same list object is reused and mutated:
summarize_args.extend(['--output-format', 'json'])This modifies the argument list globally for the function scope.
If summarize() is invoked again within the same process (for example in extended benchmarking workflows or reused utilities), the argument list may contain duplicated flags such as:
--output-format json --output-format json
This may lead to unexpected behavior depending on how the summarization script parses arguments.
Reproduction Steps
Reproduction Steps
Example simplified reproduction demonstrating the mutation behavior:
def test_mutation():
summarize_args = ["script", "file1.csv"]
# First command
print("First call:", summarize_args)
# Mutate list
summarize_args.extend(["--output-format", "json"])
# Second command
print("Second call:", summarize_args)
# Simulate reuse
summarize_args.extend(["--output-format", "json"])
print("Third call:", summarize_args)
test_mutation()Output:
First call: ['script', 'file1.csv']
Second call: ['script', 'file1.csv', '--output-format', 'json']
Third call: ['script', 'file1.csv', '--output-format', 'json', '--output-format', 'json']
This illustrates how the argument list grows due to mutation.
Possible Solution
Possible Solution
Instead of mutating the original argument list, construct a new list for the JSON summary call.
Example fix:
with open(os.path.join(summary_dir, 'summary.txt'), 'wb') as f:
subprocess.check_call(summarize_args, stdout=f)
with open(os.path.join(summary_dir, 'summary.json'), 'wb') as f:
json_args = summarize_args + ['--output-format', 'json']
subprocess.check_call(json_args, stdout=f)This preserves the original argument list and avoids side effects.
Additional Information/Context
Additional Information / Context
This issue exists in internal benchmarking utilities used under scripts/performance. While these scripts are primarily developer tooling, ensuring deterministic command construction helps prevent subtle errors in automated benchmarking workflows or CI environments.
Avoiding mutation of shared argument lists is also consistent with common CLI invocation best practices.
CLI version used
Not applicable, issue located in repository tooling (scripts/performance).
Environment details (OS name and version, etc.)
OS: Linux / macOS (reproducible on any OS) Python: 3.x Repository: aws/aws-cli Path: scripts/performance/benchmark_utils.py