Skip to content

Implement improved optional optarg parsing for short option.#25

Open
dapimentel wants to merge 2 commits intoskeeto:masterfrom
dapimentel:improved-optional-optarg
Open

Implement improved optional optarg parsing for short option.#25
dapimentel wants to merge 2 commits intoskeeto:masterfrom
dapimentel:improved-optional-optarg

Conversation

@dapimentel
Copy link

Given the following simple patch to long.c, the undesirable feature of parsing optional arguments is demonstrated:

diff --git a/examples/long.c b/examples/long.c
index 00993fa..4ba63b9 100644
--- a/examples/long.c
+++ b/examples/long.c
@@ -48,7 +48,12 @@ int main(int argc, char **argv)
         }
     }
 
-    /* Print remaining arguments. */
+    printf("amend : %s\n", amend ? "true" : "false");
+    printf("brief : %s\n", brief ? "true" : "false");
+    printf("color : %s\n", color);
+    printf("delay : %d\n", delay);
+
+    printf("Print remaining arguments:\n");
     while ((arg = optparse_arg(&options)))
         printf("%s\n", arg);

First use case fails to reset the delay value.

%> long.x -d 10 -a bob
amend : true
brief : false
color : white
delay : 1
Print remaining arguments:
10
bob

Second use case resets the delay value.

%> long.x -d10 -a bob
amend : true
brief : false
color : white
delay : 10
Print remaining arguments:
bob

Both use cases should behave the same as the second, regardless of the ridculous definition for the GNU extension:

Two colons mean an option takes an optional arg; if there is text in the current argv-element
(i.e., in the same word as the option name itself, for example, "-oarg"), then it is returned
in optarg, otherwise optarg is set to zero."

This preferred behavior is as simple as activating the included patch.

@skeeto
Copy link
Owner

skeeto commented Feb 8, 2026

Thank you for taking the time and effort to prepare these patches (here and #26), @dapimentel. While I appreciate that the changes are opt-in at compile time, the original behavior is very much deliberate and intended. A few popular argument parsers try to second-guess users' unambiguous intentions, but such behavior is surprising and hazardous. The designers of these parsers are confused. See this discussion on Python argparse. On the plus side, your changes aren't nearly as hazardous as argparse, clap, etc. because the guessing behavior strictly applies to optional option arguments, not to every argument-accepting option.

The hazard comes from passing arguments programmatically, e.g. in a script. For example (long case):

example --delay "$DELAY"

The argument is quoted, so it seems like it ought to be safe to use with an untrusted DELAY variable, right? Nope. Imagine if DELAY=--amend, allowing untrusted/unvalidated inputs to inject extra options. In the presence of a "smart" parser, the only safe way to write this would be:

example --delay="$DELAY"

The hazard is that the first usually works, and so you don't know you've made a mistake. The current (and still the default with your patch) behavior of Optparse will not produce the desired results in the first case regardless of the value of DELAY. The = is mandatory, and getting it wrong usually produces obviously incorrect results.

In case it seems far fetched, here's an example I found within about a minute of looking (not trying to pick on this article, these mistakes are common and widespread):

https://til.simonwillison.net/python/uv-tests

uv uses a "smart" option parser, and so nearly every uv invocation in that script is incorrect because the arguments aren't connected with =. If an option argument happened to look like an option, uv would treat it like one. Yet it usually works out anyway by accident, so it's not obvious the script is broken.

It annoys me when software second-guesses my unambiguous inputs, and I don't want to encourage this behavior in my library. I'm still glad you've opened this PR because it's highlighted this important subtly.

@N-R-K
Copy link
Contributor

N-R-K commented Feb 8, 2026

Also worth noting that if -d accepted negative numbers then intuitively -d -10 should work but it won't work even with this patch.

The GNU definition isn't ridiculous, it's a simple case of (wisely) choosing to avoid ambiguous grammar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants