Make str.format more Pythonic#3412
Conversation
`str.format` has a number of surprising behaviours that aren't Pythonic:
* Numerical replacement fields use a different syntax to named
replacement fields (i.e. `${0}`).
* Automatic and manual replacement field names can be used within the
same format string.
* Replacement field names are passed through if the corresponding
keyword argument is undefined.
* Format strings are allowed to contain unbalanced delimiters.
Some of these were introduced in thought-machine#3146, along with some regressions that
quietly broke substitution and delimiter escaping (see thought-machine#3356).
Make Please's `str.format` implementation more Pythonic by aligning it
with [Bazel's](https://bazel.build/rules/lib/core/string#format):
* Permit the use of by-order, by-position or by-name replacement field
referencing, but do not allow them to be combined in the same format
string.
* Parse all occurrences of `{` and `}` in the format string as if they
are delimiters, except for those escaped via `{{` and `}}` - do not
pass them through to the return value, and raise errors if they appear
inappropriately.
* Perform more robust checks on replacement field references.
The new error messages are intentionally reminiscent of those returned
by Python's `str.format`, and are identical in some cases.
The correctness and robustness of the new implementation comes at the
cost of performance - it is hard to compare them fairly against the
degenerate case in `src/parse/asp/builtins_bench_test.go`, but in the
typical case, the new implementation is about 75% slower and performs
four times more allocations:
```
cpu: Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz
BenchmarkStrFormat
BenchmarkStrFormat-12 5117605 257.0 ns/op 80 B/op 2 allocs/op
BenchmarkStrFormatBig
BenchmarkStrFormatBig-12 512914 2474 ns/op 7568 B/op 3 allocs/op
PASS
cpu: Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz
BenchmarkStrFormat
BenchmarkStrFormat-12 3261982 454.1 ns/op 176 B/op 8 allocs/op
BenchmarkStrFormatBig
BenchmarkStrFormatBig-12 358680 9921 ns/op 7712 B/op 12 allocs/op
PASS
```
| self = self[start+2:] | ||
| continue | ||
| } | ||
| // We now know we're about to begin parsing a field name, which means this character must be the "{" delimiter. |
There was a problem hiding this comment.
Is "field name" the right word here (given it may be {} or {1})? Looks like"replacement field" is the terminology used by python
There was a problem hiding this comment.
You're right, this was a terminology mix-up - according to the str.format grammar in the Python docs, "replacement field" is "field name" plus some other stuff that isn't relevant to Please because we don't implement it, and at this point we haven't yet parsed the opening { so what we're actually attempting to match here is a replacement field, not the field name inside it.
| return newPyInt(strings.LastIndex(string(self), string(needle))) | ||
| } | ||
|
|
||
| // strFormat implements the str.format function. It interpolates a format string using the arguments passed to |
There was a problem hiding this comment.
Just to check, is this used both for f-strings and explicit calls to .format()?
There was a problem hiding this comment.
Only str.format - f-strings use a separate data structure and are parsed by parseFString in src/parse/asp/grammar_parse.go.
| buf.WriteString(args[arg].String()) | ||
| arg++ | ||
| } else if val, present := s.locals[key]; present { | ||
| // Extract the replacement field's name, excluding the delimiters. |
There was a problem hiding this comment.
| // Extract the replacement field's name, excluding the delimiters. | |
| // Extract the replacement field's name or index if present, excluding the delimiters. |
There was a problem hiding this comment.
Made the language more precise here.
| // be more of them than there are positional arguments (although there may ultimately be fewer). Internally, the | ||
| // first positional argument is the format string - the positional arguments from the caller's perspective are | ||
| // shifted one to the right in args. | ||
| s.Assert(fieldName == "", "cannot switch from automatic field numbering to manual field specification") |
There was a problem hiding this comment.
I am concerned that this error message will be very unclear to somebody who is unfamiliar with how this is implemented - I think a better phrasing would be something like "cannot mix replacement field specification types in one format string", and possibly include examples of the mismatched replacement fields
There was a problem hiding this comment.
The subtlety is that if fieldName == "" here, the user didn't choose a field specification type - their first replacement field was {}, so they're relying on ASP to implicitly number the replacement fields - and so referring to field specification types in the error message is likely to be misleading. The error message is (intentionally and shamelessly) ripped from Python, so anyone who has used format strings in Python should recognise it - it's even Googleable for those who don't.
| case strFormatByName: | ||
| // With named replacement fields, output the string value of the keyword argument with the given name. | ||
| s.Assert(fieldName != "", "must use named replacement fields with keyword arguments") | ||
| val, exists := s.locals[fieldName] |
There was a problem hiding this comment.
are keyword arguments implemented as locals?
There was a problem hiding this comment.
Yep - this is unchanged from before:
please/src/parse/asp/builtins.go
Line 656 in 915012a
| s, err := parseString(fmt.Sprintf(`ret = "%s".format(%s)`, test.FormatStr, test.Args)) | ||
| if test.Error == "" { | ||
| assert.NotNil(t, s) | ||
| assert.NoError(t, err) |
There was a problem hiding this comment.
nit: I think we should assert.NoError first
There was a problem hiding this comment.
Agreed - and it should be require.NoError, not assert.NoError.
| "Dangling opening delimiter at start of string": { | ||
| FormatStr: `{{} {} {}`, | ||
| Args: `"one", "two", "three"`, | ||
| Error: "single '}' encountered in format string", |
There was a problem hiding this comment.
Can we include the position in this error?
| // Find the corresponding "}" delimiter... | ||
| end := strings.IndexByte(self[start+1:], '}') | ||
| // ...and if there isn't one, this must be a malformed format string. | ||
| s.Assert(end != -1, "single '{' encountered in format string") |
There was a problem hiding this comment.
| s.Assert(end != -1, "single '{' encountered in format string") | |
| s.Assert(end != -1, "unmatched and unescaped '{' encountered in format string") |
| continue | ||
| } | ||
| // We now know we're about to begin parsing a field name, which means this character must be the "{" delimiter. | ||
| s.Assert(self[start] == '{', "single '}' encountered in format string") |
There was a problem hiding this comment.
| s.Assert(self[start] == '{', "single '}' encountered in format string") | |
| s.Assert(self[start] == '{', "unmatched and unescaped '}' encountered in format string") |
| "Numerical field name": { | ||
| FormatStr: `1={one} 2={2} 0={zero}`, | ||
| Args: `zero="a", one="b", two="c"`, | ||
| Error: "unspecified keyword argument '2'", |
There was a problem hiding this comment.
It'd be good to special-case this to get the more meaningful error about mixing format types
str.formathas a number of surprising behaviours that aren't Pythonic:${0}).Some of these were introduced in #3146, along with some regressions that quietly broke substitution and delimiter escaping (see #3356).
Make Please's
str.formatimplementation more Pythonic by aligning it with Bazel's:{and}in the format string as if they are delimiters, except for those escaped via{{and}}- do not pass them through to the return value, and raise errors if they appear inappropriately.The new error messages are intentionally reminiscent of those returned by Python's
str.format, and are identical in some cases.The correctness and robustness of the new implementation comes at the cost of performance - it is hard to compare them fairly against the degenerate case in
src/parse/asp/builtins_bench_test.go, but in the typical case, the new implementation is about 75% slower and performs four times more allocations: