We mark each build with its line number for use in error messages, but computing the line number requires keeping track of each newline we encounter.
Some experiments:
- if I comment out that logic, parsing was 10% faster
- we also store an Rc on each
build to store the filename (error messages look like foo.ninja:123: some error), but that appeared to be ~no impact on perf -- I imagine it's just a trivial increment of a non-atomic refcount for each build statement
- another experiment I tried was to compute line numbers only when needed: each time we need a line number, scan forwards from the last time we computed a line number and count lines, caching the result. This didn't help.
A trick I learned from LLVM is to instead annotate each build with its file byte offset when parsing, which is faster to gather. Then if we encounter an error we can spend the time to re-parse the file for newlines to find the offset. Parsing is slow but it's like milliseconds slow, it's fine to do in an error message codepath.
Unfortunately doing this would mean we need to either keep around the build file text at runtime, or re-read the file when generating an error message.
If we kept the file text at runtime:
- the text is like tens of megabytes for the biggest build I currently have (LLVM CMake) but I believe the Android build file text might be much larger(?). Maybe it's fine to waste some memory on it, but it feels pretty wasteful...
-there's a couple places where we could avoid allocating a copy of some strings found within the file, though I think those are pretty minor.
If we re-read the file when generating error messages:
- maybe this is fine? I didn't investigate, feels like kind of a rabbit hole
We mark each
buildwith its line number for use in error messages, but computing the line number requires keeping track of each newline we encounter.Some experiments:
buildto store the filename (error messages look likefoo.ninja:123: some error), but that appeared to be ~no impact on perf -- I imagine it's just a trivial increment of a non-atomic refcount for each build statementA trick I learned from LLVM is to instead annotate each
buildwith its file byte offset when parsing, which is faster to gather. Then if we encounter an error we can spend the time to re-parse the file for newlines to find the offset. Parsing is slow but it's like milliseconds slow, it's fine to do in an error message codepath.Unfortunately doing this would mean we need to either keep around the build file text at runtime, or re-read the file when generating an error message.
If we kept the file text at runtime:
-there's a couple places where we could avoid allocating a copy of some strings found within the file, though I think those are pretty minor.
If we re-read the file when generating error messages: