[PERF CHECK ONLY] Add inline to trivial cross-crate accessors#156343
[PERF CHECK ONLY] Add inline to trivial cross-crate accessors#156343traviscross wants to merge 5 commits intorust-lang:mainfrom
inline to trivial cross-crate accessors#156343Conversation
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…s, r=<try> [PERF CHECK ONLY] Add `inline` to trivial cross-crate accessors
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (6000502): comparison URL. Overall result: ❌✅ regressions and improvements - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 1.4%, secondary 0.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary 2.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis perf run didn't have relevant results for this metric. Bootstrap: 498.048s -> 501.772s (0.75%) |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Small accessor functions aren't inlined across crate boundaries in all build configurations. Even with thin LTO, PGO, etc., the backend may be limited in what it's able and willing to inline without these annotations. In detailed profiling, the `inline` attributes being added in this PR seemed to make a difference. Let's add them. Notably, some of these fall within the expansion of the `newtype_index!` macro and will apply to the items it defines. Being in the macro might explain (if later profiling confirms this win) why these weren't noticed earlier. This commit covers the cases where the auto-inline heuristic from Rust PR 116505 is unlikely to work.
Some trivial accessors have a single call, like
`fn def_id(self) -> DefId { self.did() }`. After MIR inlining the
inner call, the auto-inline heuristic from Rust PR 116505 might
already cover these, but the heuristic is conservative. Let's
annotate explicitly and profile.
These accessors have pure field/index accesses -- no calls. The auto-inline heuristic from Rust PR 116505 should already cover these. Let's annotate to confirm that in profiling. Maybe or maybe not we'd want to add these anyway as documentation of intent and a safety net against later changes that would cause the heuristic to fail.
4873dc4 to
596a06f
Compare
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…s, r=<try> [PERF CHECK ONLY] Add `inline` to trivial cross-crate accessors
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…s, r=<try> [PERF CHECK ONLY] Add `inline` to trivial cross-crate accessors
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (c51601b): comparison URL. Overall result: ❌✅ regressions and improvements - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (secondary 1.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 497.93s -> 502.373s (0.89%) |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…s, r=<try> [PERF CHECK ONLY] Add `inline` to trivial cross-crate accessors
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (49c68cb): comparison URL. Overall result: ❌✅ regressions and improvements - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -2.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis perf run didn't have relevant results for this metric. Bootstrap: 497.416s -> 498.988s (0.32%) |
This comment has been minimized.
This comment has been minimized.
8b1a8c1 to
c7e222c
Compare
|
@bors try parent=fb0a5a5a9c892b351f34263d6d84da9dde72871a @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…s, r=<try> [PERF CHECK ONLY] Add `inline` to trivial cross-crate accessors
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (7534ee6): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 1.7%, secondary 2.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary 10.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.0%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 497.93s -> 498.532s (0.12%) |
View all comments
Note
This is a draft for running perf.
Small accessor functions aren't inlined across crate boundaries in all build configurations. Even with thin LTO, PGO, etc., the backend may be limited in what it's able and willing to inline without these annotations.
In detailed profiling, the
inlineattributes being added in this PR seemed to make a difference. Let's add them.Notably, some of these fall within the expansion of the
newtype_index!macro and will apply to the items it defines.r? @traviscross