Average multi microbenchmarks results#5215
Conversation
…ks when creating suites
There was a problem hiding this comment.
Pull request overview
This PR updates the GC microbenchmark infrastructure to support aggregating (averaging) results across multiple microbenchmark runs/iterations, while also renaming/refactoring parts of the analysis/presentation pipeline and introducing an outlier-removal helper.
Changes:
- Add configurable microbenchmark iteration count (
iterations) and wire it into suite creation and execution. - Replace the previous single-result comparison flow with a new per-benchmark aggregation/comparison pipeline (
MicrobenchmarkResultComparison,GCTraceMetrics,GCTraceMetricComparisonResult). - Refactor output generation to primarily emit JSON (markdown generation currently disabled).
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure/Commands/RunCommand/CreateSuiteCommand.cs | Reads configured iteration count and applies it to microbenchmark suite environment. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure/Commands/RunCommand/BaseSuite/MicrobenchmarksToRun.txt | Updates baseline suite benchmark list. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure/Commands/RunCommand/BaseSuite/Microbenchmarks.yaml | Renames environment iteration setting to iterations. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure/Commands/Microbenchmark/MicrobenchmarkCommand.cs | Runs microbenchmarks for iterations and switches to new aggregation/comparison logic before presenting results. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure/Commands/Microbenchmark/MicrobenchmarkAnalyzeCommand.cs | Updates analysis-only command to use the new aggregation/comparison logic. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Presentation/Microbenchmarks/Presentation.cs | Changes presentation API to accept precomputed grouped results; markdown output path currently disabled. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Presentation/Microbenchmarks/Markdown.cs | Markdown generation code is commented out. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Presentation/Microbenchmarks/Json/JsonOutput.cs | Removes unused placeholder type. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Presentation/Microbenchmarks/Json.cs | Moves JSON generator to Microbenchmarks presentation namespace and updates signature for grouped results. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Configurations/Microbenchmarks.Configuration.cs | Renames iteration to iterations in microbenchmark environment configuration. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Configurations/InputConfiguration.cs | Adds iterations map to input configuration. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/Microbenchmarks/MicrobenchmarkResultsAnalyzer.cs | Removes old analyzer/comparison pipeline. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/Microbenchmarks/MicrobenchmarkResultComparison.cs | Adds new JSON/trace mapping, per-benchmark analysis, and aggregation/grouping logic. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/Microbenchmarks/MicrobenchmarkResult.cs | Introduces new MicrobenchmarkResult model (namespace currently mismatched vs usage). |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/Microbenchmarks/MicrobenchmarkComparisonResult.cs | Updates comparison to support averaged values/outlier removal and new trace-metric comparisons. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/GCTraceMetrics.cs | Adds trace-derived metric extraction (includes reflection/stat bugs). |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/GCTraceMetricComparisonResult.cs | Adds averaged comparison for trace metrics (baseline vs comparand). |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/GCTraceMetricComparison.cs | Adds helper wrapper for metric comparison construction. |
| src/benchmarks/gc/GC.Infrastructure/GC.Infrastructure.Core/Analysis/BdnJsonResult.cs | Refactors BDN JSON model types; renames top-level to BdnJsonResult. |
| src/benchmarks/gc/GC.Infrastructure/GC.Analysis.API/Statistics.cs | Adds RemoveOutliers helper (IQR method). |
| src/benchmarks/gc/GC.Infrastructure/Configurations/Run.yaml | Adds iteration configuration block (currently mismatched with new iterations input model). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (format == "markdown") | ||
| { | ||
| Markdown.GenerateTable(configuration, comparisonResults, executionDetails, Path.Combine(configuration.Output.Path, "Results.md")); | ||
| //Markdown.GenerateTable(configuration, comparisonResultsGroupedByName, executionDetails, Path.Combine(configuration.Output.Path, "Results.md")); |
| PauseDurationMSec_MeanWhereIsEphemeral = | ||
| GoodLinq.Average(GoodLinq.Where(processData.GCs, (gc => gc.Generation == 1 || gc.Generation == 0)), (gc => gc.PauseDurationMSec)); | ||
| PauseDurationSeconds_SumWhereIsGen1 = | ||
| GoodLinq.Sum(GoodLinq.Where(processData.GCs, (gc => gc.Generation == 1)), (gc => gc.PauseDurationMSec)); |
| IReadOnlyList<MicrobenchmarkComparisonResults> comparisonResults = MicrobenchmarkResultsAnalyzer.GetComparisons(configuration); | ||
| Presentation.Present(configuration, new()); // Execution details aren't available for the analysis-only mode. | ||
|
|
||
| Run run = configuration.Runs.Values.FirstOrDefault(); |
|
|
||
| Presentation.Present(configuration, comparisonResultsGroupedName, new()); // Execution details aren't available for the analysis-only mode. | ||
| Directory.SetCurrentDirectory(currentDirectory); | ||
| AnsiConsole.Markup($"[bold green] ({DateTime.Now}) Wrote Microbechmark Results to: {Markup.Escape(Path.Combine(configuration.Output.Path, "Results.md"))} [/]"); |
| "System.Tests.Perf_GC<Byte>.NewOperator_Array(length: 10000)" | | ||
| "System.Tests.Perf_GC<Char>.NewOperator_Array(length: 1000)" | | ||
| "System.Tests.Perf_GC<Char>.NewOperator_Array(length: 10000)" | | ||
| "System.IO.Tests.Perf_File.ReadAllBytesAsync(size: 104857600)" | |
There was a problem hiding this comment.
The result of benchmark will be written in < run name >/< datetime string >/< Namespace >.-report-full.json while the trace file is collected in < run name >/< full name >_< idx >.etl.zip. That means dup benchmark leading to mismatch between json and trace.
|
|
||
| namespace GC.Infrastructure.Core.Analysis | ||
| { | ||
| public sealed class GCTraceMetrics |
There was a problem hiding this comment.
Rename ResultItem to GCTraceMetrics and move it GC.Infrastructure.Core.Analysis namespace since it's used by both gcperfsim and microbenchmarks.
|
|
||
| namespace GC.Infrastructure.Core.Analysis | ||
| { | ||
| public sealed class GCTraceMetricComparisonResult |
There was a problem hiding this comment.
Rename ComparisonResult to GCTraceMetricComparisonResult and move it to GC.Infrastructure.Core.Analysis.
| { | ||
| public sealed class MicrobenchmarkResult |
There was a problem hiding this comment.
Since it's a data transfer object, "BdnJsonResult" is more explicit.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
| if (format == "markdown") | ||
| { | ||
| Markdown.GenerateTable(configuration, comparisonResults, executionDetails, Path.Combine(configuration.Output.Path, "Results.md")); | ||
| //Markdown.GenerateTable(configuration, comparisonResultsGroupedByName, executionDetails, Path.Combine(configuration.Output.Path, "Results.md")); |
| IReadOnlyList<MicrobenchmarkComparisonResults> comparisonResults = MicrobenchmarkResultsAnalyzer.GetComparisons(configuration); | ||
| Presentation.Present(configuration, new()); // Execution details aren't available for the analysis-only mode. | ||
|
|
||
| Run run = configuration.Runs.Values.FirstOrDefault(); |
| string[] jsonFiles = Directory.GetFiles(outputPathForRun, "*full.json", SearchOption.AllDirectories); | ||
|
|
||
| Parallel.ForEach(jsonFiles, (jsonFile) => | ||
| { | ||
| BdnJsonResult results = JsonConvert.DeserializeObject<BdnJsonResult>(File.ReadAllText(jsonFile)); | ||
| string fullName = results.Benchmarks.FirstOrDefault()?.FullName; | ||
| benchmarkFullNameJsonMap[fullName] = benchmarkFullNameJsonMap.GetValueOrDefault(fullName, new()); | ||
| benchmarkFullNameJsonMap[fullName].Add(jsonFile); | ||
| }); |
| string[] sortedJsonFiles = jsonTraceMap.Keys | ||
| .OrderBy(jsonFile => Path.GetFileName(Path.GetDirectoryName(jsonFile))) | ||
| .ToArray(); | ||
|
|
||
| string traceFileNameTemplate = _benchmarkNameToTraceFilePatternMap[benchmarkFullName]; | ||
|
|
||
| string[] sortedTraceFiles = Enumerable.Where(Directory.GetFiles(outputPathForRun, "*.etl.zip", SearchOption.TopDirectoryOnly), traceFile => | ||
| Path.GetFileName(traceFile).ToLower().Contains(traceFileNameTemplate.ToLower())) | ||
| .OrderBy(traceFile => traceFile) |
| // If property isn't found on the GCTraceMetrics, look in GCStats. | ||
| // TODO: Add the case where we look into the map. | ||
| else | ||
| { | ||
| pInfo = typeof(GCStats).GetProperty(metricName, BindingFlags.Instance | BindingFlags.Public); | ||
| if (pInfo == null) | ||
| { | ||
| FieldInfo fieldInfo = typeof(GCStats).GetField(metricName, BindingFlags.Instance | BindingFlags.Public); | ||
| if (fieldInfo == null) | ||
| { | ||
| // Out of luck! | ||
| OriginalBaselineMetricCollection = Array.Empty<double>(); | ||
| OriginalComparandMetricCollection = Array.Empty<double>(); | ||
| OutliersFreeBaselineMetricCollection = Array.Empty<double>(); | ||
| OutliersFreeComparandMetricCollection = Array.Empty<double>(); | ||
| AveragedBaselineMetric = double.NaN; | ||
| AveragedComparandMetric = double.NaN; | ||
| return; | ||
| } | ||
|
|
||
| else | ||
| { | ||
| OriginalBaselineMetricCollection = GoodLinq.Select(baselines, baseline => (double)fieldInfo.GetValue(baseline)); | ||
| OriginalComparandMetricCollection = GoodLinq.Select(comparands, comparand => (double)fieldInfo.GetValue(comparand)); | ||
| } | ||
| } | ||
|
|
||
| else | ||
| { | ||
| OriginalBaselineMetricCollection = GoodLinq.Select(baselines, baseline => (double)pInfo.GetValue(baseline)); | ||
| OriginalComparandMetricCollection = GoodLinq.Select(comparands, comparand => (double)pInfo.GetValue(comparand)); | ||
| } |
| string[] sortedTraceFiles = Enumerable.Where(Directory.GetFiles(outputPathForRun, "*.etl.zip", SearchOption.TopDirectoryOnly), traceFile => | ||
| Path.GetFileName(traceFile).ToLower().Contains(traceFileNameTemplate.ToLower())) | ||
| .OrderBy(traceFile => traceFile) | ||
| .ToArray(); | ||
|
|
||
| if (sortedJsonFiles.Length != sortedTraceFiles.Length) | ||
| { | ||
| throw new InvalidOperationException( | ||
| $"The number of JSON files ({sortedJsonFiles.Length}) does not match the number of trace files ({sortedTraceFiles.Length}) for benchmark: {benchmarkFullName}"); | ||
| } |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
| if (format == "markdown") | ||
| { | ||
| Markdown.GenerateTable(configuration, comparisonResults, executionDetails, Path.Combine(configuration.Output.Path, "Results.md")); | ||
| //Markdown.GenerateTable(configuration, comparisonResultsGroupedByName, executionDetails, Path.Combine(configuration.Output.Path, "Results.md")); |
| string[] jsonFiles = Directory.GetFiles(outputPathForRun, "*full.json", SearchOption.AllDirectories); | ||
|
|
||
| Parallel.ForEach(jsonFiles, (jsonFile) => | ||
| { | ||
| BdnJsonResult results = JsonConvert.DeserializeObject<BdnJsonResult>(File.ReadAllText(jsonFile)); | ||
| string fullName = results.Benchmarks.FirstOrDefault()?.FullName; | ||
| benchmarkFullNameJsonMap[fullName] = benchmarkFullNameJsonMap.GetValueOrDefault(fullName, new()); | ||
| benchmarkFullNameJsonMap[fullName].Add(jsonFile); | ||
| }); |
| runsToResults[run.Value] = runsToResults.GetValueOrDefault(run.Value, new()); | ||
|
|
||
| Parallel.ForEach(jsonTraceMap, jsonTracePair => | ||
| { | ||
| string jsonPath = jsonTracePair.Key; | ||
| string tracePath = jsonTracePair.Value; | ||
|
|
||
| BdnJsonResult results = JsonConvert.DeserializeObject<BdnJsonResult>(File.ReadAllText(jsonPath)); | ||
|
|
||
| foreach (var benchmark in results?.Benchmarks) | ||
| { | ||
| Statistics statistics = benchmark.Statistics; | ||
|
|
||
| MicrobenchmarkResult microbenchmarkResult = new() | ||
| { | ||
| Statistics = statistics, | ||
| Parent = run.Value, | ||
| MicrobenchmarkName = benchmarkFullName, | ||
| }; | ||
|
|
||
| if (!excludeTraces) | ||
| { | ||
| using var analyzer = AnalyzerManager.GetAnalyzer(tracePath); | ||
| List<GCProcessData> allPertinentProcesses = analyzer.GetProcessGCData("dotnet"); | ||
| List<GCProcessData> corerunProcesses = analyzer.GetProcessGCData("corerun"); | ||
| allPertinentProcesses.AddRange(corerunProcesses); | ||
|
|
||
| GCProcessData? benchmarkGCData = null; | ||
| foreach (var process in allPertinentProcesses) | ||
| { | ||
| string commandLine = process.CommandLine.Replace("\"", "").Replace("\\", ""); | ||
| string runCleaned = benchmark.FullName.Replace("\"", "").Replace("\\", ""); | ||
| if (commandLine.Contains(runCleaned) && commandLine.Contains("--benchmarkName")) | ||
| { | ||
| benchmarkGCData = process; | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| if (benchmarkGCData != null) | ||
| { | ||
| int processID = benchmarkGCData.ProcessID; | ||
| microbenchmarkResult.GCData = benchmarkGCData; | ||
| microbenchmarkResult.GCTraceMetrics = new GCTraceMetrics(benchmarkGCData, tracePath, benchmark.FullName); | ||
| /* | ||
| TODO: THIS NEEDS TO BE ADDED BACK. | ||
| if (configuration.Output.cpu_columns != null && configuration.Output.cpu_columns.Count > 0) | ||
| { | ||
| // TODO: Add parameterize. | ||
| benchmark.Value.GCData.Parent.AddCPUAnalysis(yamlPath: @"C:\Users\musharm\source\repos\GC.Analysis.API\GC.Analysis.API\CPUAnalysis\DefaultMethods.yaml", | ||
| symbolLogFile: Path.Combine(configuration.Output.Path, run.Key, Guid.NewGuid() + ".txt"), | ||
| symbolPath: Path.Combine(configuration.Output.Path, run.Key)); | ||
| var d1 = benchmark.Value.GCData.Parent.CPUAnalyzer.GetCPUDataForProcessName("dotnet"); | ||
| d1.AddRange(benchmark.Value.GCData.Parent.CPUAnalyzer.GetCPUDataForProcessName("corerun")); | ||
| benchmark.Value.CPUData = d1.FirstOrDefault(p => p.ProcessID == processID); | ||
| } | ||
| */ | ||
| } | ||
| } | ||
| runsToResults[run.Value].Add(microbenchmarkResult); | ||
| } | ||
| }); |
| string traceFileNameTemplate = _benchmarkNameToTraceFilePatternMap[benchmarkFullName]; | ||
|
|
||
| string[] sortedTraceFiles = Enumerable.Where(Directory.GetFiles(outputPathForRun, "*.etl.zip", SearchOption.TopDirectoryOnly), traceFile => | ||
| Path.GetFileName(traceFile).ToLower().Contains(traceFileNameTemplate.ToLower())) | ||
| .OrderBy(traceFile => traceFile) | ||
| .ToArray(); | ||
|
|
||
| if (sortedJsonFiles.Length != sortedTraceFiles.Length) | ||
| { | ||
| throw new InvalidOperationException( | ||
| $"The number of JSON files ({sortedJsonFiles.Length}) does not match the number of trace files ({sortedTraceFiles.Length}) for benchmark: {benchmarkFullName}"); | ||
| } |
| { | ||
| Statistics = statistics, | ||
| Parent = run.Value, | ||
| MicrobenchmarkName = benchmarkFullName, |
| Run run = configuration.Runs.Values.FirstOrDefault(); | ||
| string outputPathForRun = Path.Combine(configuration.Output.Path, run.Name); |
| PauseDurationMSec_MeanWhereIsEphemeral = | ||
| GoodLinq.Average(GoodLinq.Where(processData.GCs, (gc => gc.Generation == 1 || gc.Generation == 0)), (gc => gc.PauseDurationMSec)); | ||
| PauseDurationSeconds_SumWhereIsGen1 = | ||
| GoodLinq.Sum(GoodLinq.Where(processData.GCs, (gc => gc.Generation == 1)), (gc => gc.PauseDurationMSec)); |
| public double CountIsBlockingGen2 { get; } | ||
| public double PauseDurationSeconds_SumWhereIsGen1 { get; } | ||
| public double PauseDurationMSec_MeanWhereIsEphemeral { get; } | ||
| public double PromotedMB_MeanWhereIsGen1 { get; } |
This PR aims at calculating average value of multiple microbenchmarks results. The work revolves around: