You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Program entry point; parses configuration, dispatches to the appropriate benchmark or analysis mode
Build and tooling
File
Purpose
Makefile
Primary build system; produces the memory_benchmark release binary and the test_runner test binary
.clang-format
Clang-Format style configuration enforced across all C++ sources
User-facing documentation
File
Purpose
README.md
Project overview, quick-start instructions, and feature summary
CAPABILITIES.md
Measurement capability overview and interpretation notes
MANUAL.md
Complete user manual: all CLI flags, modes, output formats, and usage examples
TECHNICAL_SPECIFICATION.md
Internal architecture, data structures, and implementation decisions
CHANGELOG.md
Version history and release notes
CONTRIBUTING.md
Contribution guidelines, coding standards, and pull request process
CODE_OF_CONDUCT.md
Community standards
SECURITY.md
Vulnerability disclosure policy
TLB_ANALYSIS_WHITEPAPER.md
Whitepaper: TLB analysis methodology and Apple Silicon results
LATENCY_WHITEPAPER.md
Whitepaper: cache and memory latency measurement methodology
CORE_TO_CORE_WHITEPAPER.md
Whitepaper: core-to-core cache-line handoff latency methodology and results
PROJECT_STRUCTURE.md
This file
2. src/ — Source code
All production C++ and ARM64 assembly lives under src/. Headers use include paths relative to src/ (e.g., #include "core/config/config.h").
2.1 src/asm/ — ARM64 assembly kernels
Hand-written AArch64 assembly implementing the hot inner loops that must not be rewritten by the compiler. Each .s file corresponds to one access pattern or operation type. The public C-linkage declarations live in asm_functions.h.
File
Operation
asm_functions.h
extern "C" declarations for all assembly functions
Template-based framework for dispatching multi-threaded benchmark work with synchronized start, cache-line-aligned per-thread state, and macOS QoS thread attributes
TLB analysis mode
File
Purpose
tlb_analysis.h / .cpp
Standalone -analyze-tlb mode: sweeps buffer sizes and strides to locate TLB capacity boundaries
tlb_analysis_json.h / .cpp
Serializes TLB analysis results to JSON
tlb_boundary_detector.cpp
Heuristic that identifies TLB miss inflection points in the latency-vs-buffer-size curve
Core-to-core latency mode
File
Purpose
core_to_core_latency.h
Public interface for the -analyze-core2core mode
core_to_core_latency_internal.h
Internal runner interfaces not exposed outside the module
core_to_core_latency_runner.cpp
Measurement loop: coordinates two threads on selected CPU cores, runs the assembly ping-pong hot loop, and collects round-trip latency samples
core_to_core_latency_cli.cpp
CLI argument parsing and entry point for the core-to-core mode
core_to_core_latency_json.h / .cpp
Serializes core-to-core results to JSON
2.3 src/core/ — Core utilities
Platform-independent infrastructure: configuration, memory management, system introspection, and high-resolution timing.
src/core/config/
File
Purpose
config.h
BenchmarkConfig structure; aggregates all run-time settings parsed from the command line
constants.h
Named constants for memory limits, cache size bounds, stride values, buffer sizing factors, and latency access counts
version.h
SOFTVERSION macro (semantic version string, currently "0.55.4")
argument_parser.cpp
Parses argv into a BenchmarkConfig; implements all flag definitions
config_validator.cpp
Validates the parsed configuration; emits errors for out-of-range or conflicting settings
buffer_calculator.cpp
Derives buffer sizes for each cache/memory level from the validated configuration and detected system parameters
src/core/signal/
File
Purpose
signal_handler.h / .cpp
Installs SIGINT/SIGTERM handling and coordinates benchmark interruption between main and worker threads
src/core/memory/
File
Purpose
memory_manager.h / .cpp
Top-level RAII memory manager; allocates and owns benchmark buffers via mmap; supports normal and cache-discouraging (MADV_RANDOM) allocation
buffer_manager.h
Manages the set of named buffers handed to benchmark passes
buffer_allocator.h / .cpp
Low-level mmap/munmap wrapper with alignment support
buffer_initializer.h / .cpp
Initializes buffer contents (sequential fill, random fill, pointer-chase chain construction) before benchmark runs
Formats pattern benchmark results for console output
2.6 src/warmup/ — Pre-benchmark warm-up
Warm-up passes eliminate cold-start effects from page faults, TLB misses, and instruction-cache misses before timing begins.
File
Purpose
warmup.h
Public warm-up API
warmup_internal.h
Internal warm-up helpers not exposed outside the module; provides template functions and shared inline chunk operations (warmup_read_chunk_op, warmup_write_chunk_op, warmup_copy_chunk_op)
basic_warmup.cpp
Simple sequential read/write pass to page in all benchmark buffers
cache_warmup.cpp
Targeted warm-up designed to fill a specific cache level before a cache-level benchmark
latency_warmup.cpp
Warm-up for latency tests: traverses the pointer-chase chain to bring it into the target cache level
pattern_warmup.cpp
Warm-up pass tailored to the configured access pattern
2.7 src/utils/ — Shared utilities
File
Purpose
benchmark.h
Convenience umbrella header that includes all benchmark-related headers
utils.h / .cpp
General-purpose helpers: size formatting, human-readable unit conversion, and string utilities
json_utils.h / .cpp
JSON helper functions shared between the TLB, core-to-core, and standard output serializers
2.8 src/third_party/ — Vendored dependencies
File
Purpose
nlohmann/json.hpp
nlohmann/json single-header library (MIT license); used throughout the JSON output subsystem
3. tests/ — Unit tests
GoogleTest-based unit test suite. All files are picked up automatically by the Makefile. Tests named *Integration* are excluded from make test (unit-only) and must be run explicitly.
Statistical computations: median, percentiles, stddev, min, max
test_timer.cpp
HighResTimerTest
Timer resolution, monotonicity, and elapsed-time accuracy
test_system_info.cpp
SystemInfoTest
sysctlbyname-based system queries on macOS
Total listed GoogleTest cases: 354 across 23 suite headings/instantiations as of 2026-04-26.
Shared test helper headers
Two shared helper headers provide functionality reused across multiple test suites:
File
Purpose
test_config_helpers.h
Provides initialize_system_info(BenchmarkConfig&) and allocate_and_initialize_buffers(BenchmarkConfig&, BenchmarkBuffers&) — used by multiple suites to set up system config and buffer allocation
test_statistics_helpers.h
Provides capture_bw(), capture_lat(), capture_auto_tlb_breakdown() helpers in namespace test_statistics_helpers — used by StatisticsTest to capture statistics output
4. results/ — Benchmark result data
Reference JSON (and legacy CSV/text) output from benchmark runs on specific hardware. Organized by software version subdirectory. These files are used for regression comparison and whitepaper data.
results/
0.53.7/
MacMiniM4_analyzetlb.json — TLB analysis run, Mac Mini M4
MacMiniM4_benchmark.json — Standard benchmark run, Mac Mini M4
MacMiniM4_core2core.json — Core-to-core latency run, Mac Mini M4
MacMiniM4_patterns.json — Pattern benchmark run, Mac Mini M4
0.53.8/
MacMiniM4_analyze-tlb-*.json — TLB analysis variants (chain mode, random-in-box)
MacbookAirM5_analyze-tlb.json — TLB analysis run, MacBook Air M5
MacbookAirM5_benchmark.json — Standard benchmark run, MacBook Air M5
MacbookAirM5_latency.json — Latency run, MacBook Air M5
old/
*.json / *.csv / *.txt — Pre-versioned historical results
5. pictures/ — Documentation images
PNG charts generated from benchmark result data, used in the whitepapers and README.
File
Content
MacMiniM4_memory_hierarchy_v0_53_5.png
Full memory hierarchy bandwidth/latency overview, Mac Mini M4
MacMiniM4_cache_latency_with_TLB.png
Cache latency curve with TLB miss inflection, Mac Mini M4