Skip to content

Speed up CPU demeaning#81

Merged
eloualiche merged 1 commit into
FixedEffects:mainfrom
matthieugomez:speedup-cpu-demeaning
Jun 5, 2026
Merged

Speed up CPU demeaning#81
eloualiche merged 1 commit into
FixedEffects:mainfrom
matthieugomez:speedup-cpu-demeaning

Conversation

@matthieugomez

@matthieugomez matthieugomez commented Jun 5, 2026

Copy link
Copy Markdown
Member

Speed up CPU demeaning with Int32 refs and faster 2-norm

@matthieugomez matthieugomez changed the title Speed up CPU demeaning ~20%: Int32 refs and fast LSMR 2-norm Speed up CPU demeaning with Int32 refs and faster 2-norm Jun 5, 2026
@codecov

codecov Bot commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.85714% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@7f7b0f8). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/utils/lsmr.jl 91.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main      #81   +/-   ##
=======================================
  Coverage        ?   54.46%           
=======================================
  Files           ?       11           
  Lines           ?      672           
  Branches        ?        0           
=======================================
  Hits            ?      366           
  Misses          ?      306           
  Partials        ?        0           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@matthieugomez matthieugomez force-pushed the speedup-cpu-demeaning branch from 4e9166c to dde77f7 Compare June 5, 2026 09:16
@matthieugomez matthieugomez changed the title Speed up CPU demeaning with Int32 refs and faster 2-norm Speed up CPU demeaning Jun 5, 2026
Two CPU-side optimizations on top of FixedEffects#80, with no change to results.

- FixedEffect: store group refs as Int32 when ngroups <= typemax(Int32),
  halving the dominant memory stream read by the scatter/gather kernels on
  every solver iteration. Every backend (CPU/GPU) and solve_coefficients!
  benefits; ngroups beyond Int32 keeps the original integer type.

- lsmr!: replace LinearAlgebra.norm's overflow-safe (~8x slower) path with a
  SIMD sum-of-squares at the four hot norm call sites, accumulated in Float64
  for accuracy. Restricted to concrete CPU Vector{Float32,Float64}; GPU
  arrays and FixedEffectCoefficients keep the generic norm.

Bumps version to 3.3.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@matthieugomez matthieugomez force-pushed the speedup-cpu-demeaning branch from dde77f7 to a25c978 Compare June 5, 2026 12:19
@eloualiche eloualiche merged commit 84b29c4 into FixedEffects:main Jun 5, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants