Add challenge 74: Layer Normalization (Medium) by claude[bot] · Pull Request #195 · AlphaGPU/leetgpu-challenges

claude · 2026-02-25T09:52:04Z

Summary

Adds Layer Normalization as challenge 74 (Medium difficulty)
Layer norm normalizes each row of an N×C input independently (per-sample, across the feature dimension) — the core operation in every transformer/LLM layer
Distinct from the existing Batch Normalization (challenge 40), which normalizes across the batch dimension per feature; this challenge requires the opposite reduction axis
Validated on NVIDIA Tesla T4: all functional tests pass with the reference CUDA solution

What makes this interesting

Layer normalization forces solvers to think carefully about:

Row-wise reductions — each row is an independent normalization group, requiring a parallel reduce (mean, then variance) within each row
Shared memory — with C up to 4,096, solvers must tile the reduction across threads in a block using shared memory and synchronization barriers
Two-pass algorithm — first compute the mean, then the variance (or use Welford's online algorithm for a single pass)
Work distribution — assign one (or more) blocks per row so independent rows are processed in parallel

Checklist

challenge.html

Starts with <p> (problem description)
Has <h2> sections for: Implementation Requirements, Example, Constraints
First example matches generate_example_test() values
Examples use LaTeX \begin{bmatrix} for 2D matrix data (consistent)
Constraints includes Performance is measured with N = 65,536, C = 512

challenge.py

class Challenge inherits ChallengeBase
__init__ calls super().__init__() with name, atol, rtol, num_gpus, access_tier
reference_impl has assertions on shape, dtype, and device
All 6 methods present
generate_functional_test returns 10 cases covering edge cases, powers-of-2, non-powers-of-2, realistic sizes, zeros, negatives
generate_performance_test (N=65,536, C=512) fits comfortably within 16 GB VRAM (~256 MB total)

Starter files

All 6 files present: .cu, .pytorch.py, .triton.py, .jax.py, .cute.py, .mojo
Exactly 1 parameter description comment per file, no other comments
CUDA/Mojo use "device pointers" (no parenthetical — medium challenge)
Python frameworks use "tensors on the GPU"; JAX has # return output tensor directly
Starters compile/run but do NOT produce correct output

General

Directory follows 74_layer_normalization convention
Linting passes: pre-commit run --all-files

🤖 Generated with Claude Code

Layer normalization is a core building block of transformer architectures (BERT, GPT, LLaMA). Unlike batch normalization, it normalizes across the feature dimension per sample, requiring efficient two-pass reductions (mean then variance) with shared memory — a non-trivial GPU programming challenge. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude bot requested review from ishaan-arya and kunal-mansukhani as code owners February 25, 2026 09:52

claude bot force-pushed the challenge/74-layer-normalization branch from 5b062db to 33b83a3 Compare February 26, 2026 09:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add challenge 74: Layer Normalization (Medium)#195

Add challenge 74: Layer Normalization (Medium)#195
claude[bot] wants to merge 1 commit intomainfrom
challenge/74-layer-normalization

claude bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

claude bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What makes this interesting

Checklist

challenge.html

challenge.py

Starter files

General

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

claude bot commented Feb 25, 2026 •

edited

Loading