Skip to content

Add challenge 78: 2D FFT (Medium)#208

Open
claude[bot] wants to merge 1 commit intomainfrom
add-challenge-78-2d-fft
Open

Add challenge 78: 2D FFT (Medium)#208
claude[bot] wants to merge 1 commit intomainfrom
add-challenge-78-2d-fft

Conversation

@claude
Copy link
Contributor

@claude claude bot commented Mar 5, 2026

Summary

  • Adds challenge 78: 2D Fast Fourier Transform (Medium difficulty)
  • Teaches GPU programmers to implement a 2D DFT using row-column decomposition
  • Key concepts: batched 1D FFTs, coalesced vs strided memory access for row/column passes, and shared-memory Cooley-Tukey butterfly kernels

Challenge Details

  • Input: 2D complex signal of shape M × N stored as interleaved real/imaginary float pairs (row-major, M*N*2 total floats)
  • Output: 2D DFT spectrum, same layout
  • Performance test: M = 2,048, N = 2,048

Why this is interesting

The 2D FFT is a great GPU challenge because:

  1. Row-wise FFTs have perfect memory coalescing, but column-wise FFTs access strided memory — forcing solvers to think about a transpose or shared-memory approach
  2. Requires implementing a correct Cooley-Tukey butterfly in shared memory (handling power-of-2 vs non-power-of-2 sizes)
  3. Classic primitive used in image processing, PDE solvers, and convolution-via-FFT

Checklist

  • challenge.html starts with <p>, has <h2> sections, first example matches generate_example_test(), uses LaTeX matrices for 2D data
  • challenge.py inherits ChallengeBase, all 6 methods present, assertions on shape/dtype/device, 10 functional test cases
  • All 6 starter files present and follow format conventions
  • Linting passes (pre-commit run --all-files)
  • Validated with run_challenge.py --action run

🤖 Generated with Claude Code

Introduces the 2D Discrete Fourier Transform challenge, teaching GPU
programmers to implement a row-column decomposition FFT. Key concepts:
batched 1D FFTs, coalesced vs strided memory access patterns for
row/column processing, and shared-memory butterfly kernels.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link
Contributor Author

claude bot commented Mar 6, 2026

Challenge Validation: 2D FFT (Challenge #78)

Reviewed all files against the CLAUDE.md checklist and validated with a working CUDA solution.

Checklist Verification

challenge.html

  • Starts with <p>
  • <h2> sections: Implementation Requirements, Example, Constraints
  • First example (M=2, N=2, impulse signal) matches generate_example_test()
  • Uses LaTeX \begin{bmatrix} consistently for 2D matrix display
  • Performance bullet: M = 2,048, N = 2,048

challenge.py

  • Inherits ChallengeBase correctly
  • All 6 required methods present
  • reference_impl has shape and dtype assertions
  • 10 functional test cases covering edge cases (1×1, 2×2), power-of-2 (16×16, 32×64), non-power-of-2 (3×5, 30×30), and realistic sizes (256×256, 512×512)
  • Performance test (2048×2048 × 2 × float32 ≈ 32MB per tensor) fits 5× in 16GB VRAM

Starter files ✅ (all 6 present)

  • CUDA/Mojo: use are device pointers (medium challenge, no parenthetical)
  • PyTorch/Triton/CuTe: use are tensors on the GPU
  • JAX: uses # signal is a tensor on GPU (grammatically correct for single input) with # return output tensor directly
  • All starters compile/run but produce no correct output

Solution Validation

Wrote a pure CUDA row-column DFT implementation and submitted via run_challenge.py --action run:

Status: success ✓ Test passed

All 10 functional test cases passed and the performance test (2048×2048) completed successfully. Challenge is ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants