This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
KRONOS (Kohn-Residual Optimized Numerics Over Silicon) is a research-grade, ab initio plane-wave Density Functional Theory (DFT) engine. It computes ground-state total energy, electronic density, Kohn-Sham eigenvalues, and ionic forces for periodic crystalline systems.
- Language: C++20 core, CUDA/HIP for GPU kernels, Python (pybind11) for scripting interface
- License: GPL v3
- Build system: CMake 3.20+
- Target accuracy: < 2 meV/atom Delta test score vs Wien2k all-electron reference
The self-consistent field loop is the central algorithm:
- Parse YAML input + load UPF pseudopotential files
- Build
PlaneWaveBasis: enumerate G-vectors where |k+G|²/2 ≤ ecutwfc; build FFT grids - Detect k-point symmetry via spglib; generate irreducible Brillouin zone k-points
- Initialize electron density from superposition of atomic densities
- SCF iterations:
- Hartree potential: Poisson equation in G-space V_H(G) = 4π n(G) / |G|²
- XC potential: V_xc(r) via libxc on real-space grid
- Apply Hamiltonian H|ψ⟩ per k-point via FFT (the GPU hot path)
- Eigensolver: Davidson (primary) or LOBPCG (fallback)
- Fermi level by bisection; compute occupations
- New density from wavefunctions; check convergence
- Pulay/DIIS density mixing
- Post-convergence: forces, stress tensor, total energy
- Output: JSON summary + HDF5 files
Kinetic: T|ψ⟩_G = |k+G|²/2 · ψ_G [pointwise multiply]
Local: ψ_r = IFFT(ψ_G) [cuFFT/rocFFT]
V·ψ_r = V_eff(r) · ψ_r [pointwise multiply]
V·ψ_G = FFT(V·ψ_r) [cuFFT/rocFFT]
Non-local: proj_i = <β_i|ψ> via GEMM [cuBLAS/rocBLAS]
V_NL|ψ⟩ via GEMM [cuBLAS/rocBLAS]
src/core/— Types (types.hpp), physical constants,Crystalclass, element datasrc/basis/—PlaneWaveBasis(G-vector enumeration),FFTGrid(FFTW3 wrapper)src/io/— YAML input parser, UPF pseudopotential parser, JSON output writersrc/potential/— Hartree, XC (libxc wrapper + built-in LDA fallback), local/nonlocal PP, Ewald, GGA gradients, forcessrc/solver/— Davidson eigensolver, Pulay/DIIS mixing, Fermi level bisection, SCF loop, BFGS geometry optimizersrc/postprocessing/— Band structure calculator, DOS calculatorsrc/hamiltonian/— H|ψ⟩ application (kinetic + local via FFT + nonlocal via GEMM)src/gpu/— GPU abstraction layer (fft.hpp,blas.hpp,memory.hpp); stubs in CPU-only buildssrc/utils/— Scoped timer/profiling registry, structured JSON loggertest/— GoogleTest suite: test_input, test_basis, test_fft, test_upf, test_solvers, test_physics
| Library | Purpose |
|---|---|
| FFTW3 (CPU) / cuFFT / rocFFT (GPU) | Fast Fourier transforms |
| BLAS+LAPACK (CPU) / cuBLAS / rocBLAS (GPU) | Dense linear algebra |
| ELPA | Distributed parallel eigenvalue solver (v2022.11+ for GPU) |
| libxc | Exchange-correlation functionals (v6.0+ required) |
| spglib | Space group symmetry detection, k-point generation |
| HDF5 | Binary output for density, wavefunctions, restart |
| yaml-cpp | YAML input file parsing |
| pybind11 | Python bindings |
| MPI (OpenMPI/MPICH) | Distributed memory parallelism |
# CPU-only build (v0.1 default)
cmake -B build -S .
cmake --build build -j$(nproc)
# Run
./build/kronos examples/si_bulk.yaml
# Run tests
cd build && ctest --output-on-failure
# Run a single test
./build/test_basis --gtest_filter='PlaneWaveBasis.SiBulkBasisSize'
# CUDA build
cmake -B build -S . -DKRONOS_GPU_BACKEND=cuda
cmake --build build -j$(nproc)
# HIP/AMD build
cmake -B build -S . -DKRONOS_GPU_BACKEND=hip -DROCM_PATH=/opt/rocm| Option | Default | Description |
|---|---|---|
KRONOS_GPU_BACKEND |
none |
GPU backend: none, cuda, or hip |
KRONOS_BUILD_TESTS |
ON |
Build GoogleTest test suite |
KRONOS_BUILD_PYTHON |
OFF |
Build pybind11 Python bindings |
FFTW3, BLAS, LAPACK, yaml-cpp. Optional: HDF5, MPI, libxc (built-in LDA fallback if absent).
- Input: YAML file (
kronos.yaml) with crystal structure, calculation parameters, pseudopotential paths, hardware config - Output: JSON summary + HDF5 files for density/wavefunctions
- Unknown YAML keys must raise errors (strict schema, no silent ignoring)
- Always float64/complex128 for wavefunction coefficients — never float32
- ecutrho must be ≥ 4 × ecutwfc (norm-conserving PP) or 12 × ecutwfc (PAW)
- Negative electron density: clamp to 0 with warning; abort if > 1e-6
- Energy oscillation > 1 Ry between consecutive SCF steps: abort with diagnostic
- Max 200 SCF iterations; ecutwfc range 10–500 Ry
- Davidson subspace size: 3 × N_bands
- DIIS mixing history: 8 steps
- HDF5 output: write atomically via temp file + rename (never partial writes)
- Pseudopotential norm-conservation check is mandatory on load
The gpu:: namespace wraps CUDA/HIP calls so physics code never calls vendor APIs directly. This is the abstraction boundary — adding AMD support means only touching src/gpu/ files.
For deterministic GPU results: CUBLAS_WORKSPACE_CONFIG=:4096:8
- SCF step output: energy, |dE|, |dn|, wall time per step
- Structured JSON logs on stderr with fields: timestamp, event, scf_step, wall_s, gpu_mem_mb, mpi_rank
- NVTX ranges on all GPU kernels for Nsight Systems profiling
- MPI timing via PMPI wrapper (Score-P compatible)
Checkpoints written every N SCF steps to HDF5: wavefunctions, density, DIIS history, step counter. On restart, input hash is verified against checkpoint — mismatch warns but allows override.
- Si bulk (diamond, LDA) — energy, forces, stress, band gap
- Cu FCC (metal, smearing, PBE) — Fermi level, DOS
- H₂O molecule (Gamma-only, GGA) — forces, geometry optimization
- Fe BCC (spin-polarized LSDA) — magnetic moment ~2.2 μ_B
- MgO rocksalt (ionic, PAW) — PAW one-center energy
- Graphene (2D, vacuum padding) — band structure Dirac cone
- v0.1 (current): LDA/GGA NCPP with k-point sampling, CPU only
- v0.2: MPI parallelization
- v0.5: GPU offloading (CUDA/HIP)
- v0.8: PAW support
- v1.0: Full production release with Python package, CI/CD, docs
- v2.0: Hybrid functionals (HSE06/PBE0), non-collinear magnetism
- SCF non-convergence: write partial output with
converged: false, suggest fixes - GPU OOM: auto-fallback to CPU with warning
- UPF parse failure: hard abort with file path, line number, and download URL
- Davidson divergence (residual > 1e3): auto-switch to LOBPCG for that k-point
- Input validation failures: hard abort with field name and allowed values