Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
a001659
Feature(dftu-pw): add DeltaSpin strategy input parameters and relax D…
dyzheng May 1, 2026
7f22e9b
Feature(dftu-pw): add DFT+U occupation matrix mixing in Charge_Mixing
dyzheng May 1, 2026
8739c88
Feature(dftu-pw): encapsulate Plus_U accessors and rewrite cal_occ_pw…
dyzheng May 1, 2026
916fb90
Feature(deltaspin-pw): add npol support, lambda strategies, PW subspa…
dyzheng May 1, 2026
285498a
Fix(dftu-pw): add npol=1 code paths for PW force/stress/onsite kernels
dyzheng May 1, 2026
4ab80fc
Feature(dftu-pw): extend onsite projector and PW DFT+U/DeltaSpin for …
dyzheng May 1, 2026
e6dc14d
Feature(dftu-pw): wire LCAO operators and ESolver for DFT+U/DeltaSpin…
dyzheng May 1, 2026
8a07ea6
Refactor: vnl_pw simplification, remove TD_MovingGauge, minor module …
dyzheng May 1, 2026
d325791
Test(dftu-pw): add 17_DS_DFTU integration test suite
dyzheng May 1, 2026
82bbb59
Fix(tests): correct CASES_CPU.txt directory names and regenerate resu…
dyzheng May 1, 2026
73666a0
Fix: remove stale use_paw parameter from HSolverPW/DiagoDavid calls
dyzheng May 1, 2026
9b72452
Docs: add DeltaSpin guide, sc_lambda_strategy/sc_direction_only param…
dyzheng May 1, 2026
e86f875
Fix CI build: eliminate RI_2D_Comm.hpp redefinition errors, complete …
dyzheng May 2, 2026
19fc9be
Fix(tests): make 50/51 FeO atom-order tests use identical atomic posi…
dyzheng May 2, 2026
e6b3628
Fix(test): correct MODULE_LCAO_operator_dftu_test CMakeLists and add …
dyzheng May 2, 2026
5569061
Fix(test): implement proper stubs for get_locale_flat and set_locale_…
dyzheng May 2, 2026
5837a65
Fix(test): correct stress parsing in catch_properties.sh
dyzheng May 3, 2026
d644491
Fix(PW): restore correct force/stress formula for npol=2
dyzheng May 3, 2026
eaad950
Fix(PW): correct all nonlocal force/stress formulas for nspin=4/SOC
dyzheng May 3, 2026
f7ce7d0
Fix(PW): revert nonlocal force/stress formula to match develop
dyzheng May 3, 2026
37dcdf9
Fix(test): correct stress extraction in catch_properties.sh
dyzheng May 5, 2026
3ef7ff5
Fix(test): redesign 50/51 FeO tests for fast convergence
dyzheng May 5, 2026
645aae3
Merge branch 'feat/dftu-pw-port-v2' of github.com:dyzheng/abacus-deve…
dyzheng May 5, 2026
b566110
Fix(test): correct column-major indexing in DeltaHcc_GemmContribution…
dyzheng May 5, 2026
6a18e27
Merge develop into feat/dftu-pw-port-v2
dyzheng May 5, 2026
d6cf86f
Merge develop into feat/dftu-pw-port-v2
dyzheng May 5, 2026
25d687d
Fix(test): improve DeltaSpin LCAO tests convergence
dyzheng May 5, 2026
42517c2
Merge remote-tracking branch 'zdy/feat/dftu-pw-port-v2' into feat/dft…
dyzheng May 5, 2026
41332ba
Fix(build): restore develop's DeepKS files to fix LIBRI compilation
dyzheng May 5, 2026
6abc8b6
Fix(test): sync pw_basis_k.cpp warning message with develop
dyzheng May 5, 2026
437bc66
Fix(GPU): sync CUDA kernels force/stress formula with develop
dyzheng May 5, 2026
2d90a6e
Fix(PW): sync CPU kernels force/stress/onsite formulas with develop
dyzheng May 5, 2026
ad84b16
Fix(CUDA): remove erroneous npol parameters from force/stress operato…
dyzheng May 5, 2026
1f49ed5
Fix(PW): restore npol parameters and correct formula order for DFT+U/…
dyzheng May 6, 2026
7c09c7e
Fix(CUDA): use vkbnc instead of vkb.nc for GPU path leading dimension
dyzheng May 6, 2026
c56b8bb
Fix(SDFT-GPU): restore lazy-allocation guard for vkb in getgradq_vnl
dyzheng May 6, 2026
d0193eb
Fix(force/stress): correct nonlocal formula order for SOC/DFT+U/Delta…
dyzheng May 6, 2026
47e17d5
Docs: add force/stress formula error to debug notes
dyzheng May 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
223 changes: 223 additions & 0 deletions DEBUG_NOTES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
# Debug Notes for ABACUS Developers

## Issue 1: SDFT GPU ComplexMatrix Crash

During CUDA build testing of the `feat/dftu-pw-port-v2` branch, `16_SDFT_GPU/001_PW_KG_100_GPU` test crashed with ComplexMatrix assertion failure:

```
abacus_basic_gpu: source/source_base/complexmatrix.h:38:
std::complex<double>& ModuleBase::ComplexMatrix::operator()(int, int):
Assertion `ir<nr' failed.
```

## Root Cause Analysis

### Timeline of the Bug

| Commit | Branch | Description |
|--------|--------|-------------|
| `4b847b642` | develop | Added lazy-allocation guard for GPU Velocity path |
| `8a07ea65a` | feat/dftu-pw-port-v2 | **Erroneously removed the guard** |
| Current | feat/dftu-pw-port-v2 | Fixed by restoring the guard |

### Detailed Explanation

1. **Commit `4b847b642` (develop branch)** introduced GPU memory optimization:
- GPU path skips `vkb.create()` in `init()` to save CPU memory (~3.2 GB for large systems)
- Added `vkbnc` member to store column dimension independently
- Added lazy-allocation guard in `getgradq_vnl()`:

```cpp
// GPU path skips vkb allocation in init(); allocate now if needed
if (this->vkb.nc == 0 && this->nkb > 0 && this->vkbnc > 0) {
this->vkb.create(this->nkb, this->vkbnc);
}
```

2. **Commit `8a07ea65a` (PR branch)** removed this guard:
- Commit message claimed: "Remove vkbnc member from pseudopot_cell_vnl; always allocate vkb ComplexMatrix (simplifies GPU path)"
- But the actual code in `vnl_pw.cpp:init()` still had `!use_gpu_` condition preventing allocation on GPU path
- This inconsistency caused `getgradq_vnl()` to access unallocated `vkb` when called by `Velocity::init()` for conductivity calculation

### Code Flow

```
Velocity::init(ik)
→ ppcell->getgradq_vnl(ucell, ik)
→ this->vkb(jkb, 0) // CRASH: vkb.nc == 0 on GPU path
```

## Lessons Learned

### 1. Understand Code Purpose Before Removing

When deleting code (especially guards or conditional checks):

- **Always understand why it exists** - Check commit history, commit message, and related discussions
- **Verify assumptions** - The commit message of `8a07ea65a` claimed "always allocate vkb", but actual code contradicted this
- **Test affected paths** - GPU Velocity path (`op_pw_vel.cpp`) uses `getgradq_vnl()` which needs `vkb`

### 2. GPU/CPU Path Differences Require Special Care

Key patterns in ABACUS GPU code:

| Pattern | CPU Path | GPU Path |
|---------|----------|----------|
| vkb allocation | `vkb.create(nkb, npwx)` | Skipped (uses `c_vkb`/`z_vkb` buffers) |
| Leading dimension | `vkb.nc` | `vkbnc` member |
| Data access | `vkb.c` or `vkb(i,j)` | `get_vkb_data<FPTYPE>()` |

**Recommendation**: Before modifying GPU-related code:
1. Check `use_gpu_` flag usage
2. Verify memory allocation paths for both CPU and GPU
3. Ensure lazy-allocation guards are preserved if they exist

### 3. PR Branch Sync with Develop

When merging develop into a PR branch:

- **Check for conflicts carefully** - A "remove" in PR may conflict with an "add" in develop
- **Review commit messages** - If develop commit claims to fix something, ensure the fix survives the merge
- **Use `git diff` to verify**:

```bash
# Check if a specific develop commit's changes are present
git diff <develop_commit>..HEAD -- <file>
```

### 4. Commit Message Accuracy

**Problem**: `8a07ea65a` commit message was misleading:
- Claimed: "always allocate vkb ComplexMatrix"
- Reality: Removed guard, but allocation condition still existed

**Recommendation**: Commit message should accurately describe:
- What is being removed AND why
- Impact on different code paths (CPU vs GPU)
- Any assumptions being changed

### 5. Testing Coverage

**Gap**: The PR branch's testing didn cover GPU Velocity path (`getgradq_vnl`) because:
- SDFT conductivity tests require specific INPUT settings
- CUDA build has single-process limitation

**Recommendation**:
- Ensure GPU-specific code paths are tested
- Document which tests cover GPU paths in CI

## Recommendations for Future PRs

### 1. Before Removing Code Added by develop

```bash
# Find who added the code
git log -p --reverse -- <file> | grep -B10 "code pattern"

# Check if later commits modified it
git log --oneline <adding_commit>..HEAD -- <file>
```

### 2. GPU Path Verification Checklist

When modifying `source_pw/module_pwdft/vnl_pw*` files:

- [ ] Check `use_gpu_` flag conditions
- [ ] Verify `vkb` allocation/deallocation consistency
- [ ] Ensure `vkbnc` is used for leading dimension on GPU
- [ ] Check `get_vkb_data()` usage instead of `vkb.c`
- [ ] Test SDFT conductivity path (Velocity operator)

### 3. Merge Strategy

When merging develop into feature branch:

```bash
# After merge, check key files
git diff <pre_merge_commit>..<post_merge_commit> -- \
source/source_pw/module_pwdft/vnl_pw.cpp \
source/source_pw/module_pwdft/vnl_pw_grad.cpp \
source/source_pw/module_pwdft/op_pw_nl.cpp \
source/source_pw/module_pwdft/hamilt_pw.cpp

# Verify GPU-specific guards exist
grep -n "GPU path\|use_gpu_\|vkbnc\|lazy" <file>
```

## Related Commits

- `4b847b642` - [Fix] Eliminate unnecessary vkb CPU allocation on GPU path (#7296)
- `8a07ea65a` - Refactor: vnl_pw simplification, remove TD_MovingGauge, minor module updates
- `c56b8bb70` - Fix(SDFT-GPU): restore lazy-allocation guard for vkb in getgradq_vnl (this fix)

## Files Involved

| File | Purpose |
|------|---------|
| `vnl_pw.cpp` | `init()` - vkb allocation control |
| `vnl_pw_grad.cpp` | `getgradq_vnl()` - velocity gradient calculation |
| `vnl_pw.h` | `vkbnc` member definition |
| `op_pw_vel.cpp` | Velocity operator - calls `getgradq_vnl()` |
| `op_pw_nl.cpp` | Nonlocal operator - uses `vkbnc` for leading dimension |
| `hamilt_pw.cpp` | Hamiltonian - uses `vkbnc` for leading dimension |

---

## Issue 2: Force/Stress Formula Order Error

### Problem

Tests `035_PW_15_SO` and `099_PW_DJ_SO` failed with force/stress deviations:
```
Warning: totalforceref cal=17.146449 ref=17.965510 deviation=0.819061
Warning: totalstressref cal=104846.48 ref=100582.60 deviation=4263.88
```

### Root Cause

Commit `1f49ed516` introduced wrong formula order when attempting to "restore npol parameters":

**Wrong formula** (introduced by commit `1f49ed516`):
```cpp
local_force[ipol] -= fac * (ps0 * dbb0 + ps1 * dbb2 + ps2 * dbb1 + ps3 * dbb3).real();
```

**Correct formula** (from develop branch):
```cpp
local_force[ipol] -= fac * (ps0 * dbb0 + ps1 * dbb1 + ps2 * dbb2 + ps3 * dbb3).real();
```

The commit message claimed: "Fix formula order: dbb1+dbb2 instead of dbb2+dbb1" but actually **changed the correct formula to wrong**!

### Affected Locations

| File | Line | Purpose |
|------|------|---------|
| `force_op.cpp` | 257 | Nonlocal SOC force (deeq_nc) |
| `force_op.cpp` | 355 | DFT+U/DeltaSpin force (vu) |
| `stress_op.cpp` | 235 | Nonlocal SOC stress |
| `stress_op.cpp` | 321 | DFT+U/DeltaSpin stress |
| `cuda/force_op.cu` | 273, 378 | GPU versions |
| `cuda/stress_op.cu` | 364, 980 | GPU versions |
| `rocm/force_op.hip.cu` | 259, 364 | ROCm versions |
| `rocm/stress_op.hip.cu` | 325, 969 | ROCm versions |

### Timeline

| Commit | Description |
|--------|-------------|
| `eaad9504a` | Fixed formula to ps1*dbb1 + ps2*dbb2 (correct) |
| `2d90a6eec` | Synced CPU kernels with develop (formula still correct) |
| `1f49ed516` | **Changed formula to wrong order** (ps1*dbb2 + ps2*dbb1) |
| `d0193ebbb` | Fixed formula back to correct order |

### Lesson Learned

When a commit message says "fix formula", verify the actual code change matches the description:
- `1f49ed516` claimed to "fix formula order" but actually **introduced** the bug
- The correct formula was already present in previous commits

### Important Note

Test `099_PW_DJ_SO` is in `CASES_CPU.txt` only, not `CASES_GPU.txt`.
GPU version results may differ from CPU reference values due to numerical precision.
24 changes: 22 additions & 2 deletions docs/advanced/input_files/input-main.md
Original file line number Diff line number Diff line change
Expand Up @@ -394,6 +394,8 @@
- [sccut](#sccut)
- [sc\_drop\_thr](#sc_drop_thr)
- [sc\_scf\_thr](#sc_scf_thr)
- [sc\_direction\_only](#sc_direction_only)
- [sc\_lambda\_strategy](#sc_lambda_strategy)
- [vdW correction](#vdw-correction)
- [vdw\_method](#vdw_method)
- [vdw\_s6](#vdw_s6)
Expand Down Expand Up @@ -3481,8 +3483,8 @@

- **Type**: Integer
- **Description**: Determines whether to calculate the plus U correction, which is especially important for correlated electrons.
- 1: Calculate plus U correction with radius-adjustable localized projections (with parameter onsite_radius).
- 2: Calculate plus U correction using first zeta of NAOs as projections (this is old method for testing).
- 1: Calculate plus U correction with radius-adjustable localized projections (with parameter onsite_radius). Supported for both PW and LCAO basis sets.
- 2: Calculate plus U correction using first zeta of NAOs as projections (this is old method for testing). Only available for LCAO basis.
- 0: Do not calculate plus U correction.
- **Default**: 0

Expand Down Expand Up @@ -3629,6 +3631,24 @@
- **Description**: Density error threshold for inner loop of spin-constrained SCF
- **Default**: 1.0e-4

### sc_direction_only

- **Type**: Boolean
- **Availability**: *sc_mag_switch is true*
- **Description**: When true, only the direction of the magnetic moment is constrained to the target direction, while the magnitude is allowed to vary freely. This is useful for studying magnetic anisotropy or when the magnitude of the moment is determined by the electronic structure rather than an external constraint. When false (default), both the direction and magnitude of the magnetic moment are constrained to the target values.
- **Default**: False

### sc_lambda_strategy

- **Type**: String
- **Availability**: *sc_mag_switch is true*
- **Description**: Lambda update strategy for spin-constrained DFT. Available options are:
- bfgs: BFGS quasi-Newton method (default, robust and well-tested)
- linear_response: linear response method (Scheme B)
- augmented_lagrangian: augmented Lagrangian method (Scheme C)
- hybrid_delayed: hybrid delayed update (Scheme D)
- **Default**: bfgs

[back to top](#full-list-of-input-keywords)

## vdW correction
Expand Down
2 changes: 1 addition & 1 deletion docs/advanced/scf/construct_H.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,6 @@ Here, we use a simple [example calculation](https://github.com/deepmodeling/abac

Conventional functionals, e.g., L(S)DA and GGAs, encounter failures in strongly correlated systems, usually characterized by partially filled *d*/*f* shells. These include transition metals (TM) and their oxides, rare-earth compounds, and actinides, to name a few, where L(S)DA/GGAs typically yield quantitatively or even qualitatively wrong results. To address this failure, an efficient and successful method named DFT+*U*, which inherits the efficiency of L(S)DA/GGA but gains the strength of the Hubbard model in describing the physics of strongly correlatedsystems, has been developed.

Now the DFT+*U* method is accessible in ABACUS. The details of the DFT+*U* method could be found in this [paper](https://doi.org/10.1063/5.0090122). It should be noted that the DFT+*U* works only within the NAO scheme, which means that the value of the keyword `basis_type` must be lcao when DFT+*U* is called. To turn on DFT+*U*, users need to set the value of the `dft_plus_u` keyword in the `INPUT` file to be 1. All relevant parmeters used in DFT+*U* calculations are listed in the [DFT+*U* correction](../input_files/input-main.md#dftu-correction) part of the [list of keywords](../input_files/input-main.md).
Now the DFT+*U* method is accessible in ABACUS. The details of the DFT+*U* method could be found in this [paper](https://doi.org/10.1063/5.0090122). DFT+*U* is supported for both LCAO (`basis_type = lcao`) and plane-wave (`basis_type = pw`) basis sets. For the PW basis, `dft_plus_u = 1` (radius-adjustable localized projections) is supported with `nspin = 1`, `2`, or `4`. For the LCAO basis, both `dft_plus_u = 1` and `dft_plus_u = 2` are available. To turn on DFT+*U*, users need to set the value of the `dft_plus_u` keyword in the `INPUT` file to be 1. All relevant parameters used in DFT+*U* calculations are listed in the [DFT+*U* correction](../input_files/input-main.md#dftu-correction) part of the [list of keywords](../input_files/input-main.md).

Examples of DFT+*U* calculations are provided in this [directory](https://github.com/deepmodeling/abacus-develop/tree/develop/examples/dft_plus_u).
Loading
Loading