Skip to content

Fix: dptest label mismatch#5082

Open
anyangml wants to merge 11 commits intodeepmodeling:masterfrom
anyangml:fix/dptest-label-mismatch
Open

Fix: dptest label mismatch#5082
anyangml wants to merge 11 commits intodeepmodeling:masterfrom
anyangml:fix/dptest-label-mismatch

Conversation

@anyangml
Copy link
Collaborator

@anyangml anyangml commented Dec 4, 2025

Currently, when running dptest on atomic polar. The order of atoms for coord and label are misaligned.
This is a result of type_sel = None for coord, but not None for label.
I tried to set it to None for label as well, but it broke TF compatibility.
So the easiest hack I can think of is to add a check while loading data.

Summary by CodeRabbit

  • Tests
    • Enhanced output handling in test suite to improve per-atom count tracking in selection outputs.

Copilot AI review requested due to automatic review settings December 4, 2025 06:43
@github-actions github-actions bot added the Python label Dec 4, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added the keyword argument output_natoms_for_type_sel=True to two data.add calls in the test_polar and test_dipole functions within the test entry point module. This parameter change modifies per-atom count output handling in type selection-related output generation.

Changes

Cohort / File(s) Summary
Test Data Output Configuration
deepmd/entrypoints/test.py
Added output_natoms_for_type_sel=True keyword argument to data.add calls in both test_polar and test_dipole functions to include per-atom counts in selection output.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Suggested reviewers

  • iProzd
  • njzjz
  • wanghan-iapcm
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix: dptest label mismatch' directly addresses the main issue described in the PR objectives—fixing a label mismatch problem in dptest for atomic polar calculations.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
deepmd/entrypoints/test.py (2)

1137-1137: Consider removing commented code rather than keeping it.

While commenting out the type_sel argument appears intentional (likely to fix the label mismatch), keeping commented code reduces readability. If this argument is no longer needed, remove the comment entirely.

Apply this diff:

-        # type_sel=dp.get_sel_type(),

1278-1278: Consider removing commented code rather than keeping it.

Similar to the change in test_polar, this commented line should be removed entirely if the type_sel argument is no longer needed.

Apply this diff:

-        # type_sel=dp.get_sel_type(),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d5fa3c and 1eb047f.

📒 Files selected for processing (4)
  • deepmd/dpmodel/infer/deep_eval.py (1 hunks)
  • deepmd/entrypoints/test.py (2 hunks)
  • deepmd/jax/infer/deep_eval.py (1 hunks)
  • deepmd/pt/infer/deep_eval.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-09-19T04:25:12.408Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
🧬 Code graph analysis (2)
deepmd/jax/infer/deep_eval.py (4)
deepmd/jax/jax2tf/tfmodel.py (1)
  • model_output_type (238-240)
deepmd/jax/jax2tf/serialization.py (1)
  • model_output_type (285-286)
deepmd/jax/model/hlo.py (1)
  • model_output_type (234-236)
deepmd/dpmodel/model/make_model.py (1)
  • model_output_type (184-193)
deepmd/dpmodel/infer/deep_eval.py (4)
deepmd/dpmodel/model/base_model.py (1)
  • model_output_type (99-100)
deepmd/dpmodel/model/make_model.py (1)
  • model_output_type (184-193)
deepmd/dpmodel/model/spin_model.py (1)
  • model_output_type (271-273)
deepmd/pt/model/model/make_model.py (1)
  • model_output_type (90-101)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: CodeQL analysis (python)
  • GitHub Check: Agent
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Analyze (python)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
🔇 Additional comments (3)
deepmd/pt/infer/deep_eval.py (1)

251-251: LGTM! Broadening polar model recognition to include "polarizability".

The change correctly extends the model type detection to recognize models outputting "polarizability" as polar models alongside the existing "polar" check. This is consistent with the PR objective and aligns with similar changes across all three backend implementations.

deepmd/jax/infer/deep_eval.py (1)

161-161: LGTM! Consistent with other backends.

The change correctly extends polar model recognition to include "polarizability", maintaining consistency with the pt and dpmodel backend implementations.

deepmd/dpmodel/infer/deep_eval.py (1)

140-140: LGTM! Consistent implementation across all backends.

The change correctly extends the polar model recognition to include "polarizability", completing the uniform implementation across dpmodel, jax, and pt backends.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a label mismatch issue in dptest for polarizability models. The core issue was that the model output definition uses "polarizability" as the key, but the model type detection code was only checking for "polar" in the model output type, causing polarizability models to not be recognized correctly.

  • Added "polarizability" in model_output_type check to properly detect polar models across PT, JAX, and dpmodel backends
  • Removed type_sel parameter from polar and dipole test functions as it's not needed for these model types (data is global, not per-type)

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
deepmd/pt/infer/deep_eval.py Added check for "polarizability" in model_output_type to correctly identify polar models
deepmd/jax/infer/deep_eval.py Added check for "polarizability" in model_output_type to correctly identify polar models
deepmd/dpmodel/infer/deep_eval.py Added check for "polarizability" in model_output_type to correctly identify polar models
deepmd/entrypoints/test.py Commented out type_sel parameter in test_polar and test_dipole functions as it's not applicable to these global properties

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1eb047f and 7c82e85.

📒 Files selected for processing (2)
  • deepmd/entrypoints/test.py (2 hunks)
  • deepmd/utils/data.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-09-19T04:25:12.408Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
🧬 Code graph analysis (1)
deepmd/entrypoints/test.py (2)
deepmd/utils/data.py (1)
  • add (137-190)
deepmd/utils/data_system.py (1)
  • add (344-397)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Analyze (python)
🔇 Additional comments (1)
deepmd/utils/data.py (1)

640-672: Subset-only type_sel handling in _load_data looks correct

Gating the atomic+type_sel branch on natoms_sel != natoms confines the reshape/selection logic to true subsets and avoids unnecessarily reprocessing the “all types selected” case. This matches the semantics implied by type_sel and makes the behavior less surprising; the rest of the branch and error handling remain consistent.

@anyangml anyangml requested a review from iProzd December 4, 2025 09:11
@anyangml anyangml requested a review from njzjz December 4, 2025 09:17
@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.98%. Comparing base (bf982ab) to head (a73a9fb).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5082   +/-   ##
=======================================
  Coverage   81.98%   81.98%           
=======================================
  Files         750      750           
  Lines       75189    75190    +1     
  Branches     3616     3615    -1     
=======================================
+ Hits        61647    61648    +1     
+ Misses      12380    12379    -1     
- Partials     1162     1163    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@njzjz-bot
Copy link
Contributor

I see a likely shape mismatch in the current backend switch for atomic dipole/polar testing.

In test_polar() / test_dipole(), predictions are always filtered by sel_mask (so prediction shape is nframes x nsel x dof). But this PR sets output_natoms_for_type_sel=not is_tf; for non-TF backends this can make labels stay/expand to nframes x natoms x dof in _load_data(). That makes the later RMSE compare nsel prediction vs natoms label, which is inconsistent.

Could you make label/prediction shape handling consistent in one direction? For example:

  1. keep output_natoms_for_type_sel=False (selected-only labels), or
  2. if using full-natoms labels, also align prediction/label on the same full-natoms mask before RMSE.

Authored by OpenClaw (model: gpt-5.3-codex)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants