Skip to content

[Algs] add buf to checksum utility (for debug)#1773

Open
tdavidcl wants to merge 61 commits into
Shamrock-code:mainfrom
tdavidcl:aurora-test5
Open

[Algs] add buf to checksum utility (for debug)#1773
tdavidcl wants to merge 61 commits into
Shamrock-code:mainfrom
tdavidcl:aurora-test5

Conversation

@tdavidcl
Copy link
Copy Markdown
Member

No description provided.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks @tdavidcl for opening this PR!

You can do multiple things directly here:
1 - Comment pre-commit.ci run to run pre-commit checks.
2 - Comment pre-commit.ci autofix to apply fixes.
3 - Add label autofix.ci to fix authorship & pre-commit for every commit made.
4 - Add label light-ci to only trigger a reduced & faster version of the CI (need the full one before merge).
5 - Add label trigger-ci to create an empty commit to trigger the CI.

Once the workflow completes a message will appear displaying informations related to the run.

Also the PR gets automatically reviewed by gemini, you can:
1 - Comment /gemini review to trigger a review
2 - Comment /gemini summary for a summary
3 - Tag it using @gemini-code-assist either in the PR or in review comments on files

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a timestep callback system, MPI timer tracking, and FNV-1a buffer checksumming, alongside profiling entries and OpenMP parallelization in the SerialPatchTree. However, several critical issues and debug leftovers must be addressed: a continue statement in the benchmark script skips the main loop, a logic error in Solver.cpp checks the wrong callback before execution, and a mandatory environment variable for checksumming could cause startup crashes. Additionally, the inclusion of synchronous file I/O and checksumming in the NeighbourCache will severely degrade performance. The PR also contains poorly named stack entries and an excessive increase in unit test iterations that may be unsuitable for standard CI runs. No feedback was filtered out as all comments identified valid bugs or improvement opportunities.

#include <fmt/base.h>
#include <fstream>

std::string checksum_prefix = shambase::get_check_ref(shamcmdopt::getenv_str("CHECKSUM_PREFIX"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Initializing a global variable using shambase::get_check_ref on an environment variable will cause the application to crash during startup if CHECKSUM_PREFIX is not defined. This makes the environment variable mandatory for all users of the SPH model. Consider using shamcmdopt::getenv_str("CHECKSUM_PREFIX").value_or("") and guarding the subsequent debug file operations with a check for an empty string.

step_begin=callback_before_mpi_timer, step_end=callback_after_mpi_timer
)

continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The continue statement here will cause the benchmark loop to be skipped entirely. This appears to be leftover debug code that should be removed to allow the benchmark to run.

Comment thread src/shammodels/sph/src/Solver.cpp Outdated
tstep.end();

for (auto it = timestep_callbacks.rbegin(); it != timestep_callbacks.rend(); ++it) {
if (it->step_begin_callback) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The condition checks step_begin_callback but the following line attempts to call step_end_callback. This will lead to a crash if step_end_callback is not provided while step_begin_callback is, or it will skip the call if only step_end_callback is provided. It should check it->step_end_callback instead.

Suggested change
if (it->step_begin_callback) {
if (it->step_end_callback) {

= checksum_prefix + "/" + fmt::format("patch_{}_debug.txt", patch_id);

{
std::ofstream patch_file(checksum_file_path, std::ios::app);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Performing file I/O and buffer checksums (which involve synchronous device-to-host copies via copy_to_stdvec) inside the neighbor cache construction will severely degrade performance. This debug logic should be removed or explicitly guarded by a runtime flag to avoid penalizing production runs.


Tscal h_tolerance = solver_config.htol_up_coarse_cycle;

NamedStackEntry stack_loc1ddddd{"wait queue"};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The stack entry names like stack_loc1ddddd and stack_loc1ccccc appear to be temporary debug names and should be renamed to something more descriptive for production code. Additionally, if these entries represent duplicated logic, consider refactoring into a helper function or lambda to improve maintainability.

References
  1. Refactor duplicated logic into a helper function or lambda to improve readability and maintainability.

Comment on lines +489 to +491
for (u32 i = 0; i < 1000; i++) {
test.check();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Running the exclusive scan test 1000 times with 32M elements each will significantly increase the duration of the unit test suite. This might be excessive for standard CI runs. Additionally, ensure the scan implementation adheres to the optimization rule regarding accumulator types for non-negative sequences.

References
  1. When performing a scan (e.g., std::exclusive_scan) on a sequence of non-negative numbers, if the total sum is confirmed to fit within a specific integer type, there is no need to use a wider type for the scan's accumulator, as intermediate sums will not overflow.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Workflow report

workflow report corresponding to commit cbd3e14
Commiter email is timothee.davidcleris@proton.me

Light CI is enabled. This will only run the basic tests and not the full tests.
Merging a PR require the job "on PR / all" to pass which is disabled in this case.

Pre-commit check report

Pre-commit check: ✅

trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check for merge conflicts................................................Passed
check that executables have shebangs.....................................Passed
check that scripts with shebangs are executable..........................Passed
check for added large files..............................................Passed
check for case conflicts.................................................Passed
check for broken symlinks................................................Passed
check yaml...............................................................Passed
detect private key.......................................................Passed
No-tabs checker..........................................................Passed
Tabs remover.............................................................Passed
cmake-format.............................................................Passed
Validate GitHub Workflows................................................Passed
clang-format.............................................................Passed
ruff check...............................................................Passed
ruff format..............................................................Passed
Check doxygen headers....................................................Passed
Check license headers....................................................Passed
Check #pragma once.......................................................Passed
Check SYCL #include......................................................Passed
No ssh in git submodules remote..........................................Passed
No UTF-8 in files (except for authors)...................................Passed

Test pipeline can run.

Doxygen diff with main

Removed warnings : 10
New warnings : 12
Warnings count : 7885 → 7887 (0.0%)

Detailed changes :
+ src/shamalgs/include/shamalgs/buf_checksum.hpp:26: warning: Member buf_checksum(const sham::DeviceBuffer< T > &buf) (function) of namespace shamalgs is not documented.
- src/shamalgs/src/collective/sparse_exchange.cpp:113: warning: Member build_sparse_exchange_table(const std::vector< CommMessageInfo > &messages_send, size_t max_alloc_size) (function) of namespace shamalgs::collective is not documented.
+ src/shamalgs/src/collective/sparse_exchange.cpp:116: warning: Member build_sparse_exchange_table(const std::vector< CommMessageInfo > &messages_send, size_t max_alloc_size) (function) of namespace shamalgs::collective is not documented.
- src/shamalgs/src/collective/sparse_exchange.cpp:249: warning: Member sparse_exchange(std::shared_ptr< sham::DeviceScheduler > dev_sched, const std::vector< const u8 * > &bytebuffer_send, const std::vector< u8 * > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
+ src/shamalgs/src/collective/sparse_exchange.cpp:252: warning: Member sparse_exchange(std::shared_ptr< sham::DeviceScheduler > dev_sched, const std::vector< const u8 * > &bytebuffer_send, const std::vector< u8 * > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
- src/shamalgs/src/collective/sparse_exchange.cpp:298: warning: Member sparse_exchange(std::shared_ptr< sham::DeviceScheduler > dev_sched, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, target > > > &bytebuffer_send, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, target > > > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
+ src/shamalgs/src/collective/sparse_exchange.cpp:301: warning: Member sparse_exchange(std::shared_ptr< sham::DeviceScheduler > dev_sched, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, target > > > &bytebuffer_send, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, target > > > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
- src/shamalgs/src/collective/sparse_exchange.cpp:386: warning: Member sparse_exchange< sham::device >(std::shared_ptr< sham::DeviceScheduler > dev_sched, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::device > > > &bytebuffer_send, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::device > > > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
+ src/shamalgs/src/collective/sparse_exchange.cpp:389: warning: Member sparse_exchange< sham::device >(std::shared_ptr< sham::DeviceScheduler > dev_sched, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::device > > > &bytebuffer_send, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::device > > > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
- src/shamalgs/src/collective/sparse_exchange.cpp:392: warning: Member sparse_exchange< sham::host >(std::shared_ptr< sham::DeviceScheduler > dev_sched, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::host > > > &bytebuffer_send, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::host > > > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
+ src/shamalgs/src/collective/sparse_exchange.cpp:395: warning: Member sparse_exchange< sham::host >(std::shared_ptr< sham::DeviceScheduler > dev_sched, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::host > > > &bytebuffer_send, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::host > > > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
- src/shammodels/sph/include/shammodels/sph/SPHUtilities.hpp:105: warning: Member iterate_smoothing_length_cache(sham::DeviceBuffer< vec > &merged_r, sham::DeviceBuffer< flt > &hnew, sham::DeviceBuffer< flt > &hold, sham::DeviceBuffer< flt > &eps_h, sycl::range< 1 > update_range, shamrock::tree::ObjectCache &neigh_cache, flt gpart_mass, flt h_evol_max, flt h_evol_iter_max) (function) of class shammodels::sph::SPHUtilities is not documented.
+ src/shammodels/sph/include/shammodels/sph/SPHUtilities.hpp:112: warning: Member iterate_smoothing_length_cache(sham::DeviceBuffer< vec > &merged_r, sham::DeviceBuffer< flt > &hnew, sham::DeviceBuffer< flt > &hold, sham::DeviceBuffer< flt > &eps_h, sycl::range< 1 > update_range, shamrock::tree::ObjectCache &neigh_cache, flt gpart_mass, flt h_evol_max, flt h_evol_iter_max) (function) of class shammodels::sph::SPHUtilities is not documented.
- src/shammodels/sph/include/shammodels/sph/SPHUtilities.hpp:121: warning: Member iterate_smoothing_length_tree(sycl::buffer< vec > &merged_r, sycl::buffer< flt > &hnew, sycl::buffer< flt > &hold, sycl::buffer< flt > &eps_h, sycl::range< 1 > update_range, RadixTree< u_morton, vec > &tree, flt gpart_mass, flt h_evol_max, flt h_evol_iter_max) (function) of class shammodels::sph::SPHUtilities is not documented.
+ src/shammodels/sph/include/shammodels/sph/SPHUtilities.hpp:128: warning: Member iterate_smoothing_length_tree(sycl::buffer< vec > &merged_r, sycl::buffer< flt > &hnew, sycl::buffer< flt > &hold, sycl::buffer< flt > &eps_h, sycl::range< 1 > update_range, RadixTree< u_morton, vec > &tree, flt gpart_mass, flt h_evol_max, flt h_evol_iter_max) (function) of class shammodels::sph::SPHUtilities is not documented.
- src/shammodels/sph/include/shammodels/sph/SPHUtilities.hpp:147: warning: Member compute_omega(sham::DeviceBuffer< vec > &merged_r, sham::DeviceBuffer< flt > &h_part, sham::DeviceBuffer< flt > &omega_h, sycl::range< 1 > part_range, shamrock::tree::ObjectCache &neigh_cache, flt gpart_mass) (function) of class shammodels::sph::SPHUtilities is not documented.
+ src/shammodels/sph/include/shammodels/sph/SPHUtilities.hpp:154: warning: Member compute_omega(sham::DeviceBuffer< vec > &merged_r, sham::DeviceBuffer< flt > &h_part, sham::DeviceBuffer< flt > &omega_h, sycl::range< 1 > part_range, shamrock::tree::ObjectCache &neigh_cache, flt gpart_mass) (function) of class shammodels::sph::SPHUtilities is not documented.
+ src/shammodels/sph/src/modules/NeighbourCache.cpp:35: warning: Member checksum_prefix (variable) of file NeighbourCache.cpp is not documented.
- src/shamrock/include/shamrock/scheduler/SerialPatchTree.hpp:223: warning: Member dump_dat() (function) of class SerialPatchTree is not documented.
+ src/shamrock/include/shamrock/scheduler/SerialPatchTree.hpp:228: warning: Member dump_dat() (function) of class SerialPatchTree is not documented.
- src/shamrock/include/shamrock/scheduler/SerialPatchTree.hpp:246: warning: Member compute_patch_owner(sham::DeviceScheduler_ptr dev_sched, sham::DeviceBuffer< fp_prec_vec > &position_buffer, u32 len) (function) of class SerialPatchTree is not documented.
+ src/shamrock/include/shamrock/scheduler/SerialPatchTree.hpp:251: warning: Member compute_patch_owner(sham::DeviceScheduler_ptr dev_sched, sham::DeviceBuffer< fp_prec_vec > &position_buffer, u32 len) (function) of class SerialPatchTree is not documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant