-
Notifications
You must be signed in to change notification settings - Fork 86
Runner: make MPI bind-to configurable #797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -91,6 +91,8 @@ def __validate_sb_config(self): # noqa: C901 | |||||||||
| 'btl_tcp_if_exclude': 'lo,docker0', | ||||||||||
| 'coll_hcoll_enable': 0, | ||||||||||
| } | ||||||||||
| if 'bind_to' not in mode: | ||||||||||
| self._sb_benchmarks[name].modes[idx].bind_to = 'numa' | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [SHOULD-FIX] No input validation on
|
||||||||||
| for key in ['PATH', 'LD_LIBRARY_PATH', 'SB_MICRO_PATH', 'SB_WORKSPACE']: | ||||||||||
| self._sb_benchmarks[name].modes[idx].env.setdefault(key, None) | ||||||||||
| if 'pattern' in mode: | ||||||||||
|
|
@@ -182,13 +184,14 @@ def __get_mode_command(self, benchmark_name, mode, timeout=None): | |||||||||
| '-tag-output ' # tag mpi output with [jobid,rank]<stdout/stderr> prefix | ||||||||||
| '-allow-run-as-root ' # allow mpirun to run when executed by root user | ||||||||||
| '{host_list} ' # use prepared hostfile or specify nodes and launch {proc_num} processes on each node | ||||||||||
| '-bind-to numa ' # bind processes to numa | ||||||||||
| '-bind-to {bind_to} ' # bind processes according to mode config | ||||||||||
| '{mca_list} {env_list} {command}' | ||||||||||
| ).format( | ||||||||||
| trace=trace_command, | ||||||||||
| host_list=f'-host localhost:{mode.proc_num}' if 'node_num' in mode and mode.node_num == 1 else | ||||||||||
| f'-hostfile hostfile -map-by ppr:{mode.proc_num}:node' if 'host_list' not in mode else '-host ' + | ||||||||||
| ','.join(f'{host}:{mode.proc_num}' for host in mode.host_list), | ||||||||||
| bind_to=mode.bind_to, | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [SHOULD-FIX] No new tests for custom
|
||||||||||
| Test case | Purpose |
|---|---|
MPI mode with bind_to: 'core' |
Non-default value flows into command |
MPI mode with bind_to: 'none' |
Disabling binding works |
test_validate_sb_config asserts bind_to in mode for MPI modes |
Default is applied when omitted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BLOCKER] Existing MPI tests will break
The 4 existing MPI test cases in
test_get_mode_commandconstruct mode dicts viaOmegaConf.create(...)directly — bypassing__validate_sb_config. After this change,__get_mode_commandaccessesmode.bind_toin the.format()call, but those test mode dicts don't includebind_to. OmegaConf raisesConfigAttributeError: Missing key bind_tofor missing keys.Verified locally:
Fix: Add
'bind_to': 'numa'to all 4 existing MPI test case mode dicts intest_get_mode_command, and add at least one new test case with a non-default value (e.g.,'bind_to': 'core') to verify the feature works.