Skip to content

[foss/2024a] add additional patches for PyTorch 2.9.1 to fix problems with the tests + exclude inductor/test_flex* tests#25492

Merged
boegel merged 5 commits intoeasybuilders:developfrom
Flamefire:20260303163052_new_pr_PyTorch291
Mar 19, 2026
Merged

[foss/2024a] add additional patches for PyTorch 2.9.1 to fix problems with the tests + exclude inductor/test_flex* tests#25492
boegel merged 5 commits intoeasybuilders:developfrom
Flamefire:20260303163052_new_pr_PyTorch291

Conversation

@Flamefire
Copy link
Copy Markdown
Contributor

@Flamefire Flamefire commented Mar 3, 2026

(created using eb --new-pr)

I noticed some code in tools/stats/import_test_stats.py that is executed by 4 tests. That will transform the JSON dict containing the disabled tests to have entries of the form disabled_test, (issue_url, platforms) instead of disabled_test, (pr_num, issue_url, platforms) as the downloaded file has.

This function modifies the file in-place.
With the current patch any test that is executed after this modification will fail with:

>           for disabled_test, (pr_num, issue_url, platforms) in disabled_tests_dict.items():
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E           ValueError: not enough values to unpack (expected 3, got 2)

Additionally this function will download the file again if it deems it outdated. I updated the patch to skip this and just return the contents of the file (taken from the branch used when the file exists and is up to date)

@Flamefire Flamefire force-pushed the 20260303163052_new_pr_PyTorch291 branch from aeaba06 to f8993db Compare March 11, 2026 11:56
@boegel boegel added bug fix and removed change labels Mar 11, 2026
@boegel boegel added this to the next release (5.2.2?) milestone Mar 11, 2026
@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 11, 2026

@boegelbot please test @ jsc-zen3
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr25492"

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=25492 EB_ARGS="--installpath /tmp/$USER/pr25492" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_25492 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 10012

Test results coming soon (I hope)...

Details

- notification for comment with ID 4040716240 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@Flamefire
Copy link
Copy Markdown
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (total: 9 hours 34 mins 42 secs) (1 easyconfigs in total)
i7008 - Linux Rocky Linux 9.6, x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.9.21
See https://gist.github.com/Flamefire/a99892e99d879c5469c0235dc579ca2d for a full test report.

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (total: 37 hours 27 mins 30 secs) (1 easyconfigs in total)
jsczen3c4.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.25
See https://gist.github.com/boegelbot/5f879b0e236cf0916d0c4905526718ec for a full test report.

@Flamefire
Copy link
Copy Markdown
Contributor Author

Added patches for issues discovered in EESSI while testing for ARM: EESSI/software-layer#1389

Bulk of failures was due to not building with MKLDNN.
We should consider enabling MKLDNN for ARM. If I understand this correctly you may need ACL (ARM Compute Library) but it might also work without. PyTorch defaults USE_MKLDNN to ON for x86 but OFF for ARM

@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Mar 13, 2026

@boegelbot please test @ jsc-zen3
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr25492"

@boegelbot
Copy link
Copy Markdown
Collaborator

@ocaisa: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=25492 EB_ARGS="--installpath /tmp/$USER/pr25492" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_25492 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 10027

Test results coming soon (I hope)...

Details

- notification for comment with ID 4056340460 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@Flamefire
Copy link
Copy Markdown
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (total: 10 hours 7 mins 23 secs) (1 easyconfigs in total)
i7030 - Linux Rocky Linux 9.6, x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.9.21
See https://gist.github.com/Flamefire/d89cb641b9bd8dc417b40d524c2869f0 for a full test report.

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (total: 36 hours 16 mins 6 secs) (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.25
See https://gist.github.com/boegelbot/18ba1c6063ff0cd6049bad9dbbe2bbe1 for a full test report.

@boegel boegel changed the title Fix race condition in checking for disabled tests in PyTorch-2.9.1-foss-2024a add additional patches for PyTorch 2.9.1 to fix problems with the tests + exclude inductor/test_flex* tests Mar 15, 2026
@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 15, 2026

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (total: 12 hours 20 mins 17 secs) (1 easyconfigs in total)
node4230.shinx.os - Linux RHEL 9.6, x86_64, AMD EPYC 9654 96-Core Processor (zen4), Python 3.9.21
See https://gist.github.com/boegel/cc99ea5dcb9f3ceca3b91888a970d359 for a full test report.

@Flamefire Flamefire changed the title add additional patches for PyTorch 2.9.1 to fix problems with the tests + exclude inductor/test_flex* tests [foss/2024a] add additional patches for PyTorch 2.9.1 to fix problems with the tests + exclude inductor/test_flex* tests Mar 17, 2026
Copy link
Copy Markdown
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 19, 2026

Going in, thanks @Flamefire!

@boegel boegel merged commit 6b72af9 into easybuilders:develop Mar 19, 2026
6 checks passed
@Flamefire Flamefire deleted the 20260303163052_new_pr_PyTorch291 branch March 19, 2026 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2024a issues & PRs related to 2024a common toolchains bug fix change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants