Skip to content

launch_slurm is broken for train_continue #1879

@clessig

Description

@clessig

What happened?

launch_slurm is currently broken since the stages param introduced to in run_train.py is not properly handled.

What are the steps to reproduce the bug?

../WeatherGenerator-private/hpc/launch-slurm.py --from-run-id ez94wl8b

Output:

3: usage: run_train.py [-h] {train,train_continue,inference} ...
3: run_train.py: error: unrecognized arguments: --from-run-id ez94wl8b --mini-epoch -1
2: usage: run_train.py [-h] {train,train_continue,inference} ...
2: run_train.py: error: unrecognized arguments: --from-run-id ez94wl8b --mini-epoch -1
1: usage: run_train.py [-h] {train,train_continue,inference} ...
0: usage: run_train.py [-h] {train,train_continue,inference} ...
1: run_train.py: error: unrecognized arguments: --from-run-id ez94wl8b --mini-epoch -1
0: run_train.py: error: unrecognized arguments: --from-run-id ez94wl8b --mini-epoch -1
srun: error: nid005263: tasks 0-3: Exited with exit code 2
srun: Terminating StepId=602194.0

Hedgedoc link to logs and more information. This ticket is public, do not attach files directly.

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinginfraIssues related to infrastructure

Type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions