Skip to content

Refactor workdir usage in Kubeflow and Kuberay#501

Open
nathan-az wants to merge 9 commits intoNVIDIA-NeMo:mainfrom
nathan-az:nazrak/volume_mount_support
Open

Refactor workdir usage in Kubeflow and Kuberay#501
nathan-az wants to merge 9 commits intoNVIDIA-NeMo:mainfrom
nathan-az:nazrak/volume_mount_support

Conversation

@nathan-az
Copy link
Copy Markdown

@nathan-az nathan-az commented May 1, 2026

Motivation and Summary

The motivation behind this PR was to better support subPath and other volume mount options for the workdir. Due to a cloud limitation, our users share a PVC, and subPaths provide level of virtual isolation.

So the main changes:

  • users specify the dict for the mount itself rather than name and path
  • parameterise the optional internal subpath (e.g. {path}/{getuser}/"code")
  • keep this functionality in the executor rather than runner so the user can control it

kubeflow executor

  • changed workdir volume mount arg to accept dict
  • added new arg to specify filesystem subpath, defaulting to {path}/{getuser()}/"code" to avoid change in behaviour

kuberay executor

  • added workdir volume mount arg
  • added new arg to specify filesystem subpath, defaulting to {path}/{getuser()}/"code" to avoid change in behaviour

kuberay runner

  • removed the creation of the subpath, deferring instead to the functionality added in the executor.

Example Inputs

The pattern here changes to users explicitly specifying the VolumeMount they want for the workdir, expecting that volumes contains the relevant volume. e.g.

volumes = [{"name": "work-vol", "persistentVolumeClaim": {"claimName": "my-pvc"}}]
workdir_volume_mount = {"name": "work-vol", "mountPath": "/nemo_run", "subPath": team_name}

I haven't tested this yet on a real cluster, but will do so next week. PR is to get thoughts on the change in pattern.

@ko3n1g let me know if you have any thoughts on the pattern, or issues with the PR.

@nathan-az
Copy link
Copy Markdown
Author

nathan-az commented May 1, 2026

@ko3n1g it looks like the kuberay executor just uses the first entry in volume_mounts as the workspace dir. This is even simpler than what I've got here (albeit a bit less transparent). That said, since it's less verbose, I'm happy to swap to that pattern for kubeflow too, if you prefer it to the one in this PR. Or if you prefer the explicit choice/flexibility, can change kuberay to take an arg like in this PR (but in a follow-up PR).

@ko3n1g In order to keep the patterns consistent I have now made the same changes in each, and opted to increase flexibility in the executors. Default behaviour remains the same but with different args.

@svcnvidia-nemo-ci svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label May 3, 2026
@nathan-az nathan-az force-pushed the nazrak/volume_mount_support branch from ce6c21f to 8f6691c Compare May 3, 2026 23:37
Nathan Azrak added 8 commits May 4, 2026 14:29
Signed-off-by: Nathan Azrak <nazrak@gmail.com>
Signed-off-by: Nathan Azrak <nazrak@gmail.com>
Signed-off-by: Nathan Azrak <nazrak@gmail.com>
Signed-off-by: Nathan Azrak <nazrak@gmail.com>
Signed-off-by: Nathan Azrak <nazrak@gmail.com>
Signed-off-by: Nathan Azrak <nazrak@gmail.com>
Signed-off-by: Nathan Azrak <nazrak@gmail.com>
@nathan-az nathan-az force-pushed the nazrak/volume_mount_support branch from 8f6691c to b02a015 Compare May 4, 2026 04:29
@nathan-az nathan-az changed the title Use explicit volume mount definition for kubeflow workdir Refactor workdir usage in Kubeflow and Kuberay May 4, 2026
Signed-off-by: Nathan Azrak <nazrak@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request waiting-on-maintainers Waiting on maintainers to respond

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants