Port CUDA jpeg encoder to stable ABI.#9535
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9535
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ec4c519 with merge base 4c0fc5a ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Thanks for the PR @adabeyta , it looks like there some failures on the wheel-building job. I'm not sure about the Windows one, but the Rocm ones are failing with:
File "/__w/vision/vision/pytorch/vision/test/smoke_test.py", line 119, in main
torchvision: 0.29.0.dev20260630+rocm7.1
print(f"{torch.ops.image._jpeg_version() = }")
File "/__w/_temp/conda_environment_28416352816/lib/python3.10/site-packages/torch/_ops.py", line 1393, in __getattr__
torch.cuda.is_available: False
raise AttributeError(
AttributeError: '_OpNamespace' 'image' object has no attribute '_jpeg_version'
I suspect it's because there's no rocm encoder yet and we might need to slightly change the build process. I also left a comment below which might be related. Hopefully this should transitively address the Windows stuff too.
Note that by default, the ROCm unittests job won't run on PRs - only the wheel building job. If you want to test the rocm stuff, you can just push a branch against the origin repo (this one), instead of pushing to your fork. You have write permissions so you should be able to do that.
There was a problem hiding this comment.
We shouldn't need this here in image anymore, since this should all be in image_stable now
There was a problem hiding this comment.
Nice catch, now that both ops live in image_stable we no longer need to link nvjpeg or build as a CUDAExt.
07c716a to
0b078d9
Compare
Thanks @NicolasHug. Looks like there were two issues. For Windows image_stable links nvjpeg and once the encoder left image, nothing preloaded it. Then for ROCm the build dropped image.cpp but since CUDA migrated out there was nothing to hippify. Let me know what you think of the loading approach. |
| # system CUDA it may not be on the loader path and image_stable fails to load. We try to | ||
| # locate nvjpeg in the CUDA toolkit (CUDA_PATH/CUDA_HOME, then nvcc) and preload it; on | ||
| # Windows a .pyd import does not search PATH, so we also scan PATH, where the CUDA | ||
| # installer's DLL dir lands. No-op if already loaded or not found. |
There was a problem hiding this comment.
We should avoid doing that kind of thing if we can. We didn't need it before when the encoder was in image and the decoder was already in image_stable, so there must be a simpler and cleaner solution?
There was a problem hiding this comment.
Adding image_stable to the relocate list (Linux + Windows) so it gets bundled should also solve this.
0b078d9 to
908c62b
Compare
908c62b to
ec4c519
Compare
|
Hey @NicolasHug! You merged this PR, but no labels were added. |
Ports CUDA jpeg encoder to the stable ABI. This reuses the
image_stableextension that #9533 stood up.Layout Changes
setup.py:
encode_jpegs_cuda.cppadded toSTABLE_SOURCES.Removed cuda/encode_decode_jpegs_cuda.h. With both the decoder (Port CUDA jpeg decoder to stable ABI #9533) and now the encoder on the stable header they are no longer needed.
Stable-ABI audit (
torch-abi-audit)Ran
torch-abi-auditagainst the builttorchvisionpackage to verify the migrated extension stays on the stable surface and never reaches intoat::/c10::/torch::jit::internals.image_stable.soaudits asSTABLE— 67 stable-shim symbols, 0 unstable (up from 65 in #9533; the encoder added 2). The encoder's symbols also leave the legacyimage.so, whose unstable count drops 71 → 48. The legacyimage.so/_C.sostayUNSTABLE(expected: only nms, the jpeg decoder, and now the encoder have migrated), so the package-level verdict staysUNSTABLEuntil the rest move.