[SYCL][Driver] Enable time tracing capability for SYCL applications.#21207
[SYCL][Driver] Enable time tracing capability for SYCL applications.#21207uditagarwal97 merged 38 commits intointel:syclfrom
Conversation
|
Hi @srividya-sundaram, this is an area I've explored previously, and I remember that Have you checked the comment at https://reviews.llvm.org/D150282 and https://reviews.llvm.org/D133662, and the github issue llvm/llvm-project#55455? It would be ideal to resolve this in upstream clang, and do so for all offloading models, not just SYCL. |
Hi @Maetveis
Could you please share the usability problems you encountered? Some questions I have are: |
Sure :). This was a while ago, and at the time for a different toolchain (AMD's HIP) but I think they mostly still apply.
To frame these a bit more, I think it's useful to think about the following use-cases: Use-case A:As a developer of the library libFoo which uses an offloading API for (some of) its sources, I want to analyze the overall build-time and look for "hot-spots" where I can reduce it the most. In order to do this, I use tools like ninjatracing and pass Use-case B:I have identified that the file The second case is already reasonably well served by what clang can do for The first case basically breaks down, the level of detail is reduced to the object file level instead of fine-grain we would have without offloading. We don't get any information about which step of the combined offload "compilation" took longest.
In an ideal world in my opinion there should be just one trace and that includes traces for every step: host and device compilation and linking too, assuming the linker is capable of producing compatible traces. |
I don't think that was an intentional design choice for
There are already separate high-level categories in the traces like "Frontend" and "Backend", I don't see why an additional level of "Offload Host", "Offload Device (nvptx)" etc couldn't be added.
Perfetto is the successor of the chrome-tracing visualizer; it supports binary traces (much smaller sizes), is designed with multi-process traces in mind.
I think your suggestion improves the status quo for at least the simpler use case, so SGTM. I understand that implementing a single trace is a significantly more work, and there might not be a big enough motivation to do that. |
|
For short term usability, having separate traces for each compilation (host/targetA/targetB) with different unique file names sounds reasonable to me. The perspective of having a single time-trace file when offloading enabled with all target embedded does make sense as from a general user perspective there is one binary generated - at least when generating an object. This of course goes beyond the scope of just modifying the driver. Documentation should be updated in the SYCL space to show generated file expectations. |
* Update device trace file's name to add -sycl.
|
SYCL Pre commit failures are un-related to this patch. |
jopperm
left a comment
There was a problem hiding this comment.
RTC-related changes LGTM.
Add test with actual sycl compilation Add COW test case.
|
@intel/llvm-gatekeepers please consider merging |

This PR implements -ftime-trace support for SYCL offloading compilation, enabling trace generation for both host and device compilation phases. The implementation handles various compilation modes and ensures trace files are generated with clear, predictable naming conventions.