Tracing can be enabled by setting spark.comet.tracing.enabled=true.
With this feature enabled, each Spark executor will write a JSON event log file in
Chrome's Trace Event Format. The file will be written to the executor's current working
directory with the filename comet-event-trace-{pid}.json, where {pid} is the executor
process ID.
Additionally, enabling the jemalloc feature will enable tracing of native memory allocations.
make release COMET_FEATURES="jemalloc"Example output:
{ "name": "decodeShuffleBlock", "cat": "PERF", "ph": "B", "pid": 12345, "tid": 5, "ts": 10109225730 },
{ "name": "decodeShuffleBlock", "cat": "PERF", "ph": "E", "pid": 12345, "tid": 5, "ts": 10109228835 },
{ "name": "decodeShuffleBlock", "cat": "PERF", "ph": "B", "pid": 12345, "tid": 5, "ts": 10109245928 },
{ "name": "decodeShuffleBlock", "cat": "PERF", "ph": "E", "pid": 12345, "tid": 5, "ts": 10109248843 },
{ "name": "execute_plan", "cat": "PERF", "ph": "E", "pid": 12345, "tid": 5, "ts": 10109350935 },
{ "name": "CometExecIterator_getNextBatch", "cat": "PERF", "ph": "E", "pid": 12345, "tid": 5, "ts": 10109367116 },
{ "name": "CometExecIterator_getNextBatch", "cat": "PERF", "ph": "B", "pid": 12345, "tid": 5, "ts": 10109479156 },Traces can be viewed with Perfetto UI.
Example trace visualization:
| Label | Meaning |
|---|---|
| jvm_heap_used | JVM heap memory usage of live objects for the executor process |
| jemalloc_allocated | Native memory usage for the executor process |
| task_memory_comet_NNN | Off-heap memory allocated by Comet for query execution |
| task_memory_spark_NNN | On-heap & Off-heap memory allocated by Spark |
| comet_shuffle_NNN | Off-heap memory allocated by Comet for columnar shuffle |
| shuffle_spilled_bytes | Bytes written to disk in a single shuffle spill operation |
