Skip to content

Commit 864a751

Browse files
art049claude
andcommitted
feat: discriminated union schema for msgpack, drop CSV format
- Update msgpack format to version 2 with event_schemas - Each event type (ENTER, EXIT, FORK) has its own column schema - FORK events use minimal 4-element format: [seq, tid, event, child_pid] - Remove CSV output format entirely (msgpack-only now) - Add decode-trace.py script for debugging trace files - Add fork detection via post-syscall handler for fork/clone/vfork Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 9211320 commit 864a751

6 files changed

Lines changed: 480 additions & 188 deletions

File tree

tracegrind/clo.c

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -518,8 +518,6 @@ Bool TG_(process_cmd_line_option)(const HChar* arg)
518518

519519
else if VG_STR_CLO(arg, "--tracegrind-out-file", TG_(clo).out_format) {}
520520

521-
else if VG_XACT_CLO(arg, "--output-format=csv",
522-
TG_(clo).output_format, output_format_csv) {}
523521
else if VG_XACT_CLO(arg, "--output-format=msgpack",
524522
TG_(clo).output_format, output_format_msgpack) {}
525523

@@ -578,7 +576,6 @@ void TG_(print_usage)(void)
578576
VG_(printf)(
579577
"\n dump creation options:\n"
580578
" --tracegrind-out-file=<f> Output file name [tracegrind.out.%%p]\n"
581-
" --output-format=csv|msgpack Output format [csv]\n"
582579
" --dump-line=no|yes Dump source lines of costs? [yes]\n"
583580
" --dump-instr=no|yes Dump instruction address of costs? [no]\n"
584581
" --compress-strings=no|yes Compress strings in profile dump? [yes]\n"
@@ -704,5 +701,5 @@ void TG_(set_clo_defaults)(void)
704701
TG_(clo).verbose_start = 0;
705702
#endif
706703

707-
TG_(clo).output_format = output_format_csv;
704+
TG_(clo).output_format = output_format_msgpack;
708705
}

tracegrind/docs/tracegrind-msgpack-format.md

Lines changed: 46 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Overview
44

5-
Tracegrind's `--output-format=msgpack` produces a binary trace file combining MsgPack serialization with LZ4 block compression. Files use the `.msgpack.lz4` extension.
5+
Tracegrind produces a binary trace file combining MsgPack serialization with LZ4 block compression. Files use the `.msgpack.lz4` extension.
66

77
## File Structure
88

@@ -23,7 +23,7 @@ Tracegrind's `--output-format=msgpack` produces a binary trace file combining Ms
2323
| Offset | Size | Field | Description |
2424
|--------|------|---------|-------------|
2525
| 0 | 4 | magic | ASCII `TGMP` (0x54 0x47 0x4D 0x50) |
26-
| 4 | 4 | version | Format version, uint32 LE (currently 1) |
26+
| 4 | 4 | version | Format version, uint32 LE (currently 2) |
2727

2828
## Chunk Format
2929

@@ -37,17 +37,31 @@ Each chunk (schema and data) has the same header:
3737

3838
## Schema Chunk
3939

40-
The first chunk contains a MsgPack map:
40+
The first chunk contains a MsgPack map describing the discriminated union schema:
4141

4242
```json
4343
{
44-
"version": 1,
44+
"version": 2,
4545
"format": "tracegrind-msgpack",
46-
"columns": ["seq", "tid", "event", "fn", "obj", "file", "line", "Ir", ...]
46+
"event_schemas": {
47+
"0": ["seq", "tid", "event", "fn", "obj", "file", "line", "Ir", ...],
48+
"1": ["seq", "tid", "event", "fn", "obj", "file", "line", "Ir", ...],
49+
"2": ["seq", "tid", "event", "child_pid"]
50+
}
4751
}
4852
```
4953

50-
### Fixed Columns
54+
### Event Types
55+
56+
| Type | Name | Description |
57+
|------|-------|-------------|
58+
| 0 | ENTER | Function entry |
59+
| 1 | EXIT | Function exit |
60+
| 2 | FORK | Child process created |
61+
62+
### Row Schemas
63+
64+
**ENTER/EXIT rows (event 0, 1):**
5165

5266
| Index | Name | Type | Description |
5367
|-------|-------|--------|-------------|
@@ -58,17 +72,31 @@ The first chunk contains a MsgPack map:
5872
| 4 | obj | string | Shared object path |
5973
| 5 | file | string | Source file path |
6074
| 6 | line | int32 | Line number (0 if unknown) |
75+
| 7+ | ... | int64 | Event counter deltas (Ir, Dr, Dw, etc.) |
76+
77+
**FORK rows (event 2):**
78+
79+
| Index | Name | Type | Description |
80+
|-------|-----------|--------|-------------|
81+
| 0 | seq | uint64 | Sequence number |
82+
| 1 | tid | int32 | Thread ID that called fork |
83+
| 2 | event | int | 2 = FORK |
84+
| 3 | child_pid | int32 | PID of the new child process |
85+
86+
### Event Counter Columns
6187

62-
### Event Columns (index 7+)
88+
For ENTER/EXIT rows, event counters appear as delta values starting at index 7. Which counters are present depends on Tracegrind options:
6389

64-
Event counters as delta values: `Ir`, `Dr`, `Dw`, `I1mr`, `D1mr`, `D1mw`, `ILmr`, `DLmr`, `DLmw`, `Bc`, `Bcm`, `Bi`, `Bim`. Which columns are present depends on Tracegrind options.
90+
`Ir`, `Dr`, `Dw`, `I1mr`, `D1mr`, `D1mw`, `ILmr`, `DLmr`, `DLmw`, `Bc`, `Bcm`, `Bi`, `Bim`
6591

6692
## Data Chunks
6793

68-
Each data chunk contains concatenated MsgPack arrays (one per row):
94+
Each data chunk contains concatenated MsgPack arrays. The row format depends on the event type (index 2):
6995

7096
```
71-
[seq, tid, event, fn, obj, file, line, delta_Ir, ...]
97+
[seq, tid, 0, fn, obj, file, line, delta_Ir, ...] # ENTER
98+
[seq, tid, 1, fn, obj, file, line, delta_Ir, ...] # EXIT
99+
[seq, tid, 2, child_pid] # FORK
72100
```
73101

74102
The reference implementation writes 4096 rows per chunk.
@@ -86,13 +114,16 @@ def read_tracegrind(filepath):
86114
with open(filepath, 'rb') as f:
87115
assert f.read(4) == b'TGMP'
88116
version = struct.unpack('<I', f.read(4))[0]
117+
assert version == 2
89118

90119
# Read schema chunk
91120
usize, csize = struct.unpack('<II', f.read(8))
92121
schema = msgpack.unpackb(
93122
lz4.block.decompress(f.read(csize), uncompressed_size=usize))
94-
columns = [c.decode() if isinstance(c, bytes) else c
95-
for c in schema[b'columns']]
123+
event_schemas = {
124+
int(k): [c.decode() if isinstance(c, bytes) else c for c in v]
125+
for k, v in schema[b'event_schemas'].items()
126+
}
96127

97128
# Read data chunks
98129
rows = []
@@ -104,9 +135,11 @@ def read_tracegrind(filepath):
104135
unpacker = msgpack.Unpacker(raw=False)
105136
unpacker.feed(chunk)
106137
for row in unpacker:
138+
event_type = row[2]
139+
columns = event_schemas[event_type]
107140
rows.append(dict(zip(columns, row)))
108141

109-
return columns, rows
142+
return event_schemas, rows
110143
```
111144

112145
## References

0 commit comments

Comments
 (0)