You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm analyzing parser behavior and need to determine which functions in an instrumented program accessed specific input bytes. This would help understand:
Which parsing functions process which byte ranges
How input bytes flow through the call stack
Which code paths are triggered by specific input patterns
I've made significant progress exploring the v4.0.0 API and can extract:
1. Function Names
frompolytracker.taint_dagimportTDFunctionsSection, TDStringSection# Extract function ID → name mappingforfunc_id, fn_headerinenumerate(functions_section):
func_name=string_section.read_string(fn_header.name_offset)
Results: Successfully extracted all function names: main, parse_expr, program, statement, expr, test, sum, term, etc.
2. Function Call Trace
frompolytracker.taint_dagimportTDEventsSection# Iterate through execution traceforeventinevents_section:
function_id=event.fnidx# Function being calledevent_type=event.kind# ENTRY (0) or EXIT (1)
Results: Complete function call trace with proper nesting:
ENTER program
ENTER statement
ENTER expr
ENTER term
3. Input Byte Offsets
# Extract byte offsets from taint forestfornodeintrace.taint_forest.nodes():
ifnode.sourceisnotNone:
offset=trace.file_offset(node).offset# Byte offset in inputaffected_cf=node.affected_control_flow# Whether byte influenced branching
Results: All input bytes tracked with control flow information.
What's Missing: The Correlation
I cannot find a way to correlate these pieces together to answer: "Which function accessed byte X?"
For example, given:
Input file: {i=1;}\n
Byte 2 (= character) at offset 2
I need to determine: "The expr function accessed byte 2"
What I've Tried
Approach 1: TDEvent attributes
foreventinevents_section:
# event has: fnidx, kind# event does NOT have: label, taints, bytes_accessed
Result: Events know which function, but not which bytes it accessed.
Approach 2: Taint nodes
fornodeinforest.nodes():
# node has: label, source, affected_control_flow# node does NOT have: function, event, accessed_by
Result: Nodes know which byte, but not which function accessed it.
Is this data available internally but not exposed?
If the correlation exists internally but isn't exposed via Python API, would you accept a PR to add it?
Alternative approach?
Is there a recommended way to achieve this byte-to-function mapping with the current v4.0.0 API?
Minimal Reproduction
# Instrument a C program
docker run --rm --platform linux/amd64 \
-v $(pwd):/workdir -w /workdir \
trailofbits/polytracker bash -c \
"polytracker build clang program.c -o program && \ polytracker instrument-targets --taint --ftrace program"# Execute with stdin tracking
docker run --rm --platform linux/amd64 \
-v $(pwd):/workdir -w /workdir \
-e POLYDB=polytracker.tdag \
-e POLYTRACKER_STDIN_SOURCE=1 \
trailofbits/polytracker \
bash -c "./program.instrumented < input.txt"# Analyze the trace
docker run --rm --platform linux/amd64 \
-v $(pwd):/workdir -w /workdir \
trailofbits/polytracker python3 -c "from polytracker import PolyTrackerTracefrom polytracker.taint_dag import TDFunctionsSection, TDEventsSection, TDStringSectiontrace = PolyTrackerTrace.load('polytracker.tdag')# Can extract functions and events separately,# but cannot correlate which functions accessed which bytes
Use Case
This mapping would enable:
Parser debugging: Identify which function mishandled a specific byte
Security analysis: Find which code paths process attacker-controlled bytes
Execution visualization: Create diagrams showing byte flow through functions
Performance analysis: Identify hot paths for specific input patterns
I'm analyzing parser behavior and need to determine which functions in an instrumented program accessed specific input bytes. This would help understand:
Environment
trailofbits/polytracker:latest)--platform linux/amd64)What I've Successfully Extracted
I've made significant progress exploring the v4.0.0 API and can extract:
1. Function Names
Results: Successfully extracted all function names:
main,parse_expr,program,statement,expr,test,sum,term, etc.2. Function Call Trace
Results: Complete function call trace with proper nesting:
3. Input Byte Offsets
Results: All input bytes tracked with control flow information.
What's Missing: The Correlation
I cannot find a way to correlate these pieces together to answer:
"Which function accessed byte X?"
For example, given:
{i=1;}\n=character) at offset 2I need to determine: "The
exprfunction accessed byte 2"What I've Tried
Approach 1: TDEvent attributes
Result: Events know which function, but not which bytes it accessed.
Approach 2: Taint nodes
Result: Nodes know which byte, but not which function accessed it.
Approach 3: Documented methods
Result: Documented API methods are not implemented in v4.0.0.
Approach 4: Control Flow Log
Result: Found
function_id_mappingattribute but it's a method, and calling it returns empty results.Questions
Is there an API I'm missing?
Is there a method/property that links taint labels to the events/functions that accessed them?
Should I use a different trace format?
Issue Emitting and loading a DBProgramTrace instead of a TDProgramTrace #6534 mentioned
DBProgramTracevsTDProgramTrace. Can I generate.dbfiles whereaccess_sequence()actually works?Is this data available internally but not exposed?
If the correlation exists internally but isn't exposed via Python API, would you accept a PR to add it?
Alternative approach?
Is there a recommended way to achieve this byte-to-function mapping with the current v4.0.0 API?
Minimal Reproduction
Use Case
This mapping would enable:
Related Issues: