Bug report
Bug description:
TRACE_RECORD fills recorded_values before dispatching the instruction body
|
const _PyOpcodeRecordEntry *record_entry = &_PyOpcode_RecordEntries[opcode]; |
|
for (int i = 0; i < record_entry->count; i++) { |
|
_Py_RecordFuncPtr doesnt_escape = _PyOpcode_RecordFunctions[record_entry->indices[i]]; |
|
doesnt_escape(frame, stack_pointer, oparg, &tracer->prev_state.recorded_values[i]); |
|
} |
|
tracer->prev_state.recorded_count = record_entry->count; |
|
DISPATCH_GOTO_NON_TRACING(); |
At this point, recorded_values[0..record_entry->count-1] hold the values recorded for the specialised opcode that is about to run. For LOAD_ATTR_CLASS_WITH_METACLASS_CHECK in #148394, the record entry is _RECORD_TOS, so recorded_values[0] holds the raw TOS (PyObject * itself). Unlike _RECORD_TOS_TYPE stores Py_TYPE(tos), which is always a PyTypeObject *.
- The specialised body hits
EXIT_IF when its guard fails. Then EXIT_IF / DEOPT_IF compile down to a plain goto through JUMP_TO_PREDICTED, with no cleanup in the emitted code:
|
self.emit(f"JUMP_TO_PREDICTED({self.jump_prefix}{family_name});\n") |
|
# define JUMP_TO_PREDICTED(name) goto PREDICTED_##name; |
Nothing on this path touches
tracer->prev_state.recorded_values[].
_SPECIALIZE_LOAD_ATTR calls the adaptive specializer at the same pc.
Call chain:
_SPECIALIZE_LOAD_ATTR
→ _Py_Specialize_LoadAttr
→ specialize(instr, LOAD_ATTR_SLOT)
→ set_opcode(instr, LOAD_ATTR_SLOT)
→ instr->op.code = opcode;
Once _Py_Specialize_LoadAttr returns, the pc that previously held LOAD_ATTR_CLASS_WITH_METACLASS_CHECK (181) now holds, for example, LOAD_ATTR_SLOT (191). Control then falls through to DISPATCH_SAME_OPARG(), which re-reads the opcode byte and dispatches the new specialisation at the same pc:
|
#define DISPATCH_SAME_OPARG() \ |
|
{ \ |
|
opcode = next_instr->op.code; \ |
|
PRE_DISPATCH_GOTO(); \ |
|
DISPATCH_GOTO_NON_TRACING(); \ |
|
} |
Because that dispatch uses
DISPATCH_GOTO_NON_TRACING, the new specialisation runs without re-entering
TRACE_RECORD.
- After the re-specialised op completes, the next
TRACE_RECORD translates against a stale picture.
It reads prev_state.instr->op.code. prev_state.instr is still the pointer set in step 1; only the byte at that address has changed.
- Translate feeds the stale recording into the new opcode's uops
Inside translate, each uop in the macro expansion of the current opcode byte (e.g. 191 = LOAD_ATTR_SLOT) that carries HAS_RECORDS_VALUE_FLAG is handed recorded_values[record_idx] as its operand0. There is no check that the recording actually came from the opcode currently being translated:
|
else if (_PyUop_Flags[uop] & HAS_RECORDS_VALUE_FLAG) { |
|
PyObject *recorded_value = tracer->prev_state.recorded_values[record_idx]; |
|
tracer->prev_state.recorded_values[record_idx] = NULL; |
|
record_idx++; |
|
operand = (uintptr_t)recorded_value; |
|
} |
LOAD_ATTR_SLOT begins with _RECORD_TOS_TYPE, which expects a PyTypeObject:
|
macro(LOAD_ATTR_SLOT) = |
|
unused/1 + |
|
_RECORD_TOS_TYPE + |
|
_GUARD_TYPE_VERSION + |
|
_LOAD_ATTR_SLOT + // NOTE: This action may also deopt |
|
POP_TOP + |
|
unused/5 + |
|
_PUSH_NULL_CONDITIONAL; |
But the value still sitting in recorded_values[0] is whatever _RECORD_TOS stored for the old 181 — Boom.
The only existing cleanup in _PyJit_FinalizeTracing runs when tracing ends — far too late for a deopt that occurs mid-trace:
|
for (int i = 0; i < MAX_RECORDED_VALUES; i++) { |
|
Py_CLEAR(tracer->prev_state.recorded_values[i]); |
|
} |
|
tracer->prev_state.recorded_count = 0; |
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Bug report
Bug description:
TRACE_RECORDfillsrecorded_valuesbefore dispatching the instruction bodycpython/Python/bytecodes.c
Lines 6366 to 6372 in 52a7f1b
At this point,
recorded_values[0..record_entry->count-1]hold the values recorded for the specialised opcode that is about to run. ForLOAD_ATTR_CLASS_WITH_METACLASS_CHECKin #148394, the record entry is_RECORD_TOS, sorecorded_values[0]holds the raw TOS (PyObject *itself). Unlike_RECORD_TOS_TYPEstoresPy_TYPE(tos), which is always aPyTypeObject *.EXIT_IFwhen its guard fails. ThenEXIT_IF/DEOPT_IFcompile down to a plaingotothroughJUMP_TO_PREDICTED, with no cleanup in the emitted code:cpython/Tools/cases_generator/generators_common.py
Line 190 in 52a7f1b
cpython/Python/ceval_macros.h
Line 136 in 52a7f1b
Nothing on this path touches
tracer->prev_state.recorded_values[]._SPECIALIZE_LOAD_ATTRcalls the adaptive specializer at the same pc.Call chain:
_SPECIALIZE_LOAD_ATTR→
_Py_Specialize_LoadAttr→
specialize(instr, LOAD_ATTR_SLOT)→
set_opcode(instr, LOAD_ATTR_SLOT)→
instr->op.code = opcode;Once
_Py_Specialize_LoadAttrreturns, the pc that previously heldLOAD_ATTR_CLASS_WITH_METACLASS_CHECK(181) now holds, for example,LOAD_ATTR_SLOT(191). Control then falls through toDISPATCH_SAME_OPARG(), which re-reads the opcode byte and dispatches the new specialisation at the same pc:cpython/Python/ceval_macros.h
Lines 216 to 221 in 52a7f1b
Because that dispatch uses
DISPATCH_GOTO_NON_TRACING, the new specialisation runs without re-enteringTRACE_RECORD.TRACE_RECORDtranslates against a stale picture.It reads
prev_state.instr->op.code.prev_state.instris still the pointer set in step 1; only the byte at that address has changed.Inside translate, each uop in the macro expansion of the current opcode byte (e.g. 191 =
LOAD_ATTR_SLOT) that carriesHAS_RECORDS_VALUE_FLAGis handedrecorded_values[record_idx]as itsoperand0. There is no check that the recording actually came from the opcode currently being translated:cpython/Python/optimizer.c
Lines 949 to 954 in 52a7f1b
LOAD_ATTR_SLOTbegins with_RECORD_TOS_TYPE, which expects aPyTypeObject:cpython/Python/bytecodes.c
Lines 2883 to 2890 in 52a7f1b
But the value still sitting in
recorded_values[0]is whatever_RECORD_TOSstored for the old 181 — Boom.The only existing cleanup in
_PyJit_FinalizeTracingruns when tracing ends — far too late for a deopt that occurs mid-trace:cpython/Python/optimizer.c
Lines 1126 to 1129 in 52a7f1b
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux