Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 22 additions & 7 deletions docs/design/datacontracts/ExecutionManager.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,8 @@ public enum CodeKind : uint
CallCountingStub = 9,
MethodCallThunk = 10,
Jitted = 11,
ReadyToRun = 12
ReadyToRun = 12,
Interpreter = 13
}
```

Expand Down Expand Up @@ -184,6 +185,10 @@ Data descriptors used:
| `RealCodeHeader` | `DebugInfo` | Pointer to the DebugInfo |
| `RealCodeHeader` | `GCInfo` | Pointer to the GCInfo encoding |
| `RealCodeHeader` | `EHInfo` | Pointer to the `EE_ILEXCEPTION` containing exception clauses |
| `InterpreterRealCodeHeader` | `MethodDesc` | Pointer to the corresponding `MethodDesc` for interpreter code |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janvorli @kotlarmilos - Please take a look at these contracts (the markdown files) and see what you think. Every type/field/algorithm included here becomes part of the interpreter's contract with diagnostic tools. Changing the data structures is possible, but its a breaking change that requires developers to update their tools so we'd only expect to do it rarely. Make sure the things documented here are things you'd expect to be reasonably stable over time or please suggest alternatives.

Thanks!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to bump @janvorli @kotlarmilos @jkotas, planning to merge this soon.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, I've missed the comment from @noahfalk last week. I'll look at it today.

| `InterpreterRealCodeHeader` | `DebugInfo` | Pointer to the DebugInfo for interpreter code |
| `InterpreterRealCodeHeader` | `GCInfo` | Pointer to the GCInfo encoding for interpreter code |
| `InterpreterRealCodeHeader` | `JitEHInfo` | Pointer to the `EE_ILEXCEPTION` containing exception clauses for interpreter code |
Comment thread
noahfalk marked this conversation as resolved.
| `Module` | `ReadyToRunInfo` | Pointer to the `ReadyToRunInfo` for the module |
| `ReadyToRunInfo` | `ReadyToRunHeader` | Pointer to the ReadyToRunHeader |
| `ReadyToRunInfo` | `CompositeInfo` | Pointer to composite R2R info - or itself for non-composite |
Expand Down Expand Up @@ -282,9 +287,11 @@ The bulk of the work is done by the `GetCodeBlockHandle` API that maps a code po
}
```

There are two JIT managers: the "EE JitManager" for jitted code and "R2R JitManager" for ReadyToRun code.
There are three JIT managers: the "EE JitManager" for jitted code, the "Interpreter JitManager" for interpreted code, and the "R2R JitManager" for ReadyToRun code.

The EE JitManager `GetMethodInfo` implements the nibble map lookup, summarized below, followed by returning the `RealCodeHeader` data:
The EE JitManager and Interpreter JitManager both use the same nibble map lookup to find method code.
The only difference is which code header type is read: the EE JitManager reads a `RealCodeHeader` while the Interpreter JitManager reads an `InterpreterRealCodeHeader`.
Their shared `GetMethodInfo` is summarized below:

```csharp
bool GetMethodInfo(TargetPointer rangeSection, TargetCodePointer jittedCodeAddress, [NotNullWhen(true)] out CodeBlock? info)
Expand All @@ -303,8 +310,10 @@ bool GetMethodInfo(TargetPointer rangeSection, TargetCodePointer jittedCodeAddre
return false;

TargetPointer codeHeaderAddress = Target.ReadPointer(codeHeaderIndirect);
TargetPointer methodDesc = Target.ReadPointer(codeHeaderAddress + /* RealCodeHeader::MethodDesc offset */);
info = new CodeBlock(jittedCodeAddress, realCodeHeader.MethodDesc, relativeOffset);
// EE JitManager: read RealCodeHeader at codeHeaderAddress
// Interpreter JitManager: read InterpreterRealCodeHeader at codeHeaderAddress
TargetPointer methodDesc = // read MethodDesc field from the appropriate code header
info = new CodeBlock(jittedCodeAddress, methodDesc, relativeOffset);
return true;
}
```
Expand Down Expand Up @@ -480,6 +489,8 @@ The `GetMethodDesc`, `GetStartAddress`, and `GetRelativeOffset` APIs extract fie

* For R2R code (`ReadyToRunJitManager`), a list of sorted `RUNTIME_FUNCTION` are stored on the module's `ReadyToRunInfo`. This is accessed as described above for `GetMethodInfo`. Again, the relevant `RUNTIME_FUNCTION` is found by binary searching the list based on IP.

* For interpreted code (`InterpreterJitManager`), there is no native unwind info. `GetUnwindInfo` returns null.

Unwind info (`RUNTIME_FUNCTION`) use relative addressing. For managed code, these values are relative to the start of the code's containing range in the RangeSectionMap (described below). This could be the beginning of a `CodeHeap` for jitted code or the base address of the loaded image for ReadyToRun code.
`GetUnwindInfoBaseAddress` finds this base address for a given `CodeBlockHandle`.

Expand All @@ -490,6 +501,8 @@ Unwind info (`RUNTIME_FUNCTION`) use relative addressing. For managed code, thes
* For R2R code (`ReadyToRunJitManager`) the `DebugInfo` is stored as part of the R2R image. The relevant `ReadyToRunInfo` stores a pointer to the an `ImageDataDirectory` representing the `DebugInfo` directory. Read the `VirtualAddress` of this data directory as a `NativeArray` containing the `DebugInfos`. To find the specific `DebugInfo`, index into the array using the `index` of the beginning of the R2R function as found like in `GetMethodInfo` above. This yields an offset `offset` value relative to the image base. Read the first variable length uint at `imageBase + offset`, `lookBack`. If `lookBack != 0`, return `imageBase + offset - lookback`. Otherwise return `offset + size of reading lookback`.
For R2R images, `hasFlagByte` is always `false`.

* For interpreted code (`InterpreterJitManager`), a pointer to the `DebugInfo` is stored on the `InterpreterRealCodeHeader` which is accessed in the same way as the EE JitManager's `GetMethodInfo` (nibble map lookup followed by code header read). `hasFlagByte` is always `false`.

`IExecutionManager.GetGCInfo` gets a pointer to the relevant GCInfo for a `CodeBlockHandle`. The ExecutionManager delegates to the JitManager implementations as the GCInfo is stored differently on jitted and R2R code.

* For jitted code (`EEJitManager`) a pointer to the `GCInfo` is stored on the `RealCodeHeader` which is accessed in the same way as `GetMethodInfo` described above. This can simply be returned as is. The `GCInfoVersion` is defined by the runtime global `GCInfoVersion`.
Expand All @@ -498,6 +511,8 @@ For R2R images, `hasFlagByte` is always `false`.
* The `GCInfoVersion` of R2R code is mapped from the R2R MajorVersion and MinorVersion which is read from the ReadyToRunHeader which itself is read from the ReadyToRunInfo (can be found as in GetMethodInfo). The current GCInfoVersion mapping is:
* MajorVersion >= 11 and MajorVersion < 15 => 4

* For interpreted code (`InterpreterJitManager`), a pointer to the `GCInfo` is stored on the `InterpreterRealCodeHeader`, accessed via nibble map lookup as with the EE JitManager. The `GCInfoVersion` is defined by the runtime global `GCInfoVersion`. The GC info is decoded using interpreter-specific decoding (`DecodeInterpreterGCInfo`).


`IExecutionManager.GetFuncletStartAddress` finds the start of the code blocks funclet. This will be different than the methods start address `GetStartAddress` if the current code block is inside of a funclet. To find the funclet start address, we get the unwind info corresponding to the code block using `IExecutionManager.GetUnwindInfo`. We then parse the unwind info to find the begin address (relative to the unwind info base address) and return the unwind info base address + unwind info begin address.

Expand All @@ -511,11 +526,11 @@ There are two distinct clause data types. JIT-compiled code uses `EEExceptionCla

* For R2R code (`ReadyToRunJitManager`), exception clause data is found via the `ExceptionInfo` section (section type 104) of the R2R image. The section is located by traversing `ReadyToRunInfo::Composite` to reach the `ReadyToRunCoreInfo`, then reading its `Header` pointer to the `ReadyToRunCoreHeader`, and iterating through the inline `ReadyToRunSection` array that immediately follows the header. The `ExceptionInfo` section contains an `ExceptionLookupTableEntry` array, where each entry maps a `MethodStartRVA` to an `ExceptionInfoRVA`. A binary search (falling back to linear scan for small ranges) finds the entry matching the method's RVA. The exception clauses span from that entry's `ExceptionInfoRVA` to the next entry's `ExceptionInfoRVA`, both offset from the image base. The clause array is strided using the size of `R2RExceptionClause`.

After obtaining the clause array bounds, the common iteration logic classifies each clause by its flags. The native `COR_ILEXCEPTION_CLAUSE` flags are bit flags: `Filter` (0x1), `Finally` (0x2), `Fault` (0x4). If none are set, the clause is `Typed`. For typed clauses, if the `CachedClass` flag (0x10000000) is set (JIT-only, used for dynamic methods), the union field contains a resolved `TypeHandle` pointer; the clause is a catch-all if this pointer equals the `ObjectMethodTable` global. Otherwise, the union field is a metadata `ClassToken`. To determine whether a typed clause is a catch-all handler, the `ClassToken` (which may be a `TypeDef` or `TypeRef`) is resolved to a `MethodTable` via the `Loader` contract's module lookup maps (`TypeDefToMethodTable` or `TypeRefToMethodTable`) and compared against the `ObjectMethodTable` global. For typed clauses without a cached type handle, the module address is resolved by walking `CodeBlockHandle` → `MethodDesc` → `MethodTable` → `TypeHandle` → `Module` via the `RuntimeTypeSystem` contract.
After obtaining the clause array bounds, the common iteration logic classifies each clause by its flags. The native `COR_ILEXCEPTION_CLAUSE` flags are bit flags: `Filter` (0x1), `Finally` (0x2), `Fault` (0x4). If none are set, the clause is `Typed`. For typed clauses, if the `CachedClass` flag (0x10000000) is set (JIT-only, used for dynamic methods), the union field contains a resolved `TypeHandle` pointer; the clause is a catch-all if this pointer equals the `ObjectMethodTable` global. Otherwise, the union field is a metadata `ClassToken`. To determine whether a typed clause is a catch-all handler, the `ClassToken` (which may be a `TypeDef` or `TypeRef`) is resolved to a `MethodTable` via the `Loader` contract's module lookup maps (`TypeDefToMethodTable` or `TypeRefToMethodTable`) and compared against the `ObjectMethodTable` global. For typed clauses without a cached type handle, the module address is resolved by walking `CodeBlockHandle` -> `MethodDesc` -> `MethodTable` -> `TypeHandle` -> `Module` via the `RuntimeTypeSystem` contract.

`IsFilterFunclet` first checks `IsFunclet`. If the code block is a funclet, it retrieves the EH clauses for the method and checks whether any filter clause's handler offset matches the funclet's relative offset. If a match is found, the funclet is a filter funclet.

`GetCodeKind` classifies a code address by finding its owning range section and determining the code kind. It distinguishes between jitted code, stub code blocks (jump stubs, precode stubs, VSD stubs, etc.), and ReadyToRun code. Returns `Unknown` if the address cannot be classified. We depend on the values of the StubCodeBlockKind enum defined in codeman.h; for non-R2R code, we compare either the RangeList type or the code header against the values of this enum.
`GetCodeKind` classifies a code address by finding its owning range section and determining the code kind. It distinguishes between jitted code, stub code blocks (jump stubs, precode stubs, VSD stubs, etc.), ReadyToRun code, and interpreter code. Returns `Unknown` if the address cannot be classified. We depend on the values of the StubCodeBlockKind enum defined in codeman.h; for non-R2R code, we compare either the RangeList type or the code header against the values of this enum.
### FindReadyToRunModule

`FindReadyToRunModule` locates the ReadyToRun module whose PE image contains the given address. Unlike `GetCodeBlockHandle` (which only matches code regions), this API matches against the full PE image range - including data sections such as import tables. This is used in GCRefMap resolution as it requires finding the module that owns an import section indirection address, which is in the data section rather than the code section.
Expand Down
54 changes: 54 additions & 0 deletions docs/design/datacontracts/PrecodeStubs.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ This contract provides support for examining [precode](../coreclr/botr/method-de
// Given an interior address within a precode stub and the kind of stub (StubPrecode or FixupPrecode),
// computes the entry point of the precode.
TargetPointer GetPrecodeEntryPointFromInteriorAddress(TargetCodePointer interiorAddress, bool isFixupPrecode);

// If the code pointer is an interpreter precode, returns the actual interpreter
// code address (ByteCodeAddr). Otherwise returns the original address unchanged.
// Mirrors GetInterpreterCodeFromInterpreterPrecodeIfPresent in native code (precode.cpp).
TargetCodePointer GetInterpreterCodeFromInterpreterPrecodeIfPresent(TargetCodePointer entryPoint);
```

## Version 1, 2, and 3
Expand Down Expand Up @@ -44,6 +49,10 @@ Data descriptors used:
| StubPrecodeData | Type | precise sort of stub precode |
| FixupPrecodeData | MethodDesc | pointer to the MethodDesc associated with this fixup precode |
| ThisPtrRetBufPrecodeData | MethodDesc | pointer to the MethodDesc associated with the ThisPtrRetBufPrecode (Version 2 only) |
| InterpreterPrecodeData | ByteCodeAddr | pointer to the `InterpByteCodeStart` for the interpreter bytecode (Version 3 only) |
| InterpreterPrecodeData | Type | precode sort byte identifying this as an interpreter precode (Version 3 only) |
| InterpByteCodeStart | Method | pointer to the `InterpMethod` associated with the bytecode |
| InterpMethod | MethodDesc | pointer to the MethodDesc for the interpreted method |

arm32 note: the `CodePointerToInstrPointerMask` is used to convert IP values that may include an arm Thumb bit (for example extracted from disassembling a call instruction or from a snapshot of the registers) into an address. On other architectures applying the mask is a no-op.

Expand Down Expand Up @@ -263,6 +272,22 @@ After the initial precode type is determined, for stub precodes a refined precod
}
}

// Version 3 only: resolves MethodDesc for interpreter precodes by following
// the InterpreterPrecodeData -> InterpByteCodeStart -> InterpMethod -> MethodDesc chain.
internal sealed class InterpreterPrecode : ValidPrecode
{
internal InterpreterPrecode(TargetPointer instrPointer) : base(instrPointer, KnownPrecodeType.Interpreter) { }

internal override TargetPointer GetMethodDesc(Target target, Data.PrecodeMachineDescriptor precodeMachineDescriptor)
{
TargetPointer dataAddr = InstrPointer + precodeMachineDescriptor.StubCodePageSize;
Data.InterpreterPrecodeData precodeData = target.ProcessedData.GetOrAdd<Data.InterpreterPrecodeData>(dataAddr);
Data.InterpByteCodeStart byteCodeStart = target.ProcessedData.GetOrAdd<Data.InterpByteCodeStart>(precodeData.ByteCodeAddr);
Data.InterpMethod interpMethod = target.ProcessedData.GetOrAdd<Data.InterpMethod>(byteCodeStart.Method);
return interpMethod.MethodDesc;
}
}

internal TargetPointer CodePointerReadableInstrPointer(TargetCodePointer codePointer)
{
// Mask off the thumb bit, if we're on arm32, to get the actual instruction pointer
Expand All @@ -286,6 +311,8 @@ After the initial precode type is determined, for stub precodes a refined precod
return new PInvokeImportPrecode(instrPointer);
case KnownPrecodeType.ThisPtrRetBuf:
return new ThisPtrRetBufPrecode(instrPointer);
case KnownPrecodeType.Interpreter:
return new InterpreterPrecode(instrPointer);
default:
break;
}
Expand All @@ -299,6 +326,33 @@ After the initial precode type is determined, for stub precodes a refined precod

return precode.GetMethodDesc(_target, MachineDescriptor);
}

// Returns the interpreter bytecode address if the entry point is an interpreter precode,
// otherwise returns the original entry point unchanged.
// This method never throws - on any failure, the original address is returned.
TargetCodePointer IPrecodeStubs.GetInterpreterCodeFromInterpreterPrecodeIfPresent(TargetCodePointer entryPoint)
{
try
{
TargetPointer instrPointer = CodePointerReadableInstrPointer(entryPoint);
if (!IsAlignedInstrPointer(instrPointer))
return entryPoint;

if (TryGetKnownPrecodeType(instrPointer) is not KnownPrecodeType.Interpreter)
return entryPoint;

TargetPointer dataAddr = instrPointer + MachineDescriptor.StubCodePageSize;
Data.InterpreterPrecodeData precodeData = // read InterpreterPrecodeData at dataAddr
if (precodeData.ByteCodeAddr == TargetPointer.Null)
return entryPoint;

return new TargetCodePointer(precodeData.ByteCodeAddr);
}
catch
{
return entryPoint;
}
}
```

### `GetPrecodeEntryPointFromInteriorAddress`
Expand Down
Loading
Loading