-
Notifications
You must be signed in to change notification settings - Fork 71
Add intrinsic support for the range prefetch (RPRFM) instruction #423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
8ab80d6
d9b772a
a3a51f8
4616a23
e53af76
3513fc7
cb32719
25ea5c8
f6f04b5
11a6ba8
923d671
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -471,6 +471,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin | |
|
|
||
| * Added support for modal 8-bit floating point matrix multiply-accumulate widening intrinsics. | ||
| * Added support for 16-bit floating point matrix multiply-accumulate widening intrinsics. | ||
| * Added support for range prefetch intrinsic when `__ARM_FEATURE_RPRFM` is defined. | ||
|
|
||
| ### References | ||
|
|
||
|
|
@@ -3613,6 +3614,79 @@ values. | |
| | KEEP | 0 | Temporal fetch of the addressed location (that is, allocate in cache normally) | | ||
| | STRM | 1 | Streaming fetch of the addressed location (that is, memory used only once) | | ||
|
|
||
| The following intrinsic is also available when `__ARM_FEATURE_RPRFM` is defined: | ||
|
|
||
| ``` c | ||
| void __pldx_range(/*constant*/ unsigned int /*access_kind*/, | ||
| /*constant*/ unsigned int /*retention_policy*/, | ||
| /*constant*/ size_t /*reuse distance*/, | ||
| /*constant*/ signed int /*stride*/, | ||
| /*constant*/ unsigned int /*count*/, | ||
| /*constant*/ signed int /*length*/, | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not a strongly held view, but after reading the description of
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be better if the order is consistent for both of the intrinsics, and it makes sense to keep the ordering from bits 0-63 used by |
||
| void const volatile *addr); | ||
| ``` | ||
|
|
||
| Generates a data prefetch instruction for a range of addresses starting from a | ||
| given base address. Locations within the specified address ranges are prefetched | ||
| into one or more caches. This intrinsic allows the specification of the | ||
| expected access kind (read or write), the data retention policy (temporal or | ||
| streaming) and the reuse distance, stride, count and length metadata values. | ||
|
|
||
| The access kind and data retention policy arguments can only be one of the | ||
| following values. | ||
|
|
||
| | **Access Kind** | **Value** | **Summary** | | ||
| | --------------- | --------- | ---------------------------------------- | | ||
| | PLD | 0 | Fetch the addressed location for reading | | ||
| | PST | 1 | Fetch the addressed location for writing | | ||
|
|
||
| | **Retention Policy** | **Value** | **Summary** | | ||
| | -------------------- | --------- | -------------------------------------------------------------------------- | | ||
| | KEEP | 0 | Temporal fetch of the addressed location (that is, allocate in cache normally) | | ||
| | STRM | 1 | Streaming fetch of the addressed location (that is, memory used only once) | | ||
|
|
||
| The table below describes the ranges of the reuse distance, stride, count and length arguments. | ||
|
|
||
| | **Metadata** | **Range** | **Summary** | | ||
| | -------------- | ------------------- | -------------------------------------------------------------------- | | ||
| | Reuse Distance | | Maximum number of bytes to be accessed before executing the next | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we also add the Range here for Reuse Distance?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @CarolineConcatto, the reason for removing the Reuse Distance range was discussed here: #423 (comment) |
||
| | | | RPRFM instruction that specifies the same range. All values are | | ||
| | | | rounded up to the nearest power of 2 in the range 32KiB to 512MiB. | | ||
| | | | Values exceeding the maximum of 512MiB will be represented by 0, | | ||
| | | | indicating distance not known. | | ||
| | | | Note: This value is ignored if a streaming prefetch is specified. | | ||
| | Stride | [-2MiB, +2MiB) | Number of bytes to advance the block address by after `Length` | | ||
| | | | bytes have been accessed. Note: This value is ignored if Count is 1. | | ||
| | Count | [1, 65536] | Number of blocks to be accessed. | | ||
| | Length | [-2MiB, +2MiB) | Number of contiguous bytes to be accessed. | | ||
|
|
||
| ``` c | ||
| void __pld_range(/*constant*/ unsigned int /*access_kind*/, | ||
| /*constant*/ unsigned int /*retention_policy*/, | ||
| unsigned long /*metadata*/, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I realize this is quite late to request a change, sorry about that, but is it worth using uint64_t for metadata instead of unsigned long to avoid any issues with targets where long != 64 bits?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or long long, happy with either :)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No problem, I think it makes sense to change it. Metadata is now using
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! |
||
| void const volatile *addr); | ||
| ``` | ||
|
|
||
| Generates a data prefetch instruction for a range of addresses starting from a | ||
| given base address. Locations within the specified address ranges are prefetched | ||
| into one or more caches. The access kind and retention policy arguments can | ||
| have the same values as in `__pldx_range`. The bits of the metadata argument | ||
| are interpreted as follows: | ||
|
|
||
| | **Metadata** | **Bits** | **Range** | **Summary** | | ||
| | -------------- | -------- | --------------- | ------------------------------------------------------------ | | ||
| | Length | 0-21 | [-2MiB, +2MiB) | Signed integer representing the number of contiguous | | ||
| | | | | bytes to be accessed. | | ||
| | Count | 37-22 | [0, 65535] | Unsigned integer representing number of blocks of data | | ||
| | | | | to be accessed, minus 1. | | ||
| | Stride | 59-38 | [-2MiB, +2MiB) | Signed integer representing the number of bytes to advance | | ||
| | | | | the block address by after `Length` bytes have been | | ||
| | | | | accessed. This value is ignored if Count is 0. | | ||
| | Reuse Distance | 63-60 | [0, 15] | Indicates the maximum number of bytes to be accessed before | | ||
| | | | | executing the next RPRFM instruction that specifies the same | | ||
| | | | | range. Bits encode decreasing powers of two in the range | | ||
| | | | | 1 (512MiB) to 15 (32KiB). 0 indicates distance not known. | | ||
|
|
||
| ### Instruction prefetch | ||
|
|
||
| ``` c | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the expectation for the builtin to fail compilation when this feature is not present? or be treated as a NOP? The existing
_plddocumentation suggests the latter.I guess it comes down to whether the builtin provides useful information that can be used even if RPRFM is not available (e.g. by emitting other prefetch instructions, or non-temporal accesses).
The reason I mention this is because if there's only a weak link to the underlying instruction then we'll not want to use a feature macro, but instead something that just says the builtin is available, like __ARM_PREFIX_RANGE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the existing documentation above the
_pldbuiltin applies here too, in that the range builtin should not fail when the feature is not present and is treated as a nop.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think I agree with Paul, I think we should have a __ARM_PREFETCH_RANGE macro to tell the user if this part of the ACLE is currently implemented for the compiler and is available. Then it can be used for conditional compilation?
I think we could also have a __ARM_FEATURE_RPRFM macro, to indicate if the actual feat is enabled though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this to describe the
__ARM_PREFETCH_RANGEmacro.@AlfGalf Given that the
__ARM_FEATURE_RPRFMmacro should be used to query whether this is implemented, I'm not sure why we would also want to add__ARM_FEATURE_RPRFMif the instruction does not require something like+rprfm?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support @kmclaughlin-arm , the FEAT_RPRFM is OPTIONAL from Armv8.0, and not guarded by any dedicated feature flag, moreover it looks like RPRFM is a preferred disassembly for PRFM with UXTW, as guarded by this test https://github.com/llvm/llvm-project/blob/f9a3076180bfa1ebffd518971eefd30939a5ced6/llvm/test/MC/AArch64/rprfm.s#L273 , so not sure what we would achieve with having __ARM_FEATURE_RPRFM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, okay, agreed best to not have __ARM_FEATURE_RPRFM