Skip to content

Minor XMLoadFloat3A/4x3(A) SSE4 optimization#307

Merged
walbourn merged 3 commits into
mainfrom
sse4loadopt
May 14, 2026
Merged

Minor XMLoadFloat3A/4x3(A) SSE4 optimization#307
walbourn merged 3 commits into
mainfrom
sse4loadopt

Conversation

@walbourn
Copy link
Copy Markdown
Member

No description provided.

@walbourn walbourn requested review from billkris-ms and jenatali May 14, 2026 04:58
@walbourn walbourn self-assigned this May 14, 2026
@walbourn walbourn linked an issue May 14, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Member

@jenatali jenatali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, the zero register doesn't need to come from memory and you probably even already have a zero'd register somewhere so using the imm 3 bits instead of constructing a full 3-lane mask from memory is probably a nice win.

Comment thread Inc/DirectXMathConvert.inl
@walbourn walbourn changed the title Minor XMLoadFloat3A SSE4 optimization Minor XMLoadFloat3A/4x3(A) SSE4 optimization May 14, 2026
@walbourn
Copy link
Copy Markdown
Member Author

I looked in a few other places where I use the same mask. Most of those case already have SSE4 replacements that use different instructions.

@walbourn walbourn merged commit 74a0f33 into main May 14, 2026
233 checks passed
@walbourn walbourn deleted the sse4loadopt branch May 14, 2026 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Alternative XMLoadFloat3A implementation

2 participants