Conversation
runlevel5
commented
Mar 27, 2026
- MOVS, CMPS, STOS, LODS, SCAS (with REP/REPNZ)
- TEST/NOT/NEG/MUL/IMUL/DIV/IDIV
- INC/DEC/CALL/JMP/PUSH
- PUSHF/POPF, SAHF/LAHF, CMC/CLC/STC/CLD/STD
- Interrupts: INT 3, INT n, INTO
- CLI/STI
- emit helpers: emit_neg8/16/32, emit_inc8/16/32, emit_dec8/16/32
- helper macros: GETEDz, GETEDH, GETSED, GETDIR
- MOVS, CMPS, STOS, LODS, SCAS (with REP/REPNZ) - TEST/NOT/NEG/MUL/IMUL/DIV/IDIV - INC/DEC/CALL/JMP/PUSH - PUSHF/POPF, SAHF/LAHF, CMC/CLC/STC/CLD/STD - Interrupts: INT 3, INT n, INTO - CLI/STI New emit helpers: emit_neg8/16/32, emit_inc8/16/32, emit_dec8/16/32 New helper macros: GETEDz, GETEDH, GETSED, GETDIR
| CBZ_NEXT(xRCX); | ||
| ANDI(x1, xFlags, 1 << F_DF); | ||
| BNEZ_MARK2(x1); | ||
| // special optim for large RCX value on forward case only |
There was a problem hiding this comment.
Isn't there any risk of SIGBUS with ppc64le when acceding graphic memory (or some other hardware related memory)? Becuse on ARM64 & RISC-V, an "Aligned" path is needed there.
| } | ||
| } | ||
| IFX (X_SF) { | ||
| SRDI(s3, s1, rex.w ? 63 : 31); |
There was a problem hiding this comment.
Why not using
SRDI(s3, s1, (rex.w ? 63 : 31)-F_SF);
ANDI(s3, s3, 1<<F_SF);
ORI(xFlags, xFlags, s3):
and using the same trick as for the 8bits & 16bits version?
(I assume it's like that on the LA64/RV64 counterpart maybe?)
| } | ||
| } | ||
| IFX (X_SF) { | ||
| SRDI(s3, s1, rex.w ? 63 : 31); |
There was a problem hiding this comment.
same, why the version with the BEQ instead of the jumpless, shorter version?
| } | ||
| } | ||
| IFX (X_SF) { | ||
| ANDId(s2, s1, 0x80); |
There was a problem hiding this comment.
Same here. I guess there might be a lot of places that can be optimized.
| } | ||
| } | ||
| IFX (X_SF) { | ||
| BF_EXTRACT(s5, s1, 15, 15); |
There was a problem hiding this comment.
And here is a 3rd whay to extract SF flags, in only 2 opcodes jumpless now?!
|
I'm ok with the PR but, it would be nice to get an homogeneous way to compute SF flags, and in the fastest/smaller way possible (so the extract/insert bit way, that is 2 opcodes?) Also, make sure Hardware Memory doesn't have an alignment constraint on PPC64LE, else, the optimise REP MOVSB will need an unligned path like on Arm64 & RV64. Last point: if PPC64LE have flags, then you might want to look at ARM64 way of handling native flags as a match for x86 flags. It can come later of course, but it's a good source of speedup. But again, that can comes much later, as it can be complex to get stable. |
ptitSeb
left a comment
There was a problem hiding this comment.
Please get SF computation the fastest possible and always the same method if possible.