Skip to content

[PPC64LE_DYNAREC] Add more opcodes#3720

Open
runlevel5 wants to merge 1 commit intoptitSeb:mainfrom
runlevel5:ppc64le-more-opcodes
Open

[PPC64LE_DYNAREC] Add more opcodes#3720
runlevel5 wants to merge 1 commit intoptitSeb:mainfrom
runlevel5:ppc64le-more-opcodes

Conversation

@runlevel5
Copy link
Copy Markdown
Contributor

  • MOVS, CMPS, STOS, LODS, SCAS (with REP/REPNZ)
  • TEST/NOT/NEG/MUL/IMUL/DIV/IDIV
  • INC/DEC/CALL/JMP/PUSH
  • PUSHF/POPF, SAHF/LAHF, CMC/CLC/STC/CLD/STD
  • Interrupts: INT 3, INT n, INTO
  • CLI/STI
  • emit helpers: emit_neg8/16/32, emit_inc8/16/32, emit_dec8/16/32
  • helper macros: GETEDz, GETEDH, GETSED, GETDIR

- MOVS, CMPS, STOS, LODS, SCAS (with REP/REPNZ)
- TEST/NOT/NEG/MUL/IMUL/DIV/IDIV
- INC/DEC/CALL/JMP/PUSH
- PUSHF/POPF, SAHF/LAHF, CMC/CLC/STC/CLD/STD
- Interrupts: INT 3, INT n, INTO
- CLI/STI

New emit helpers: emit_neg8/16/32, emit_inc8/16/32, emit_dec8/16/32
New helper macros: GETEDz, GETEDH, GETSED, GETDIR
@ptitSeb ptitSeb requested a review from ksco March 27, 2026 12:47
CBZ_NEXT(xRCX);
ANDI(x1, xFlags, 1 << F_DF);
BNEZ_MARK2(x1);
// special optim for large RCX value on forward case only
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there any risk of SIGBUS with ppc64le when acceding graphic memory (or some other hardware related memory)? Becuse on ARM64 & RISC-V, an "Aligned" path is needed there.

}
}
IFX (X_SF) {
SRDI(s3, s1, rex.w ? 63 : 31);
Copy link
Copy Markdown
Owner

@ptitSeb ptitSeb Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using

SRDI(s3, s1, (rex.w ? 63 : 31)-F_SF);
ANDI(s3, s3, 1<<F_SF);
ORI(xFlags, xFlags, s3):

and using the same trick as for the 8bits & 16bits version?

(I assume it's like that on the LA64/RV64 counterpart maybe?)

}
}
IFX (X_SF) {
SRDI(s3, s1, rex.w ? 63 : 31);
Copy link
Copy Markdown
Owner

@ptitSeb ptitSeb Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, why the version with the BEQ instead of the jumpless, shorter version?

}
}
IFX (X_SF) {
ANDId(s2, s1, 0x80);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. I guess there might be a lot of places that can be optimized.

}
}
IFX (X_SF) {
BF_EXTRACT(s5, s1, 15, 15);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here is a 3rd whay to extract SF flags, in only 2 opcodes jumpless now?!

@ptitSeb
Copy link
Copy Markdown
Owner

ptitSeb commented Mar 28, 2026

I'm ok with the PR but, it would be nice to get an homogeneous way to compute SF flags, and in the fastest/smaller way possible (so the extract/insert bit way, that is 2 opcodes?)

Also, make sure Hardware Memory doesn't have an alignment constraint on PPC64LE, else, the optimise REP MOVSB will need an unligned path like on Arm64 & RV64.

Last point: if PPC64LE have flags, then you might want to look at ARM64 way of handling native flags as a match for x86 flags. It can come later of course, but it's a good source of speedup. But again, that can comes much later, as it can be complex to get stable.

Copy link
Copy Markdown
Owner

@ptitSeb ptitSeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please get SF computation the fastest possible and always the same method if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants