Skip to content

Surrogate pair characters (emoji, rare CJK, etc.) break a, r, s, ~, ga commandsΒ #9931

@k1832

Description

@k1832

Describe the bug

When the cursor is on a character encoded as a UTF-16 surrogate pair β€” emojis (πŸ˜„), rare CJK characters (π©Έ½), musical symbols (π„ž), etc. β€” several character-level commands produce incorrect results.

Command Broken behavior
a (Append) Cursor lands before the character instead of after
r (Replace) Only replaces half the character, corrupting the text
s (Change char) Only deletes half the character before entering Insert mode
~ (Toggle case) Corrupts the character into a lone surrogate
ga (Unicode info) Shows the half-surrogate value instead of the full codepoint

To Reproduce

  1. Open a file containing πŸ˜„text
  2. Place cursor on πŸ˜„
  3. Press a (append), type !, press Esc
  4. See !πŸ˜„text β€” ! inserted before the emoji instead of after

Expected behavior

πŸ˜„!text β€” ! should be inserted after the emoji.

Environment (please complete the following information):

  • Extension (VsCodeVim) version: 1.32.4
  • VSCode version: 1.109.0
  • OS: Ubuntu 24.04

Additional context

position.getRight() increments by 1 UTF-16 code unit, but these characters are 2 code units (a surrogate pair). Moving by 1 lands between the pair, and VSCode's validatePosition clamps it back to the start.

x/X, l/h, and y already have surrogate boundary correction and work correctly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions