Skip to content

Fix default token metadata for high language IDs#319144

Open
EduardF1 wants to merge 2 commits into
microsoft:mainfrom
EduardF1:fix-319118-languageid-mask
Open

Fix default token metadata for high language IDs#319144
EduardF1 wants to merge 2 commits into
microsoft:mainfrom
EduardF1:fix-319118-languageid-mask

Conversation

@EduardF1
Copy link
Copy Markdown

Fixes #319118

Masks the encoded language ID before packing default token metadata so IDs above 255 cannot spill into the token type bits. Adds a regression test that exercises a language ID >= 256 and verifies the default token type stays Other.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 30, 2026 14:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes incorrect default token metadata generation for large language IDs by preventing language ID bits from bleeding into token type bits, and adds a regression test for the scenario.

Changes:

  • Mask topLevelLanguageId when composing default token metadata to keep token type bits correct.
  • Add regression test covering high language ID values and validating StandardTokenType.Other is preserved.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/vs/editor/common/tokens/contiguousTokensStore.ts Masks language ID bits when computing default metadata to avoid corrupting token type bits.
src/vs/editor/test/common/model/tokensStore.test.ts Adds regression test ensuring default metadata yields StandardTokenType.Other for high language IDs.

Comment on lines +592 to +599
let languageId = '';
for (let i = 0; i < 255; i++) {
languageId = `language-${i}`;
codec.register(languageId);
}

const encodedLanguageId = codec.encodeLanguageId(languageId);
assert.ok(encodedLanguageId >= 256);
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test to register languages until the encoded ID exceeds MetadataConsts.LANGUAGEID_MASK, so it no longer depends on a hard-coded registration count or codec starting index. Pushed in 31e4cd0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@EduardF1
Copy link
Copy Markdown
Author

EduardF1 commented Jun 1, 2026

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-completion disabled for languages with ID >= 256 due to LanguageID bits overlapping with TokenType bits

3 participants