Skip to content

Fix metal::vec namespace and metal_stdlib include for Metal Toolchain…#3336

Open
adamSellers wants to merge 2 commits intoml-explore:mainfrom
adamSellers:fix/metal-toolchain-32023-macos26
Open

Fix metal::vec namespace and metal_stdlib include for Metal Toolchain…#3336
adamSellers wants to merge 2 commits intoml-explore:mainfrom
adamSellers:fix/metal-toolchain-32023-macos26

Conversation

@adamSellers
Copy link
Copy Markdown

@adamSellers adamSellers commented Mar 30, 2026

… 32023 (macOS 26)

Metal Toolchain 32023 (Xcode 26, macOS 26 Tahoe) introduced two breaking changes:

  1. vec is no longer in the global namespace - must be qualified as metal::vec
  2. <metal_math> no longer transitively includes <metal_stdlib>, so bfloat (aliased as bfloat16_t in bf16.h) is not in scope without an explicit include

Without this fix MLX silently falls back to CPU dispatch on macOS 26, resulting in ~50% of expected throughput despite mx.default_device() reporting Device(gpu, 0).

Tested on M3 Ultra, macOS 26.4 (25E246), Metal Toolchain 32023.883:

  • Before: ~17 tok/sec generation, GPU 2% active
  • After: ~35 tok/sec generation, GPU 100% active @ 1380MHz

Proposed changes

Please include a description of the problem or feature this PR is addressing. If there is a corresponding issue, include the issue #.

Fixes #3337

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

… 32023 (macOS 26)

Metal Toolchain 32023 (Xcode 26, macOS 26 Tahoe) introduced two breaking changes:

1. vec is no longer in the global namespace - must be qualified as metal::vec
2. <metal_math> no longer transitively includes <metal_stdlib>, so bfloat
   (aliased as bfloat16_t in bf16.h) is not in scope without an explicit include

Without this fix MLX silently falls back to CPU dispatch on macOS 26, resulting
in ~50% of expected throughput despite mx.default_device() reporting Device(gpu, 0).

Tested on M3 Ultra, macOS 26.4 (25E246), Metal Toolchain 32023.883:
- Before: ~17 tok/sec generation, GPU 2% active
- After:  ~35 tok/sec generation, GPU 100% active @ 1380MHz
@adamSellers
Copy link
Copy Markdown
Author

No tests added for this fix — the bug manifests at Metal shader compilation time (runtime JIT), which requires macOS 26 + Metal Toolchain 32023 to reproduce. There's currently no CI runner for macOS 26, so this has been validated manually on M3 Ultra / macOS 26.4 (25E246) / Metal Toolchain 32023.883. Happy to add a test if the team can point me to where Metal compilation is exercised in the test suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Metal] MLX fails to build metal library on macOS 26 Tahoe / Metal Toolchain 32023

1 participant