You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: Expand documentation for Metal backend support
- Add complete Metal backend API documentation (docs/api/backends/metal.md)
- Update exception docs with MetalError and MSLCompilationError
- Update accelerator docs with METAL DeviceType and metal_available
- Update unified-buffer docs with .metal property and mark_metal_dirty()
- Update API index with MetalBackend in package structure
- Update home page with Metal features and installation tabs
- Update installation guide with Metal requirements and options
- Update memory-management article with Metal unified memory notes
- Update gpu-computing article with multi-backend examples
- Update gpu-optimization guide with Metal-specific tips
- Update mkdocs.yml navigation to include Metal backend page
- Fix version mismatch: update __version__ to 0.2.0 in __init__.py
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/api/core/unified-buffer.md
+54-5Lines changed: 54 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ Host-device memory abstraction with lazy synchronization.
4
4
5
5
## Overview
6
6
7
-
`UnifiedBuffer` provides a unified view of memory that can exist on both host (CPU)and device (GPU). It tracks which copy is current and automatically synchronizes when needed.
7
+
`UnifiedBuffer` provides a unified view of memory that can exist on host (CPU), CUDA device, and Metal device (macOS). It tracks which copy is current and automatically synchronizes when needed.
8
8
9
9
```python
10
10
from pydotcompute import UnifiedBuffer
@@ -79,13 +79,26 @@ def host(self) -> np.ndarray:
79
79
@property
80
80
defdevice(self) -> Any:
81
81
"""
82
-
Get device (GPU) view of data.
82
+
Get device (CUDA GPU) view of data.
83
83
84
84
Automatically syncs from host if host is dirty.
85
85
Returns CuPy array if CUDA available, else NumPy array.
86
86
"""
87
87
```
88
88
89
+
### metal
90
+
91
+
```python
92
+
@property
93
+
defmetal(self) -> Any:
94
+
"""
95
+
Get Metal (Apple GPU) view of data.
96
+
97
+
Automatically syncs from host if host is dirty.
98
+
Returns MLX array if Metal available (macOS only).
-**CPU**: Fallback simulation for development/testing
12
+
7
13
### CPU vs GPU
8
14
9
15
```
@@ -173,23 +179,46 @@ Efficiency: 10/12 = 83%
173
179
174
180
### Unified Memory
175
181
176
-
PyDotCompute's `UnifiedBuffer`uses CUDA Unified Memory:
182
+
PyDotCompute's `UnifiedBuffer`abstracts memory across backends:
177
183
178
-
```python
179
-
from pydotcompute import UnifiedBuffer
184
+
=== "CUDA"
180
185
181
-
# Single buffer, accessible from both host and device
182
-
buf = UnifiedBuffer((1000,), dtype=np.float32)
186
+
```python
187
+
from pydotcompute import UnifiedBuffer
183
188
184
-
# Host access
185
-
buf.host[:] = data # Automatic page migration
189
+
# Single buffer, accessible from both host and device
190
+
buf = UnifiedBuffer((1000,), dtype=np.float32)
186
191
187
-
# Device access
188
-
result = kernel(buf.device) # Data migrates to GPU
192
+
# Host access
193
+
buf.host[:] = data # Automatic page migration
189
194
190
-
# Host access again
191
-
output = buf.host[:] # Data migrates back
192
-
```
195
+
# Device access
196
+
result = kernel(buf.device) # Data migrates to GPU
197
+
198
+
# Host access again
199
+
output = buf.host[:] # Data migrates back
200
+
```
201
+
202
+
=== "Metal (macOS)"
203
+
204
+
```python
205
+
from pydotcompute import UnifiedBuffer
206
+
207
+
# On Apple Silicon, memory is truly unified
208
+
buf = UnifiedBuffer((1000,), dtype=np.float32)
209
+
210
+
# Host access
211
+
buf.host[:] = data
212
+
213
+
# Metal access (no physical transfer needed!)
214
+
metal_array = buf.metal # Returns MLX array
215
+
216
+
# CPU and GPU share the same physical memory
217
+
output = buf.host[:] # Virtually free
218
+
```
219
+
220
+
!!! tip "Apple Silicon Advantage"
221
+
Apple Silicon's unified memory architecture means CPU and GPU share the same physical memory. This eliminates the traditional host-device transfer bottleneck, making Metal particularly efficient for streaming workloads.
193
222
194
223
## Kernel Launch Overhead
195
224
@@ -276,6 +305,7 @@ PyDotCompute addresses these GPU challenges:
0 commit comments