Status: Planning Phase
Priority: Medium to High
Target: Complete feature set for production DirectStorage-style runtime
What Exists:
RequestStatusenum with values:Pending,InProgress,Complete,Failed- No
Cancelledstatus - No
cancel()method on Queue - No cancellation support in backends
What's Missing:
- ❌
RequestStatus::Cancelledenum value - ❌
Queue::cancel_request(request_id)method - ❌ In-flight request tracking for cancellation
- ❌ Backend cancellation hooks
- ❌ Race condition handling (completion vs cancellation)
Use Cases:
- Timeout: Cancel requests that take too long
- User Action: User cancels loading operation
- Priority Change: Cancel low-priority work to start high-priority
- Shutdown: Cancel all in-flight requests on cleanup
Semantics:
// Strong guarantee: Request will not complete after cancel
bool cancel(request_id);
// Weak guarantee: Request may complete, but won't invoke callback
bool try_cancel(request_id);Add to Request:
struct Request {
// Existing fields...
std::atomic<bool> cancellation_requested = false;
request_id_t id = 0; // Unique ID for tracking
};Add to Queue:
class Queue {
public:
// Cancel specific request (returns true if cancelled before completion)
bool cancel_request(request_id_t id);
// Cancel all pending requests (not yet submitted)
size_t cancel_all_pending();
// Cancel all requests (including in-flight)
size_t cancel_all();
};Add Status:
enum class RequestStatus {
Pending,
InProgress,
Complete,
Failed,
Cancelled // NEW
};Request Tracking:
class Queue::Impl {
std::unordered_map<request_id_t, Request*> active_requests_;
std::mutex active_mutex_;
std::atomic<request_id_t> next_id_{1};
};
void Queue::enqueue(Request& req) {
req.id = impl_->next_id_.fetch_add(1);
{
std::lock_guard lock(impl_->active_mutex_);
impl_->active_requests_[req.id] = &req;
}
// ... existing enqueue logic
}Cancellation:
bool Queue::cancel_request(request_id_t id) {
std::lock_guard lock(impl_->active_mutex_);
auto it = impl_->active_requests_.find(id);
if (it == impl_->active_requests_.end()) {
return false; // Already completed or never existed
}
Request* req = it->second;
// Mark as cancellation requested
req->cancellation_requested.store(true, std::memory_order_release);
// If still pending (not submitted), remove immediately
if (req->status == RequestStatus::Pending) {
req->status = RequestStatus::Cancelled;
impl_->active_requests_.erase(it);
return true;
}
// If in-flight, backend must handle cancellation
// Return false to indicate "in progress, might complete"
return false;
}CPU Backend:
void CpuBackend::process_request(Request& req) {
// Check cancellation before I/O
if (req.cancellation_requested.load(std::memory_order_acquire)) {
req.status = RequestStatus::Cancelled;
req.callback(&req);
return;
}
// Perform I/O
ssize_t bytes = pread(req.fd, req.dst, req.size, req.offset);
// Check cancellation after I/O (before callback)
if (req.cancellation_requested.load(std::memory_order_acquire)) {
req.status = RequestStatus::Cancelled;
req.callback(&req);
return;
}
// Normal completion
req.status = RequestStatus::Complete;
req.callback(&req);
}Vulkan Backend:
// Harder to cancel GPU work in progress
// Strategy: Don't invoke callback if cancelled
void VulkanBackend::complete_request(Request& req) {
if (req.cancellation_requested.load(std::memory_order_acquire)) {
req.status = RequestStatus::Cancelled;
}
req.callback(&req);
}io_uring Backend:
// Can cancel SQE before submission
bool IoUringBackend::cancel_sqe(Request& req) {
// Remove from pending queue if not yet submitted
std::lock_guard lock(pending_mutex_);
auto it = std::find(pending_.begin(), pending_.end(), &req);
if (it != pending_.end()) {
pending_.erase(it);
req.status = RequestStatus::Cancelled;
return true;
}
return false; // Already submitted
}Test Cases:
- Cancel pending request (before submit)
- Cancel in-flight request (during I/O)
- Cancel completed request (should fail)
- Cancel non-existent request (should fail)
- Race: cancel vs completion
- Cancel all requests
- Callback not invoked for cancelled request
Test File: tests/cancellation_test.cpp
2-3 weeks for complete cancellation support
Goal: Zero-copy disk → GPU data path
Traditional Path (current):
Disk → Host Staging Buffer → GPU Buffer
[copy 1] [copy 2]
GPU-Resident Path (target):
Disk → GPU Buffer (direct)
[copy 1 only]
Microsoft DirectStorage Concept:
- GPU upload heap: CPU-visible, GPU-accessible memory
- Direct writes from storage controller to GPU memory
- Requires hardware support (PCIe peer-to-peer, GPU Direct Storage)
Linux Equivalent:
- NVIDIA GPUDirect Storage: Kernel driver enables direct NVMe → GPU transfers
- AMD equivalent: DirectGMA (less documented)
- Standard Vulkan: No direct disk → GPU (must use staging)
Approach: Staging buffer + GPU copy (already implemented)
Pros:
- Works on all Vulkan hardware
- Portable across vendors
- Already implemented
Cons:
- Extra copy (staging → GPU)
- Higher latency
- More memory usage
Approach: Integrate with vendor-specific APIs
NVIDIA GPUDirect Storage:
// Open file with GDS flags
int fd = open(path, O_RDONLY | O_DIRECT);
// Register GPU buffer with GDS
cuFileDriverOpen();
CUfileHandle_t handle;
cuFileHandleRegister(&handle, &cufile_desc);
// Direct read to GPU memory
cuFileRead(handle, gpu_buffer, size, offset, 0);Pros:
- Zero extra copies
- Lowest latency
- Highest throughput
Cons:
- NVIDIA-only (no AMD/Intel equivalent)
- Requires special driver setup
- O_DIRECT alignment requirements
- Complex integration
Approach: mmap file, map to GPU upload heap
Implementation:
// Map file to host memory
void* mapped = mmap(nullptr, file_size, PROT_READ,
MAP_SHARED, fd, 0);
// Allocate GPU upload heap (CPU-visible, GPU-accessible)
VkBuffer upload_buffer = create_upload_buffer(device);
void* gpu_mapped = map_buffer(upload_buffer);
// Copy file data to upload heap
memcpy(gpu_mapped, mapped, file_size);
// Unmap
munmap(mapped, file_size);
unmap_buffer(upload_buffer);
// Use upload buffer directly in GPU (no staging copy needed)Pros:
- Simpler than GDS
- Works across vendors
- Reduces staging buffer usage
Cons:
- Still one copy (mmap → GPU)
- Page cache overhead
- Not true "direct to GPU"
Phase 1: Optimize Current Path
- Reuse staging buffers (pool)
- Async staging → GPU copy (don't wait)
- Batch multiple requests
Phase 2: Vendor-Specific Paths (Optional)
- Add GDS backend for NVIDIA
- Conditional compilation (#ifdef NVIDIA_GDS)
- Fallback to standard path
Phase 3: Future Hardware
- Wait for standardized GPU Direct Storage in Vulkan
- Integrate when available
Use Case: Texture decompression GPU → GPU
Current Path:
Disk → Staging → GPU Compressed Buffer → GPU Decompressed Buffer
[compute shader]
Optimization:
Disk → GPU Compressed Buffer → GPU Decompressed Buffer
[single command buffer]
Implementation: Already supported via Vulkan backend + compute pipelines
Benchmarks:
- Staging vs direct (if GDS available)
- Throughput (MB/s)
- Latency (ms)
- CPU overhead (%)
Validation:
- Data integrity (checksums)
- Memory usage
- GPU utilization
Phase 1 (Optimization): 2 weeks
Phase 2 (GDS Integration): 4-6 weeks (if needed)
Goal: Enable Windows DirectStorage games to run on Linux via Proton
Strategy:
Windows Game (DirectStorage API)
↓
Wine/Proton dstorage.dll Shim
↓ (translate calls)
ds-runtime (Linux native)
↓ (execute)
Linux Kernel (io_uring, Vulkan)
Architecture:
dstorage.dll (PE) - Windows ABI
↓ dlopen
libds_runtime.so - Linux ABI
Implementation:
- Create
dstorage.dll(Wine builtin DLL) - Implement DirectStorage API entry points
- Forward to
libds_runtime.sovia C ABI - Translate types (HANDLE → fd, etc.)
Example:
// dstorage.dll (Wine)
HRESULT WINAPI DStorageCreateQueue(
const DSTORAGE_QUEUE_DESC* desc,
REFIID riid,
void** ppv
) {
// Load libds_runtime.so
void* handle = dlopen("libds_runtime.so", RTLD_NOW);
// Get C API functions
auto ds_create_queue = (ds_queue_t* (*)(ds_backend_t*))
dlsym(handle, "ds_create_queue");
// Create backend
ds_backend_t* backend = ds_make_cpu_backend();
// Create queue
ds_queue_t* queue = ds_create_queue(backend);
// Wrap in COM object
*ppv = new DStorageQueueImpl(queue);
return S_OK;
}Architecture:
Wine/Proton DirectStorage Implementation
↓ (link directly)
libds_runtime_static.a
Implementation:
- Build ds-runtime as static library
- Link into Wine dlls/dstorage build
- Call C++ API directly (no PE/ELF bridge)
- Share Vulkan device with vkd3d-proton
Pros:
- No dlopen overhead
- Simpler debugging
- Shared Vulkan context
Cons:
- Tighter coupling
- Requires Wine build modifications
Architecture:
DirectStorage Requests
↓
ioctl to kernel module
↓
Kernel-side I/O handling
Not Recommended: Too complex, overkill for userspace I/O
Windows → Linux Translation:
| Windows Type | Linux Type | Conversion |
|---|---|---|
HANDLE |
int |
fd = _open_osfhandle(handle) |
DSTORAGE_REQUEST |
ds_request |
Struct field mapping |
ID3D12Resource* |
VkBuffer |
vkd3d-proton interop |
DSTORAGE_COMPRESSION |
ds_compression_t |
Enum mapping |
OVERLAPPED |
Completion callback | Async model |
Example Struct Mapping:
void translate_request(
const DSTORAGE_REQUEST_DESC* windows_req,
ds_request* linux_req
) {
linux_req->fd = get_fd_from_handle(windows_req->Source.File.Handle);
linux_req->offset = windows_req->Source.File.Offset;
linux_req->size = windows_req->Source.File.Size;
linux_req->dst = get_buffer_pointer(windows_req->Destination);
linux_req->op = (windows_req->DestinationType == DSTORAGE_REQUEST_DESTINATION_MEMORY)
? DS_REQUEST_OP_READ : DS_REQUEST_OP_WRITE;
linux_req->compression = translate_compression(
windows_req->CompressionFormat
);
}Challenge: DirectStorage expects D3D12 device, we need Vulkan
Solution: vkd3d-proton already handles D3D12 → Vulkan translation
Integration:
// Get Vulkan device from vkd3d-proton
VkDevice vk_device = vkd3d_get_vk_device(d3d12_device);
VkQueue vk_queue = vkd3d_get_vk_queue(d3d12_device);
// Create ds-runtime Vulkan backend with shared device
ds_vulkan_backend_config config;
config.device = vk_device;
config.queue = vk_queue;
config.take_ownership = false; // Don't destroy device
ds_backend_t* backend = ds_make_vulkan_backend(&config);Status: ✅ Complete
include/ds_runtime_c.hprovides C API- Type conversions implemented
- Tested with
c_abi_stats_test.c
Location: Outside ds-runtime repo (in Wine tree)
Files:
dlls/dstorage/
├── Makefile.in
├── dstorage.spec
├── dstorage_main.c
├── queue.c
└── request.c
Implement:
DStorageGetFactoryDStorageSetConfigurationIDStorageFactory::CreateQueueIDStorageQueue::EnqueueRequestIDStorageQueue::SubmitIDStorageQueue::EnqueueSignal
Option A: Dynamic Linking
EXTRADLLFLAGS = -Wl,--no-undefined
EXTRALIBS = -lds_runtimeOption B: Static Linking
EXTRALIBS = $(LIBDS_RUNTIME_STATIC)Test Titles:
- Forspoken (uses DirectStorage)
- Ratchet & Clank: Rift Apart
- Any UE5 game with DirectStorage support
Validation:
- Game launches without crashes
- Asset loading works
- Performance acceptable
- No memory leaks
Create: docs/wine_integration_guide.md
Contents:
- Build dstorage.dll
- Configure Wine to use builtin override
- Debugging tips
- Performance tuning
- Known issues
Week 1-2: Prototype
- Create basic dstorage.dll shim
- Implement skeleton COM interfaces
- Test with simple DirectStorage app
Week 3-4: Type Mapping
- Implement full type conversion
- Handle edge cases
- Vulkan device sharing
Week 5-6: Testing
- Test with real games
- Performance benchmarking
- Bug fixing
Week 7-8: Polish
- Documentation
- Error handling
- Wine upstreaming (if desired)
Total Estimate: 8 weeks
GDeflate CPU ━━━┓
┃
Vulkan Compute ━╋━━> GDeflate GPU
┃
io_uring Multi ┃
┃
Cancellation ━━━╋━━> GPU Workflows
┃
┗━━> Wine/Proton Integration
Phase 1: Foundation (Weeks 1-8)
- ✅ Initial assessment (complete)
- ⏩ GDeflate research (2 weeks)
- ⏩ Vulkan compute infrastructure (8 weeks, parallel)
Phase 2: Core Features (Weeks 9-18)
- ⏩ GDeflate CPU implementation (5 weeks)
- ⏩ io_uring multi-worker (6 weeks, parallel)
- ⏩ Request cancellation (3 weeks, parallel)
Phase 3: Advanced Features (Weeks 19-28)
- ⏩ GDeflate GPU implementation (6 weeks)
- ⏩ GPU workflow optimization (4 weeks)
Phase 4: Integration (Weeks 29-36)
- ⏩ Wine/Proton shim (8 weeks)
- ⏩ Real game testing
- ⏩ Performance tuning
Total Timeline: 36 weeks (9 months)
Can Work in Parallel:
- Vulkan compute + GDeflate research
- GDeflate CPU + io_uring enhancements
- GDeflate CPU + cancellation
- GPU workflows + Wine integration
Must Be Sequential:
- Vulkan compute → GDeflate GPU
- GDeflate CPU → GDeflate GPU
- Core features → Wine integration
Goal: Minimal viable product in 12 weeks
Scope:
- ✅ CPU backend (working)
- ⏩ GDeflate CPU (5 weeks)
- ⏩ Vulkan compute (8 weeks, start week 1)
- ⏩ Basic Wine shim (3 weeks)
- ❌ Skip: GPU GDeflate, io_uring multi-worker, advanced features
Timeline: 12 weeks
Core:
- ✅ All features work independently
- ✅ Integration tests pass
- ✅ No regressions in existing functionality
- ✅ Documentation complete
Performance:
- ✅ GDeflate CPU: ≥ 500 MB/s
- ✅ GDeflate GPU: ≥ 2 GB/s
- ✅ io_uring: ≥ 2x CPU backend
- ✅ Wine overhead: < 10%
Quality:
- ✅ No memory leaks
- ✅ Thread-safe
- ✅ Vulkan validation clean
- ✅ Works on CachyOS/Arch Linux
Required:
- ✅ At least one DirectStorage game runs
- ✅ Asset loading works correctly
- ✅ Performance within 20% of Windows
- ✅ No crashes or hangs
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| GDeflate format unavailable | Medium | High | Reverse engineer, community collaboration |
| GPU compute too slow | Low | Medium | Optimize shaders, fallback to CPU |
| Wine integration complex | High | Medium | Start simple, iterate |
| Hardware incompatibility | Medium | High | Test on multiple GPUs, provide fallbacks |
| Risk | Impact | Mitigation |
|---|---|---|
| GDeflate research longer than expected | +4 weeks | Start GPU work in parallel |
| Wine upstreaming delays | +8 weeks | Maintain out-of-tree fork |
| Testing reveals bugs | +2-4 weeks | Allocate buffer time |
- ✅ Complete investigation documents
- ⏩ Begin GDeflate format research
- ⏩ Start Vulkan compute implementation
- ⏩ Install liburing for io_uring testing
- ⏩ Implement shader module loading
- ⏩ Begin GDeflate CPU decoder
- ⏩ Design cancellation API
- ⏩ Test io_uring multi-worker prototype
- ⏩ Complete Vulkan compute pipelines
- ⏩ Finish GDeflate CPU implementation
- ⏩ Implement request cancellation
- ⏩ Start Wine shim prototype
Document Status: Draft v1.0
Last Updated: 2026-02-16
Next Review: After Phase 1 milestones complete