This document explains the internal architecture of fastalloc, including design decisions, data structures, and implementation details.
Purpose: Pre-allocated fixed-size pool with O(1) operations.
Structure:
pub struct FixedPool<T> {
storage: RefCell<Vec<MaybeUninit<T>>>,
allocator: RefCell<StackAllocator>,
capacity: usize,
config: PoolConfig<T>,
}Design rationale:
Vec<MaybeUninit<T>>: Allows uninitialized memory until explicitly allocatedRefCell: Provides interior mutability for single-threaded useStackAllocator: LIFO allocation for cache-friendly access patterns
Trade-offs:
- ✅ Fastest allocation (~3.5ns)
- ✅ Zero fragmentation
- ✅ Predictable memory usage
- ❌ Fixed capacity (cannot grow)
- ❌ Not thread-safe
Purpose: Dynamic pool that grows on demand.
Structure:
pub struct GrowingPool<T> {
storage: RefCell<Vec<Vec<MaybeUninit<T>>>>,
allocator: RefCell<FreeListAllocator>,
capacity: RefCell<usize>,
chunk_boundaries: RefCell<Vec<usize>>,
config: PoolConfig<T>,
}Design rationale:
Vec<Vec<...>>: Chunks never move, making pointers stableFreeListAllocator: Better for random access than stackchunk_boundaries: Cached for O(log n) binary search instead of O(n) linear scan
Growth strategy:
- Linear: Add fixed amount each time
- Exponential: Multiply by factor (e.g., 2x)
- Custom: User-defined function
Trade-offs:
- ✅ Automatic growth
- ✅ No fixed capacity limit
- ✅ Stable pointers across growth
- ❌ Slightly slower (~4.6ns)
- ❌ More complex memory layout
Purpose: Thread-safe pool for concurrent access.
Structure:
pub struct ThreadSafePool<T> {
inner: Arc<Mutex<GrowingPool<T>>>,
}
pub struct ThreadSafeHandle<T> {
pool: Arc<Mutex<GrowingPool<T>>>,
index: usize,
cached_ptr: *mut T, // Key optimization!
}Key optimization: Cached pointer eliminates lock acquisition on every dereference.
Design rationale:
Arc<Mutex>: Shared ownership with exclusive access- Cached pointer: Lock-free deref after initial allocation
- Only locks during allocate/deallocate
Trade-offs:
- ✅ Thread-safe
- ✅ Lock-free deref (10-50x improvement)
- ✅ Works with multiple threads
- ❌ Mutex overhead on allocate/free
- ❌ Cannot be Sync (contains raw pointer)
Strategy: Last-In-First-Out using a stack of free indices.
struct StackAllocator {
free_stack: Vec<usize>,
capacity: usize,
allocated_bitmap: Vec<u64>, // Debug only
}Allocation: O(1) - Pop from stack
Deallocation: O(1) - Push to stack
Advantages:
- Excellent cache locality (recently freed = hot in cache)
- Simple implementation
- Optimal for allocate/free patterns
Used by: FixedPool
Strategy: Maintains list of free slot indices.
struct FreeListAllocator {
free_list: Vec<usize>,
capacity: usize,
allocated_bitmap: Vec<u64>, // Debug only
}Allocation: O(1) - Pop from list
Deallocation: O(1) - Push to list
Advantages:
- Good for random access patterns
- Can extend capacity dynamically
- No preference for recently freed slots
Used by: GrowingPool
Strategy: Bit vector tracking allocation status.
struct BitmapAllocator {
bitmap: Vec<u64>,
capacity: usize,
allocated: usize,
next_free_hint: usize,
}Allocation: O(n) worst case, O(1) typical (with hint)
Deallocation: O(1)
Advantages:
- Minimal memory overhead (1 bit per slot)
- Fast intrinsics (trailing_zeros)
- Hint tracking for common patterns
Used by: Optional, for space-constrained scenarios
Purpose: RAII-based exclusive ownership of pool object.
pub struct OwnedHandle<'pool, T> {
pool: &'pool dyn PoolInterface<T>,
index: usize,
}Lifetime safety:
'poollifetime prevents pool from being dropped- Borrow checker ensures exclusive access
- Drop impl automatically returns object
Trait implementations:
Deref/DerefMut: Direct access to valueDrop: Returns object to poolDebug,Display,PartialEq,Ord: For convenience
Purpose: Reference-counted shared access.
pub struct SharedHandle<'pool, T> {
inner: Rc<OwnedHandle<'pool, T>>,
}Use case: Multiple readers need access to same pooled object.
Trade-off: Reference counting overhead vs flexibility.
Purpose: Non-owning reference that may become invalid.
pub struct WeakHandle<'pool, T> {
inner: Weak<OwnedHandle<'pool, T>>,
}Use case: Breaking reference cycles, observer patterns.
FixedPool (per object):
- Storage:
size_of::<T>()bytes - No per-object metadata (objects stored directly)
- Total: Exactly
size_of::<T>()bytes per slot
Allocator Overhead (one-time, shared across all objects):
| Allocator Type | Per-Pool Overhead | Formula |
|---|---|---|
| StackAllocator | capacity * 8 + bitmap_size |
Vec + optional debug bitmap |
| FreeListAllocator | capacity * 8 + bitmap_size |
Vec + optional debug bitmap |
| BitmapAllocator | (capacity + 63) / 64 * 8 |
One bit per slot (packed in u64s) |
Debug Mode Additional Overhead:
- StackAllocator:
(capacity + 63) / 64 * 8bytes for bitmap - FreeListAllocator:
(capacity + 63) / 64 * 8bytes for bitmap - Release mode: No additional overhead
Example Calculation (FixedPool with 1000 i32 objects):
Object storage: 1000 * 4 = 4,000 bytes
StackAllocator: 1000 * 8 = 8,000 bytes (indices)
Debug bitmap: (1000 + 63) / 64 * 8 = 128 bytes
Total: 12,128 bytes
Overhead: 8,128 / 12,128 = 67% (but amortized across all objects)
Per-object view: 8,128 / 1000 = 8.128 bytes overhead per slot
For larger objects (e.g., 256-byte structs):
Object storage: 1000 * 256 = 256,000 bytes
Allocator overhead: 8,128 bytes
Total: 264,128 bytes
Overhead: 8,128 / 264,128 = 3.1%
Conclusion: Overhead is < 5% for objects > 100 bytes and pools > 1000 objects.
OwnedHandle:
- Size:
size_of::<&dyn Trait>() + size_of::<usize>()= 16 + 8 = 24 bytes (on 64-bit) - No heap allocation (stored on stack)
ThreadSafeHandle:
- Size:
Arc(16 bytes) +usize(8 bytes) +*mut T(8 bytes) = 32 bytes - Arc overhead: Shared atomic reference counter
SharedHandle:
- Size:
Rc(16 bytes) wrapper around OwnedHandle - Rc overhead: Reference counter
Additional overhead:
Vec<Vec<MaybeUninit<T>>>: Outer Vec grows, ~24 bytes per chunk pointerchunk_boundaries: Vec<usize>: 8 bytes per chunk- Per chunk: ~32 bytes of Vec metadata
Example (GrowingPool with 3 chunks: 100, 200, 400 capacity):
Chunk 0 storage: 100 * size_of::<T>()
Chunk 1 storage: 200 * size_of::<T>()
Chunk 2 storage: 400 * size_of::<T>()
Chunk pointers: 3 * 24 = 72 bytes
Boundaries: 3 * 8 = 24 bytes
Allocator: (100+200+400) * 8 = 5,600 bytes
Total overhead: ~5,696 bytes + 3 * ~32 = ~5,792 bytes
- Use larger pool capacities (overhead amortizes better)
- Use FixedPool when capacity is known (less overhead than GrowingPool)
- For tiny objects (< 8 bytes), consider batching or using larger container types
- In release mode, debug bitmap is compiled out
FixedPool<T>
├── storage: Vec<MaybeUninit<T>>
│ ├── [0]: MaybeUninit<T> ← Object or uninitialized
│ ├── [1]: MaybeUninit<T>
│ └── [n]: MaybeUninit<T>
├── allocator: StackAllocator
│ └── free_stack: [n, n-1, ..., 2, 1, 0] ← Available indices
└── capacity: n
Key properties:
- Contiguous memory block
- Fixed size, never moves
- Pointers stable for pool lifetime
GrowingPool<T>
├── storage: Vec<Vec<MaybeUninit<T>>>
│ ├── Chunk 0: [MaybeUninit<T>; initial_cap]
│ ├── Chunk 1: [MaybeUninit<T>; growth_amount]
│ └── Chunk 2: [MaybeUninit<T>; growth_amount]
├── chunk_boundaries: [initial_cap, initial_cap + g1, initial_cap + g1 + g2]
├── allocator: FreeListAllocator
└── capacity: total across chunks
Index to pointer mapping:
fn compute_chunk_location(index: usize) -> (chunk_idx, offset) {
// Binary search in chunk_boundaries
chunk_idx = binary_search(chunk_boundaries, index + 1)
offset = if chunk_idx == 0 {
index
} else {
index - chunk_boundaries[chunk_idx - 1]
}
}Key properties:
- Chunks never move individually
- Pointers remain valid across growth
- O(log n) lookup with cached boundaries
pub(crate) fn get(&self, index: usize) -> &T {
let storage = self.storage.borrow();
unsafe {
let ptr = storage.as_ptr();
&*ptr.add(index).cast::<T>()
}
}Why unsafe?: Returns reference that outlives the borrow.
Why safe?:
- Pool owns the storage (won't be freed)
- Index is valid (checked by allocator)
- Handle prevents concurrent mutable access
- Memory is initialized before handle creation
pub(crate) fn get_mut(&self, index: usize) -> &mut T {
let storage = self.storage.borrow_mut();
unsafe {
let ptr = storage.as_ptr() as *mut MaybeUninit<T>;
&mut *ptr.add(index).cast::<T>()
}
}Why unsafe?: &mut T from &self.
Why safe?:
- RefCell provides interior mutability
- Only one handle per slot exists
- Borrow checker ensures no aliasing via handle
// Allocation
storage[index].write(value); // Initialize
Ok(OwnedHandle::new(self, index)) // Now safe to read
// Deallocation
let value_ptr = storage[index].as_mut_ptr();
(*value_ptr).on_release();
ptr::drop_in_place(value_ptr); // Drop before reuseWhy safe?:
- Write before any read
- Drop before reuse
- Handle ensures no access after drop
Before: O(n) linear scan through chunks
After: O(log n) binary search
Implementation:
chunk_boundaries: RefCell<Vec<usize>> // [100, 200, 400, 800]Updated on growth, used in compute_chunk_location.
Before: Lock on every dereference
After: Cached pointer, lock only on alloc/dealloc
Impact: 10-50x improvement for handle access patterns
Before: O(n) contains() in FreeListAllocator
After: O(1) bitmap check
Impact: Eliminates debug build overhead for large pools
Critical path functions marked with #[inline]:
- Allocator methods
- Handle deref/deref_mut
- Poolable trait hooks
let mut free_stack = Vec::with_capacity(capacity);
for i in (0..capacity).rev() {
free_stack.push(i);
}Better than: (0..capacity).rev().collect() (avoids reallocation)
- Allocator tests: Verify allocation/deallocation logic
- Pool tests: Test capacity, growth, reuse
- Handle tests: RAII behavior, drop semantics
- Safety tests: Borrow checker violations (compile failures)
- Multi-threaded scenarios
- Edge cases (capacity limits, exhaustion)
- Drop behavior and cleanup
- Criterion.rs for statistical rigor
- Comparison with Box allocation
- Various pool sizes and types
- QuickCheck/proptest for invariant checking
- Fuzz testing with cargo-fuzz
- Safety First: Leverage Rust's type system, minimal unsafe
- Zero-Cost Abstractions: Performance comparable to hand-written code
- Ergonomic API: RAII handles, builder pattern, sensible defaults
- Composability: Multiple pool types for different use cases
- Transparency: Well-documented unsafe code with justification
- Lock-free data structures (experimental
lock-freefeature) - NUMA-aware allocation for large systems
- Custom allocator backend support
- Allocation batching for reduced overhead
- Per-thread caching for thread-safe pool
- Rustonomicon
- The Rust Reference
- API Guidelines
- Memory pool design patterns from C/C++ literature