Memory Management and Zero-Copy Access
Relevant source files
- .github/workflows/rust-lint.yml
- CHANGELOG.md
- README.md
- simd-r-drive-entry-handle/src/constants.rs
- simd-r-drive-entry-handle/src/debug_assert_aligned.rs
- simd-r-drive-entry-handle/src/lib.rs
- src/lib.rs
- src/storage_engine.rs
- src/storage_engine/entry_iterator.rs
- src/utils/align_or_copy.rs
This document explains how SIMD R Drive implements zero-copy reads through memory-mapped files, manages payload alignment for optimal performance, and provides safe concurrent access to stored data. It covers the EntryHandle abstraction, the Arc<Mmap> sharing strategy, alignment requirements enforced by PAYLOAD_ALIGNMENT, and utilities for working with aligned binary data.
For details on the on-disk entry format and metadata structure, see Entry Structure and Metadata. For information on how concurrent reads and writes are coordinated, see Concurrency and Thread Safety. For performance characteristics of the alignment strategy, see Payload Alignment and Cache Efficiency.
Memory-Mapped File Architecture
SIMD R Drive uses memory-mapped files (memmap2::Mmap) to enable zero-copy reads. The storage file is mapped into the process's virtual address space, allowing direct access to payload bytes without copying them into separate buffers. This approach provides several benefits:
- Zero-copy reads : Data is accessed directly from the mapped memory region
- Larger-than-RAM datasets : The OS handles paging, so only accessed portions consume physical memory
- Efficient random access : Any offset in the file can be accessed with pointer arithmetic
- Shared memory : Multiple threads can read from the same mapped region concurrently
The memory-mapped file is wrapped in Arc<Mutex<Arc<Mmap>>> within the DataStore structure:
DataStore.mmap_arc: Arc<Mutex<Arc<Mmap>>>
The outer Arc allows the DataStore itself to be cloned and shared. The Mutex protects remapping operations (which occur after writes). The inner Arc<Mmap> is what gets cloned and handed out to readers, ensuring that even if a remap occurs, existing readers retain a valid view of the previous mapping.
Sources: README.md:43-49 README.md:174-180 src/storage_engine/entry_iterator.rs:22-23
Mmap Lifecycle and Remapping
Diagram: Memory-mapped file lifecycle across write operations
When a write operation extends the file, the following sequence occurs:
- The file is extended and flushed to disk
- The
Mutex<Arc<Mmap>>is locked - A new
Mmapis created from the extended file - The old
Arc<Mmap>is replaced with a new one - Existing
EntryHandleinstances continue using their oldArc<Mmap>reference until dropped
This design ensures that readers never see an invalid memory mapping, even as writes extend the file.
Sources: README.md:174-180 src/storage_engine/data_store.rs:46-48 (implied from architecture)
EntryHandle: Zero-Copy Data Access
The EntryHandle struct provides a zero-copy view into a specific payload within the memory-mapped file. It consists of three fields:
EntryHandle {
mmap_arc: Arc<Mmap>,
range: Range<usize>,
metadata: EntryMetadata,
}
Each EntryHandle holds:
mmap_arc: A reference-counted pointer to the memory-mapped filerange: The byte range<FileRef file-url="https://github.com/jzombie/rust-simd-r-drive/blob/487b7b98/start..end)identifying the payload location\n-metadata#LNaN-LNaN" NaN file-path="start..end)identifying the payload location\n- **metadata`**">Hii (structure definition)
EntryHandle Methods
Diagram: EntryHandle methods and their memory semantics
| Method | Return Type | Memory Semantics | Use Case |
|---|---|---|---|
as_slice() | &[u8] | Zero-copy borrow | Direct byte access |
into_vec() | Vec<u8> | Owned copy | When ownership needed |
as_arrow_buffer() | arrow::buffer::Buffer | Zero-copy Buffer | Arrow integration (feature-gated) |
into_arrow_buffer() | arrow::buffer::Buffer | Consumes EntryHandle | Arrow integration (feature-gated) |
metadata() | &EntryMetadata | Borrow | Access key hash, checksum |
get_mmap_arc() | &Arc<Mmap> | Borrow | Low-level mmap access |
get_range() | &Range<usize> | Borrow | Get payload boundaries |
The as_slice() method is the primary zero-copy interface:
It returns a slice directly into the memory-mapped region with no allocation or copying.
Sources: simd-r-drive-entry-handle/src/entry_handle.rs:22-50 (method implementations)
Alignment Requirements
All non-tombstone payloads in SIMD R Drive begin on a fixed 64-byte aligned boundary. This alignment is defined by the PAYLOAD_ALIGNMENT constant:
PAYLOAD_ALIGN_LOG2 = 6
PAYLOAD_ALIGNMENT = 1 << 6 = 64 bytes
The 64-byte alignment matches typical CPU cache line sizes and ensures that SIMD operations (AVX, AVX-512, SVE) can operate at full speed without crossing cache line boundaries.
Sources: simd-r-drive-entry-handle/src/constants.rs:13-18 README.md:51-59
Alignment Calculation
The pre-padding required to align a payload is calculated using:
pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)
Where prev_tail is the file offset immediately after the previous entry's metadata. This formula ensures:
- Payloads always start at multiples of
PAYLOAD_ALIGNMENT - Pre-padding ranges from 0 to
PAYLOAD_ALIGNMENT - 1bytes - The calculation works for any power-of-two alignment
Diagram: Pre-padding calculation for 64-byte alignment
The storage engine writes zero bytes for the pre-padding region, then writes the payload starting at the aligned boundary.
Sources: README.md:112-124 src/storage_engine/entry_iterator.rs:50-53
Alignment Validation
The crate provides debug-only assertions to verify alignment invariants:
debug_assert_aligned(ptr: *const u8, align: usize)
debug_assert_aligned_offset(offset: u64)
These assertions:
- Are compiled to no-ops in release builds (zero runtime cost)
- Execute in debug and test builds to catch alignment violations
- Check both pointer alignment (
debug_assert_aligned) and offset alignment (debug_assert_aligned_offset)
Diagram: Debug-only alignment assertion behavior
graph TB
subgraph "Alignment Validation"
DebugMode["Debug/Test Build"]
ReleaseMode["Release Build"]
Call["debug_assert_aligned(ptr, align)"]
CheckPowerOf2["Assert align.is_power_of_two()"]
CheckAligned["Assert (ptr as usize & (align-1)) == 0"]
NoOp["No-op (optimized away)"]
Call --> DebugMode
Call --> ReleaseMode
DebugMode --> CheckPowerOf2
CheckPowerOf2 --> CheckAligned
ReleaseMode --> NoOp
end
The Arrow integration feature uses these assertions when creating arrow::buffer::Buffer instances from EntryHandle:
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88 simd-r-drive-entry-handle/src/entry_handle.rs:52-85 (Arrow integration)
Typed Slice Access via align_or_copy
The align_or_copy utility function enables efficient conversion of byte slices into typed slices (e.g., &[f32], &[u32]) with automatic fallback when alignment requirements are not met.
align_or_copy<T, const N: usize>(
bytes: &[u8],
from_le_bytes: fn([u8; N]) -> T
) -> Cow<'_, [T]>
Sources: src/utils/align_or_copy.rs:1-73
Zero-Copy vs. Fallback Behavior
Diagram: align_or_copy decision flow
The function attempts zero-copy reinterpretation using slice::align_to::<T>():
| Condition | Result | Allocation |
|---|---|---|
Memory aligned for T AND length is multiple of size_of::<T>() | Cow::Borrowed | None |
| Misaligned OR length not a multiple | Cow::Owned | Allocates Vec<T> |
Zero-copy path:
Fallback path:
Sources: src/utils/align_or_copy.rs:44-73
Usage Example
Due to the 64-byte payload alignment, EntryHandle payloads are often well-aligned for SIMD types:
Since PAYLOAD_ALIGNMENT = 64 is a multiple of align_of::<f32>() = 4, align_of::<f64>() = 8, and align_of::<u128>() = 16, zero-copy access is typically possible for these types when the payload length is an exact multiple of their size.
Sources: src/utils/align_or_copy.rs:37-43 README.md:51-59
graph TB
subgraph "DataStore Structure"
DS["DataStore"]
MmapMutex["Arc<Mutex<Arc<Mmap>>>"]
end
subgraph "Reader Thread 1"
R1["read(key1)"]
EH1["EntryHandle\n(Arc<Mmap> clone)"]
Slice1["as_slice() → &[u8]"]
end
subgraph "Reader Thread 2"
R2["read(key2)"]
EH2["EntryHandle\n(Arc<Mmap> clone)"]
Slice2["as_slice() → &[u8]"]
end
subgraph "Reader Thread N"
RN["read(keyN)"]
EHN["EntryHandle\n(Arc<Mmap> clone)"]
SliceN["as_slice() → &[u8]"]
end
subgraph "Memory-Mapped File"
Mmap["memmap2::Mmap\n(shared memory region)"]
end
DS --> MmapMutex
MmapMutex -.Arc::clone.-> R1
MmapMutex -.Arc::clone.-> R2
MmapMutex -.Arc::clone.-> RN
R1 --> EH1
R2 --> EH2
RN --> EHN
EH1 --> Slice1
EH2 --> Slice2
EHN --> SliceN
Slice1 -.references.-> Mmap
Slice2 -.references.-> Mmap
SliceN -.references.-> Mmap
Memory Sharing and Concurrency
The Arc<Mmap> design enables efficient memory sharing across threads while maintaining safety:
Diagram: Concurrent zero-copy reads via Arc sharing
Concurrency characteristics:
- No read locks required : Once an
EntryHandleholds anArc<Mmap>, it can access data without further synchronization - Write safety : Writers acquire
RwLock<File>to serialize writes, andMutex<Arc<Mmap>>to safely remap after writes - Memory efficiency : All readers share the same physical pages, regardless of thread count
- Graceful remapping : Old
Arc<Mmap>instances remain valid until all references are dropped
This design allows SIMD R Drive to support thousands of concurrent readers with minimal overhead.
Sources: README.md:172-206 src/storage_engine/entry_iterator.rs:22-23
Datasets Larger Than RAM
Memory mapping enables efficient access to storage files that exceed available physical memory:
- OS-managed paging : The operating system handles paging, loading only accessed regions into physical memory
- Transparent access : Application code uses the same
EntryHandleAPI regardless of file size - Efficient random access : Jumping between distant file offsets does not require explicit seeking
- Memory pressure handling : Unused pages are evicted by the OS when memory is needed elsewhere
For example, a storage file containing 100 GB of data on a system with 16 GB of RAM can be accessed efficiently. Only the portions actively being read will occupy physical memory, with the OS managing page faults transparently.
This design makes SIMD R Drive suitable for:
- Large-scale data analytics workloads
- Multi-gigabyte append-only logs
- Embedded databases on resource-constrained systems
- Archive storage with infrequent random access patterns
Sources: README.md:43-49 README.md:147-148