Memory Management and Zero-Copy Access

Relevant source files

This document explains how SIMD R Drive implements zero-copy reads through memory-mapped files, manages payload alignment for optimal performance, and provides safe concurrent access to stored data. It covers the EntryHandle abstraction, the Arc<Mmap> sharing strategy, alignment requirements enforced by PAYLOAD_ALIGNMENT, and utilities for working with aligned binary data.

For details on the on-disk entry format and metadata structure, see Entry Structure and Metadata. For information on how concurrent reads and writes are coordinated, see Concurrency and Thread Safety. For performance characteristics of the alignment strategy, see Payload Alignment and Cache Efficiency.

Memory-Mapped File Architecture

SIMD R Drive uses memory-mapped files (memmap2::Mmap) to enable zero-copy reads. The storage file is mapped into the process's virtual address space, allowing direct access to payload bytes without copying them into separate buffers. This approach provides several benefits:

Zero-copy reads : Data is accessed directly from the mapped memory region
Larger-than-RAM datasets : The OS handles paging, so only accessed portions consume physical memory
Efficient random access : Any offset in the file can be accessed with pointer arithmetic
Shared memory : Multiple threads can read from the same mapped region concurrently

The memory-mapped file is wrapped in Arc<Mutex<Arc<Mmap>>> within the DataStore structure:

DataStore.mmap_arc: Arc<Mutex<Arc<Mmap>>>

The outer Arc allows the DataStore itself to be cloned and shared. The Mutex protects remapping operations (which occur after writes). The inner Arc<Mmap> is what gets cloned and handed out to readers, ensuring that even if a remap occurs, existing readers retain a valid view of the previous mapping.

Sources: README.md:43-49 README.md:174-180 src/storage_engine/entry_iterator.rs:22-23

Mmap Lifecycle and Remapping

Diagram: Memory-mapped file lifecycle across write operations

When a write operation extends the file, the following sequence occurs:

The file is extended and flushed to disk
The Mutex<Arc<Mmap>> is locked
A new Mmap is created from the extended file
The old Arc<Mmap> is replaced with a new one
Existing EntryHandle instances continue using their old Arc<Mmap> reference until dropped

This design ensures that readers never see an invalid memory mapping, even as writes extend the file.

Sources: README.md:174-180 src/storage_engine/data_store.rs:46-48 (implied from architecture)

EntryHandle: Zero-Copy Data Access

The EntryHandle struct provides a zero-copy view into a specific payload within the memory-mapped file. It consists of three fields:

EntryHandle {
    mmap_arc: Arc<Mmap>,
    range: Range<usize>,
    metadata: EntryMetadata,
}

Each EntryHandle holds:

mmap_arc : A reference-counted pointer to the memory-mapped file
range : The byte range <FileRef file-url="https://github.com/jzombie/rust-simd-r-drive/blob/487b7b98/start..end) identifying the payload location\n- metadata #LNaN-LNaN" NaN file-path="start..end) identifying the payload location\n- **metadata`**">Hii (structure definition)

EntryHandle Methods

Diagram: EntryHandle methods and their memory semantics

Method	Return Type	Memory Semantics	Use Case
`as_slice()`	`&[u8]`	Zero-copy borrow	Direct byte access
`into_vec()`	`Vec<u8>`	Owned copy	When ownership needed
`as_arrow_buffer()`	`arrow::buffer::Buffer`	Zero-copy Buffer	Arrow integration (feature-gated)
`into_arrow_buffer()`	`arrow::buffer::Buffer`	Consumes EntryHandle	Arrow integration (feature-gated)
`metadata()`	`&EntryMetadata`	Borrow	Access key hash, checksum
`get_mmap_arc()`	`&Arc<Mmap>`	Borrow	Low-level mmap access
`get_range()`	`&Range<usize>`	Borrow	Get payload boundaries

The as_slice() method is the primary zero-copy interface:

It returns a slice directly into the memory-mapped region with no allocation or copying.

Sources: simd-r-drive-entry-handle/src/entry_handle.rs:22-50 (method implementations)

Alignment Requirements

All non-tombstone payloads in SIMD R Drive begin on a fixed 64-byte aligned boundary. This alignment is defined by the PAYLOAD_ALIGNMENT constant:

PAYLOAD_ALIGN_LOG2 = 6
PAYLOAD_ALIGNMENT = 1 << 6 = 64 bytes

The 64-byte alignment matches typical CPU cache line sizes and ensures that SIMD operations (AVX, AVX-512, SVE) can operate at full speed without crossing cache line boundaries.

Sources: simd-r-drive-entry-handle/src/constants.rs:13-18 README.md:51-59

Alignment Calculation

The pre-padding required to align a payload is calculated using:

pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)

Where prev_tail is the file offset immediately after the previous entry's metadata. This formula ensures:

Payloads always start at multiples of PAYLOAD_ALIGNMENT
Pre-padding ranges from 0 to PAYLOAD_ALIGNMENT - 1 bytes
The calculation works for any power-of-two alignment

Diagram: Pre-padding calculation for 64-byte alignment

The storage engine writes zero bytes for the pre-padding region, then writes the payload starting at the aligned boundary.

Sources: README.md:112-124 src/storage_engine/entry_iterator.rs:50-53

Alignment Validation

The crate provides debug-only assertions to verify alignment invariants:

debug_assert_aligned(ptr: *const u8, align: usize)
debug_assert_aligned_offset(offset: u64)

These assertions:

Are compiled to no-ops in release builds (zero runtime cost)
Execute in debug and test builds to catch alignment violations
Check both pointer alignment (debug_assert_aligned) and offset alignment (debug_assert_aligned_offset)

Diagram: Debug-only alignment assertion behavior

graph TB
    subgraph "Alignment Validation"
        DebugMode["Debug/Test Build"]
ReleaseMode["Release Build"]
Call["debug_assert_aligned(ptr, align)"]
CheckPowerOf2["Assert align.is_power_of_two()"]
CheckAligned["Assert (ptr as usize & (align-1)) == 0"]
NoOp["No-op (optimized away)"]
Call --> DebugMode
 
       Call --> ReleaseMode
        
 
       DebugMode --> CheckPowerOf2
 
       CheckPowerOf2 --> CheckAligned
        
 
       ReleaseMode --> NoOp
    end

The Arrow integration feature uses these assertions when creating arrow::buffer::Buffer instances from EntryHandle:

Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88 simd-r-drive-entry-handle/src/entry_handle.rs:52-85 (Arrow integration)

Typed Slice Access via align_or_copy

The align_or_copy utility function enables efficient conversion of byte slices into typed slices (e.g., &[f32], &[u32]) with automatic fallback when alignment requirements are not met.

align_or_copy<T, const N: usize>(
    bytes: &[u8],
    from_le_bytes: fn([u8; N]) -> T
) -> Cow<'_, [T]>

Sources: src/utils/align_or_copy.rs:1-73

Zero-Copy vs. Fallback Behavior

Diagram: align_or_copy decision flow

The function attempts zero-copy reinterpretation using slice::align_to::<T>():

Condition	Result	Allocation
Memory aligned for `T` AND length is multiple of `size_of::<T>()`	`Cow::Borrowed`	None
Misaligned OR length not a multiple	`Cow::Owned`	Allocates `Vec<T>`

Zero-copy path:

Fallback path:

Sources: src/utils/align_or_copy.rs:44-73

Usage Example

Due to the 64-byte payload alignment, EntryHandle payloads are often well-aligned for SIMD types:

Since PAYLOAD_ALIGNMENT = 64 is a multiple of align_of::<f32>() = 4, align_of::<f64>() = 8, and align_of::<u128>() = 16, zero-copy access is typically possible for these types when the payload length is an exact multiple of their size.

Sources: src/utils/align_or_copy.rs:37-43 README.md:51-59

graph TB
    subgraph "DataStore Structure"
        DS["DataStore"]
MmapMutex["Arc&lt;Mutex&lt;Arc&lt;Mmap&gt;&gt;&gt;"]
end
    
    subgraph "Reader Thread 1"
        R1["read(key1)"]
EH1["EntryHandle\n(Arc&lt;Mmap&gt; clone)"]
Slice1["as_slice() → &[u8]"]
end
    
    subgraph "Reader Thread 2"
        R2["read(key2)"]
EH2["EntryHandle\n(Arc&lt;Mmap&gt; clone)"]
Slice2["as_slice() → &[u8]"]
end
    
    subgraph "Reader Thread N"
        RN["read(keyN)"]
EHN["EntryHandle\n(Arc&lt;Mmap&gt; clone)"]
SliceN["as_slice() → &[u8]"]
end
    
    subgraph "Memory-Mapped File"
        Mmap["memmap2::Mmap\n(shared memory region)"]
end
    
 
   DS --> MmapMutex
    MmapMutex -.Arc::clone.-> R1
    MmapMutex -.Arc::clone.-> R2
    MmapMutex -.Arc::clone.-> RN
    
 
   R1 --> EH1
 
   R2 --> EH2
 
   RN --> EHN
    
 
   EH1 --> Slice1
 
   EH2 --> Slice2
 
   EHN --> SliceN
    
    Slice1 -.references.-> Mmap
    Slice2 -.references.-> Mmap
    SliceN -.references.-> Mmap

The Arc<Mmap> design enables efficient memory sharing across threads while maintaining safety:

Diagram: Concurrent zero-copy reads via Arc sharing

Concurrency characteristics:

No read locks required : Once an EntryHandle holds an Arc<Mmap>, it can access data without further synchronization
Write safety : Writers acquire RwLock<File> to serialize writes, and Mutex<Arc<Mmap>> to safely remap after writes
Memory efficiency : All readers share the same physical pages, regardless of thread count
Graceful remapping : Old Arc<Mmap> instances remain valid until all references are dropped

This design allows SIMD R Drive to support thousands of concurrent readers with minimal overhead.

Sources: README.md:172-206 src/storage_engine/entry_iterator.rs:22-23

Datasets Larger Than RAM

Memory mapping enables efficient access to storage files that exceed available physical memory:

OS-managed paging : The operating system handles paging, loading only accessed regions into physical memory
Transparent access : Application code uses the same EntryHandle API regardless of file size
Efficient random access : Jumping between distant file offsets does not require explicit seeking
Memory pressure handling : Unused pages are evicted by the OS when memory is needed elsewhere

For example, a storage file containing 100 GB of data on a system with 16 GB of RAM can be accessed efficiently. Only the portions actively being read will occupy physical memory, with the OS managing page faults transparently.

This design makes SIMD R Drive suitable for:

Large-scale data analytics workloads
Multi-gigabyte append-only logs
Embedded databases on resource-constrained systems
Archive storage with infrequent random access patterns

Sources: README.md:43-49 README.md:147-148

Keyboard shortcuts

rust-simd-r-drive Documentation