Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Memory Management and Zero-Copy Access

Loading…

Memory Management and Zero-Copy Access

Relevant source files

This document explains how SIMD R Drive implements zero-copy reads through memory-mapped files, manages payload alignment for optimal performance, and provides safe concurrent access to stored data. It covers the EntryHandle abstraction, the Arc<Mmap> sharing strategy, alignment requirements enforced by PAYLOAD_ALIGNMENT, and utilities for working with aligned binary data.

For details on the on-disk entry format and metadata structure, see Entry Structure and Metadata. For information on how concurrent reads and writes are coordinated, see Concurrency and Thread Safety. For performance characteristics of the alignment strategy, see Payload Alignment and Cache Efficiency.


Memory-Mapped File Architecture

SIMD R Drive uses memory-mapped files (memmap2::Mmap) to enable zero-copy reads. The storage file is mapped into the process’s virtual address space, allowing direct access to payload bytes without copying them into separate buffers. This approach provides several benefits:

  • Zero-copy reads : Data is accessed directly from the mapped memory region
  • Larger-than-RAM datasets : The OS handles paging, so only accessed portions consume physical memory
  • Efficient random access : Any offset in the file can be accessed with pointer arithmetic
  • Shared memory : Multiple threads can read from the same mapped region concurrently

The memory-mapped file is wrapped in Arc<Mutex<Arc<Mmap>>> within the DataStore structure:

DataStore.mmap_arc: Arc<Mutex<Arc<Mmap>>>

The outer Arc allows the DataStore itself to be cloned and shared. The Mutex protects remapping operations (which occur after writes). The inner Arc<Mmap> is what gets cloned and handed out to readers, ensuring that even if a remap occurs, existing readers retain a valid view of the previous mapping.

Sources: README.md:43-49 README.md:174-180 src/storage_engine/entry_iterator.rs:22-23


Mmap Lifecycle and Remapping

Diagram: Memory-mapped file lifecycle across write operations

When a write operation extends the file, the following sequence occurs:

  1. The file is extended and flushed to disk
  2. The Mutex<Arc<Mmap>> is locked
  3. A new Mmap is created from the extended file
  4. The old Arc<Mmap> is replaced with a new one
  5. Existing EntryHandle instances continue using their old Arc<Mmap> reference until dropped

This design ensures that readers never see an invalid memory mapping, even as writes extend the file.

Sources: README.md:174-180 src/storage_engine/data_store.rs:46-48 (implied from architecture)


EntryHandle: Zero-Copy Data Access

The EntryHandle struct provides a zero-copy view into a specific payload within the memory-mapped file. It consists of three fields:

Each EntryHandle holds:

  • mmap_arc : A reference-counted pointer to the memory-mapped file (shared across all handles)
  • range : The byte range <FileRef file-url="https://github.com/jzombie/rust-simd-r-drive/blob/1665f50d/start..end) identifying the payload location within the mmap\n- metadata #LNaN-LNaN“ NaN file-path=“start..end) identifying the payload location within the mmap\n- **metadata**">Hii</FileRef> and is marked #[derive(Debug, Clone)]`, allowing it to be cloned to create additional handles referencing the same underlying memory.

Sources: simd-r-drive-entry-handle/src/entry_handle.rs:7-19


EntryHandle Methods

Diagram: EntryHandle method categories

Core Zero-Copy Access Methods:

MethodSignatureDescriptionLines
as_slice()&self -> &[u8]Returns zero-copy slice into mmapsimd-r-drive-entry-handle/src/entry_handle.rs:151-155
Deref trait&self -> &[u8]Enables *handle syntax for slice accesssimd-r-drive-entry-handle/src/entry_handle.rs:36-42
AsRef<[u8]>&self -> &[u8]Standard trait for byte slice conversionsimd-r-drive-entry-handle/src/entry_handle.rs:447-451
clone_arc()&self -> EntryHandleCreates new handle sharing same mmap Arcsimd-r-drive-entry-handle/src/entry_handle.rs:179-185

Metadata Access Methods:

MethodSignatureDescriptionLines
metadata()&self -> &EntryMetadataReturns reference to entry metadatasimd-r-drive-entry-handle/src/entry_handle.rs:196-198
key_hash()&self -> u64Returns 64-bit XXH3 key hashsimd-r-drive-entry-handle/src/entry_handle.rs:227-229
checksum()&self -> u32Returns CRC32C checksumsimd-r-drive-entry-handle/src/entry_handle.rs:237-239
raw_checksum()&self -> [u8; 4]Returns raw checksum bytessimd-r-drive-entry-handle/src/entry_handle.rs:247-249
is_valid_checksum()&self -> boolValidates payload integritysimd-r-drive-entry-handle/src/entry_handle.rs:260-275

Offset and Size Methods:

MethodSignatureDescriptionLines
size()&self -> usizeReturns payload size (bytes)simd-r-drive-entry-handle/src/entry_handle.rs:204-206
file_size()&self -> usizeReturns payload + metadata sizesimd-r-drive-entry-handle/src/entry_handle.rs:212-214
start_offset()&self -> usizeReturns absolute start offset in filesimd-r-drive-entry-handle/src/entry_handle.rs:283-285
end_offset()&self -> usizeReturns absolute end offset in filesimd-r-drive-entry-handle/src/entry_handle.rs:293-295
offset_range()&self -> Range<usize>Returns byte range in filesimd-r-drive-entry-handle/src/entry_handle.rs:303-305
address_range()&self -> Range<*const u8>Returns virtual memory pointer rangesimd-r-drive-entry-handle/src/entry_handle.rs:315-320

Construction Methods:

MethodSignatureDescriptionLines
from_owned_bytes_anon()(bytes, key_hash) -> Result<Self>Creates in-memory entry via anonymous mmapsimd-r-drive-entry-handle/src/entry_handle.rs:87-113
from_arc_mmap()(Arc<Mmap>, Range, metadata) -> SelfWraps region in existing mmap (zero-copy)simd-r-drive-entry-handle/src/entry_handle.rs:129-139

Feature-Gated Methods:

MethodFeature FlagDescriptionLines
mmap_arc()expose-internal-apiReturns reference to underlying Arc<Mmap>simd-r-drive-entry-handle/src/entry_handle.rs:349-351
as_arrow_buffer()arrowCreates zero-copy Arrow Buffersimd-r-drive-entry-handle/src/entry_handle.rs:385-404
into_arrow_buffer()arrowConsumes handle into Arrow Buffersimd-r-drive-entry-handle/src/entry_handle.rs:425-444

The primary zero-copy access method as_slice() is implemented as:

This returns a slice directly referencing the memory-mapped region without copying data.

Sources: simd-r-drive-entry-handle/src/entry_handle.rs:36-451


Alignment Requirements

All non-tombstone payloads in SIMD R Drive begin on a fixed 64-byte aligned boundary. This alignment is defined by the PAYLOAD_ALIGNMENT constant:

PAYLOAD_ALIGN_LOG2 = 6
PAYLOAD_ALIGNMENT = 1 << 6 = 64 bytes

The 64-byte alignment matches typical CPU cache line sizes and ensures that SIMD operations (AVX, AVX-512, SVE) can operate at full speed without crossing cache line boundaries.

Sources: simd-r-drive-entry-handle/src/constants.rs:13-18 README.md:51-59


Alignment Calculation

The pre-padding required to align a payload is calculated using the prepad_len() function in DataStore:

Where offset is the file offset immediately after the previous entry’s metadata (the prev_tail). This formula ensures:

  • Payloads always start at multiples of PAYLOAD_ALIGNMENT (64 bytes)
  • Pre-padding ranges from 0 to 63 bytes
  • The calculation works for any power-of-two alignment value

Diagram: prepad_len() calculation with examples

During write operations, the storage engine:

  1. Calls prepad_len(prev_tail) to determine padding needed
  2. Writes zero bytes for the pre-padding region (if pad > 0)
  3. Writes the payload starting at the aligned boundary
  4. Writes metadata immediately after payload

This is implemented in the write path at src/storage_engine/data_store.rs:765-771 and src/storage_engine/data_store.rs:908-914 for streaming and batch writes respectively.

Sources: src/storage_engine/data_store.rs:670-673 src/storage_engine/data_store.rs:765-771 README.md:112-124


Alignment Validation

The crate provides debug-only assertions to verify alignment invariants. These are defined in the simd-r-drive-entry-handle crate and used throughout the storage engine:

Assertion functions:

  • debug_assert_aligned(ptr: *const u8, align: usize) - Validates pointer alignment
  • debug_assert_aligned_offset(offset: u64) - Validates file offset alignment to PAYLOAD_ALIGNMENT

These assertions:

  • Are compiled to no-ops in release builds (zero runtime cost)
  • Execute in debug and test builds to catch alignment violations
  • Panic with descriptive error messages if alignment is incorrect

Diagram: debug_assert_aligned_offset() execution flow

The assertions are used at key points in the codebase:

  1. During entry construction in read_entry_with_context():

src/storage_engine/data_store.rs:555-558

  1. During recovery in recover_valid_chain():

src/storage_engine/data_store.rs:410-413

  1. During parallel iteration in par_iter_entries():

src/storage_engine/data_store.rs:350-353

  1. In Arrow Buffer creation (when arrow feature enabled):

simd-r-drive-entry-handle/src/entry_handle.rs:391-400 simd-r-drive-entry-handle/src/entry_handle.rs:431-440

These assertions ensure that the alignment guarantees are maintained throughout the system, catching bugs early during development without impacting production performance.

Sources: src/storage_engine/data_store.rs:20-21 src/storage_engine/data_store.rs:350-353 src/storage_engine/data_store.rs:410-413 src/storage_engine/data_store.rs:555-558 simd-r-drive-entry-handle/src/entry_handle.rs:391-400


Typed Slice Access and Reinterpretation

Due to the 64-byte payload alignment guarantee, EntryHandle payloads can often be reinterpreted as typed slices without copying. The alignment is sufficient for common types:

TypeSizeAlignment RequirementGuaranteed by 64-byte boundary?
u811
u1622
u3244
f3244
u6488
f6488
u1281616

Diagram: Typed slice reinterpretation decision flow

Example: Zero-Copy f32 Array Access

The 64-byte alignment boundary combined with standard type alignments means zero-copy reinterpretation typically succeeds for common numeric types, as long as:

  1. The payload was written in the expected binary format (e.g., little-endian)
  2. The payload length is an exact multiple of the type’s size

Apache Arrow Integration

When the arrow feature is enabled, EntryHandle can be converted to arrow::buffer::Buffer with zero copying:

This enables efficient integration with Apache Arrow’s columnar memory format without data copying. The Buffer keeps the Arc<EntryHandle> alive, which in turn keeps the underlying Arc<Mmap> valid.

Sources: README.md:51-59 simd-r-drive-entry-handle/src/entry_handle.rs:385-404


sequenceDiagram
    participant T1 as "Thread 1"
    participant T2 as "Thread 2"
    participant DS as "DataStore"
    participant MmapMutex as "Mutex&lt;Arc&lt;Mmap&gt;&gt;"
    participant MmapArc as "Arc&lt;Mmap&gt;"
    participant File as "Physical File"
    
    Note over DS: DataStore.mmap at\n[src/storage_engine/data_store.rs:29]()
    Note over DS: DataStore.get_mmap_arc() at\n[src/storage_engine/data_store.rs:658-663]()
    
    T1->>DS: read(b"key1")
    DS->>MmapMutex: lock()
    MmapMutex->>MmapArc: Arc::clone()
    MmapMutex-->>DS: Arc&lt;Mmap&gt;
    DS->>DS: create EntryHandle
    DS-->>T1: EntryHandle { mmap_arc, range, metadata }
    
    par Concurrent Read
        T2->>DS: read(b"key2")
        DS->>MmapMutex: lock()
        MmapMutex->>MmapArc: Arc::clone()
        MmapMutex-->>DS: Arc&lt;Mmap&gt;
        DS->>DS: create EntryHandle
        DS-->>T2: EntryHandle { mmap_arc, range, metadata }
    end
    
    T1->>T1: entry.as_slice()
    Note over T1,MmapArc: Zero-copy access:\n&mmap_arc[range.clone()]
    T1->>File: Read memory pages (OS manages)
    
    T2->>T2: entry.as_slice()
    Note over T2,MmapArc: Both threads read\nsame physical pages
    T2->>File: Read memory pages (shared)
    
    Note over T1,T2: No locks held during data access

Memory Sharing and Concurrency

The Arc<Mmap> design enables efficient memory sharing across threads while maintaining safety:

Diagram: Concurrent zero-copy reads via Arc sharing

Concurrency characteristics:

AspectImplementationBenefit
Lock-free readsOnce EntryHandle holds Arc<Mmap>, no further locks neededScales to thousands of concurrent readers
Write serializationWriters acquire RwLock<BufWriter<File>> at src/storage_engine/data_store.rs28Prevents write conflicts
Safe remappingMutex<Arc<Mmap>> at src/storage_engine/data_store.rs29 protects remap operationsOld readers keep valid mmap reference
Memory efficiencyAll EntryHandle instances share same Mmap physical pagesMinimal memory overhead per reader
Graceful updatesOld Arc<Mmap> valid until all handles droppedNo reader invalidation during writes

The get_mmap_arc() private method implements the clone-and-release pattern:

This pattern ensures:

  1. The lock is held only long enough to clone the Arc
  2. Readers never block each other
  3. Even if a write remaps the file, existing EntryHandle instances retain their valid mmap view

Sources: src/storage_engine/data_store.rs:28-29 src/storage_engine/data_store.rs:658-663 README.md:172-206


Datasets Larger Than RAM

Memory mapping enables efficient access to storage files that exceed available physical memory:

  • OS-managed paging : The operating system handles paging, loading only accessed regions into physical memory
  • Transparent access : Application code uses the same EntryHandle API regardless of file size
  • Efficient random access : Jumping between distant file offsets does not require explicit seeking
  • Memory pressure handling : Unused pages are evicted by the OS when memory is needed elsewhere

For example, a storage file containing 100 GB of data on a system with 16 GB of RAM can be accessed efficiently. Only the portions actively being read will occupy physical memory, with the OS managing page faults transparently.

This design makes SIMD R Drive suitable for:

  • Large-scale data analytics workloads
  • Multi-gigabyte append-only logs
  • Embedded databases on resource-constrained systems
  • Archive storage with infrequent random access patterns

Sources: README.md:43-49 README.md:147-148

Dismiss

Refresh this wiki

Enter email to refresh