This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Memory Management and Zero-Copy Access
Loading…
Memory Management and Zero-Copy Access
Relevant source files
This document explains how SIMD R Drive implements zero-copy reads through memory-mapped files, manages payload alignment for optimal performance, and provides safe concurrent access to stored data. It covers the EntryHandle abstraction, the Arc<Mmap> sharing strategy, alignment requirements enforced by PAYLOAD_ALIGNMENT, and utilities for working with aligned binary data.
For details on the on-disk entry format and metadata structure, see Entry Structure and Metadata. For information on how concurrent reads and writes are coordinated, see Concurrency and Thread Safety. For performance characteristics of the alignment strategy, see Payload Alignment and Cache Efficiency.
Memory-Mapped File Architecture
SIMD R Drive uses memory-mapped files (memmap2::Mmap) to enable zero-copy reads. The storage file is mapped into the process’s virtual address space, allowing direct access to payload bytes without copying them into separate buffers. This approach provides several benefits:
- Zero-copy reads : Data is accessed directly from the mapped memory region
- Larger-than-RAM datasets : The OS handles paging, so only accessed portions consume physical memory
- Efficient random access : Any offset in the file can be accessed with pointer arithmetic
- Shared memory : Multiple threads can read from the same mapped region concurrently
The memory-mapped file is wrapped in Arc<Mutex<Arc<Mmap>>> within the DataStore structure:
DataStore.mmap_arc: Arc<Mutex<Arc<Mmap>>>
The outer Arc allows the DataStore itself to be cloned and shared. The Mutex protects remapping operations (which occur after writes). The inner Arc<Mmap> is what gets cloned and handed out to readers, ensuring that even if a remap occurs, existing readers retain a valid view of the previous mapping.
Sources: README.md:43-49 README.md:174-180 src/storage_engine/entry_iterator.rs:22-23
Mmap Lifecycle and Remapping
Diagram: Memory-mapped file lifecycle across write operations
When a write operation extends the file, the following sequence occurs:
- The file is extended and flushed to disk
- The
Mutex<Arc<Mmap>>is locked - A new
Mmapis created from the extended file - The old
Arc<Mmap>is replaced with a new one - Existing
EntryHandleinstances continue using their oldArc<Mmap>reference until dropped
This design ensures that readers never see an invalid memory mapping, even as writes extend the file.
Sources: README.md:174-180 src/storage_engine/data_store.rs:46-48 (implied from architecture)
EntryHandle: Zero-Copy Data Access
The EntryHandle struct provides a zero-copy view into a specific payload within the memory-mapped file. It consists of three fields:
Each EntryHandle holds:
mmap_arc: A reference-counted pointer to the memory-mapped file (shared across all handles)range: The byte range<FileRef file-url="https://github.com/jzombie/rust-simd-r-drive/blob/1665f50d/start..end)identifying the payload location within the mmap\n-metadata#LNaN-LNaN“ NaN file-path=“start..end)identifying the payload location within the mmap\n- **metadata**">Hii</FileRef> and is marked#[derive(Debug, Clone)]`, allowing it to be cloned to create additional handles referencing the same underlying memory.
Sources: simd-r-drive-entry-handle/src/entry_handle.rs:7-19
EntryHandle Methods
Diagram: EntryHandle method categories
Core Zero-Copy Access Methods:
| Method | Signature | Description | Lines |
|---|---|---|---|
as_slice() | &self -> &[u8] | Returns zero-copy slice into mmap | simd-r-drive-entry-handle/src/entry_handle.rs:151-155 |
Deref trait | &self -> &[u8] | Enables *handle syntax for slice access | simd-r-drive-entry-handle/src/entry_handle.rs:36-42 |
AsRef<[u8]> | &self -> &[u8] | Standard trait for byte slice conversion | simd-r-drive-entry-handle/src/entry_handle.rs:447-451 |
clone_arc() | &self -> EntryHandle | Creates new handle sharing same mmap Arc | simd-r-drive-entry-handle/src/entry_handle.rs:179-185 |
Metadata Access Methods:
| Method | Signature | Description | Lines |
|---|---|---|---|
metadata() | &self -> &EntryMetadata | Returns reference to entry metadata | simd-r-drive-entry-handle/src/entry_handle.rs:196-198 |
key_hash() | &self -> u64 | Returns 64-bit XXH3 key hash | simd-r-drive-entry-handle/src/entry_handle.rs:227-229 |
checksum() | &self -> u32 | Returns CRC32C checksum | simd-r-drive-entry-handle/src/entry_handle.rs:237-239 |
raw_checksum() | &self -> [u8; 4] | Returns raw checksum bytes | simd-r-drive-entry-handle/src/entry_handle.rs:247-249 |
is_valid_checksum() | &self -> bool | Validates payload integrity | simd-r-drive-entry-handle/src/entry_handle.rs:260-275 |
Offset and Size Methods:
| Method | Signature | Description | Lines |
|---|---|---|---|
size() | &self -> usize | Returns payload size (bytes) | simd-r-drive-entry-handle/src/entry_handle.rs:204-206 |
file_size() | &self -> usize | Returns payload + metadata size | simd-r-drive-entry-handle/src/entry_handle.rs:212-214 |
start_offset() | &self -> usize | Returns absolute start offset in file | simd-r-drive-entry-handle/src/entry_handle.rs:283-285 |
end_offset() | &self -> usize | Returns absolute end offset in file | simd-r-drive-entry-handle/src/entry_handle.rs:293-295 |
offset_range() | &self -> Range<usize> | Returns byte range in file | simd-r-drive-entry-handle/src/entry_handle.rs:303-305 |
address_range() | &self -> Range<*const u8> | Returns virtual memory pointer range | simd-r-drive-entry-handle/src/entry_handle.rs:315-320 |
Construction Methods:
| Method | Signature | Description | Lines |
|---|---|---|---|
from_owned_bytes_anon() | (bytes, key_hash) -> Result<Self> | Creates in-memory entry via anonymous mmap | simd-r-drive-entry-handle/src/entry_handle.rs:87-113 |
from_arc_mmap() | (Arc<Mmap>, Range, metadata) -> Self | Wraps region in existing mmap (zero-copy) | simd-r-drive-entry-handle/src/entry_handle.rs:129-139 |
Feature-Gated Methods:
| Method | Feature Flag | Description | Lines |
|---|---|---|---|
mmap_arc() | expose-internal-api | Returns reference to underlying Arc<Mmap> | simd-r-drive-entry-handle/src/entry_handle.rs:349-351 |
as_arrow_buffer() | arrow | Creates zero-copy Arrow Buffer | simd-r-drive-entry-handle/src/entry_handle.rs:385-404 |
into_arrow_buffer() | arrow | Consumes handle into Arrow Buffer | simd-r-drive-entry-handle/src/entry_handle.rs:425-444 |
The primary zero-copy access method as_slice() is implemented as:
This returns a slice directly referencing the memory-mapped region without copying data.
Sources: simd-r-drive-entry-handle/src/entry_handle.rs:36-451
Alignment Requirements
All non-tombstone payloads in SIMD R Drive begin on a fixed 64-byte aligned boundary. This alignment is defined by the PAYLOAD_ALIGNMENT constant:
PAYLOAD_ALIGN_LOG2 = 6
PAYLOAD_ALIGNMENT = 1 << 6 = 64 bytes
The 64-byte alignment matches typical CPU cache line sizes and ensures that SIMD operations (AVX, AVX-512, SVE) can operate at full speed without crossing cache line boundaries.
Sources: simd-r-drive-entry-handle/src/constants.rs:13-18 README.md:51-59
Alignment Calculation
The pre-padding required to align a payload is calculated using the prepad_len() function in DataStore:
Where offset is the file offset immediately after the previous entry’s metadata (the prev_tail). This formula ensures:
- Payloads always start at multiples of
PAYLOAD_ALIGNMENT(64 bytes) - Pre-padding ranges from 0 to 63 bytes
- The calculation works for any power-of-two alignment value
Diagram: prepad_len() calculation with examples
During write operations, the storage engine:
- Calls
prepad_len(prev_tail)to determine padding needed - Writes zero bytes for the pre-padding region (if
pad > 0) - Writes the payload starting at the aligned boundary
- Writes metadata immediately after payload
This is implemented in the write path at src/storage_engine/data_store.rs:765-771 and src/storage_engine/data_store.rs:908-914 for streaming and batch writes respectively.
Sources: src/storage_engine/data_store.rs:670-673 src/storage_engine/data_store.rs:765-771 README.md:112-124
Alignment Validation
The crate provides debug-only assertions to verify alignment invariants. These are defined in the simd-r-drive-entry-handle crate and used throughout the storage engine:
Assertion functions:
debug_assert_aligned(ptr: *const u8, align: usize)- Validates pointer alignmentdebug_assert_aligned_offset(offset: u64)- Validates file offset alignment toPAYLOAD_ALIGNMENT
These assertions:
- Are compiled to no-ops in release builds (zero runtime cost)
- Execute in debug and test builds to catch alignment violations
- Panic with descriptive error messages if alignment is incorrect
Diagram: debug_assert_aligned_offset() execution flow
The assertions are used at key points in the codebase:
- During entry construction in
read_entry_with_context():
src/storage_engine/data_store.rs:555-558
- During recovery in
recover_valid_chain():
src/storage_engine/data_store.rs:410-413
- During parallel iteration in
par_iter_entries():
src/storage_engine/data_store.rs:350-353
- In Arrow Buffer creation (when
arrowfeature enabled):
simd-r-drive-entry-handle/src/entry_handle.rs:391-400 simd-r-drive-entry-handle/src/entry_handle.rs:431-440
These assertions ensure that the alignment guarantees are maintained throughout the system, catching bugs early during development without impacting production performance.
Sources: src/storage_engine/data_store.rs:20-21 src/storage_engine/data_store.rs:350-353 src/storage_engine/data_store.rs:410-413 src/storage_engine/data_store.rs:555-558 simd-r-drive-entry-handle/src/entry_handle.rs:391-400
Typed Slice Access and Reinterpretation
Due to the 64-byte payload alignment guarantee, EntryHandle payloads can often be reinterpreted as typed slices without copying. The alignment is sufficient for common types:
| Type | Size | Alignment Requirement | Guaranteed by 64-byte boundary? |
|---|---|---|---|
u8 | 1 | 1 | ✓ |
u16 | 2 | 2 | ✓ |
u32 | 4 | 4 | ✓ |
f32 | 4 | 4 | ✓ |
u64 | 8 | 8 | ✓ |
f64 | 8 | 8 | ✓ |
u128 | 16 | 16 | ✓ |
Diagram: Typed slice reinterpretation decision flow
Example: Zero-Copy f32 Array Access
The 64-byte alignment boundary combined with standard type alignments means zero-copy reinterpretation typically succeeds for common numeric types, as long as:
- The payload was written in the expected binary format (e.g., little-endian)
- The payload length is an exact multiple of the type’s size
Apache Arrow Integration
When the arrow feature is enabled, EntryHandle can be converted to arrow::buffer::Buffer with zero copying:
This enables efficient integration with Apache Arrow’s columnar memory format without data copying. The Buffer keeps the Arc<EntryHandle> alive, which in turn keeps the underlying Arc<Mmap> valid.
Sources: README.md:51-59 simd-r-drive-entry-handle/src/entry_handle.rs:385-404
sequenceDiagram
participant T1 as "Thread 1"
participant T2 as "Thread 2"
participant DS as "DataStore"
participant MmapMutex as "Mutex<Arc<Mmap>>"
participant MmapArc as "Arc<Mmap>"
participant File as "Physical File"
Note over DS: DataStore.mmap at\n[src/storage_engine/data_store.rs:29]()
Note over DS: DataStore.get_mmap_arc() at\n[src/storage_engine/data_store.rs:658-663]()
T1->>DS: read(b"key1")
DS->>MmapMutex: lock()
MmapMutex->>MmapArc: Arc::clone()
MmapMutex-->>DS: Arc<Mmap>
DS->>DS: create EntryHandle
DS-->>T1: EntryHandle { mmap_arc, range, metadata }
par Concurrent Read
T2->>DS: read(b"key2")
DS->>MmapMutex: lock()
MmapMutex->>MmapArc: Arc::clone()
MmapMutex-->>DS: Arc<Mmap>
DS->>DS: create EntryHandle
DS-->>T2: EntryHandle { mmap_arc, range, metadata }
end
T1->>T1: entry.as_slice()
Note over T1,MmapArc: Zero-copy access:\n&mmap_arc[range.clone()]
T1->>File: Read memory pages (OS manages)
T2->>T2: entry.as_slice()
Note over T2,MmapArc: Both threads read\nsame physical pages
T2->>File: Read memory pages (shared)
Note over T1,T2: No locks held during data access
Memory Sharing and Concurrency
The Arc<Mmap> design enables efficient memory sharing across threads while maintaining safety:
Diagram: Concurrent zero-copy reads via Arc sharing
Concurrency characteristics:
| Aspect | Implementation | Benefit |
|---|---|---|
| Lock-free reads | Once EntryHandle holds Arc<Mmap>, no further locks needed | Scales to thousands of concurrent readers |
| Write serialization | Writers acquire RwLock<BufWriter<File>> at src/storage_engine/data_store.rs28 | Prevents write conflicts |
| Safe remapping | Mutex<Arc<Mmap>> at src/storage_engine/data_store.rs29 protects remap operations | Old readers keep valid mmap reference |
| Memory efficiency | All EntryHandle instances share same Mmap physical pages | Minimal memory overhead per reader |
| Graceful updates | Old Arc<Mmap> valid until all handles dropped | No reader invalidation during writes |
The get_mmap_arc() private method implements the clone-and-release pattern:
This pattern ensures:
- The lock is held only long enough to clone the
Arc - Readers never block each other
- Even if a write remaps the file, existing
EntryHandleinstances retain their valid mmap view
Sources: src/storage_engine/data_store.rs:28-29 src/storage_engine/data_store.rs:658-663 README.md:172-206
Datasets Larger Than RAM
Memory mapping enables efficient access to storage files that exceed available physical memory:
- OS-managed paging : The operating system handles paging, loading only accessed regions into physical memory
- Transparent access : Application code uses the same
EntryHandleAPI regardless of file size - Efficient random access : Jumping between distant file offsets does not require explicit seeking
- Memory pressure handling : Unused pages are evicted by the OS when memory is needed elsewhere
For example, a storage file containing 100 GB of data on a system with 16 GB of RAM can be accessed efficiently. Only the portions actively being read will occupy physical memory, with the OS managing page faults transparently.
This design makes SIMD R Drive suitable for:
- Large-scale data analytics workloads
- Multi-gigabyte append-only logs
- Embedded databases on resource-constrained systems
- Archive storage with infrequent random access patterns
Sources: README.md:43-49 README.md:147-148
Dismiss
Refresh this wiki
Enter email to refresh