This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Write and Read Modes
Loading…
Write and Read Modes
Relevant source files
Purpose and Scope
This document describes the three write modes and three read modes available in SIMD R Drive, implemented through the DataStoreWriter and DataStoreReader traits. Each mode provides specific operational characteristics, performance trade-offs, and concurrency behaviors suited to different workload patterns.
Write Modes (via DataStoreWriter trait):
- Single Entry : Atomic single key-value write with immediate flush
- Batch Entry : Multiple key-value pairs written in one atomic operation
- Streaming : Incremental write from
Readsource for large payloads
Read Modes (via DataStoreReader trait):
- Direct Access : Zero-copy memory-mapped lookups returning
EntryHandle - Streaming : Buffered incremental reads via
EntryStreamimplementingstd::io::Read - Parallel Iteration : Rayon-powered multi-threaded dataset scanning
For information about the underlying SIMD acceleration that optimizes these operations, see page 5.1. For details about the alignment strategy that enables efficient reads, see page 5.2.
Sources: README.md:29-36 README.md:208-246 src/traits.rs src/storage_engine/data_store.rs:752-1182
Write Modes
SIMD R Drive provides three distinct write modes, each optimized for different usage patterns. All write modes acquire an exclusive write lock (RwLock<BufWriter<File>>) to ensure thread safety and data consistency.
Single Entry Write
The single entry write mode writes one key-value pair atomically and flushes immediately to disk.
Trait: DataStoreWriter src/traits.rs
Primary Methods:
write(key: &[u8], payload: &[u8]) -> Result<u64>src/storage_engine/data_store.rs:827-830write_with_key_hash(key_hash: u64, payload: &[u8]) -> Result<u64>src/storage_engine/data_store.rs:832-834
Both methods internally delegate to batch_write_with_key_hashes() with a single-element vector src/storage_engine/data_store.rs833
Operation Flow:
Title: Single Entry Write Call Chain
Characteristics:
- Latency : Lowest for single operations (immediate flush)
- Throughput : Lower due to per-write overhead
- Disk I/O : One flush operation per write
- Use Case : Interactive operations, real-time updates, critical writes requiring immediate durability
Key Implementation Details:
| Component | Code Entity | Location |
|---|---|---|
| Hash computation | compute_hash(key) | src/storage_engine/digest.rs |
| Alignment calculation | DataStore::prepad_len(offset: u64) | src/storage_engine/data_store.rs:670-673 |
| Checksum | compute_checksum(payload) using CRC32C | src/storage_engine/digest.rs |
| Memory copy | simd_copy(dest, src) | src/storage_engine/mod.rs |
| Metadata | EntryMetadata::serialize() | simd-r-drive-entry-handle/src/lib.rs |
| Reindex | DataStore::reindex() | src/storage_engine/data_store.rs:224-259 |
Sources: README.md:212-215 src/storage_engine/data_store.rs:827-834 src/storage_engine/data_store.rs:847-939
Batch Write
Batch write mode writes multiple key-value pairs in a single atomic operation, flushing only once at the end.
Trait: DataStoreWriter src/traits.rs
Primary Methods:
batch_write(entries: &[(&[u8], &[u8])]) -> Result<u64>src/storage_engine/data_store.rs:838-843batch_write_with_key_hashes(prehashed_keys: Vec<(u64, &[u8])>, allow_null_bytes: bool) -> Result<u64>src/storage_engine/data_store.rs:847-951
The batch_write() method computes hashes using compute_hash_batch() and delegates to batch_write_with_key_hashes() src/storage_engine/data_store.rs:839-842
Operation Flow:
Title: Batch Write Buffer Construction and Flush
Characteristics:
- Latency : Higher per-entry latency (amortized)
- Throughput : Significantly higher due to reduced disk I/O
- Disk I/O : Single flush for entire batch
- Memory : Builds entries in-memory buffer before writing src/storage_engine/data_store.rs:857-898
- Use Case : Bulk imports, batch processing, high-throughput ingestion
Performance Optimization:
The batch implementation constructs all entries in a single Vec<u8> buffer before any disk I/O src/storage_engine/data_store.rs:857-858 minimizing lock contention and maximizing sequential write performance.
Key optimization points:
- Single lock acquisition for entire batch src/storage_engine/data_store.rs:852-855
- Single
write_all()call for all entries src/storage_engine/data_store.rs933 - Single
flush()call at end src/storage_engine/data_store.rs934 - Single
reindex()call updating all offsets atomically src/storage_engine/data_store.rs936 - SIMD-accelerated copy via
simd_copy()src/storage_engine/data_store.rs925
Tombstone Support:
Batch writes support deletion markers when allow_null_bytes is true src/storage_engine/data_store.rs850:
- Tombstones write a single
NULL_BYTE(0x00) src/storage_engine/data_store.rs889 - No pre-padding applied to tombstones src/storage_engine/data_store.rs872
- Metadata links to previous tail as usual src/storage_engine/data_store.rs:873-886
- Used internally by
delete()andbatch_delete()operations
Sources: README.md:216-219 src/storage_engine/data_store.rs:838-951 benches/storage_benchmark.rs:85-92
Streaming Write
Streaming write mode writes large payloads incrementally from a Read source without requiring full in-memory buffering.
Trait: DataStoreWriter src/traits.rs
Primary Methods:
write_stream<R: Read>(key: &[u8], reader: &mut R) -> Result<u64>src/storage_engine/data_store.rs:753-756write_stream_with_key_hash<R: Read>(key_hash: u64, reader: &mut R) -> Result<u64>src/storage_engine/data_store.rs:758-825
The generic R: Read parameter allows streaming from any Read implementation: files, network streams, in-memory buffers, etc.
Operation Flow:
Title: Streaming Write Incremental Read Loop
Characteristics:
- Memory Footprint : Constant (4096-byte buffer) src/storage_engine/constants.rs
- Payload Size : Unbounded (supports arbitrarily large entries)
- Disk I/O : Incremental writes, single flush at end
- Use Case : Large file storage, network streams, memory-constrained environments
Implementation Details:
The streaming write uses a fixed-size buffer and performs incremental writes while computing the checksum:
| Component | Code Entity | Size/Type | Location |
|---|---|---|---|
| Read Buffer | vec![0; WRITE_STREAM_BUFFER_SIZE] | 4096 bytes | src/storage_engine/data_store.rs773 |
| Checksum State | crc32fast::Hasher | Incremental CRC32C | src/storage_engine/data_store.rs775 |
| Pre-pad | Written via &pad[..prepad] | 0-63 bytes | src/storage_engine/data_store.rs:769-770 |
| Metadata | EntryMetadata::serialize() | 20 bytes | src/storage_engine/data_store.rs:808-813 |
| Buffer constant | WRITE_STREAM_BUFFER_SIZE | 4096 | src/storage_engine/constants.rs |
Validation:
The implementation validates payloads before writing:
- Empty payload check : Returns error if
total_written == 0src/storage_engine/data_store.rs:799-804 - NULL-byte-only check : Returns error if payload contains only
NULL_BYTE(0x00) src/storage_engine/data_store.rs:792-797 - Both checks prevent conflicts with tombstone format (single NULL byte)
Sources: README.md:220-223 src/storage_engine/data_store.rs:753-825 src/storage_engine/constants.rs
Write Mode Comparison
Performance Table:
| Write Mode | Lock Duration | Disk Flushes | Memory Usage | Best For |
|---|---|---|---|---|
| Single | Short (per write) | 1 per write | Minimal | Interactive operations, real-time updates |
| Batch | Medium (entire batch) | 1 per batch | Buffer size × entries | Bulk imports, high throughput |
| Streaming | Long (entire stream) | 1 per stream | 4096 bytes (constant) | Large files, memory-constrained |
Throughput Characteristics:
Based on benchmark results benches/storage_benchmark.rs:52-83:
Sources: benches/storage_benchmark.rs:52-92 README.md:208-223
Read Modes
SIMD R Drive provides three read modes optimized for different access patterns. All read modes leverage zero-copy access through memory-mapped files.
Direct Read
Direct read mode provides immediate, zero-copy access to stored entries through EntryHandle.
Trait: DataStoreReader src/traits.rs
Primary Methods:
read(key: &[u8]) -> Result<Option<EntryHandle>>src/storage_engine/data_store.rs:1040-1049read_with_key_hash(prehashed_key: u64) -> Result<Option<EntryHandle>>src/storage_engine/data_store.rs:1051-1059batch_read(keys: &[&[u8]]) -> Result<Vec<Option<EntryHandle>>>src/storage_engine/data_store.rs:1105-1109batch_read_hashed_keys(prehashed_keys: &[u64], non_hashed_keys: Option<&[&[u8]]>) -> Result<Vec<Option<EntryHandle>>>src/storage_engine/data_store.rs:1111-1158exists(key: &[u8]) -> Result<bool>src/storage_engine/data_store.rs:1030-1032
Operation Flow:
Title: Direct Read Index Lookup and EntryHandle Construction
Characteristics:
- Latency : Minimal (single hash lookup + pointer arithmetic)
- Memory : Zero-copy (returns view into mmap)
- Concurrency : Lock-free reads (except brief index lock)
- Use Case : Random access, key-value lookups, real-time queries
Zero-Copy Guarantee:
The EntryHandle struct provides direct memory-mapped access without payload copying:
EntryHandle Structure simd-r-drive-entry-handle/src/lib.rs:
Key characteristics:
- Implements
Deref<Target = [u8]>for direct slice access simd-r-drive-entry-handle/src/lib.rs Arc<Mmap>allows concurrent readers with same data view- Metadata deserialized once during construction src/storage_engine/data_store.rs529
- No heap allocation for payload data
Batch Read Optimization:
The batch_read() method optimizes multiple key lookups src/storage_engine/data_store.rs:1105-1158:
- Single lock acquisition for entire batch src/storage_engine/data_store.rs:1117-1120
- Single
get_mmap_arc()call shared across all entries src/storage_engine/data_store.rs1116 - Vectorized hash computation via
compute_hash_batch()src/storage_engine/data_store.rs1106 - Iterator-based result collection src/storage_engine/data_store.rs:1134-1154
Returns Vec<Option<EntryHandle>> with same order as input keys.
Sources: README.md:228-233 src/storage_engine/data_store.rs:502-565 src/storage_engine/data_store.rs:1105-1158 simd-r-drive-entry-handle/src/lib.rs
Streaming Read
Streaming read mode provides incremental, buffered access to large entries without loading them fully into memory.
Primary Structure:
EntryStreamstruct src/storage_engine/entry_stream.rs- Implements
std::io::Readtrait src/storage_engine/entry_stream.rs - Constructed via
EntryStream::from(entry_handle: EntryHandle)src/storage_engine/entry_stream.rs
Operation Flow:
Title: EntryStream Buffered Read Implementation
Characteristics:
- Memory Footprint : 8192-byte internal buffer src/storage_engine/entry_stream.rs
- Copy Behavior : Non-zero-copy (copies from mmap to buffer)
- Payload Size : Supports arbitrarily large entries
- Use Case : Processing large entries incrementally, network transmission, streaming transformations
Implementation Details:
| Component | Code Entity | Size/Type |
|---|---|---|
| Internal buffer | inner_buffer: Vec<u8> | 8192 bytes |
| Position tracker | position: usize | Tracks read progress in mmap |
| Source data | EntryHandle.mmap_arc | Zero-copy reference to mmap |
| Payload bounds | EntryHandle.range | Start/end offsets |
Why Non-Zero-Copy:
Despite sourcing from mmap, EntryStream performs buffered copies:
- Controlled memory pressure : Constant 8KB footprint regardless of payload size
std::io::Readcompatibility: Standard interface for streaming I/O- Incremental processing : Avoids loading multi-GB payloads into memory
- OS page management : Reduces page faults for sequential access patterns
Zero-Copy Alternative:
For true zero-copy access, use direct read mode: storage.read(key)?.map(|handle| handle.as_slice()). This provides &[u8] directly from mmap without intermediate copying.
Sources: README.md:234-241 src/storage_engine/entry_stream.rs
Parallel Iteration
Parallel iteration mode uses Rayon to process all valid entries across multiple threads (requires parallel feature).
Primary Methods:
DataStore::iter_entries() -> EntryIteratorsrc/storage_engine/data_store.rs:276-280DataStore::par_iter_entries() -> impl ParallelIterator<Item = EntryHandle>src/storage_engine/data_store.rs:296-361 (requiresparallelfeature)DataStore::into_iter() -> EntryIteratorviaIntoIteratorimpl src/storage_engine/data_store.rs:44-51
Related Structures:
EntryIteratorstruct src/storage_engine/entry_iterator.rs- Rayon’s
ParallelIteratortrait [external dependency]
Operation Flow:
Title: Parallel Iteration Work Distribution
Characteristics:
- Throughput : Scales with CPU cores
- Concurrency : Work-stealing via Rayon
- Memory : Minimal overhead (offsets collected upfront)
- Use Case : Bulk analytics, dataset scanning, cache warming, batch transformations
Implementation Strategy:
The parallel iterator minimizes lock contention via upfront collection src/storage_engine/data_store.rs:296-361:
Phase 1: Preparation (Sequential, Locked)
- Acquire read lock on
RwLock<KeyIndexer>src/storage_engine/data_store.rs300 - Collect all packed values:
Vec<u64>src/storage_engine/data_store.rs301 - Release lock immediately src/storage_engine/data_store.rs302
- Clone
Arc<Mmap>once, moved into iterator src/storage_engine/data_store.rs305
Phase 2: Processing (Parallel, Lock-Free) 5. Convert to Rayon parallel iterator: into_par_iter() src/storage_engine/data_store.rs310 6. Each worker thread processes subset of offsets src/storage_engine/data_store.rs:310-360:
- Unpacks
(tag, offset)viaKeyIndexer::unpack()src/storage_engine/data_store.rs311 - Validates bounds:
offset + METADATA_SIZE <= mmap_arc.len()src/storage_engine/data_store.rs:317-319 - Deserializes metadata:
EntryMetadata::deserialize()src/storage_engine/data_store.rs322 - Derives entry start from
prev_offset + prepad_len()src/storage_engine/data_store.rs328 - Filters tombstones (single NULL byte) src/storage_engine/data_store.rs:346-348
- Constructs
EntryHandlewith clonedmmap_arcsrc/storage_engine/data_store.rs:355-359
Work Stealing: Rayon’s work-stealing scheduler automatically balances load across available CPU cores. No manual thread management required.
Sequential Iteration:
For single-threaded scanning, use iter_entries() which returns EntryIterator src/storage_engine/data_store.rs:276-280 The DataStore also implements IntoIterator src/storage_engine/data_store.rs:44-51
Sources: README.md:242-246 src/storage_engine/data_store.rs:276-361 benches/storage_benchmark.rs:98-118
Read Mode Comparison
Performance Table:
| Read Mode | Access Pattern | Memory Copy | Concurrency | Best For |
|---|---|---|---|---|
| Direct | Random/lookup | Zero-copy | Lock-free | Key-value queries, random access |
| Streaming | Sequential/buffered | Buffered copy | Single reader | Large entry processing |
| Parallel | Full scan | Zero-copy | Multi-threaded | Bulk analytics, dataset scanning |
Throughput Characteristics:
Based on benchmark measurements benches/storage_benchmark.rs:
| Operation | Throughput (1M entries, 8 bytes) | Notes |
|---|---|---|
| Sequential iteration | ~millions/s | Zero-copy, cache-friendly |
| Random single reads | ~1M reads/s | Hash lookup + bounds check |
| Batch reads | ~1M reads/s | Vectorized index access |
Sources: benches/storage_benchmark.rs:98-203 README.md:224-246
Performance Optimization Strategies
Write Optimization
Batching Strategy:
- Group writes into batches of 1024-10000 entries for optimal throughput
- Balance batch size against latency requirements
- Use streaming for payloads > 1 MB to avoid memory pressure
Lock Contention:
- All writes acquire the same
RwLock<BufWriter<File>> - Increase batch size to amortize lock overhead
- Consider application-level write queuing for highly concurrent workloads
Read Optimization
Access Pattern Matching:
| Access Pattern | Recommended Mode | Reason |
|---|---|---|
| Random lookups | Direct read | O(1) hash lookup, zero-copy |
| Known key sets | Batch read | Amortized lock overhead |
| Full dataset scan | Sequential iteration | Cache-friendly forward traversal |
| Parallel analytics | Parallel iteration | Scales with CPU cores |
| Large entry processing | Streaming read | Constant memory footprint |
Memory-Mapped File Behavior:
The OS manages mmap pages transparently:
- Working set : Only accessed regions loaded into RAM
- Large datasets : Can exceed available RAM (pages swapped on demand)
- Cache warming : Sequential iteration benefits from read-ahead
- Random access : May trigger page faults (disk I/O) on cold reads
Sources: benches/storage_benchmark.rs README.md:43-50
Concurrency Considerations
Write Concurrency
All write modes acquire the same exclusive lock and are mutually exclusive :
Title: Write Lock Contention Model
Lock acquisition: self.file.write() src/storage_engine/data_store.rs:759-762 src/storage_engine/data_store.rs:852-855
Implication: High write concurrency benefits from:
- Using
batch_write()to amortize lock overhead - Application-level write queuing/buffering
- Accepting increased per-write latency for higher throughput
graph TB
read_thread1["Thread 1: read()"]
read_thread2["Thread 2: batch_read()"]
read_thread3["Thread 3: par_iter_entries()"]
index_lock["Arc<RwLock<KeyIndexer>>\nself.key_indexer.read() - Shared"]
get_mmap_arc["self.get_mmap_arc()"]
mmap_lock["Mutex<Arc<Mmap>>"]
mmap_arc["Arc<Mmap> (cloned)"]
read_thread1 --> get_mmap_arc
read_thread2 --> get_mmap_arc
read_thread3 --> get_mmap_arc
get_mmap_arc --> mmap_lock
mmap_lock -->|brief lock| mmap_arc
read_thread1 --> index_lock
read_thread2 --> index_lock
read_thread3 --> index_lock
index_lock -->|concurrent read access| lookup["KeyIndexer::get_packed()"]
subgraph "Lock-Free Phase"
mmap_access["Direct &[u8] access via Arc<Mmap>"]
entry_handle["Construct EntryHandle"]
end
mmap_arc --> mmap_access
lookup --> entry_handle
mmap_access --> entry_handle
subgraph "Write Thread (Independent)"
write_thread["Thread 4: write()"]
file_lock_write["Arc<RwLock<BufWriter<File>>>.write()"]
reindex_call["reindex()"]
new_mmap["Creates new Arc<Mmap>"]
end
write_thread --> file_lock_write
file_lock_write --> reindex_call
reindex_call -.->|Replaces in Mutex| new_mmap
new_mmap -.->|Old Arc<Mmap> refs remain valid| mmap_arc
Read Concurrency
Read operations acquire brief read locks and access shared Arc<Mmap> references:
Title: Read Lock Independence Model
Key characteristics:
- Multiple readers : Concurrent reads via shared
RwLockread locks - Lock independence : Write lock (
RwLock<BufWriter>) separate from read locks (RwLock<KeyIndexer>) - Mmap stability : Readers hold
Arc<Mmap>clones; writes create new mmap but old references remain valid - Eventual consistency : New reads see updates after
reindex()completes src/storage_engine/data_store.rs:224-259
Sources: README.md:170-207 src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:658-663
Usage Examples
Write Mode Selection
Single Write (Real-Time Updates):
Batch Write (Bulk Import):
Streaming Write (Large Files):
Read Mode Selection
Direct Read (Key Lookup):
Streaming Read (Large Entry Processing):
Parallel Iteration (Dataset Analytics):
Sources: README.md src/storage_engine/data_store.rs benches/storage_benchmark.rs
Dismiss
Refresh this wiki
Enter email to refresh