Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Write and Read Modes

Loading…

Write and Read Modes

Relevant source files

Purpose and Scope

This document describes the three write modes and three read modes available in SIMD R Drive, implemented through the DataStoreWriter and DataStoreReader traits. Each mode provides specific operational characteristics, performance trade-offs, and concurrency behaviors suited to different workload patterns.

Write Modes (via DataStoreWriter trait):

  • Single Entry : Atomic single key-value write with immediate flush
  • Batch Entry : Multiple key-value pairs written in one atomic operation
  • Streaming : Incremental write from Read source for large payloads

Read Modes (via DataStoreReader trait):

  • Direct Access : Zero-copy memory-mapped lookups returning EntryHandle
  • Streaming : Buffered incremental reads via EntryStream implementing std::io::Read
  • Parallel Iteration : Rayon-powered multi-threaded dataset scanning

For information about the underlying SIMD acceleration that optimizes these operations, see page 5.1. For details about the alignment strategy that enables efficient reads, see page 5.2.

Sources: README.md:29-36 README.md:208-246 src/traits.rs src/storage_engine/data_store.rs:752-1182


Write Modes

SIMD R Drive provides three distinct write modes, each optimized for different usage patterns. All write modes acquire an exclusive write lock (RwLock<BufWriter<File>>) to ensure thread safety and data consistency.

Single Entry Write

The single entry write mode writes one key-value pair atomically and flushes immediately to disk.

Trait: DataStoreWriter src/traits.rs

Primary Methods:

Both methods internally delegate to batch_write_with_key_hashes() with a single-element vector src/storage_engine/data_store.rs833

Operation Flow:

Title: Single Entry Write Call Chain

Characteristics:

  • Latency : Lowest for single operations (immediate flush)
  • Throughput : Lower due to per-write overhead
  • Disk I/O : One flush operation per write
  • Use Case : Interactive operations, real-time updates, critical writes requiring immediate durability

Key Implementation Details:

ComponentCode EntityLocation
Hash computationcompute_hash(key)src/storage_engine/digest.rs
Alignment calculationDataStore::prepad_len(offset: u64)src/storage_engine/data_store.rs:670-673
Checksumcompute_checksum(payload) using CRC32Csrc/storage_engine/digest.rs
Memory copysimd_copy(dest, src)src/storage_engine/mod.rs
MetadataEntryMetadata::serialize()simd-r-drive-entry-handle/src/lib.rs
ReindexDataStore::reindex()src/storage_engine/data_store.rs:224-259

Sources: README.md:212-215 src/storage_engine/data_store.rs:827-834 src/storage_engine/data_store.rs:847-939


Batch Write

Batch write mode writes multiple key-value pairs in a single atomic operation, flushing only once at the end.

Trait: DataStoreWriter src/traits.rs

Primary Methods:

The batch_write() method computes hashes using compute_hash_batch() and delegates to batch_write_with_key_hashes() src/storage_engine/data_store.rs:839-842

Operation Flow:

Title: Batch Write Buffer Construction and Flush

Characteristics:

  • Latency : Higher per-entry latency (amortized)
  • Throughput : Significantly higher due to reduced disk I/O
  • Disk I/O : Single flush for entire batch
  • Memory : Builds entries in-memory buffer before writing src/storage_engine/data_store.rs:857-898
  • Use Case : Bulk imports, batch processing, high-throughput ingestion

Performance Optimization:

The batch implementation constructs all entries in a single Vec<u8> buffer before any disk I/O src/storage_engine/data_store.rs:857-858 minimizing lock contention and maximizing sequential write performance.

Key optimization points:

  1. Single lock acquisition for entire batch src/storage_engine/data_store.rs:852-855
  2. Singlewrite_all() call for all entries src/storage_engine/data_store.rs933
  3. Singleflush() call at end src/storage_engine/data_store.rs934
  4. Singlereindex() call updating all offsets atomically src/storage_engine/data_store.rs936
  5. SIMD-accelerated copy via simd_copy() src/storage_engine/data_store.rs925

Tombstone Support:

Batch writes support deletion markers when allow_null_bytes is true src/storage_engine/data_store.rs850:

Sources: README.md:216-219 src/storage_engine/data_store.rs:838-951 benches/storage_benchmark.rs:85-92


Streaming Write

Streaming write mode writes large payloads incrementally from a Read source without requiring full in-memory buffering.

Trait: DataStoreWriter src/traits.rs

Primary Methods:

The generic R: Read parameter allows streaming from any Read implementation: files, network streams, in-memory buffers, etc.

Operation Flow:

Title: Streaming Write Incremental Read Loop

Characteristics:

  • Memory Footprint : Constant (4096-byte buffer) src/storage_engine/constants.rs
  • Payload Size : Unbounded (supports arbitrarily large entries)
  • Disk I/O : Incremental writes, single flush at end
  • Use Case : Large file storage, network streams, memory-constrained environments

Implementation Details:

The streaming write uses a fixed-size buffer and performs incremental writes while computing the checksum:

ComponentCode EntitySize/TypeLocation
Read Buffervec![0; WRITE_STREAM_BUFFER_SIZE]4096 bytessrc/storage_engine/data_store.rs773
Checksum Statecrc32fast::HasherIncremental CRC32Csrc/storage_engine/data_store.rs775
Pre-padWritten via &pad[..prepad]0-63 bytessrc/storage_engine/data_store.rs:769-770
MetadataEntryMetadata::serialize()20 bytessrc/storage_engine/data_store.rs:808-813
Buffer constantWRITE_STREAM_BUFFER_SIZE4096src/storage_engine/constants.rs

Validation:

The implementation validates payloads before writing:

Sources: README.md:220-223 src/storage_engine/data_store.rs:753-825 src/storage_engine/constants.rs


Write Mode Comparison

Performance Table:

Write ModeLock DurationDisk FlushesMemory UsageBest For
SingleShort (per write)1 per writeMinimalInteractive operations, real-time updates
BatchMedium (entire batch)1 per batchBuffer size × entriesBulk imports, high throughput
StreamingLong (entire stream)1 per stream4096 bytes (constant)Large files, memory-constrained

Throughput Characteristics:

Based on benchmark results benches/storage_benchmark.rs:52-83:

Sources: benches/storage_benchmark.rs:52-92 README.md:208-223


Read Modes

SIMD R Drive provides three read modes optimized for different access patterns. All read modes leverage zero-copy access through memory-mapped files.

Direct Read

Direct read mode provides immediate, zero-copy access to stored entries through EntryHandle.

Trait: DataStoreReader src/traits.rs

Primary Methods:

Operation Flow:

Title: Direct Read Index Lookup and EntryHandle Construction

Characteristics:

  • Latency : Minimal (single hash lookup + pointer arithmetic)
  • Memory : Zero-copy (returns view into mmap)
  • Concurrency : Lock-free reads (except brief index lock)
  • Use Case : Random access, key-value lookups, real-time queries

Zero-Copy Guarantee:

The EntryHandle struct provides direct memory-mapped access without payload copying:

EntryHandle Structure simd-r-drive-entry-handle/src/lib.rs:

Key characteristics:

Batch Read Optimization:

The batch_read() method optimizes multiple key lookups src/storage_engine/data_store.rs:1105-1158:

  1. Single lock acquisition for entire batch src/storage_engine/data_store.rs:1117-1120
  2. Singleget_mmap_arc() call shared across all entries src/storage_engine/data_store.rs1116
  3. Vectorized hash computation via compute_hash_batch() src/storage_engine/data_store.rs1106
  4. Iterator-based result collection src/storage_engine/data_store.rs:1134-1154

Returns Vec<Option<EntryHandle>> with same order as input keys.

Sources: README.md:228-233 src/storage_engine/data_store.rs:502-565 src/storage_engine/data_store.rs:1105-1158 simd-r-drive-entry-handle/src/lib.rs


Streaming Read

Streaming read mode provides incremental, buffered access to large entries without loading them fully into memory.

Primary Structure:

Operation Flow:

Title: EntryStream Buffered Read Implementation

Characteristics:

  • Memory Footprint : 8192-byte internal buffer src/storage_engine/entry_stream.rs
  • Copy Behavior : Non-zero-copy (copies from mmap to buffer)
  • Payload Size : Supports arbitrarily large entries
  • Use Case : Processing large entries incrementally, network transmission, streaming transformations

Implementation Details:

ComponentCode EntitySize/Type
Internal bufferinner_buffer: Vec<u8>8192 bytes
Position trackerposition: usizeTracks read progress in mmap
Source dataEntryHandle.mmap_arcZero-copy reference to mmap
Payload boundsEntryHandle.rangeStart/end offsets

Why Non-Zero-Copy:

Despite sourcing from mmap, EntryStream performs buffered copies:

  1. Controlled memory pressure : Constant 8KB footprint regardless of payload size
  2. std::io::Read compatibility: Standard interface for streaming I/O
  3. Incremental processing : Avoids loading multi-GB payloads into memory
  4. OS page management : Reduces page faults for sequential access patterns

Zero-Copy Alternative:

For true zero-copy access, use direct read mode: storage.read(key)?.map(|handle| handle.as_slice()). This provides &[u8] directly from mmap without intermediate copying.

Sources: README.md:234-241 src/storage_engine/entry_stream.rs


Parallel Iteration

Parallel iteration mode uses Rayon to process all valid entries across multiple threads (requires parallel feature).

Primary Methods:

Related Structures:

Operation Flow:

Title: Parallel Iteration Work Distribution

Characteristics:

  • Throughput : Scales with CPU cores
  • Concurrency : Work-stealing via Rayon
  • Memory : Minimal overhead (offsets collected upfront)
  • Use Case : Bulk analytics, dataset scanning, cache warming, batch transformations

Implementation Strategy:

The parallel iterator minimizes lock contention via upfront collection src/storage_engine/data_store.rs:296-361:

Phase 1: Preparation (Sequential, Locked)

  1. Acquire read lock on RwLock<KeyIndexer> src/storage_engine/data_store.rs300
  2. Collect all packed values: Vec<u64> src/storage_engine/data_store.rs301
  3. Release lock immediately src/storage_engine/data_store.rs302
  4. Clone Arc<Mmap> once, moved into iterator src/storage_engine/data_store.rs305

Phase 2: Processing (Parallel, Lock-Free) 5. Convert to Rayon parallel iterator: into_par_iter() src/storage_engine/data_store.rs310 6. Each worker thread processes subset of offsets src/storage_engine/data_store.rs:310-360:

Work Stealing: Rayon’s work-stealing scheduler automatically balances load across available CPU cores. No manual thread management required.

Sequential Iteration:

For single-threaded scanning, use iter_entries() which returns EntryIterator src/storage_engine/data_store.rs:276-280 The DataStore also implements IntoIterator src/storage_engine/data_store.rs:44-51

Sources: README.md:242-246 src/storage_engine/data_store.rs:276-361 benches/storage_benchmark.rs:98-118


Read Mode Comparison

Performance Table:

Read ModeAccess PatternMemory CopyConcurrencyBest For
DirectRandom/lookupZero-copyLock-freeKey-value queries, random access
StreamingSequential/bufferedBuffered copySingle readerLarge entry processing
ParallelFull scanZero-copyMulti-threadedBulk analytics, dataset scanning

Throughput Characteristics:

Based on benchmark measurements benches/storage_benchmark.rs:

OperationThroughput (1M entries, 8 bytes)Notes
Sequential iteration~millions/sZero-copy, cache-friendly
Random single reads~1M reads/sHash lookup + bounds check
Batch reads~1M reads/sVectorized index access

Sources: benches/storage_benchmark.rs:98-203 README.md:224-246


Performance Optimization Strategies

Write Optimization

Batching Strategy:

  • Group writes into batches of 1024-10000 entries for optimal throughput
  • Balance batch size against latency requirements
  • Use streaming for payloads > 1 MB to avoid memory pressure

Lock Contention:

  • All writes acquire the same RwLock<BufWriter<File>>
  • Increase batch size to amortize lock overhead
  • Consider application-level write queuing for highly concurrent workloads

Read Optimization

Access Pattern Matching:

Access PatternRecommended ModeReason
Random lookupsDirect readO(1) hash lookup, zero-copy
Known key setsBatch readAmortized lock overhead
Full dataset scanSequential iterationCache-friendly forward traversal
Parallel analyticsParallel iterationScales with CPU cores
Large entry processingStreaming readConstant memory footprint

Memory-Mapped File Behavior:

The OS manages mmap pages transparently:

  • Working set : Only accessed regions loaded into RAM
  • Large datasets : Can exceed available RAM (pages swapped on demand)
  • Cache warming : Sequential iteration benefits from read-ahead
  • Random access : May trigger page faults (disk I/O) on cold reads

Sources: benches/storage_benchmark.rs README.md:43-50


Concurrency Considerations

Write Concurrency

All write modes acquire the same exclusive lock and are mutually exclusive :

Title: Write Lock Contention Model

Lock acquisition: self.file.write() src/storage_engine/data_store.rs:759-762 src/storage_engine/data_store.rs:852-855

Implication: High write concurrency benefits from:

  • Using batch_write() to amortize lock overhead
  • Application-level write queuing/buffering
  • Accepting increased per-write latency for higher throughput
graph TB
    read_thread1["Thread 1: read()"]
read_thread2["Thread 2: batch_read()"]
read_thread3["Thread 3: par_iter_entries()"]
index_lock["Arc<RwLock<KeyIndexer>>\nself.key_indexer.read() - Shared"]
get_mmap_arc["self.get_mmap_arc()"]
mmap_lock["Mutex<Arc<Mmap>>"]
mmap_arc["Arc<Mmap> (cloned)"]
read_thread1 --> get_mmap_arc
 
   read_thread2 --> get_mmap_arc
 
   read_thread3 --> get_mmap_arc
    
 
   get_mmap_arc --> mmap_lock
 
   mmap_lock -->|brief lock| mmap_arc
    
 
   read_thread1 --> index_lock
 
   read_thread2 --> index_lock
 
   read_thread3 --> index_lock
    
 
   index_lock -->|concurrent read access| lookup["KeyIndexer::get_packed()"]
subgraph "Lock-Free Phase"
        mmap_access["Direct &[u8] access via Arc<Mmap>"]
entry_handle["Construct EntryHandle"]
end
    
 
   mmap_arc --> mmap_access
 
   lookup --> entry_handle
 
   mmap_access --> entry_handle
    
    subgraph "Write Thread (Independent)"
        write_thread["Thread 4: write()"]
file_lock_write["Arc<RwLock<BufWriter<File>>>.write()"]
reindex_call["reindex()"]
new_mmap["Creates new Arc<Mmap>"]
end
    
 
   write_thread --> file_lock_write
 
   file_lock_write --> reindex_call
 
   reindex_call -.->|Replaces in Mutex| new_mmap
 
   new_mmap -.->|Old Arc<Mmap> refs remain valid| mmap_arc

Read Concurrency

Read operations acquire brief read locks and access shared Arc<Mmap> references:

Title: Read Lock Independence Model

Key characteristics:

  • Multiple readers : Concurrent reads via shared RwLock read locks
  • Lock independence : Write lock (RwLock<BufWriter>) separate from read locks (RwLock<KeyIndexer>)
  • Mmap stability : Readers hold Arc<Mmap> clones; writes create new mmap but old references remain valid
  • Eventual consistency : New reads see updates after reindex() completes src/storage_engine/data_store.rs:224-259

Sources: README.md:170-207 src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:658-663


Usage Examples

Write Mode Selection

Single Write (Real-Time Updates):

Batch Write (Bulk Import):

Streaming Write (Large Files):

Read Mode Selection

Direct Read (Key Lookup):

Streaming Read (Large Entry Processing):

Parallel Iteration (Dataset Analytics):

Sources: README.md src/storage_engine/data_store.rs benches/storage_benchmark.rs

Dismiss

Refresh this wiki

Enter email to refresh