Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Write and Read Modes

Loading…

Write and Read Modes

Relevant source files

Purpose and Scope

This document describes the different operation modes available in SIMD R Drive for writing and reading data. Each mode is optimized for specific use cases, offering different trade-offs between memory usage, I/O overhead, and concurrency. For information about SIMD acceleration used within these operations, see SIMD Acceleration. For details on payload alignment requirements, see Payload Alignment and Cache Efficiency.


Write Operation Modes

SIMD R Drive provides three distinct write modes, each optimized for different scenarios. All write operations acquire a write lock on the underlying file to ensure consistency.

Single Entry Write

The write() method writes a single key-value pair atomically with immediate disk flushing.

Method Signature: write(&self, key: &[u8], payload: &[u8]) -> Result<u64>

Characteristics:

  • Acquires RwLock<BufWriter<File>> for entire operation
  • Writes are flushed immediately via file.flush()
  • Each write performs file remapping and index update
  • Suitable for individual, isolated write operations

Internal Flow:

Sources: src/storage_engine/data_store.rs:827-834 src/storage_engine/data_store.rs:832-834

Batch Entry Write

The batch_write() method writes multiple key-value pairs in a single locked operation, reducing disk I/O overhead.

Method Signature: batch_write(&self, entries: &[(&[u8], &[u8])]) -> Result<u64>

Characteristics:

  • Acquires RwLock<BufWriter<File>> once for entire batch
  • All entries are buffered in memory before writing
  • Single file.flush() at end of batch
  • Single remapping and index update operation
  • Significantly more efficient for bulk writes

Internal Process:

StepOperationLock Held
1Hash all keys with compute_hash_batch()No
2Acquire write lockYes
3Build in-memory buffer with all entriesYes
4Calculate alignment padding for each entryYes
5Copy payloads using simd_copy()Yes
6Append all metadataYes
7Write entire buffer with file.write_all()Yes
8Flush with file.flush()Yes
9Call reindex() onceYes
10Release write lockNo

Sources: src/storage_engine/data_store.rs:838-843 src/storage_engine/data_store.rs:847-939 README.md:216-218

Streaming Write

The write_stream() method writes large data entries using a streaming Read source without requiring full in-memory allocation.

Method Signature: write_stream<R: Read>(&self, key: &[u8], reader: &mut R) -> Result<u64>

Characteristics:

  • Reads data in chunks of WRITE_STREAM_BUFFER_SIZE (8192 bytes)
  • Suitable for large files or data streams
  • Only one buffer’s worth of data in memory at a time
  • Computes CRC32 checksum incrementally
  • Single file.flush() after all chunks written

Streaming Flow:

Sources: src/storage_engine/data_store.rs:753-825 README.md:220-222 src/lib.rs:66-115


Read Operation Modes

SIMD R Drive provides multiple read modes optimized for different access patterns and performance requirements.

Direct Memory Access

The read() method retrieves stored data using zero-copy memory mapping, providing the most efficient access for individual entries.

Method Signature: read(&self, key: &[u8]) -> Result<Option<EntryHandle>>

Characteristics:

  • Zero-copy access via mmap
  • Returns EntryHandle wrapping Arc<Mmap> and byte range
  • No data copying - direct pointer into memory-mapped region
  • O(1) lookup via KeyIndexer hash table
  • Lock-free after index lookup completes

Read Path:

Sources: src/storage_engine/data_store.rs:1040-1049 src/storage_engine/data_store.rs:502-565 README.md:228-232

Batch Read

The batch_read() method efficiently retrieves multiple entries in a single operation, minimizing lock contention.

Method Signature: batch_read(&self, keys: &[&[u8]]) -> Result<Vec<Option<EntryHandle>>>

Characteristics:

  • Hashes all keys in batch using compute_hash_batch()
  • Acquires index read lock once for entire batch
  • Clones Arc<Mmap> once and reuses for all entries
  • Returns vector of optional EntryHandle objects
  • More efficient than individual read() calls

Batch Processing:

OperationComplexityLock Duration
Hash all keysO(n)No lock
Acquire index read lockO(1)Begin
Clone Arc<Mmap> onceO(1)Held
Lookup each hashO(n) averageHeld
Verify tagsO(n)Held
Create handlesO(n)Held
Release lockO(1)End

Sources: src/storage_engine/data_store.rs:1105-1109 src/storage_engine/data_store.rs:1111-1158

Streaming Read

The EntryStream wrapper provides incremental reading of large entries, avoiding high memory overhead.

Characteristics:

  • Implements std::io::Read trait
  • Reads data in configurable buffer chunks
  • Non-zero-copy - data is read through a buffer
  • Suitable for processing large entries incrementally
  • Useful when full entry doesn’t fit in memory

Usage Pattern:

Sources: README.md:234-240 src/lib.rs:86-92 src/storage_engine.rs:10-11

Parallel Iteration

The par_iter_entries() method provides Rayon-powered parallel iteration over all valid entries.

Method Signature (requiresparallel feature): par_iter_entries(&self) -> impl ParallelIterator<Item = EntryHandle>

Characteristics:

  • Only available with parallel feature flag
  • Uses Rayon’s parallel iterator infrastructure
  • Acquires index lock briefly to collect offsets
  • Releases lock before parallel processing begins
  • Each thread receives Arc<Mmap> clone for safe access
  • Automatically filters tombstones and duplicates
  • Ideal for bulk processing and analytics workloads

Parallel Execution Flow:

Sources: src/storage_engine/data_store.rs:296-361 README.md:242-246


Performance Characteristics Comparison

Write Mode Comparison

ModeLock DurationFlush FrequencyMemory UsageBest For
Single WritePer writePer writeLow (single entry)Individual updates, low throughput
Batch WritePer batchPer batchMedium (all entries buffered)Bulk imports, high throughput
Stream WritePer streamPer streamLow (8KB buffer)Large files, limited memory

Read Mode Comparison

ModeCopy BehaviorLock ContentionMemory OverheadBest For
Direct ReadZero-copyLow (brief lock)Minimal (Arc<Mmap>)Individual lookups, hot path
Batch ReadZero-copyVery low (single lock)Minimal (shared Arc<Mmap>)Multiple lookups at once
Stream ReadBuffered copyLow (brief lock)Medium (buffer size)Large entries, incremental processing
Parallel IterZero-copyVery low (brief lock)Medium (per-thread Arc<Mmap>)Full scans, analytics, multi-core

Lock Acquisition Patterns

Sources: src/storage_engine/data_store.rs:753-939 src/storage_engine/data_store.rs:1040-1158 README.md:208-246


Code Entity Mapping

Write Mode Function References

ModeTrait MethodImplementationKey Helper
SingleDataStoreWriter::write()data_store.rs:827-830write_with_key_hash()
BatchDataStoreWriter::batch_write()data_store.rs:838-843batch_write_with_key_hashes()
StreamDataStoreWriter::write_stream()data_store.rs:753-756write_stream_with_key_hash()

Read Mode Function References

ModeTrait MethodImplementationKey Helper
DirectDataStoreReader::read()data_store.rs:1040-1049read_entry_with_context()
BatchDataStoreReader::batch_read()data_store.rs:1105-1109batch_read_hashed_keys()
StreamEntryStream::from()storage_engine.rs:10-11N/A
ParallelDataStore::par_iter_entries()data_store.rs:297-361KeyIndexer::unpack()

Core Types

Sources: src/storage_engine/data_store.rs:1-1183 src/storage_engine.rs:1-25 src/storage_engine/entry_iterator.rs:1-128

Dismiss

Refresh this wiki

Enter email to refresh