Write and Read Modes
Relevant source files
- README.md
- benches/storage_benchmark.rs
- src/main.rs
- src/storage_engine/data_store.rs
- src/utils/format_bytes.rs
- tests/concurrency_tests.rs
Purpose and Scope
This document describes the three write modes and three read modes available in SIMD R Drive, detailing their operational characteristics, performance trade-offs, and appropriate use cases. These modes provide flexibility for different workload patterns, from single-key operations to bulk processing.
For information about the underlying SIMD acceleration that optimizes these operations, see SIMD Acceleration. For details about the alignment strategy that enables efficient reads, see Payload Alignment and Cache Efficiency.
Write Modes
SIMD R Drive provides three distinct write modes, each optimized for different usage patterns. All write modes acquire an exclusive write lock (RwLock<BufWriter<File>>) to ensure thread safety and data consistency.
Single Entry Write
The single entry write mode writes one key-value pair atomically and flushes immediately to disk.
Primary Methods:
DataStoreWriter::write(key: &[u8], payload: &[u8]) -> Result<u64>src/storage_engine/data_store.rs:827-830DataStoreWriter::write_with_key_hash(key_hash: u64, payload: &[u8]) -> Result<u64>src/storage_engine/data_store.rs:832-834
Operation Flow:
Characteristics:
- Latency : Lowest for single operations (immediate flush)
- Throughput : Lower due to per-write overhead
- Disk I/O : One flush operation per write
- Use Case : Interactive operations, real-time updates, critical writes requiring immediate durability
Implementation Detail: Single writes internally delegate to batch_write_with_key_hashes() with a single-element vector, ensuring consistent behavior across all write paths src/storage_engine/data_store.rs:832-834
Sources: README.md:212-215 src/storage_engine/data_store.rs:827-834
Batch Write
Batch write mode writes multiple key-value pairs in a single atomic operation, flushing only once at the end.
Primary Methods:
DataStoreWriter::batch_write(entries: &[(&[u8], &[u8])]) -> Result<u64>src/storage_engine/data_store.rs:838-843DataStoreWriter::batch_write_with_key_hashes(prehashed_keys: Vec<(u64, &[u8])>, allow_null_bytes: bool) -> Result<u64>src/storage_engine/data_store.rs:847-951
Operation Flow:
Characteristics:
- Latency : Higher per-entry latency (amortized)
- Throughput : Significantly higher due to reduced disk I/O
- Disk I/O : Single flush for entire batch
- Memory : Builds entries in-memory buffer before writing src/storage_engine/data_store.rs:857-898
- Use Case : Bulk imports, batch processing, high-throughput ingestion
Performance Optimization: The batch implementation pre-allocates a buffer and constructs all entries before any disk I/O, minimizing lock contention and maximizing sequential write performance. The buffer construction happens at src/storage_engine/data_store.rs:857-918
Tombstone Support: Batch writes support deletion markers (tombstones) when allow_null_bytes is true, writing a single NULL byte followed by metadata src/storage_engine/data_store.rs:864-898
Sources: README.md:216-219 src/storage_engine/data_store.rs:838-951 benches/storage_benchmark.rs:85-92
Streaming Write
Streaming write mode writes large payloads incrementally from a Read source without requiring full in-memory buffering.
Primary Methods:
DataStoreWriter::write_stream<R: Read>(key: &[u8], reader: &mut R) -> Result<u64>src/storage_engine/data_store.rs:753-756DataStoreWriter::write_stream_with_key_hash<R: Read>(key_hash: u64, reader: &mut R) -> Result<u64>src/storage_engine/data_store.rs:758-825
Operation Flow:
Characteristics:
- Memory Footprint : Constant (4096-byte buffer) src/storage_engine/constants.rs
- Payload Size : Unbounded (supports arbitrarily large entries)
- Disk I/O : Incremental writes, single flush at end
- Use Case : Large file storage, network streams, memory-constrained environments
Implementation Details:
The streaming write uses a fixed-size buffer (WRITE_STREAM_BUFFER_SIZE) and performs incremental writes while computing the checksum:
| Component | Size/Type | Purpose |
|---|---|---|
| Read Buffer | 4096 bytes | Temporary staging for stream chunks |
| Checksum State | crc32fast::Hasher | Incremental CRC32C calculation |
| Pre-pad | 0-63 bytes | Alignment padding before payload |
| Metadata | 20 bytes | key_hash, prev_offset, checksum |
Validation:
- Rejects empty payloads src/storage_engine/data_store.rs:799-804
- Rejects NULL-byte-only streams (reserved for tombstones) src/storage_engine/data_store.rs:792-797
Sources: README.md:220-223 src/storage_engine/data_store.rs:753-825 tests/concurrency_tests.rs:16-109
Write Mode Comparison
Performance Table:
| Write Mode | Lock Duration | Disk Flushes | Memory Usage | Best For |
|---|---|---|---|---|
| Single | Short (per write) | 1 per write | Minimal | Interactive operations, real-time updates |
| Batch | Medium (entire batch) | 1 per batch | Buffer size × entries | Bulk imports, high throughput |
| Streaming | Long (entire stream) | 1 per stream | 4096 bytes (constant) | Large files, memory-constrained |
Throughput Characteristics:
Based on benchmark results benches/storage_benchmark.rs:52-83:
Sources: benches/storage_benchmark.rs:52-92 README.md:208-223
Read Modes
SIMD R Drive provides three read modes optimized for different access patterns. All read modes leverage zero-copy access through memory-mapped files.
Direct Read
Direct read mode provides immediate, zero-copy access to stored entries through EntryHandle.
Primary Methods:
DataStoreReader::read(key: &[u8]) -> Result<Option<EntryHandle>>src/traits.rsDataStoreReader::batch_read(keys: &[&[u8]]) -> Result<Vec<Option<EntryHandle>>>src/traits.rsDataStoreReader::exists(key: &[u8]) -> Result<bool>src/traits.rs
Operation Flow:
Characteristics:
- Latency : Minimal (single hash lookup + pointer arithmetic)
- Memory : Zero-copy (returns view into mmap)
- Concurrency : Lock-free reads (except brief index lock)
- Use Case : Random access, key-value lookups, real-time queries
Zero-Copy Guarantee:
The EntryHandle provides direct access to the memory-mapped region without copying:
EntryHandle {
mmap_arc: Arc<Mmap>, // Shared reference to mmap
range: Range<usize>, // Byte range within mmap
metadata: EntryMetadata, // Deserialized metadata (20 bytes)
}
The handle implements Deref<Target = [u8]>, allowing transparent access to payload bytes simd-r-drive-entry-handle/src/lib.rs
Batch Read Optimization:
batch_read() performs vectorized lookups, acquiring the index lock once for all keys and returning Vec<Option<EntryHandle>>. This reduces lock acquisition overhead for multiple keys src/storage_engine/data_store.rs
Sources: README.md:228-233 src/storage_engine/data_store.rs:502-565 benches/storage_benchmark.rs:124-149
Streaming Read
Streaming read mode provides incremental, buffered access to large entries without loading them fully into memory.
Primary Structure:
EntryStreamsrc/storage_engine/entry_stream.rs- Implements
std::io::Readtrait
Operation Flow:
Characteristics:
- Memory Footprint : 8192-byte buffer src/storage_engine/entry_stream.rs
- Copy Behavior : Non-zero-copy (reads through buffer)
- Payload Size : Supports arbitrarily large entries
- Use Case : Processing large entries incrementally, network transmission, streaming transformations
Implementation Notes:
The streaming read is not zero-copy despite using mmap as the source. This design choice enables:
- Controlled memory pressure (constant buffer size)
- Standard
std::io::Readinterface compatibility - Incremental processing without loading entire payload
For true zero-copy access to large entries, use direct read mode and process the EntryHandle slice directly.
Sources: README.md:234-241 src/storage_engine/entry_stream.rs
Parallel Iteration
Parallel iteration mode uses Rayon to process all valid entries across multiple threads (requires parallel feature).
Primary Methods:
DataStore::iter_entries() -> EntryIteratorsrc/storage_engine/data_store.rs:276-280DataStore::par_iter_entries() -> impl ParallelIterator<Item = EntryHandle>src/storage_engine/data_store.rs:296-361
Operation Flow:
Characteristics:
- Throughput : Scales with CPU cores
- Concurrency : Work-stealing via Rayon
- Memory : Minimal overhead (offsets collected upfront)
- Use Case : Bulk analytics, dataset scanning, cache warming, batch transformations
Implementation Strategy:
The parallel iterator optimizes for minimal lock contention:
- Acquire read lock on
KeyIndexersrc/storage_engine/data_store.rs300 - Collect all packed offset values into
Vec<u64>src/storage_engine/data_store.rs301 - Release lock immediately src/storage_engine/data_store.rs302
- Clone
Arc<Mmap>once src/storage_engine/data_store.rs305 - Parallel filter_map over offsets src/storage_engine/data_store.rs:310-360
Each worker thread independently:
- Unpacks
(tag, offset)from packed value - Validates bounds and metadata
- Constructs
EntryHandlewith clonedArc<Mmap> - Filters tombstones
Sequential Iteration:
For sequential scanning without Rayon overhead, use iter_entries() which returns EntryIterator src/storage_engine/data_store.rs:276-280
Sources: README.md:242-246 src/storage_engine/data_store.rs:276-361 benches/storage_benchmark.rs:98-118
Read Mode Comparison
Performance Table:
| Read Mode | Access Pattern | Memory Copy | Concurrency | Best For |
|---|---|---|---|---|
| Direct | Random/lookup | Zero-copy | Lock-free | Key-value queries, random access |
| Streaming | Sequential/buffered | Buffered copy | Single reader | Large entry processing |
| Parallel | Full scan | Zero-copy | Multi-threaded | Bulk analytics, dataset scanning |
Throughput Characteristics:
Based on benchmark measurements benches/storage_benchmark.rs:
| Operation | Throughput (1M entries, 8 bytes) | Notes |
|---|---|---|
| Sequential iteration | ~millions/s | Zero-copy, cache-friendly |
| Random single reads | ~1M reads/s | Hash lookup + bounds check |
| Batch reads | ~1M reads/s | Vectorized index access |
Sources: benches/storage_benchmark.rs:98-203 README.md:224-246
Performance Optimization Strategies
Write Optimization
Batching Strategy:
- Group writes into batches of 1024-10000 entries for optimal throughput
- Balance batch size against latency requirements
- Use streaming for payloads > 1 MB to avoid memory pressure
Lock Contention:
- All writes acquire the same
RwLock<BufWriter<File>> - Increase batch size to amortize lock overhead
- Consider application-level write queuing for highly concurrent workloads
Read Optimization
Access Pattern Matching:
| Access Pattern | Recommended Mode | Reason |
|---|---|---|
| Random lookups | Direct read | O(1) hash lookup, zero-copy |
| Known key sets | Batch read | Amortized lock overhead |
| Full dataset scan | Sequential iteration | Cache-friendly forward traversal |
| Parallel analytics | Parallel iteration | Scales with CPU cores |
| Large entry processing | Streaming read | Constant memory footprint |
Memory-Mapped File Behavior:
The OS manages mmap pages transparently:
- Working set : Only accessed regions loaded into RAM
- Large datasets : Can exceed available RAM (pages swapped on demand)
- Cache warming : Sequential iteration benefits from read-ahead
- Random access : May trigger page faults (disk I/O) on cold reads
Sources: benches/storage_benchmark.rs README.md:43-50
Concurrency Considerations
Write Concurrency
All write modes use exclusive locking and are mutually exclusive :
Implication : High write concurrency may benefit from application-level write buffering or queueing.
graph TB
Read1["read()
Thread 1"]
Read2["read()
Thread 2"]
Read3["par_iter_entries()
Thread 3"]
IndexLock["RwLock<KeyIndexer>\n(Read Lock)"]
Mmap["Arc<Mmap>\n(Shared Reference)"]
Read1 --> IndexLock
Read2 --> IndexLock
Read3 --> IndexLock
IndexLock -->|Concurrent read access| Mmap
Write["write()
Thread 4"]
WriteLock["RwLock<BufWriter>\n(Exclusive Lock)"]
Write -->|Independent lock| WriteLock
WriteLock -.->|After flush: remaps and updates| Mmap
Read Concurrency
Read operations are lock-free after index lookup and can occur concurrently with writes:
Characteristics:
- Multiple readers can access mmap concurrently
- Reads do not block writes (different locks)
- Writes remap mmap after flushing, but readers retain their
Arc<Mmap>reference - New reads see updated data after reindexing completes
Sources: README.md:170-207 tests/concurrency_tests.rs src/storage_engine/data_store.rs:224-259
Usage Examples
Write Mode Selection
Single Write (Real-Time Updates):
Batch Write (Bulk Import):
Streaming Write (Large Files):
Read Mode Selection
Direct Read (Key Lookup):
Streaming Read (Large Entry Processing):
Parallel Iteration (Dataset Analytics):
Sources: README.md src/storage_engine/data_store.rs benches/storage_benchmark.rs