This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Write and Read Modes
Loading…
Write and Read Modes
Relevant source files
- README.md
- src/lib.rs
- src/storage_engine.rs
- src/storage_engine/data_store.rs
- src/storage_engine/entry_iterator.rs
Purpose and Scope
This document describes the different operation modes available in SIMD R Drive for writing and reading data. Each mode is optimized for specific use cases, offering different trade-offs between memory usage, I/O overhead, and concurrency. For information about SIMD acceleration used within these operations, see SIMD Acceleration. For details on payload alignment requirements, see Payload Alignment and Cache Efficiency.
Write Operation Modes
SIMD R Drive provides three distinct write modes, each optimized for different scenarios. All write operations acquire a write lock on the underlying file to ensure consistency.
Single Entry Write
The write() method writes a single key-value pair atomically with immediate disk flushing.
Method Signature: write(&self, key: &[u8], payload: &[u8]) -> Result<u64>
Characteristics:
- Acquires
RwLock<BufWriter<File>>for entire operation - Writes are flushed immediately via
file.flush() - Each write performs file remapping and index update
- Suitable for individual, isolated write operations
Internal Flow:
Sources: src/storage_engine/data_store.rs:827-834 src/storage_engine/data_store.rs:832-834
Batch Entry Write
The batch_write() method writes multiple key-value pairs in a single locked operation, reducing disk I/O overhead.
Method Signature: batch_write(&self, entries: &[(&[u8], &[u8])]) -> Result<u64>
Characteristics:
- Acquires
RwLock<BufWriter<File>>once for entire batch - All entries are buffered in memory before writing
- Single
file.flush()at end of batch - Single remapping and index update operation
- Significantly more efficient for bulk writes
Internal Process:
| Step | Operation | Lock Held |
|---|---|---|
| 1 | Hash all keys with compute_hash_batch() | No |
| 2 | Acquire write lock | Yes |
| 3 | Build in-memory buffer with all entries | Yes |
| 4 | Calculate alignment padding for each entry | Yes |
| 5 | Copy payloads using simd_copy() | Yes |
| 6 | Append all metadata | Yes |
| 7 | Write entire buffer with file.write_all() | Yes |
| 8 | Flush with file.flush() | Yes |
| 9 | Call reindex() once | Yes |
| 10 | Release write lock | No |
Sources: src/storage_engine/data_store.rs:838-843 src/storage_engine/data_store.rs:847-939 README.md:216-218
Streaming Write
The write_stream() method writes large data entries using a streaming Read source without requiring full in-memory allocation.
Method Signature: write_stream<R: Read>(&self, key: &[u8], reader: &mut R) -> Result<u64>
Characteristics:
- Reads data in chunks of
WRITE_STREAM_BUFFER_SIZE(8192 bytes) - Suitable for large files or data streams
- Only one buffer’s worth of data in memory at a time
- Computes CRC32 checksum incrementally
- Single
file.flush()after all chunks written
Streaming Flow:
Sources: src/storage_engine/data_store.rs:753-825 README.md:220-222 src/lib.rs:66-115
Read Operation Modes
SIMD R Drive provides multiple read modes optimized for different access patterns and performance requirements.
Direct Memory Access
The read() method retrieves stored data using zero-copy memory mapping, providing the most efficient access for individual entries.
Method Signature: read(&self, key: &[u8]) -> Result<Option<EntryHandle>>
Characteristics:
- Zero-copy access via
mmap - Returns
EntryHandlewrappingArc<Mmap>and byte range - No data copying - direct pointer into memory-mapped region
- O(1) lookup via
KeyIndexerhash table - Lock-free after index lookup completes
Read Path:
Sources: src/storage_engine/data_store.rs:1040-1049 src/storage_engine/data_store.rs:502-565 README.md:228-232
Batch Read
The batch_read() method efficiently retrieves multiple entries in a single operation, minimizing lock contention.
Method Signature: batch_read(&self, keys: &[&[u8]]) -> Result<Vec<Option<EntryHandle>>>
Characteristics:
- Hashes all keys in batch using
compute_hash_batch() - Acquires index read lock once for entire batch
- Clones
Arc<Mmap>once and reuses for all entries - Returns vector of optional
EntryHandleobjects - More efficient than individual
read()calls
Batch Processing:
| Operation | Complexity | Lock Duration |
|---|---|---|
| Hash all keys | O(n) | No lock |
| Acquire index read lock | O(1) | Begin |
Clone Arc<Mmap> once | O(1) | Held |
| Lookup each hash | O(n) average | Held |
| Verify tags | O(n) | Held |
| Create handles | O(n) | Held |
| Release lock | O(1) | End |
Sources: src/storage_engine/data_store.rs:1105-1109 src/storage_engine/data_store.rs:1111-1158
Streaming Read
The EntryStream wrapper provides incremental reading of large entries, avoiding high memory overhead.
Characteristics:
- Implements
std::io::Readtrait - Reads data in configurable buffer chunks
- Non-zero-copy - data is read through a buffer
- Suitable for processing large entries incrementally
- Useful when full entry doesn’t fit in memory
Usage Pattern:
Sources: README.md:234-240 src/lib.rs:86-92 src/storage_engine.rs:10-11
Parallel Iteration
The par_iter_entries() method provides Rayon-powered parallel iteration over all valid entries.
Method Signature (requiresparallel feature): par_iter_entries(&self) -> impl ParallelIterator<Item = EntryHandle>
Characteristics:
- Only available with
parallelfeature flag - Uses Rayon’s parallel iterator infrastructure
- Acquires index lock briefly to collect offsets
- Releases lock before parallel processing begins
- Each thread receives
Arc<Mmap>clone for safe access - Automatically filters tombstones and duplicates
- Ideal for bulk processing and analytics workloads
Parallel Execution Flow:
Sources: src/storage_engine/data_store.rs:296-361 README.md:242-246
Performance Characteristics Comparison
Write Mode Comparison
| Mode | Lock Duration | Flush Frequency | Memory Usage | Best For |
|---|---|---|---|---|
| Single Write | Per write | Per write | Low (single entry) | Individual updates, low throughput |
| Batch Write | Per batch | Per batch | Medium (all entries buffered) | Bulk imports, high throughput |
| Stream Write | Per stream | Per stream | Low (8KB buffer) | Large files, limited memory |
Read Mode Comparison
| Mode | Copy Behavior | Lock Contention | Memory Overhead | Best For |
|---|---|---|---|---|
| Direct Read | Zero-copy | Low (brief lock) | Minimal (Arc<Mmap>) | Individual lookups, hot path |
| Batch Read | Zero-copy | Very low (single lock) | Minimal (shared Arc<Mmap>) | Multiple lookups at once |
| Stream Read | Buffered copy | Low (brief lock) | Medium (buffer size) | Large entries, incremental processing |
| Parallel Iter | Zero-copy | Very low (brief lock) | Medium (per-thread Arc<Mmap>) | Full scans, analytics, multi-core |
Lock Acquisition Patterns
Sources: src/storage_engine/data_store.rs:753-939 src/storage_engine/data_store.rs:1040-1158 README.md:208-246
Code Entity Mapping
Write Mode Function References
| Mode | Trait Method | Implementation | Key Helper |
|---|---|---|---|
| Single | DataStoreWriter::write() | data_store.rs:827-830 | write_with_key_hash() |
| Batch | DataStoreWriter::batch_write() | data_store.rs:838-843 | batch_write_with_key_hashes() |
| Stream | DataStoreWriter::write_stream() | data_store.rs:753-756 | write_stream_with_key_hash() |
Read Mode Function References
| Mode | Trait Method | Implementation | Key Helper |
|---|---|---|---|
| Direct | DataStoreReader::read() | data_store.rs:1040-1049 | read_entry_with_context() |
| Batch | DataStoreReader::batch_read() | data_store.rs:1105-1109 | batch_read_hashed_keys() |
| Stream | EntryStream::from() | storage_engine.rs:10-11 | N/A |
| Parallel | DataStore::par_iter_entries() | data_store.rs:297-361 | KeyIndexer::unpack() |
Core Types
- DataStore : data_store.rs:27-33 - Main storage engine struct
- EntryHandle : storage_engine.rs24 - Zero-copy entry wrapper
- EntryStream : storage_engine.rs:10-11 - Streaming read adapter
- KeyIndexer : storage_engine.rs:13-14 - Hash index for O(1) lookups
- EntryIterator : entry_iterator.rs:21-25 - Sequential iterator
Sources: src/storage_engine/data_store.rs:1-1183 src/storage_engine.rs:1-25 src/storage_engine/entry_iterator.rs:1-128
Dismiss
Refresh this wiki
Enter email to refresh