Concurrency and Thread Safety
Relevant source files
- README.md
- benches/storage_benchmark.rs
- src/main.rs
- src/storage_engine/data_store.rs
- src/utils/format_bytes.rs
- tests/concurrency_tests.rs
Purpose and Scope
This document describes the concurrency model and thread safety mechanisms used by the SIMD R Drive storage engine. It covers the synchronization primitives, locking strategies, and guarantees provided for concurrent access in single-process, multi-threaded environments. For information about the core storage architecture and data structures, see Storage Architecture. For details on the DataStore API methods, see DataStore API.
Sources: README.md:170-207 src/storage_engine/data_store.rs:1-33
Synchronization Architecture
The DataStore structure employs a combination of synchronization primitives to enable safe concurrent access while maintaining high performance for reads and writes.
graph TB
subgraph DataStore["DataStore Structure"]
File["file\nArc<RwLock<BufWriter<File>>>\nWrite synchronization"]
Mmap["mmap\nArc<Mutex<Arc<Mmap>>>\nMemory-map coordination"]
TailOffset["tail_offset\nAtomicU64\nSequential write ordering"]
KeyIndexer["key_indexer\nArc<RwLock<KeyIndexer>>\nConcurrent index access"]
Path["path\nPathBuf\nFile location"]
end
ReadOps["Read Operations"] --> KeyIndexer
ReadOps --> Mmap
WriteOps["Write Operations"] --> File
WriteOps --> TailOffset
WriteOps -.reindex.-> Mmap
WriteOps -.reindex.-> KeyIndexer
WriteOps -.reindex.-> TailOffset
ParallelIter["par_iter_entries"] --> KeyIndexer
ParallelIter --> Mmap
DataStore Synchronization Primitives
Synchronization Primitives by Field:
| Field | Type | Purpose | Concurrency Model |
|---|---|---|---|
file | Arc<RwLock<BufWriter<File>>> | Write operations | Exclusive write lock required |
mmap | Arc<Mutex<Arc<Mmap>>> | Memory-mapped view | Locked during remap operations |
tail_offset | AtomicU64 | Next write position | Lock-free atomic updates |
key_indexer | Arc<RwLock<KeyIndexer>> | Hash-to-offset index | Read-write lock for lookups/updates |
path | PathBuf | Storage file path | Immutable after creation |
Sources: src/storage_engine/data_store.rs:27-33 README.md:172-182
Read Path Concurrency
Read operations in SIMD R Drive are designed to be lock-free and zero-copy, enabling high throughput for concurrent readers.
sequenceDiagram
participant Thread1 as "Reader Thread 1"
participant Thread2 as "Reader Thread 2"
participant KeyIndexer as "key_indexer\nRwLock<KeyIndexer>"
participant Mmap as "mmap\nMutex<Arc<Mmap>>"
participant File as "Memory-Mapped File"
Thread1->>KeyIndexer: read_lock()
Thread2->>KeyIndexer: read_lock()
Note over Thread1,Thread2: Multiple readers can acquire lock simultaneously
KeyIndexer-->>Thread1: lookup hash → (tag, offset)
KeyIndexer-->>Thread2: lookup hash → (tag, offset)
Thread1->>Mmap: get_mmap_arc()
Thread2->>Mmap: get_mmap_arc()
Note over Mmap: Clone Arc, release Mutex immediately
Mmap-->>Thread1: Arc<Mmap>
Mmap-->>Thread2: Arc<Mmap>
Thread1->>File: Direct memory access [offset..offset+len]
Thread2->>File: Direct memory access [offset..offset+len]
Note over Thread1,Thread2: Zero-copy reads, no locking required
Lock-Free Read Architecture
Read Method Implementation
The read_entry_with_context method (lines 501-565) centralizes the read logic:
- Acquire Read Lock: Obtains a read lock on
key_indexervia the providedRwLockReadGuard - Index Lookup: Retrieves the packed
(tag, offset)value for the key hash - Tag Verification: Validates the tag to detect hash collisions (lines 513-521)
- Memory Access: Reads metadata and payload directly from the memory-mapped region
- Tombstone Check: Filters out deleted entries (single NULL byte)
- Zero-Copy Handle: Returns an
EntryHandlereferencing the memory-mapped data
Key Performance Characteristics:
- No Write Lock: Reads never block on the file write lock
- Concurrent Readers: Multiple threads can read simultaneously via
RwLockread lock - No Data Copy: The
EntryHandlereferences the memory-mapped region directly - Immediate Lock Release: The
get_mmap_arcmethod (lines 658-663) clones theArc<Mmap>and immediately releases theMutex
Sources: src/storage_engine/data_store.rs:501-565 src/storage_engine/data_store.rs:658-663 README.md:174-175
Write Path Synchronization
Write operations require exclusive access to the storage file and coordinate with the memory-mapped view and index.
sequenceDiagram
participant Writer as "Writer Thread"
participant FileRwLock as "file\nRwLock"
participant BufWriter as "BufWriter<File>"
participant TailOffset as "tail_offset\nAtomicU64"
participant Reindex as "reindex()"
participant MmapMutex as "mmap\nMutex"
participant KeyIndexerRwLock as "key_indexer\nRwLock"
Writer->>FileRwLock: write_lock()
Note over FileRwLock: Exclusive lock acquired\nOnly one writer at a time
Writer->>TailOffset: load(Ordering::Acquire)
TailOffset-->>Writer: current tail offset
Writer->>BufWriter: write pre-pad + payload + metadata
Writer->>BufWriter: flush()
Writer->>Reindex: reindex(&file, key_hash_offsets, tail_offset)
Reindex->>BufWriter: init_mmap(&file)
Note over Reindex: Create new memory-mapped view
Reindex->>MmapMutex: lock()
Reindex->>MmapMutex: Replace Arc<Mmap>
Reindex->>MmapMutex: unlock()
Reindex->>KeyIndexerRwLock: write_lock()
Reindex->>KeyIndexerRwLock: insert(key_hash → offset)
Reindex->>KeyIndexerRwLock: unlock()
Reindex->>TailOffset: store(new_tail, Ordering::Release)
Reindex-->>Writer: Ok(())
Writer->>FileRwLock: unlock()
Write Operation Flow
Write Methods and Locking
Single Write (write):
- Delegates to
batch_write_with_key_hasheswith a single entry (lines 827-834)
Batch Write (batch_write_with_key_hashes):
- Acquires write lock on
file(lines 852-855) - Loads
tail_offsetatomically (line 858) - Writes all entries to buffer (lines 863-922)
- Writes buffer to file in single operation (line 935)
- Flushes to disk (line 936)
- Calls
reindexto update mmap and index (lines 938-943) - Write lock automatically released on
file_guarddrop
Streaming Write (write_stream_with_key_hash):
- Acquires write lock on
file(lines 759-762) - Streams payload to file via buffered reads (lines 778-790)
- Flushes after streaming (line 814)
- Calls
reindexto update mmap and index (lines 818-823)
Sources: src/storage_engine/data_store.rs:752-843 src/storage_engine/data_store.rs:847-943
Memory Mapping Coordination
The reindex method coordinates the critical transition from write completion to read visibility by updating the memory-mapped view.
Reindex Operation
The reindex method (lines 224-259) performs four critical operations:
Step-by-Step Process:
-
Create New Memory Map (line 231):
- Calls
init_mmap(&write_guard)to create freshMmapfrom the file - File must be flushed before this point to ensure visibility
- Calls
-
Update Shared Mmap (lines 232, 255):
- Acquires
Mutexonself.mmap - Replaces the
Arc<Mmap>with the new mapping - Existing readers retain old
Arc<Mmap>references until they drop
- Acquires
-
Update Index (lines 233-253):
- Acquires write lock on
self.key_indexer - Inserts new
(key_hash → offset)mappings - Removes tombstone entries if present
- Acquires write lock on
-
Update Tail Offset (line 256):
- Stores new tail offset with
Ordering::Release - Ensures all prior writes are visible before offset update
- Stores new tail offset with
Critical Ordering: The write lock on file is held throughout the entire reindex operation, preventing concurrent writes from starting until the index and mmap are fully updated.
Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:172-174
Index Concurrency Model
The key_indexer uses RwLock<KeyIndexer> to enable concurrent read access while providing exclusive write access during updates.
Index Access Patterns
| Operation | Lock Type | Held During | Purpose |
|---|---|---|---|
read | RwLock::read() | Index lookup only (lines 507-508) | Multiple threads can lookup concurrently |
batch_read | RwLock::read() | All lookups in batch | Amortizes lock acquisition overhead |
par_iter_entries | RwLock::read() | Collecting offsets only (lines 300-302) | Minimizes lock hold time, drops before parallel iteration |
write / batch_write | RwLock::write() | Index updates in reindex (lines 233-236) | Exclusive access for inserting new entries |
delete (tombstone) | RwLock::write() | Removing from index (line 243) | Exclusive access for deletion |
Parallel Iteration Strategy
The par_iter_entries method (lines 296-361, feature-gated by parallel) demonstrates efficient lock usage:
This approach:
- Minimizes the duration the read lock is held
- Avoids holding locks during expensive parallel processing
- Allows concurrent writes to proceed while parallel iteration is in progress
Sources: src/storage_engine/data_store.rs:296-361
Atomic Tail Offset
The tail_offset field uses AtomicU64 with acquire-release ordering to coordinate write sequencing without locks.
Atomic Operations Pattern
Load (Acquire):
- Used at start of write operations (lines 763, 858)
- Ensures all prior writes are visible before loading the offset
- Provides the base offset for the new write
Store (Release):
- Used at end of
reindexafter all updates (line 256) - Ensures all writes (file, mmap, index) are visible before storing
- Prevents reordering of writes after the offset update
Sequential Consistency: Even though multiple threads may acquire the write lock sequentially, the atomic tail offset ensures that:
- Each write observes the correct starting position
- The tail offset update happens after all associated data is committed
- Readers see a consistent view of the storage
Sources: src/storage_engine/data_store.rs30 src/storage_engine/data_store.rs256 src/storage_engine/data_store.rs763 src/storage_engine/data_store.rs858
Multi-Process Considerations
SIMD R Drive's concurrency mechanisms are designed for single-process, multi-threaded environments. Multi-process access requires external coordination.
Thread Safety Matrix
| Environment | Reads | Writes | Index Updates | Storage Safety |
|---|---|---|---|---|
| Single Process, Single Thread | ✅ Safe | ✅ Safe | ✅ Safe | ✅ Safe |
| Single Process, Multi-Threaded | ✅ Safe (lock-free, zero-copy) | ✅ Safe (RwLock<File>) | ✅ Safe (RwLock<KeyIndexer>) | ✅ Safe (Mutex<Arc<Mmap>>) |
| Multiple Processes, Shared File | ⚠️ Unsafe (no cross-process coordination) | ❌ Unsafe (no external locking) | ❌ Unsafe (separate memory spaces) | ❌ Unsafe (race conditions) |
Why Multi-Process is Unsafe
Separate Memory Spaces:
- Each process has its own
Arc<Mmap>instance - Index changes in one process are not visible to others
- The
tail_offsetatomic is process-local
No Cross-Process Synchronization:
RwLockandMutexonly synchronize within a single process- File writes from multiple processes can interleave arbitrarily
- Memory mapping updates are not coordinated
Risk of Corruption:
- Two processes can both acquire file write locks independently
- Both may attempt to write at the same
tail_offset - The storage file's append-only chain can be broken
External Locking Requirements
For multi-process access, applications must implement external file locking:
- Advisory Locks: Use
flock(Unix) orLockFile(Windows) - Mandatory Locks: File system-level exclusive access
- Distributed Locks: For networked file systems
The core library intentionally does not provide cross-process locking to avoid platform-specific dependencies and maintain flexibility.
Sources: README.md:185-206 src/storage_engine/data_store.rs:27-33
Concurrency Testing
The test suite validates concurrent access patterns through multi-threaded integration tests.
Test Coverage
Concurrent Write Test (concurrent_write_test):
- Spawns 16 threads writing 10 entries each (lines 111-161)
- Uses
Arc<DataStore>to share storage instance - Verifies all 160 entries are correctly written and retrievable
- Demonstrates write lock serialization
Interleaved Read-Write Test (interleaved_read_write_test):
- Two threads alternating reads and writes on same key (lines 163-229)
- Uses
tokio::sync::Notifyfor coordination - Validates that writes are immediately visible to readers
- Confirms index updates are atomic
Concurrent Streamed Write Test (concurrent_slow_streamed_write_test):
- Multiple threads streaming 1MB payloads concurrently (lines 14-109)
- Simulates slow readers with artificial delays
- Verifies write lock prevents interleaved streaming
- Validates payload integrity after concurrent writes
Sources: tests/concurrency_tests.rs:14-229
graph TD
A["1. file: RwLock (Write)"] --> B["2. mmap: Mutex"]
B --> C["3. key_indexer: RwLock (Write)"]
C --> D["4. tail_offset: AtomicU64"]
style A fill:#f9f9f9
style B fill:#f9f9f9
style C fill:#f9f9f9
style D fill:#f9f9f9
Locking Order and Deadlock Prevention
To prevent deadlocks, SIMD R Drive follows a consistent lock acquisition order:
Lock Hierarchy
Acquisition Order:
- File Write Lock: Acquired first in all write operations
- Mmap Mutex: Locked during
reindexto update memory mapping - Key Indexer Write Lock: Locked during
reindexto update index - Tail Offset Atomic: Updated last with
Releaseordering
Lock Release: Locks are released in reverse order automatically through RAII (Rust's drop semantics).
Deadlock Prevention: This fixed order ensures no circular wait conditions. Read operations only acquire the key indexer read lock, which cannot deadlock with other readers.
Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:752-825 src/storage_engine/data_store.rs:847-943
Performance Characteristics
Concurrent Read Performance
- Zero-Copy: No memory allocation or copying for payload access
- Lock-Free After Index Lookup: Once
Arc<Mmap>is cloned, no locks held - Scalable: Read throughput increases linearly with thread count
- Cache-Friendly: 64-byte alignment optimizes cache line access
Write Serialization
- Sequential Writes: All writes are serialized by the
RwLock<File> - Minimal Lock Hold Time: Lock held only during actual I/O and index update
- Batch Amortization:
batch_writeamortizes lock overhead across multiple entries - No Write Starvation: Fair
RwLockimplementation prevents reader starvation of writers
Benchmark Results
From benches/storage_benchmark.rs typical performance on a single-threaded benchmark:
- Write Throughput: ~1M entries (8 bytes each) written per batch operation
- Sequential Read: Iterates all entries via zero-copy handles
- Random Read: ~1M random lookups complete in under 1 second
- Batch Read: Vectorized lookups provide additional speedup
For multi-threaded workloads, the concurrency tests demonstrate that read throughput scales with core count while write throughput is limited by the sequential append-only design.
Sources: benches/storage_benchmark.rs:1-234 tests/concurrency_tests.rs:1-230