This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Concurrency and Thread Safety
Loading…
Concurrency and Thread Safety
Relevant source files
- README.md
- benches/storage_benchmark.rs
- src/main.rs
- src/storage_engine/data_store.rs
- src/utils/format_bytes.rs
- tests/concurrency_tests.rs
Purpose and Scope
This document describes the concurrency model and thread safety mechanisms used by the SIMD R Drive storage engine. It covers the synchronization primitives, locking strategies, and guarantees provided for concurrent access in single-process, multi-threaded environments. For information about the core storage architecture and data structures, see Storage Architecture. For details on the DataStore API methods, see DataStore API.
Sources: README.md:170-207 src/storage_engine/data_store.rs:1-33
Synchronization Architecture
The DataStore structure employs a combination of synchronization primitives to enable safe concurrent access while maintaining high performance for reads and writes.
graph TB
subgraph DataStore["DataStore Structure"]
File["file\nArc<RwLock<BufWriter<File>>>\nWrite synchronization"]
Mmap["mmap\nArc<Mutex<Arc<Mmap>>>\nMemory-map coordination"]
TailOffset["tail_offset\nAtomicU64\nSequential write ordering"]
KeyIndexer["key_indexer\nArc<RwLock<KeyIndexer>>\nConcurrent index access"]
Path["path\nPathBuf\nFile location"]
end
ReadOps["Read Operations"] --> KeyIndexer
ReadOps --> Mmap
WriteOps["Write Operations"] --> File
WriteOps --> TailOffset
WriteOps -.reindex.-> Mmap
WriteOps -.reindex.-> KeyIndexer
WriteOps -.reindex.-> TailOffset
ParallelIter["par_iter_entries"] --> KeyIndexer
ParallelIter --> Mmap
DataStore Synchronization Primitives
Synchronization Primitives by Field:
| Field | Type | Purpose | Concurrency Model |
|---|---|---|---|
file | Arc<RwLock<BufWriter<File>>> | Write operations | Exclusive write lock required |
mmap | Arc<Mutex<Arc<Mmap>>> | Memory-mapped view | Locked during remap operations |
tail_offset | AtomicU64 | Next write position | Lock-free atomic updates |
key_indexer | Arc<RwLock<KeyIndexer>> | Hash-to-offset index | Read-write lock for lookups/updates |
path | PathBuf | Storage file path | Immutable after creation |
Sources: src/storage_engine/data_store.rs:27-33 README.md:172-182
Read Path Concurrency
Read operations in SIMD R Drive are designed to be lock-free and zero-copy, enabling high throughput for concurrent readers.
sequenceDiagram
participant Thread1 as "Reader Thread 1"
participant Thread2 as "Reader Thread 2"
participant KeyIndexer as "key_indexer\nRwLock<KeyIndexer>"
participant Mmap as "mmap\nMutex<Arc<Mmap>>"
participant File as "Memory-Mapped File"
Thread1->>KeyIndexer: read_lock()
Thread2->>KeyIndexer: read_lock()
Note over Thread1,Thread2: Multiple readers can acquire lock simultaneously
KeyIndexer-->>Thread1: lookup hash → (tag, offset)
KeyIndexer-->>Thread2: lookup hash → (tag, offset)
Thread1->>Mmap: get_mmap_arc()
Thread2->>Mmap: get_mmap_arc()
Note over Mmap: Clone Arc, release Mutex immediately
Mmap-->>Thread1: Arc<Mmap>
Mmap-->>Thread2: Arc<Mmap>
Thread1->>File: Direct memory access [offset..offset+len]
Thread2->>File: Direct memory access [offset..offset+len]
Note over Thread1,Thread2: Zero-copy reads, no locking required
Lock-Free Read Architecture
Read Method Implementation
The read_entry_with_context method (lines 501-565) centralizes the read logic:
- Acquire Read Lock: Obtains a read lock on
key_indexervia the providedRwLockReadGuard - Index Lookup: Retrieves the packed
(tag, offset)value for the key hash - Tag Verification: Validates the tag to detect hash collisions (lines 513-521)
- Memory Access: Reads metadata and payload directly from the memory-mapped region
- Tombstone Check: Filters out deleted entries (single NULL byte)
- Zero-Copy Handle: Returns an
EntryHandlereferencing the memory-mapped data
Key Performance Characteristics:
- No Write Lock: Reads never block on the file write lock
- Concurrent Readers: Multiple threads can read simultaneously via
RwLockread lock - No Data Copy: The
EntryHandlereferences the memory-mapped region directly - Immediate Lock Release: The
get_mmap_arcmethod (lines 658-663) clones theArc<Mmap>and immediately releases theMutex
Sources: src/storage_engine/data_store.rs:501-565 src/storage_engine/data_store.rs:658-663 README.md:174-175
Write Path Synchronization
Write operations require exclusive access to the storage file and coordinate with the memory-mapped view and index.
sequenceDiagram
participant Writer as "Writer Thread"
participant FileRwLock as "file\nRwLock"
participant BufWriter as "BufWriter<File>"
participant TailOffset as "tail_offset\nAtomicU64"
participant Reindex as "reindex()"
participant MmapMutex as "mmap\nMutex"
participant KeyIndexerRwLock as "key_indexer\nRwLock"
Writer->>FileRwLock: write_lock()
Note over FileRwLock: Exclusive lock acquired\nOnly one writer at a time
Writer->>TailOffset: load(Ordering::Acquire)
TailOffset-->>Writer: current tail offset
Writer->>BufWriter: write pre-pad + payload + metadata
Writer->>BufWriter: flush()
Writer->>Reindex: reindex(&file, key_hash_offsets, tail_offset)
Reindex->>BufWriter: init_mmap(&file)
Note over Reindex: Create new memory-mapped view
Reindex->>MmapMutex: lock()
Reindex->>MmapMutex: Replace Arc<Mmap>
Reindex->>MmapMutex: unlock()
Reindex->>KeyIndexerRwLock: write_lock()
Reindex->>KeyIndexerRwLock: insert(key_hash → offset)
Reindex->>KeyIndexerRwLock: unlock()
Reindex->>TailOffset: store(new_tail, Ordering::Release)
Reindex-->>Writer: Ok(())
Writer->>FileRwLock: unlock()
Write Operation Flow
Write Methods and Locking
Single Write (write):
- Delegates to
batch_write_with_key_hasheswith a single entry (lines 827-834)
Batch Write (batch_write_with_key_hashes):
- Acquires write lock on
file(lines 852-855) - Loads
tail_offsetatomically (line 858) - Writes all entries to buffer (lines 863-922)
- Writes buffer to file in single operation (line 935)
- Flushes to disk (line 936)
- Calls
reindexto update mmap and index (lines 938-943) - Write lock automatically released on
file_guarddrop
Streaming Write (write_stream_with_key_hash):
- Acquires write lock on
file(lines 759-762) - Streams payload to file via buffered reads (lines 778-790)
- Flushes after streaming (line 814)
- Calls
reindexto update mmap and index (lines 818-823)
Sources: src/storage_engine/data_store.rs:752-843 src/storage_engine/data_store.rs:847-943
Memory Mapping Coordination
The reindex method coordinates the critical transition from write completion to read visibility by updating the memory-mapped view.
Reindex Operation
The reindex method (lines 224-259) performs four critical operations:
Step-by-Step Process:
-
Create New Memory Map (line 231):
- Calls
init_mmap(&write_guard)to create freshMmapfrom the file - File must be flushed before this point to ensure visibility
- Calls
-
Update Shared Mmap (lines 232, 255):
- Acquires
Mutexonself.mmap - Replaces the
Arc<Mmap>with the new mapping - Existing readers retain old
Arc<Mmap>references until they drop
- Acquires
-
Update Index (lines 233-253):
- Acquires write lock on
self.key_indexer - Inserts new
(key_hash → offset)mappings - Removes tombstone entries if present
- Acquires write lock on
-
Update Tail Offset (line 256):
- Stores new tail offset with
Ordering::Release - Ensures all prior writes are visible before offset update
- Stores new tail offset with
Critical Ordering: The write lock on file is held throughout the entire reindex operation, preventing concurrent writes from starting until the index and mmap are fully updated.
Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:172-174
Index Concurrency Model
The key_indexer uses RwLock<KeyIndexer> to enable concurrent read access while providing exclusive write access during updates.
Index Access Patterns
| Operation | Lock Type | Held During | Purpose |
|---|---|---|---|
read | RwLock::read() | Index lookup only (lines 507-508) | Multiple threads can lookup concurrently |
batch_read | RwLock::read() | All lookups in batch | Amortizes lock acquisition overhead |
par_iter_entries | RwLock::read() | Collecting offsets only (lines 300-302) | Minimizes lock hold time, drops before parallel iteration |
write / batch_write | RwLock::write() | Index updates in reindex (lines 233-236) | Exclusive access for inserting new entries |
delete (tombstone) | RwLock::write() | Removing from index (line 243) | Exclusive access for deletion |
Parallel Iteration Strategy
The par_iter_entries method (lines 296-361, feature-gated by parallel) demonstrates efficient lock usage:
This approach:
- Minimizes the duration the read lock is held
- Avoids holding locks during expensive parallel processing
- Allows concurrent writes to proceed while parallel iteration is in progress
Sources: src/storage_engine/data_store.rs:296-361
Atomic Tail Offset
The tail_offset field uses AtomicU64 with acquire-release ordering to coordinate write sequencing without locks.
Atomic Operations Pattern
Load (Acquire):
- Used at start of write operations (lines 763, 858)
- Ensures all prior writes are visible before loading the offset
- Provides the base offset for the new write
Store (Release):
- Used at end of
reindexafter all updates (line 256) - Ensures all writes (file, mmap, index) are visible before storing
- Prevents reordering of writes after the offset update
Sequential Consistency: Even though multiple threads may acquire the write lock sequentially, the atomic tail offset ensures that:
- Each write observes the correct starting position
- The tail offset update happens after all associated data is committed
- Readers see a consistent view of the storage
Sources: src/storage_engine/data_store.rs30 src/storage_engine/data_store.rs256 src/storage_engine/data_store.rs763 src/storage_engine/data_store.rs858
Multi-Process Considerations
SIMD R Drive’s concurrency mechanisms are designed for single-process, multi-threaded environments. Multi-process access requires external coordination.
Thread Safety Matrix
| Environment | Reads | Writes | Index Updates | Storage Safety |
|---|---|---|---|---|
| Single Process, Single Thread | ✅ Safe | ✅ Safe | ✅ Safe | ✅ Safe |
| Single Process, Multi-Threaded | ✅ Safe (lock-free, zero-copy) | ✅ Safe (RwLock<File>) | ✅ Safe (RwLock<KeyIndexer>) | ✅ Safe (Mutex<Arc<Mmap>>) |
| Multiple Processes, Shared File | ⚠️ Unsafe (no cross-process coordination) | ❌ Unsafe (no external locking) | ❌ Unsafe (separate memory spaces) | ❌ Unsafe (race conditions) |
Why Multi-Process is Unsafe
Separate Memory Spaces:
- Each process has its own
Arc<Mmap>instance - Index changes in one process are not visible to others
- The
tail_offsetatomic is process-local
No Cross-Process Synchronization:
RwLockandMutexonly synchronize within a single process- File writes from multiple processes can interleave arbitrarily
- Memory mapping updates are not coordinated
Risk of Corruption:
- Two processes can both acquire file write locks independently
- Both may attempt to write at the same
tail_offset - The storage file’s append-only chain can be broken
External Locking Requirements
For multi-process access, applications must implement external file locking:
- Advisory Locks: Use
flock(Unix) orLockFile(Windows) - Mandatory Locks: File system-level exclusive access
- Distributed Locks: For networked file systems
The core library intentionally does not provide cross-process locking to avoid platform-specific dependencies and maintain flexibility.
Sources: README.md:185-206 src/storage_engine/data_store.rs:27-33
Concurrency Testing
The test suite validates concurrent access patterns through multi-threaded integration tests.
Test Coverage
Concurrent Write Test (concurrent_write_test):
- Spawns 16 threads writing 10 entries each (lines 111-161)
- Uses
Arc<DataStore>to share storage instance - Verifies all 160 entries are correctly written and retrievable
- Demonstrates write lock serialization
Interleaved Read-Write Test (interleaved_read_write_test):
- Two threads alternating reads and writes on same key (lines 163-229)
- Uses
tokio::sync::Notifyfor coordination - Validates that writes are immediately visible to readers
- Confirms index updates are atomic
Concurrent Streamed Write Test (concurrent_slow_streamed_write_test):
- Multiple threads streaming 1MB payloads concurrently (lines 14-109)
- Simulates slow readers with artificial delays
- Verifies write lock prevents interleaved streaming
- Validates payload integrity after concurrent writes
Sources: tests/concurrency_tests.rs:14-229
graph TD
A["1. file: RwLock (Write)"] --> B["2. mmap: Mutex"]
B --> C["3. key_indexer: RwLock (Write)"]
C --> D["4. tail_offset: AtomicU64"]
style A fill:#f9f9f9
style B fill:#f9f9f9
style C fill:#f9f9f9
style D fill:#f9f9f9
Locking Order and Deadlock Prevention
To prevent deadlocks, SIMD R Drive follows a consistent lock acquisition order:
Lock Hierarchy
Acquisition Order:
- File Write Lock: Acquired first in all write operations
- Mmap Mutex: Locked during
reindexto update memory mapping - Key Indexer Write Lock: Locked during
reindexto update index - Tail Offset Atomic: Updated last with
Releaseordering
Lock Release: Locks are released in reverse order automatically through RAII (Rust’s drop semantics).
Deadlock Prevention: This fixed order ensures no circular wait conditions. Read operations only acquire the key indexer read lock, which cannot deadlock with other readers.
Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:752-825 src/storage_engine/data_store.rs:847-943
Performance Characteristics
Concurrent Read Performance
- Zero-Copy: No memory allocation or copying for payload access
- Lock-Free After Index Lookup: Once
Arc<Mmap>is cloned, no locks held - Scalable: Read throughput increases linearly with thread count
- Cache-Friendly: 64-byte alignment optimizes cache line access
Write Serialization
- Sequential Writes: All writes are serialized by the
RwLock<File> - Minimal Lock Hold Time: Lock held only during actual I/O and index update
- Batch Amortization:
batch_writeamortizes lock overhead across multiple entries - No Write Starvation: Fair
RwLockimplementation prevents reader starvation of writers
Benchmark Results
From benches/storage_benchmark.rs typical performance on a single-threaded benchmark:
- Write Throughput: ~1M entries (8 bytes each) written per batch operation
- Sequential Read: Iterates all entries via zero-copy handles
- Random Read: ~1M random lookups complete in under 1 second
- Batch Read: Vectorized lookups provide additional speedup
For multi-threaded workloads, the concurrency tests demonstrate that read throughput scales with core count while write throughput is limited by the sequential append-only design.
Sources: benches/storage_benchmark.rs:1-234 tests/concurrency_tests.rs:1-230
Dismiss
Refresh this wiki
Enter email to refresh