Concurrency and Thread Safety

Relevant source files

Purpose and Scope

This document describes the concurrency model and thread safety mechanisms used by the SIMD R Drive storage engine. It covers the synchronization primitives, locking strategies, and guarantees provided for concurrent access in single-process, multi-threaded environments. For information about the core storage architecture and data structures, see Storage Architecture. For details on the DataStore API methods, see DataStore API.

Sources: README.md:170-207 src/storage_engine/data_store.rs:1-33

Synchronization Architecture

The DataStore structure employs a combination of synchronization primitives to enable safe concurrent access while maintaining high performance for reads and writes.

graph TB
    subgraph DataStore["DataStore Structure"]
File["file\nArc&lt;RwLock&lt;BufWriter&lt;File&gt;&gt;&gt;\nWrite synchronization"]
Mmap["mmap\nArc&lt;Mutex&lt;Arc&lt;Mmap&gt;&gt;&gt;\nMemory-map coordination"]
TailOffset["tail_offset\nAtomicU64\nSequential write ordering"]
KeyIndexer["key_indexer\nArc&lt;RwLock&lt;KeyIndexer&gt;&gt;\nConcurrent index access"]
Path["path\nPathBuf\nFile location"]
end
    
 
   ReadOps["Read Operations"] --> KeyIndexer
 
   ReadOps --> Mmap
    
 
   WriteOps["Write Operations"] --> File
 
   WriteOps --> TailOffset
    WriteOps -.reindex.-> Mmap
    WriteOps -.reindex.-> KeyIndexer
    WriteOps -.reindex.-> TailOffset
    
 
   ParallelIter["par_iter_entries"] --> KeyIndexer
 
   ParallelIter --> Mmap

DataStore Synchronization Primitives

Synchronization Primitives by Field:

Field	Type	Purpose	Concurrency Model
`file`	`Arc<RwLock<BufWriter<File>>>`	Write operations	Exclusive write lock required
`mmap`	`Arc<Mutex<Arc<Mmap>>>`	Memory-mapped view	Locked during remap operations
`tail_offset`	`AtomicU64`	Next write position	Lock-free atomic updates
`key_indexer`	`Arc<RwLock<KeyIndexer>>`	Hash-to-offset index	Read-write lock for lookups/updates
`path`	`PathBuf`	Storage file path	Immutable after creation

Sources: src/storage_engine/data_store.rs:27-33 README.md:172-182

Read Path Concurrency

Read operations in SIMD R Drive are designed to be lock-free and zero-copy, enabling high throughput for concurrent readers.

sequenceDiagram
    participant Thread1 as "Reader Thread 1"
    participant Thread2 as "Reader Thread 2"
    participant KeyIndexer as "key_indexer\nRwLock&lt;KeyIndexer&gt;"
    participant Mmap as "mmap\nMutex&lt;Arc&lt;Mmap&gt;&gt;"
    participant File as "Memory-Mapped File"
    
    Thread1->>KeyIndexer: read_lock()
    Thread2->>KeyIndexer: read_lock()
    Note over Thread1,Thread2: Multiple readers can acquire lock simultaneously
    
    KeyIndexer-->>Thread1: lookup hash → (tag, offset)
    KeyIndexer-->>Thread2: lookup hash → (tag, offset)
    
    Thread1->>Mmap: get_mmap_arc()
    Thread2->>Mmap: get_mmap_arc()
    Note over Mmap: Clone Arc, release Mutex immediately
    
    Mmap-->>Thread1: Arc&lt;Mmap&gt;
    Mmap-->>Thread2: Arc&lt;Mmap&gt;
    
    Thread1->>File: Direct memory access [offset..offset+len]
    Thread2->>File: Direct memory access [offset..offset+len]
    Note over Thread1,Thread2: Zero-copy reads, no locking required

Lock-Free Read Architecture

Read Method Implementation

The read_entry_with_context method (lines 501-565) centralizes the read logic:

Acquire Read Lock: Obtains a read lock on key_indexer via the provided RwLockReadGuard
Index Lookup: Retrieves the packed (tag, offset) value for the key hash
Tag Verification: Validates the tag to detect hash collisions (lines 513-521)
Memory Access: Reads metadata and payload directly from the memory-mapped region
Tombstone Check: Filters out deleted entries (single NULL byte)
Zero-Copy Handle: Returns an EntryHandle referencing the memory-mapped data

Key Performance Characteristics:

No Write Lock: Reads never block on the file write lock
Concurrent Readers: Multiple threads can read simultaneously via RwLock read lock
No Data Copy: The EntryHandle references the memory-mapped region directly
Immediate Lock Release: The get_mmap_arc method (lines 658-663) clones the Arc<Mmap> and immediately releases the Mutex

Sources: src/storage_engine/data_store.rs:501-565 src/storage_engine/data_store.rs:658-663 README.md:174-175

Write Path Synchronization

Write operations require exclusive access to the storage file and coordinate with the memory-mapped view and index.

sequenceDiagram
    participant Writer as "Writer Thread"
    participant FileRwLock as "file\nRwLock"
    participant BufWriter as "BufWriter&lt;File&gt;"
    participant TailOffset as "tail_offset\nAtomicU64"
    participant Reindex as "reindex()"
    participant MmapMutex as "mmap\nMutex"
    participant KeyIndexerRwLock as "key_indexer\nRwLock"
    
    Writer->>FileRwLock: write_lock()
    Note over FileRwLock: Exclusive lock acquired\nOnly one writer at a time
    
    Writer->>TailOffset: load(Ordering::Acquire)
    TailOffset-->>Writer: current tail offset
    
    Writer->>BufWriter: write pre-pad + payload + metadata
    Writer->>BufWriter: flush()
    
    Writer->>Reindex: reindex(&file, key_hash_offsets, tail_offset)
    
    Reindex->>BufWriter: init_mmap(&file)
    Note over Reindex: Create new memory-mapped view
    
    Reindex->>MmapMutex: lock()
    Reindex->>MmapMutex: Replace Arc&lt;Mmap&gt;
    Reindex->>MmapMutex: unlock()
    
    Reindex->>KeyIndexerRwLock: write_lock()
    Reindex->>KeyIndexerRwLock: insert(key_hash → offset)
    Reindex->>KeyIndexerRwLock: unlock()
    
    Reindex->>TailOffset: store(new_tail, Ordering::Release)
    
    Reindex-->>Writer: Ok(())
    Writer->>FileRwLock: unlock()

Write Operation Flow

Write Methods and Locking

Single Write (write):

Delegates to batch_write_with_key_hashes with a single entry (lines 827-834)

Batch Write (batch_write_with_key_hashes):

Acquires write lock on file (lines 852-855)
Loads tail_offset atomically (line 858)
Writes all entries to buffer (lines 863-922)
Writes buffer to file in single operation (line 935)
Flushes to disk (line 936)
Calls reindex to update mmap and index (lines 938-943)
Write lock automatically released on file_guard drop

Streaming Write (write_stream_with_key_hash):

Acquires write lock on file (lines 759-762)
Streams payload to file via buffered reads (lines 778-790)
Flushes after streaming (line 814)
Calls reindex to update mmap and index (lines 818-823)

Sources: src/storage_engine/data_store.rs:752-843 src/storage_engine/data_store.rs:847-943

Memory Mapping Coordination

The reindex method coordinates the critical transition from write completion to read visibility by updating the memory-mapped view.

Reindex Operation

The reindex method (lines 224-259) performs four critical operations:

Step-by-Step Process:

Create New Memory Map (line 231):
- Calls init_mmap(&write_guard) to create fresh Mmap from the file
- File must be flushed before this point to ensure visibility
Update Shared Mmap (lines 232, 255):
- Acquires Mutex on self.mmap
- Replaces the Arc<Mmap> with the new mapping
- Existing readers retain old Arc<Mmap> references until they drop
Update Index (lines 233-253):
- Acquires write lock on self.key_indexer
- Inserts new (key_hash → offset) mappings
- Removes tombstone entries if present
Update Tail Offset (line 256):
- Stores new tail offset with Ordering::Release
- Ensures all prior writes are visible before offset update

Critical Ordering: The write lock on file is held throughout the entire reindex operation, preventing concurrent writes from starting until the index and mmap are fully updated.

Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:172-174

Index Concurrency Model

The key_indexer uses RwLock<KeyIndexer> to enable concurrent read access while providing exclusive write access during updates.

Index Access Patterns

Operation	Lock Type	Held During	Purpose
`read`	`RwLock::read()`	Index lookup only (lines 507-508)	Multiple threads can lookup concurrently
`batch_read`	`RwLock::read()`	All lookups in batch	Amortizes lock acquisition overhead
`par_iter_entries`	`RwLock::read()`	Collecting offsets only (lines 300-302)	Minimizes lock hold time, drops before parallel iteration
`write` / `batch_write`	`RwLock::write()`	Index updates in `reindex` (lines 233-236)	Exclusive access for inserting new entries
`delete` (tombstone)	`RwLock::write()`	Removing from index (line 243)	Exclusive access for deletion

Parallel Iteration Strategy

The par_iter_entries method (lines 296-361, feature-gated by parallel) demonstrates efficient lock usage:

This approach:

Minimizes the duration the read lock is held
Avoids holding locks during expensive parallel processing
Allows concurrent writes to proceed while parallel iteration is in progress

Sources: src/storage_engine/data_store.rs:296-361

Atomic Tail Offset

The tail_offset field uses AtomicU64 with acquire-release ordering to coordinate write sequencing without locks.

Atomic Operations Pattern

Load (Acquire):

Used at start of write operations (lines 763, 858)
Ensures all prior writes are visible before loading the offset
Provides the base offset for the new write

Store (Release):

Used at end of reindex after all updates (line 256)
Ensures all writes (file, mmap, index) are visible before storing
Prevents reordering of writes after the offset update

Sequential Consistency: Even though multiple threads may acquire the write lock sequentially, the atomic tail offset ensures that:

Each write observes the correct starting position
The tail offset update happens after all associated data is committed
Readers see a consistent view of the storage

Sources: src/storage_engine/data_store.rs30 src/storage_engine/data_store.rs256 src/storage_engine/data_store.rs763 src/storage_engine/data_store.rs858

Multi-Process Considerations

SIMD R Drive's concurrency mechanisms are designed for single-process, multi-threaded environments. Multi-process access requires external coordination.

Thread Safety Matrix

Environment	Reads	Writes	Index Updates	Storage Safety
Single Process, Single Thread	✅ Safe	✅ Safe	✅ Safe	✅ Safe
Single Process, Multi-Threaded	✅ Safe (lock-free, zero-copy)	✅ Safe (`RwLock<File>`)	✅ Safe (`RwLock<KeyIndexer>`)	✅ Safe (`Mutex<Arc<Mmap>>`)
Multiple Processes, Shared File	⚠️ Unsafe (no cross-process coordination)	❌ Unsafe (no external locking)	❌ Unsafe (separate memory spaces)	❌ Unsafe (race conditions)

Why Multi-Process is Unsafe

Separate Memory Spaces:

Each process has its own Arc<Mmap> instance
Index changes in one process are not visible to others
The tail_offset atomic is process-local

No Cross-Process Synchronization:

RwLock and Mutex only synchronize within a single process
File writes from multiple processes can interleave arbitrarily
Memory mapping updates are not coordinated

Risk of Corruption:

Two processes can both acquire file write locks independently
Both may attempt to write at the same tail_offset
The storage file's append-only chain can be broken

External Locking Requirements

For multi-process access, applications must implement external file locking:

Advisory Locks: Use flock (Unix) or LockFile (Windows)
Mandatory Locks: File system-level exclusive access
Distributed Locks: For networked file systems

The core library intentionally does not provide cross-process locking to avoid platform-specific dependencies and maintain flexibility.

Sources: README.md:185-206 src/storage_engine/data_store.rs:27-33

Concurrency Testing

The test suite validates concurrent access patterns through multi-threaded integration tests.

Test Coverage

Concurrent Write Test (concurrent_write_test):

Spawns 16 threads writing 10 entries each (lines 111-161)
Uses Arc<DataStore> to share storage instance
Verifies all 160 entries are correctly written and retrievable
Demonstrates write lock serialization

Interleaved Read-Write Test (interleaved_read_write_test):

Two threads alternating reads and writes on same key (lines 163-229)
Uses tokio::sync::Notify for coordination
Validates that writes are immediately visible to readers
Confirms index updates are atomic

Concurrent Streamed Write Test (concurrent_slow_streamed_write_test):

Multiple threads streaming 1MB payloads concurrently (lines 14-109)
Simulates slow readers with artificial delays
Verifies write lock prevents interleaved streaming
Validates payload integrity after concurrent writes

Sources: tests/concurrency_tests.rs:14-229

graph TD
 
   A["1. file: RwLock (Write)"] --> B["2. mmap: Mutex"]
B --> C["3. key_indexer: RwLock (Write)"]
C --> D["4. tail_offset: AtomicU64"]
style A fill:#f9f9f9
    style B fill:#f9f9f9
    style C fill:#f9f9f9
    style D fill:#f9f9f9

Locking Order and Deadlock Prevention

To prevent deadlocks, SIMD R Drive follows a consistent lock acquisition order:

Lock Hierarchy

Acquisition Order:

File Write Lock: Acquired first in all write operations
Mmap Mutex: Locked during reindex to update memory mapping
Key Indexer Write Lock: Locked during reindex to update index
Tail Offset Atomic: Updated last with Release ordering

Lock Release: Locks are released in reverse order automatically through RAII (Rust's drop semantics).

Deadlock Prevention: This fixed order ensures no circular wait conditions. Read operations only acquire the key indexer read lock, which cannot deadlock with other readers.

Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:752-825 src/storage_engine/data_store.rs:847-943

Performance Characteristics

Concurrent Read Performance

Zero-Copy: No memory allocation or copying for payload access
Lock-Free After Index Lookup: Once Arc<Mmap> is cloned, no locks held
Scalable: Read throughput increases linearly with thread count
Cache-Friendly: 64-byte alignment optimizes cache line access

Write Serialization

Sequential Writes: All writes are serialized by the RwLock<File>
Minimal Lock Hold Time: Lock held only during actual I/O and index update
Batch Amortization: batch_write amortizes lock overhead across multiple entries
No Write Starvation: Fair RwLock implementation prevents reader starvation of writers

Benchmark Results

From benches/storage_benchmark.rs typical performance on a single-threaded benchmark:

Write Throughput: ~1M entries (8 bytes each) written per batch operation
Sequential Read: Iterates all entries via zero-copy handles
Random Read: ~1M random lookups complete in under 1 second
Batch Read: Vectorized lookups provide additional speedup

For multi-threaded workloads, the concurrency tests demonstrate that read throughput scales with core count while write throughput is limited by the sequential append-only design.

Sources: benches/storage_benchmark.rs:1-234 tests/concurrency_tests.rs:1-230

Keyboard shortcuts

rust-simd-r-drive Documentation