Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DeepWiki GitHub

Concurrency and Thread Safety

Relevant source files

Purpose and Scope

This document describes the concurrency model and thread safety mechanisms used by the SIMD R Drive storage engine. It covers the synchronization primitives, locking strategies, and guarantees provided for concurrent access in single-process, multi-threaded environments. For information about the core storage architecture and data structures, see Storage Architecture. For details on the DataStore API methods, see DataStore API.

Sources: README.md:170-207 src/storage_engine/data_store.rs:1-33

Synchronization Architecture

The DataStore structure employs a combination of synchronization primitives to enable safe concurrent access while maintaining high performance for reads and writes.

graph TB
    subgraph DataStore["DataStore Structure"]
File["file\nArc<RwLock<BufWriter<File>>>\nWrite synchronization"]
Mmap["mmap\nArc<Mutex<Arc<Mmap>>>\nMemory-map coordination"]
TailOffset["tail_offset\nAtomicU64\nSequential write ordering"]
KeyIndexer["key_indexer\nArc<RwLock<KeyIndexer>>\nConcurrent index access"]
Path["path\nPathBuf\nFile location"]
end
    
 
   ReadOps["Read Operations"] --> KeyIndexer
 
   ReadOps --> Mmap
    
 
   WriteOps["Write Operations"] --> File
 
   WriteOps --> TailOffset
    WriteOps -.reindex.-> Mmap
    WriteOps -.reindex.-> KeyIndexer
    WriteOps -.reindex.-> TailOffset
    
 
   ParallelIter["par_iter_entries"] --> KeyIndexer
 
   ParallelIter --> Mmap

DataStore Synchronization Primitives

Synchronization Primitives by Field:

FieldTypePurposeConcurrency Model
fileArc<RwLock<BufWriter<File>>>Write operationsExclusive write lock required
mmapArc<Mutex<Arc<Mmap>>>Memory-mapped viewLocked during remap operations
tail_offsetAtomicU64Next write positionLock-free atomic updates
key_indexerArc<RwLock<KeyIndexer>>Hash-to-offset indexRead-write lock for lookups/updates
pathPathBufStorage file pathImmutable after creation

Sources: src/storage_engine/data_store.rs:27-33 README.md:172-182

Read Path Concurrency

Read operations in SIMD R Drive are designed to be lock-free and zero-copy, enabling high throughput for concurrent readers.

sequenceDiagram
    participant Thread1 as "Reader Thread 1"
    participant Thread2 as "Reader Thread 2"
    participant KeyIndexer as "key_indexer\nRwLock&lt;KeyIndexer&gt;"
    participant Mmap as "mmap\nMutex&lt;Arc&lt;Mmap&gt;&gt;"
    participant File as "Memory-Mapped File"
    
    Thread1->>KeyIndexer: read_lock()
    Thread2->>KeyIndexer: read_lock()
    Note over Thread1,Thread2: Multiple readers can acquire lock simultaneously
    
    KeyIndexer-->>Thread1: lookup hash → (tag, offset)
    KeyIndexer-->>Thread2: lookup hash → (tag, offset)
    
    Thread1->>Mmap: get_mmap_arc()
    Thread2->>Mmap: get_mmap_arc()
    Note over Mmap: Clone Arc, release Mutex immediately
    
    Mmap-->>Thread1: Arc&lt;Mmap&gt;
    Mmap-->>Thread2: Arc&lt;Mmap&gt;
    
    Thread1->>File: Direct memory access [offset..offset+len]
    Thread2->>File: Direct memory access [offset..offset+len]
    Note over Thread1,Thread2: Zero-copy reads, no locking required

Lock-Free Read Architecture

Read Method Implementation

The read_entry_with_context method (lines 501-565) centralizes the read logic:

  1. Acquire Read Lock: Obtains a read lock on key_indexer via the provided RwLockReadGuard
  2. Index Lookup: Retrieves the packed (tag, offset) value for the key hash
  3. Tag Verification: Validates the tag to detect hash collisions (lines 513-521)
  4. Memory Access: Reads metadata and payload directly from the memory-mapped region
  5. Tombstone Check: Filters out deleted entries (single NULL byte)
  6. Zero-Copy Handle: Returns an EntryHandle referencing the memory-mapped data

Key Performance Characteristics:

  • No Write Lock: Reads never block on the file write lock
  • Concurrent Readers: Multiple threads can read simultaneously via RwLock read lock
  • No Data Copy: The EntryHandle references the memory-mapped region directly
  • Immediate Lock Release: The get_mmap_arc method (lines 658-663) clones the Arc<Mmap> and immediately releases the Mutex

Sources: src/storage_engine/data_store.rs:501-565 src/storage_engine/data_store.rs:658-663 README.md:174-175

Write Path Synchronization

Write operations require exclusive access to the storage file and coordinate with the memory-mapped view and index.

sequenceDiagram
    participant Writer as "Writer Thread"
    participant FileRwLock as "file\nRwLock"
    participant BufWriter as "BufWriter&lt;File&gt;"
    participant TailOffset as "tail_offset\nAtomicU64"
    participant Reindex as "reindex()"
    participant MmapMutex as "mmap\nMutex"
    participant KeyIndexerRwLock as "key_indexer\nRwLock"
    
    Writer->>FileRwLock: write_lock()
    Note over FileRwLock: Exclusive lock acquired\nOnly one writer at a time
    
    Writer->>TailOffset: load(Ordering::Acquire)
    TailOffset-->>Writer: current tail offset
    
    Writer->>BufWriter: write pre-pad + payload + metadata
    Writer->>BufWriter: flush()
    
    Writer->>Reindex: reindex(&file, key_hash_offsets, tail_offset)
    
    Reindex->>BufWriter: init_mmap(&file)
    Note over Reindex: Create new memory-mapped view
    
    Reindex->>MmapMutex: lock()
    Reindex->>MmapMutex: Replace Arc&lt;Mmap&gt;
    Reindex->>MmapMutex: unlock()
    
    Reindex->>KeyIndexerRwLock: write_lock()
    Reindex->>KeyIndexerRwLock: insert(key_hash → offset)
    Reindex->>KeyIndexerRwLock: unlock()
    
    Reindex->>TailOffset: store(new_tail, Ordering::Release)
    
    Reindex-->>Writer: Ok(())
    Writer->>FileRwLock: unlock()

Write Operation Flow

Write Methods and Locking

Single Write (write):

  • Delegates to batch_write_with_key_hashes with a single entry (lines 827-834)

Batch Write (batch_write_with_key_hashes):

  • Acquires write lock on file (lines 852-855)
  • Loads tail_offset atomically (line 858)
  • Writes all entries to buffer (lines 863-922)
  • Writes buffer to file in single operation (line 935)
  • Flushes to disk (line 936)
  • Calls reindex to update mmap and index (lines 938-943)
  • Write lock automatically released on file_guard drop

Streaming Write (write_stream_with_key_hash):

  • Acquires write lock on file (lines 759-762)
  • Streams payload to file via buffered reads (lines 778-790)
  • Flushes after streaming (line 814)
  • Calls reindex to update mmap and index (lines 818-823)

Sources: src/storage_engine/data_store.rs:752-843 src/storage_engine/data_store.rs:847-943

Memory Mapping Coordination

The reindex method coordinates the critical transition from write completion to read visibility by updating the memory-mapped view.

Reindex Operation

The reindex method (lines 224-259) performs four critical operations:

Step-by-Step Process:

  1. Create New Memory Map (line 231):

    • Calls init_mmap(&write_guard) to create fresh Mmap from the file
    • File must be flushed before this point to ensure visibility
  2. Update Shared Mmap (lines 232, 255):

    • Acquires Mutex on self.mmap
    • Replaces the Arc<Mmap> with the new mapping
    • Existing readers retain old Arc<Mmap> references until they drop
  3. Update Index (lines 233-253):

    • Acquires write lock on self.key_indexer
    • Inserts new (key_hash → offset) mappings
    • Removes tombstone entries if present
  4. Update Tail Offset (line 256):

    • Stores new tail offset with Ordering::Release
    • Ensures all prior writes are visible before offset update

Critical Ordering: The write lock on file is held throughout the entire reindex operation, preventing concurrent writes from starting until the index and mmap are fully updated.

Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:172-174

Index Concurrency Model

The key_indexer uses RwLock<KeyIndexer> to enable concurrent read access while providing exclusive write access during updates.

Index Access Patterns

OperationLock TypeHeld DuringPurpose
readRwLock::read()Index lookup only (lines 507-508)Multiple threads can lookup concurrently
batch_readRwLock::read()All lookups in batchAmortizes lock acquisition overhead
par_iter_entriesRwLock::read()Collecting offsets only (lines 300-302)Minimizes lock hold time, drops before parallel iteration
write / batch_writeRwLock::write()Index updates in reindex (lines 233-236)Exclusive access for inserting new entries
delete (tombstone)RwLock::write()Removing from index (line 243)Exclusive access for deletion

Parallel Iteration Strategy

The par_iter_entries method (lines 296-361, feature-gated by parallel) demonstrates efficient lock usage:

This approach:

  • Minimizes the duration the read lock is held
  • Avoids holding locks during expensive parallel processing
  • Allows concurrent writes to proceed while parallel iteration is in progress

Sources: src/storage_engine/data_store.rs:296-361

Atomic Tail Offset

The tail_offset field uses AtomicU64 with acquire-release ordering to coordinate write sequencing without locks.

Atomic Operations Pattern

Load (Acquire):

  • Used at start of write operations (lines 763, 858)
  • Ensures all prior writes are visible before loading the offset
  • Provides the base offset for the new write

Store (Release):

  • Used at end of reindex after all updates (line 256)
  • Ensures all writes (file, mmap, index) are visible before storing
  • Prevents reordering of writes after the offset update

Sequential Consistency: Even though multiple threads may acquire the write lock sequentially, the atomic tail offset ensures that:

  1. Each write observes the correct starting position
  2. The tail offset update happens after all associated data is committed
  3. Readers see a consistent view of the storage

Sources: src/storage_engine/data_store.rs30 src/storage_engine/data_store.rs256 src/storage_engine/data_store.rs763 src/storage_engine/data_store.rs858

Multi-Process Considerations

SIMD R Drive's concurrency mechanisms are designed for single-process, multi-threaded environments. Multi-process access requires external coordination.

Thread Safety Matrix

EnvironmentReadsWritesIndex UpdatesStorage Safety
Single Process, Single Thread✅ Safe✅ Safe✅ Safe✅ Safe
Single Process, Multi-Threaded✅ Safe (lock-free, zero-copy)✅ Safe (RwLock<File>)✅ Safe (RwLock<KeyIndexer>)✅ Safe (Mutex<Arc<Mmap>>)
Multiple Processes, Shared File⚠️ Unsafe (no cross-process coordination)❌ Unsafe (no external locking)❌ Unsafe (separate memory spaces)❌ Unsafe (race conditions)

Why Multi-Process is Unsafe

Separate Memory Spaces:

  • Each process has its own Arc<Mmap> instance
  • Index changes in one process are not visible to others
  • The tail_offset atomic is process-local

No Cross-Process Synchronization:

  • RwLock and Mutex only synchronize within a single process
  • File writes from multiple processes can interleave arbitrarily
  • Memory mapping updates are not coordinated

Risk of Corruption:

  • Two processes can both acquire file write locks independently
  • Both may attempt to write at the same tail_offset
  • The storage file's append-only chain can be broken

External Locking Requirements

For multi-process access, applications must implement external file locking:

  • Advisory Locks: Use flock (Unix) or LockFile (Windows)
  • Mandatory Locks: File system-level exclusive access
  • Distributed Locks: For networked file systems

The core library intentionally does not provide cross-process locking to avoid platform-specific dependencies and maintain flexibility.

Sources: README.md:185-206 src/storage_engine/data_store.rs:27-33

Concurrency Testing

The test suite validates concurrent access patterns through multi-threaded integration tests.

Test Coverage

Concurrent Write Test (concurrent_write_test):

  • Spawns 16 threads writing 10 entries each (lines 111-161)
  • Uses Arc<DataStore> to share storage instance
  • Verifies all 160 entries are correctly written and retrievable
  • Demonstrates write lock serialization

Interleaved Read-Write Test (interleaved_read_write_test):

  • Two threads alternating reads and writes on same key (lines 163-229)
  • Uses tokio::sync::Notify for coordination
  • Validates that writes are immediately visible to readers
  • Confirms index updates are atomic

Concurrent Streamed Write Test (concurrent_slow_streamed_write_test):

  • Multiple threads streaming 1MB payloads concurrently (lines 14-109)
  • Simulates slow readers with artificial delays
  • Verifies write lock prevents interleaved streaming
  • Validates payload integrity after concurrent writes

Sources: tests/concurrency_tests.rs:14-229

graph TD
 
   A["1. file: RwLock (Write)"] --> B["2. mmap: Mutex"]
B --> C["3. key_indexer: RwLock (Write)"]
C --> D["4. tail_offset: AtomicU64"]
style A fill:#f9f9f9
    style B fill:#f9f9f9
    style C fill:#f9f9f9
    style D fill:#f9f9f9

Locking Order and Deadlock Prevention

To prevent deadlocks, SIMD R Drive follows a consistent lock acquisition order:

Lock Hierarchy

Acquisition Order:

  1. File Write Lock: Acquired first in all write operations
  2. Mmap Mutex: Locked during reindex to update memory mapping
  3. Key Indexer Write Lock: Locked during reindex to update index
  4. Tail Offset Atomic: Updated last with Release ordering

Lock Release: Locks are released in reverse order automatically through RAII (Rust's drop semantics).

Deadlock Prevention: This fixed order ensures no circular wait conditions. Read operations only acquire the key indexer read lock, which cannot deadlock with other readers.

Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:752-825 src/storage_engine/data_store.rs:847-943

Performance Characteristics

Concurrent Read Performance

  • Zero-Copy: No memory allocation or copying for payload access
  • Lock-Free After Index Lookup: Once Arc<Mmap> is cloned, no locks held
  • Scalable: Read throughput increases linearly with thread count
  • Cache-Friendly: 64-byte alignment optimizes cache line access

Write Serialization

  • Sequential Writes: All writes are serialized by the RwLock<File>
  • Minimal Lock Hold Time: Lock held only during actual I/O and index update
  • Batch Amortization: batch_write amortizes lock overhead across multiple entries
  • No Write Starvation: Fair RwLock implementation prevents reader starvation of writers

Benchmark Results

From benches/storage_benchmark.rs typical performance on a single-threaded benchmark:

  • Write Throughput: ~1M entries (8 bytes each) written per batch operation
  • Sequential Read: Iterates all entries via zero-copy handles
  • Random Read: ~1M random lookups complete in under 1 second
  • Batch Read: Vectorized lookups provide additional speedup

For multi-threaded workloads, the concurrency tests demonstrate that read throughput scales with core count while write throughput is limited by the sequential append-only design.

Sources: benches/storage_benchmark.rs:1-234 tests/concurrency_tests.rs:1-230