Overview
Relevant source files
Purpose and Scope
This document provides a high-level introduction to SIMD R Drive, describing its core purpose, architectural components, access methods, and key features. It serves as the entry point for understanding the system before diving into detailed subsystem documentation.
For details on the core storage engine internals, see Core Storage Engine. For network-based remote access, see Network Layer and RPC. For Python integration, see Python Integration. For performance optimization details, see Performance Optimizations.
Sources: README.md:1-42
What is SIMD R Drive?
SIMD R Drive is a high-performance, append-only, schema-less storage engine designed for zero-copy binary data access. It stores arbitrary binary payloads in a single-file container without imposing serialization formats, schemas, or data interpretation. All data is treated as raw bytes (&[u8]), providing maximum flexibility for applications that require high-speed storage and retrieval of binary data.
Core Characteristics
| Characteristic | Description |
|---|---|
| Storage Model | Append-only, single-file container |
| Data Format | Schema-less binary (&[u8]) |
| Access Pattern | Zero-copy memory-mapped reads |
| Alignment | 64-byte boundaries (configurable via PAYLOAD_ALIGNMENT) |
| Concurrency | Thread-safe reads and writes within a single process |
| Indexing | Hardware-accelerated XXH3_64 hash-based key lookup |
| Integrity | CRC32C checksums and validation chain |
The storage engine is optimized for workloads that benefit from SIMD operations, cache-line efficiency, and direct memory access. By enforcing 64-byte payload alignment, it enables efficient typed slice reinterpretation (e.g., &[u32], &[u64]) without copying.
Sources: README.md:5-8 README.md:43-87 Cargo.toml13
Core Architecture Components
The system consists of three primary layers: the storage engine core, network access layer, and language bindings. The following diagram maps high-level architectural concepts to specific code entities.
graph TB
subgraph "User Interfaces"
CLI["CLI Binary\nsimd-r-drive crate\nmain.rs"]
PythonApp["Python Applications\nsimd_r_drive.DataStoreWsClient"]
RustApp["Native Rust Clients\nDataStoreWsClient struct"]
end
subgraph "Network Layer"
WSServer["WebSocket Server\nsimd-r-drive-ws-server\nAxum HTTP server"]
RPCDef["Service Definition\nsimd-r-drive-muxio-service-definition\nDataStoreService trait"]
end
subgraph "Core Storage Engine"
DataStore["DataStore struct\nsrc/data_store.rs"]
Traits["DataStoreReader trait\nDataStoreWriter trait"]
KeyIndexer["KeyIndexer struct\nsrc/key_indexer.rs"]
EntryHandle["EntryHandle struct\nsimd-r-drive-entry-handle crate"]
end
subgraph "Storage Infrastructure"
Mmap["Arc<Mmap>\nmemmap2 crate"]
FileHandle["BufWriter<File>\nstd::fs::File"]
AtomicOffset["AtomicU64\ntail_offset field"]
end
subgraph "Performance Layer"
SIMDCopy["simd_copy function\nsrc/simd_utils.rs"]
XXH3["xxhash-rust crate\nXXH3_64 algorithm"]
end
CLI --> DataStore
PythonApp --> WSServer
RustApp --> WSServer
WSServer --> RPCDef
RPCDef --> DataStore
DataStore --> Traits
DataStore --> KeyIndexer
DataStore --> EntryHandle
DataStore --> Mmap
DataStore --> FileHandle
DataStore --> AtomicOffset
DataStore --> SIMDCopy
KeyIndexer --> XXH3
EntryHandle --> Mmap
style DataStore fill:#f9f9f9,stroke:#333,stroke-width:3px
style Traits fill:#f9f9f9,stroke:#333,stroke-width:2px
System Architecture with Code Entities
Diagram: System architecture showing code entity mappings
Component Descriptions
| Component | Code Entity | Purpose |
|---|---|---|
| DataStore | DataStore struct in src/data_store.rs | Main storage interface implementing read/write operations |
| DataStoreReader | DataStoreReader trait | Defines zero-copy read operations (read, exists, batch_read) |
| DataStoreWriter | DataStoreWriter trait | Defines synchronized write operations (write, delete, batch_write) |
| KeyIndexer | KeyIndexer struct in src/key_indexer.rs | Hash-based index mapping u64 hashes to (tag, offset) tuples |
| EntryHandle | EntryHandle struct in simd-r-drive-entry-handle/src/lib.rs | Zero-copy reference to memory-mapped payload data |
| Memory Mapping | Arc<Mmap> wrapped in Mutex | Shared memory-mapped file reference for zero-copy reads |
| File Handle | Arc<RwLock<BufWriter<File>>> | Synchronized buffered writer for append operations |
| Tail Offset | AtomicU64 field tail_offset | Atomic counter tracking the current end-of-file position |
Sources: Cargo.toml:66-73 src/data_store.rs (inferred from architecture diagrams), High-level diagrams provided
graph LR
subgraph "Direct Access"
DirectApp["Rust Application"]
DirectDS["DataStore::open\nDataStoreReader\nDataStoreWriter"]
end
subgraph "CLI Access"
CLIApp["Command Line"]
CLIBin["simd-r-drive binary\nclap::Parser"]
end
subgraph "Remote Access"
PyClient["Python Client\nDataStoreWsClient class"]
RustClient["Rust Client\nDataStoreWsClient struct"]
WSServer["WebSocket Server\nmuxio-tokio-rpc-server\nAxum router"]
BackendDS["DataStore instance"]
end
DirectApp --> DirectDS
CLIApp --> CLIBin
CLIBin --> DirectDS
PyClient --> WSServer
RustClient --> WSServer
WSServer --> BackendDS
style DirectDS fill:#f9f9f9,stroke:#333,stroke-width:2px
style WSServer fill:#f9f9f9,stroke:#333,stroke-width:2px
style BackendDS fill:#f9f9f9,stroke:#333,stroke-width:2px
Access Methods
SIMD R Drive can be accessed through three primary interfaces, each optimized for different use cases.
Access Method Architecture
Diagram: Access methods with code entity mappings
Access Method Comparison
| Method | Use Case | Code Entry Point | Latency | Throughput |
|---|---|---|---|---|
| Direct Library | Embedded in Rust applications | DataStore::open() | Microseconds | Highest (zero-copy) |
| CLI | Command-line operations, scripting | simd-r-drive binary with clap | Milliseconds | Process-bound |
| WebSocket RPC | Remote access, language bindings | DataStoreWsClient (Rust/Python) | Network-dependent | RPC-serialization-bound |
Direct Library Access:
- Applications link against the
simd-r-drivecrate directly - Call
DataStore::open()to obtain a storage instance - Use
DataStoreReaderandDataStoreWritertraits for operations - Provides lowest latency and highest throughput
CLI Access:
- The
simd-r-drivebinary provides a command-line interface - Built using
clapfor argument parsing - Useful for scripting, testing, and manual operations
- Each invocation opens the storage, performs operation, and closes
Remote Access (WebSocket RPC):
simd-r-drive-ws-serverprovides network access via WebSocket- Uses the Muxio RPC framework with
bitcodeserialization DataStoreWsClientavailable for both Rust and Python clients- Enables multi-language access and distributed architectures
For CLI details, see Repository Structure. For WebSocket server architecture, see WebSocket Server. For Python client usage, see Python WebSocket Client API.
Sources: Cargo.toml:23-26 README.md9 README.md:262-266 High-level diagrams
Key Features
Zero-Copy Memory-Mapped Access
SIMD R Drive uses memmap2 to memory-map the storage file, allowing direct access to stored data without deserialization or copying. The EntryHandle struct provides a zero-copy view into the memory-mapped region, returning &[u8] slices that point directly into the mapped file.
This approach enables:
- Sub-microsecond reads for indexed lookups
- Minimal memory overhead for large entries
- Efficient processing of datasets larger than available RAM
The memory-mapped file is wrapped in Arc<Mutex<Arc<Mmap>>> to ensure thread-safe access during concurrent reads and remap operations.
Sources: README.md:43-49 simd-r-drive-entry-handle/ (inferred)
Fixed 64-Byte Payload Alignment
Every non-tombstone payload begins on a 64-byte boundary (defined by PAYLOAD_ALIGNMENT constant). This alignment matches typical CPU cache line sizes and enables:
- Cache-friendly access with reduced cache line splits
- Full-speed SIMD operations (AVX2, AVX-512, NEON) without misalignment penalties
- Zero-copy typed slices when payload length matches element size (e.g.,
&[u64])
Pre-padding bytes are inserted before payloads to maintain this alignment. Tombstones (deletion markers) do not require alignment.
Sources: README.md:51-59 simd-r-drive-entry-handle/src/constants.rs (inferred)
Single-File Storage Container
All data is stored in a single append-only file with the following characteristics:
| Aspect | Description |
|---|---|
| File Structure | Sequential entries: [pre-pad] [payload] [metadata] |
| Metadata Size | Fixed 20 bytes: key_hash (8) + prev_offset (8) + checksum (4) |
| Entry Chaining | Each metadata contains prev_offset pointing to previous entry's tail |
| Validation | CRC32C checksums and backward chain traversal |
| Recovery | Automatic truncation of incomplete writes on open |
The storage format is detailed in Entry Structure and Metadata.
Sources: README.md:62-147 README.md:104-150
Thread-Safe Concurrency
SIMD R Drive supports concurrent operations within a single process using:
| Mechanism | Code Entity | Purpose |
|---|---|---|
| Read Lock | RwLock (reads) | Allows multiple concurrent readers |
| Write Lock | RwLock (writes) | Ensures exclusive write access |
| Atomic Offset | AtomicU64 (tail_offset) | Tracks file end without locking |
| Index Lock | RwLock<HashMap> | Protects key index updates |
| Mmap Lock | Mutex<Arc<Mmap>> | Prevents concurrent remapping |
Concurrency Guarantees:
- ✅ Multiple threads can read concurrently (zero-copy, lock-free)
- ✅ Write operations are serialized via
RwLock - ✅ Index updates are synchronized
- ❌ Multiple processes require external file locking
For detailed concurrency model, see Concurrency and Thread Safety.
Sources: README.md:170-206
Hardware-Accelerated Indexing
The KeyIndexer uses the xxhash-rust crate with XXH3_64 algorithm, which provides hardware acceleration:
- SSE2 on x86_64 (universally supported)
- AVX2 on capable x86_64 CPUs (runtime detection)
- NEON on aarch64 (default)
Key lookups are O(1) via HashMap, with benchmarks showing ~1 million random 8-byte lookups completing in under 1 second.
Sources: README.md:158-168 Cargo.toml34
SIMD Write Acceleration
The simd_copy function (in src/simd_utils.rs) accelerates memory copying during write operations:
- x86_64 with AVX2 : 32-byte SIMD chunks using
_mm256_loadu_si256/_mm256_storeu_si256 - aarch64 : 16-byte NEON chunks using
vld1q_u8/vst1q_u8 - Fallback : Standard
copy_from_slicewhen SIMD unavailable
This optimization reduces CPU cycles during buffer staging before disk writes.
For SIMD implementation details, see SIMD Acceleration.
Sources: README.md:249-257 src/simd_utils.rs (inferred)
Write and Read Modes
Write Modes
| Mode | Method | Use Case | Flush Behavior |
|---|---|---|---|
| Single Entry | write(key, payload) | Individual writes | Immediate flush |
| Batch | batch_write(&[(key, payload)]) | Multiple entries | Single flush at end |
| Streaming | write_large_entry(key, Read) | Large payloads | Streaming with immediate flush |
Batch writes reduce disk I/O overhead by grouping multiple entries under a single write lock and flushing once.
Sources: README.md:208-223
Read Modes
| Mode | Method | Memory Behavior | Use Case |
|---|---|---|---|
| Direct | read(key) -> EntryHandle | Zero-copy mmap reference | Standard reads |
| Streaming | read_stream(key) -> impl Read | Buffered, non-zero-copy | Large entries |
| Parallel Iteration | par_iter_entries() (Rayon) | Parallel processing | Bulk analytics |
Direct reads return EntryHandle with zero-copy &[u8] access. Streaming reads process data incrementally through a buffer. Parallel iteration is available via the optional parallel feature.
For iteration details, see Parallel Iteration (via Rayon).
Sources: README.md:225-247
Repository Structure
The project is organized as a Cargo workspace with the following crates:
Diagram: Workspace structure with crate relationships
Crate Descriptions
| Crate | Path | Purpose |
|---|---|---|
simd-r-drive | ./ | Core storage engine with DataStore, KeyIndexer, SIMD utilities |
simd-r-drive-entry-handle | ./simd-r-drive-entry-handle/ | Zero-copy EntryHandle and metadata structures |
simd-r-drive-extensions | ./extensions/ | Utility functions and helper modules |
simd-r-drive-muxio-service-definition | ./experiments/simd-r-drive-muxio-service-definition/ | RPC service trait definitions using bitcode |
simd-r-drive-ws-server | ./experiments/simd-r-drive-ws-server/ | Axum-based WebSocket RPC server |
simd-r-drive-ws-client | ./experiments/simd-r-drive-ws-client/ | Native Rust WebSocket client |
simd-r-drive-py | ./experiments/bindings/python/ | PyO3-based Python bindings for direct access |
simd-r-drive-ws-client-py | ./experiments/bindings/python-ws-client/ | Python WebSocket client wrapper |
The workspace is defined in Cargo.toml:65-78 with version 0.15.5-alpha specified in Cargo.toml3
For detailed repository structure, see Repository Structure.
Sources: Cargo.toml:65-78 Cargo.toml:1-10 README.md:259-266
Performance Characteristics
SIMD R Drive is designed for high-performance workloads with the following characteristics:
Benchmark Context
| Metric | Typical Performance |
|---|---|
| Random Read (8-byte) | ~1M lookups in < 1 second |
| Sequential Write | Limited by disk I/O and flush frequency |
| Memory Overhead | Minimal (mmap-based, on-demand paging) |
| Index Lookup | O(1) via HashMap with XXH3_64 |
Optimization Strategies
- SIMD Copy Operations : The
simd_copyfunction uses AVX2/NEON for bulk memory transfers during writes - Hardware-Accelerated Hashing : XXH3_64 with SSE2/AVX2/NEON for fast key hashing
- Zero-Copy Reads : Memory-mapped access eliminates deserialization overhead
- Cache-Line Alignment : 64-byte boundaries reduce cache misses
- Batch Operations : Grouping writes reduces lock contention and flush overhead
For detailed performance optimization documentation, see Performance Optimizations.
Sources: README.md:158-168 README.md:249-257
Next Steps
This overview provides a foundation for understanding SIMD R Drive. For deeper exploration:
- Storage Internals : See Core Storage Engine for
DataStoreimplementation details - Data Format : See Entry Structure and Metadata for on-disk layout
- Network Access : See Network Layer and RPC for remote access architecture
- Python Usage : See Python Integration for PyO3 bindings and client APIs
- Performance Tuning : See Performance Optimizations for SIMD and alignment strategies
Sources: All sections above
Core Storage Engine
Relevant source files
- README.md
- src/lib.rs
- src/storage_engine.rs
- src/storage_engine/data_store.rs
- src/storage_engine/entry_iterator.rs
The core storage engine is the foundational layer of SIMD R Drive, providing append-only, memory-mapped binary storage with zero-copy access. This document covers the high-level architecture and key components that implement the storage functionality.
For network-based access to the storage engine, see Network Layer and RPC. For Python integration, see Python Integration. For performance optimizations including SIMD acceleration, see Performance Optimizations.
Purpose and Scope
This page introduces the core storage engine's architecture, primary components, and design principles. Detailed information about specific subsystems is covered in child pages:
Architecture Overview
The storage engine implements an append-only binary store with in-memory indexing and memory-mapped file access. The architecture consists of four primary structures that work together to provide concurrent, high-performance storage operations.
Sources: src/storage_engine/data_store.rs:26-33 src/storage_engine.rs:1-25
graph TB
subgraph "Public Traits"
Reader["DataStoreReader\nread, exists, batch_read"]
Writer["DataStoreWriter\nwrite, delete, batch_write"]
end
subgraph "DataStore Structure"
DS["DataStore"]
FileHandle["Arc<RwLock<BufWriter<File>>>\nfile\nSequential write operations"]
MmapHandle["Arc<Mutex<Arc<Mmap>>>\nmmap\nMemory-mapped read access"]
TailOffset["AtomicU64\ntail_offset\nCurrent end position"]
KeyIndexer["Arc<RwLock<KeyIndexer>>\nkey_indexer\nHash to offset mapping"]
PathBuf["PathBuf\npath\nFile system location"]
end
subgraph "Support Structures"
EH["EntryHandle\nZero-copy data view"]
EM["EntryMetadata\nkey_hash, prev_offset, checksum"]
EI["EntryIterator\nSequential traversal"]
ES["EntryStream\nStreaming reads"]
end
Reader -.implements.-> DS
Writer -.implements.-> DS
DS --> FileHandle
DS --> MmapHandle
DS --> TailOffset
DS --> KeyIndexer
DS --> PathBuf
DS --> EH
DS --> EI
EH --> EM
EH --> ES
Core Components
DataStore
The DataStore struct is the primary interface to the storage engine. It maintains four essential shared resources protected by different synchronization primitives:
| Field | Type | Purpose | Synchronization |
|---|---|---|---|
file | Arc<RwLock<BufWriter<File>>> | Write operations to disk | RwLock for exclusive writes |
mmap | Arc<Mutex<Arc<Mmap>>> | Memory-mapped view for reads | Mutex for atomic remapping |
tail_offset | AtomicU64 | Current end-of-file position | Atomic operations |
key_indexer | Arc<RwLock<KeyIndexer>> | Hash-based key lookup | RwLock for concurrent reads |
path | PathBuf | Storage file location | Immutable after creation |
Sources: src/storage_engine/data_store.rs:26-33
Traits
The storage engine exposes two primary traits that define its public API:
-
DataStoreReader: Provides zero-copy read operations includingread(),batch_read(),exists(), and iteration methods. Implemented at src/storage_engine/data_store.rs:1027-1182 -
DataStoreWriter: Provides write operations includingwrite(),batch_write(),delete(), and streaming writes. Implemented at src/storage_engine/data_store.rs:752-1025
Sources: src/lib.rs21 src/storage_engine.rs21
EntryHandle
EntryHandle provides a zero-copy view into stored data by maintaining a reference to the memory-mapped file and a byte range. It includes the associated EntryMetadata for accessing checksums and hash values without additional lookups.
Sources: src/storage_engine.rs24 README.md:43-49
KeyIndexer
KeyIndexer maintains an in-memory hash map from 64-bit XXH3 key hashes to packed values containing a 16-bit collision detection tag and a 48-bit file offset. This structure enables O(1) key lookups while detecting hash collisions.
Sources: src/storage_engine/data_store.rs31 README.md:158-169
Component Interaction Flow
The following diagram illustrates how the core components interact during typical read and write operations, showing the actual method calls and data flow between structures.
Sources: src/storage_engine/data_store.rs:752-825 src/storage_engine/data_store.rs:1040-1049 src/storage_engine/data_store.rs:224-259
sequenceDiagram
participant Client
participant DS as "DataStore"
participant FileHandle as "file: RwLock<BufWriter<File>>"
participant TailOffset as "tail_offset: AtomicU64"
participant MmapHandle as "mmap: Mutex<Arc<Mmap>>"
participant KeyIndexer as "key_indexer: RwLock<KeyIndexer>"
Note over Client,KeyIndexer: Write Operation
Client->>DS: write(key, payload)
DS->>DS: compute_hash(key)
DS->>TailOffset: load(Ordering::Acquire)
DS->>FileHandle: write().unwrap()
DS->>FileHandle: write_all(pre_pad + payload + metadata)
DS->>FileHandle: flush()
DS->>DS: reindex()
DS->>FileHandle: init_mmap()
DS->>MmapHandle: lock().unwrap()
DS->>MmapHandle: *guard = Arc::new(new_mmap)
DS->>KeyIndexer: write().unwrap()
DS->>KeyIndexer: insert(key_hash, offset)
DS->>TailOffset: store(new_offset, Ordering::Release)
DS-->>Client: Ok(offset)
Note over Client,KeyIndexer: Read Operation
Client->>DS: read(key)
DS->>DS: compute_hash(key)
DS->>KeyIndexer: read().unwrap()
DS->>KeyIndexer: get_packed(key_hash)
KeyIndexer-->>DS: Some(packed_value)
DS->>DS: KeyIndexer::unpack(packed)
DS->>MmapHandle: lock().unwrap()
DS->>MmapHandle: clone Arc
DS->>DS: read_entry_with_context()
DS->>DS: EntryMetadata::deserialize()
DS->>DS: create EntryHandle
DS-->>Client: Ok(Some(EntryHandle))
Storage Initialization and Recovery
The DataStore::open() method performs a multi-stage initialization process that ensures data integrity even after crashes or incomplete writes.
flowchart TD
Start["DataStore::open(path)"]
OpenFile["open_file_in_append_mode()\nOpenOptions::new()\n.read(true).write(true).create(true)"]
InitMmap["init_mmap()\nmemmap2::MmapOptions::new().map()"]
Recover["recover_valid_chain(mmap, file_len)\nBackward scan validating prev_offset chain"]
CheckTrunc{"final_len < file_len?"}
Truncate["Truncate file to final_len\nfile.set_len(final_len)\nReopen storage"]
BuildIndex["KeyIndexer::build(mmap, final_len)\nForward scan to populate index"]
CreateDS["Create DataStore instance\nwith Arc-wrapped fields"]
Start --> OpenFile
OpenFile --> InitMmap
InitMmap --> Recover
Recover --> CheckTrunc
CheckTrunc -->|Yes Corruption detected| Truncate
CheckTrunc -->|No File valid| BuildIndex
Truncate --> Start
BuildIndex --> CreateDS
The recovery process validates the storage file by:
- Scanning backward from the end of file following
prev_offsetlinks inEntryMetadata - Verifying chain integrity ensuring each offset points to a valid previous tail
- Truncating corruption if an invalid chain is detected
- Building the index by forward scanning the validated region
Sources: src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:383-482
flowchart TD
WriteCall["write(key, payload)"]
ComputeHash["compute_hash(key)\nXXH3_64 hash"]
AcquireWrite["file.write().unwrap()\nAcquire write lock"]
LoadTail["tail_offset.load(Ordering::Acquire)"]
CalcPad["prepad_len(prev_tail)\n(A - offset % A) & (A-1)"]
WritePad["write_all(&pad[..prepad])\nZero bytes for alignment"]
WritePayload["write_all(payload)\nActual data"]
WriteMetadata["write_all(metadata.serialize())\nkey_hash + prev_offset + checksum"]
Flush["flush()\nEnsure disk persistence"]
Reindex["reindex()\nUpdate mmap and index"]
UpdateTail["tail_offset.store(new_offset)\nOrdering::Release"]
WriteCall --> ComputeHash
ComputeHash --> AcquireWrite
AcquireWrite --> LoadTail
LoadTail --> CalcPad
CalcPad --> WritePad
WritePad --> WritePayload
WritePayload --> WriteMetadata
WriteMetadata --> Flush
Flush --> Reindex
Reindex --> UpdateTail
Write Path Architecture
Write operations follow a specific sequence to maintain consistency and enable zero-copy reads. All payloads are aligned to 64-byte boundaries using pre-padding.
The reindex() method performs two critical operations after each write:
- Memory map update : Creates a new
Mmapview of the file and atomically swaps it - Index update : Inserts new key-hash to offset mappings into
KeyIndexer
Sources: src/storage_engine/data_store.rs:827-834 src/storage_engine/data_store.rs:758-825 src/storage_engine/data_store.rs:224-259
Read Path Architecture
Read operations use the hash index for O(1) lookup and return EntryHandle instances that provide zero-copy access to the memory-mapped data.
flowchart TD
ReadCall["read(key)"]
ComputeHash["compute_hash(key)\nXXH3_64 hash"]
AcquireIndex["key_indexer.read().unwrap()\nAcquire read lock"]
GetMmap["get_mmap_arc()\nClone Arc<Mmap>"]
ReadContext["read_entry_with_context()"]
Lookup["key_indexer.get_packed(key_hash)"]
CheckFound{"Found?"}
Unpack["KeyIndexer::unpack(packed)\nExtract tag + offset"]
VerifyTag{"Tag matches?"}
ReadMetadata["EntryMetadata::deserialize()\nFrom mmap[offset..offset+20]"]
CalcRange["Derive entry range\nentry_start..entry_end"]
CheckTombstone{"Is tombstone?"}
CreateHandle["Create EntryHandle\nmmap_arc + range + metadata"]
ReturnNone["Return None"]
ReadCall --> ComputeHash
ComputeHash --> AcquireIndex
AcquireIndex --> GetMmap
GetMmap --> ReadContext
ReadContext --> Lookup
Lookup --> CheckFound
CheckFound -->|No| ReturnNone
CheckFound -->|Yes| Unpack
Unpack --> VerifyTag
VerifyTag -->|No Collision| ReturnNone
VerifyTag -->|Yes| ReadMetadata
ReadMetadata --> CalcRange
CalcRange --> CheckTombstone
CheckTombstone -->|Yes| ReturnNone
CheckTombstone -->|No| CreateHandle
The tag verification step at src/storage_engine/data_store.rs:513-521 prevents hash collision issues by comparing a 16-bit tag derived from the original key with the stored tag.
Sources: src/storage_engine/data_store.rs:1040-1049 src/storage_engine/data_store.rs:501-565
Batch Operations
Batch operations reduce lock acquisition overhead by processing multiple entries in a single synchronized operation.
flowchart TD
BatchWrite["batch_write(entries)"]
HashAll["compute_hash_batch(keys)\nParallel XXH3_64"]
AcquireWrite["file.write().unwrap()"]
InitBuffer["buffer = Vec::new()"]
LoopStart{"For each entry"}
CalcPad["prepad_len(tail_offset)"]
BuildEntry["Build entry:\npre_pad + payload + metadata"]
AppendBuffer["buffer.extend_from_slice(entry)"]
UpdateTail["tail_offset += entry.len()"]
LoopEnd["Next entry"]
WriteAll["file.write_all(&buffer)"]
Flush["file.flush()"]
Reindex["reindex(key_hash_offsets)"]
BatchWrite --> HashAll
HashAll --> AcquireWrite
AcquireWrite --> InitBuffer
InitBuffer --> LoopStart
LoopStart -->|More entries| CalcPad
CalcPad --> BuildEntry
BuildEntry --> AppendBuffer
AppendBuffer --> UpdateTail
UpdateTail --> LoopEnd
LoopEnd --> LoopStart
LoopStart -->|Done| WriteAll
WriteAll --> Flush
Flush --> Reindex
Batch Write
The batch_write() method accumulates all entries in an in-memory buffer before acquiring the write lock once:
Sources: src/storage_engine/data_store.rs:838-843 src/storage_engine/data_store.rs:847-939
Batch Read
The batch_read() method acquires the index lock once and performs all lookups before releasing it:
Sources: src/storage_engine/data_store.rs:1105-1109 src/storage_engine/data_store.rs:1111-1158
Iteration Support
The storage engine provides multiple iteration mechanisms for different use cases:
| Iterator | Type | Synchronization | Use Case |
|---|---|---|---|
EntryIterator | Sequential | Single lock acquisition | Forward/backward iteration |
par_iter_entries() | Parallel (Rayon) | Lock-free after setup | High-throughput scanning |
IntoIterator | Consuming | Single lock acquisition | Owned iteration |
The EntryIterator scans backward through the file using the prev_offset chain, tracking seen keys in a HashSet to return only the latest version of each key.
Sources: src/storage_engine/entry_iterator.rs:8-128 src/storage_engine/data_store.rs:276-280 src/storage_engine/data_store.rs:296-361
Key Design Principles
Append-Only Model
All write operations append data to the end of the file. Updates and deletes create new entries rather than modifying existing data. This design ensures:
- No seeking during writes : Sequential disk I/O for maximum throughput
- Crash safety : Incomplete writes are detected and truncated during recovery
- Zero-copy reads : Memory-mapped regions remain valid as data is never modified
Sources: README.md:98-103
Fixed Alignment
Every non-tombstone payload starts on a 64-byte boundary (configurable via PAYLOAD_ALIGNMENT). This alignment enables:
- Cache-line efficiency : Payloads align with CPU cache lines
- SIMD operations : Full-speed vectorized reads without crossing boundaries
- Zero-copy typed views : Safe reinterpretation as typed slices
Sources: README.md:51-59 src/storage_engine/data_store.rs:670-673
Metadata-After-Payload
Each entry stores its metadata immediately after the payload, forming a backward-linked chain. This design:
- Minimizes seeking : Metadata and payload are adjacent on disk
- Enables recovery : The chain can be validated backward from any point
- Supports nesting : Storage containers can be stored within other containers
Sources: README.md:139-148
Lock-Free Reads
Read operations do not acquire write locks and perform zero-copy access through memory-mapped views. The AtomicU64 tail offset ensures readers see a consistent view without blocking writes.
Sources: README.md:170-183 src/storage_engine/data_store.rs:1040-1049
Repository Structure
Relevant source files
Purpose and Scope
This document provides a comprehensive overview of the SIMD R Drive repository structure, including its Cargo workspace organization, crate hierarchy, and inter-crate dependencies. It covers the core storage engine crates, experimental network components, and build configuration. For details about the storage architecture and API, see Core Storage Engine. For network communication details, see Network Layer and RPC. For Python integration specifics, see Python Integration.
Workspace Organization
The SIMD R Drive repository is organized as a Cargo workspace containing six published crates and two excluded Python binding crates. The workspace uses Cargo resolver version 2 and maintains unified versioning (0.15.5-alpha) across all workspace members.
Workspace Configuration:
| Configuration | Value |
|---|---|
| Resolver | Version 2 |
| Unified Version | 0.15.5-alpha |
| Rust Edition | 2024 |
| License | Apache-2.0 |
| Repository | https://github.com/jzombie/rust-simd-r-drive |
The workspace is defined in Cargo.toml:65-78 with members spanning core functionality, utilities, and experimental features.
Workspace Structure Diagram:
graph TB
subgraph "Core Crates"
Core["simd-r-drive\n(Root Package)\nsrc/"]
EntryHandle["simd-r-drive-entry-handle\nsimd-r-drive-entry-handle/"]
Extensions["simd-r-drive-extensions\nextensions/"]
end
subgraph "Experimental Network Crates"
MuxioDef["simd-r-drive-muxio-service-definition\nexperiments/simd-r-drive-muxio-service-definition/"]
WSClient["simd-r-drive-ws-client\nexperiments/simd-r-drive-ws-client/"]
WSServer["simd-r-drive-ws-server\nexperiments/simd-r-drive-ws-server/"]
end
subgraph "Excluded Python Bindings"
PyBinding["Python Direct Binding\nexperiments/bindings/python/\n(Excluded from workspace)"]
PyWSClient["Python WebSocket Client\nexperiments/bindings/python-ws-client/\n(Excluded from workspace)"]
end
Core --> EntryHandle
Extensions --> Core
WSServer --> Core
WSServer --> MuxioDef
WSClient --> Core
WSClient --> MuxioDef
PyBinding -.-> Core
PyWSClient -.-> WSClient
style Core fill:#f9f9f9,stroke:#333,stroke-width:3px
style EntryHandle fill:#f9f9f9,stroke:#333,stroke-width:2px
style Extensions fill:#f9f9f9,stroke:#333,stroke-width:2px
Sources: Cargo.toml:65-78
Core Crates
simd-r-drive (Root Package)
The main storage engine crate providing append-only, schema-less binary storage with SIMD optimizations. Located at the repository root with source code in src/.
Key Components:
DataStore- Main storage interface implementingDataStoreReaderandDataStoreWritertraitsKeyIndexer- Hash-based index for O(1) key lookups using XXH3- Storage format implementation with 64-byte payload alignment
- SIMD-optimized memory operations (
simd_copy) - Recovery and validation logic
Features:
default- Standard functionalityexpose-internal-api- Exposes internal APIs for testingparallel- Enables Rayon-based parallel iterationarrow- Proxy feature enabling Apache Arrow support in entry-handle
CLI Binary: The crate includes a CLI application for direct storage operations without network overhead.
Sources: Cargo.toml:11-56 Cargo.lock:1795-1820
simd-r-drive-entry-handle
Zero-copy data access abstraction located in simd-r-drive-entry-handle/. Provides safe wrappers around memory-mapped file regions without copying payload data.
Key Components:
EntryHandle- Zero-copy view into memory-mapped payload dataEntryMetadata- Fixed 20-byte metadata structure- Arrow integration (optional) for columnar data access
- CRC32C checksum validation
Dependencies:
memmap2- Memory-mapped file supportcrc32fast- Checksum validationarrow(optional) - Apache Arrow integration
The crate is dependency-minimal to enable use as a standalone zero-copy accessor.
Sources: Cargo.toml83 Cargo.lock:1823-1829
simd-r-drive-extensions
Utility functions and helper modules located in extensions/. Provides commonly-needed functionality for working with the storage engine.
Key Components:
align_or_copy- Utility for handling alignment requirements- Serialization helpers (bincode integration)
- Testing utilities
- File system utilities (walkdir integration)
Dependencies:
simd-r-drive- Core storage enginebincode- Serialization supportserde- Serialization frameworktempfile- Temporary file handlingwalkdir- Directory traversal
Sources: Cargo.toml69 Cargo.lock:1832-1841
Experimental Network Crates
These crates enable remote access to the storage engine via WebSocket-based RPC. Located in experiments/.
simd-r-drive-muxio-service-definition
RPC service contract definition using the Muxio framework. Located in experiments/simd-r-drive-muxio-service-definition/.
Purpose:
- Defines RPC service interface shared between client and server
- Uses
bitcodefor efficient binary serialization - Provides type-safe method definitions
Dependencies:
bitcode- Binary serializationmuxio-rpc-service- RPC service framework
Sources: Cargo.toml72 Cargo.lock:1844-1849
simd-r-drive-ws-client
Native Rust WebSocket client for remote storage access. Located in experiments/simd-r-drive-ws-client/.
Key Components:
BaseDataStoreWsClient- WebSocket client implementation- RPC method callers
- Connection management
- Async/await integration with Tokio
Dependencies:
simd-r-drive- Storage engine traitssimd-r-drive-muxio-service-definition- Shared RPC contractmuxio-tokio-rpc-client- WebSocket RPC client runtimemuxio-rpc-service-caller- RPC method invocationtokio- Async runtime
Sources: Cargo.toml71 Cargo.lock:1852-1863
simd-r-drive-ws-server
WebSocket server exposing the storage engine over RPC. Located in experiments/simd-r-drive-ws-server/.
Key Components:
- Axum-based HTTP/WebSocket server
- RPC endpoint routing
- DataStore backend integration
- Configuration via CLI arguments
Dependencies:
simd-r-drive- Storage engine backendsimd-r-drive-muxio-service-definition- Shared RPC contractmuxio-tokio-rpc-server- WebSocket RPC server runtimeclap- CLI argument parsingtokio- Async runtime
Sources: Cargo.toml70 Cargo.lock:1866-1878
Python Bindings (Excluded)
Two Python binding crates exist but are excluded from the main workspace due to their PyO3 and Maturin build requirements.
experiments/bindings/python
Direct Python bindings to the core storage engine using PyO3. Excluded at Cargo.toml75
Purpose:
- FFI bridge between Python and Rust
- Direct local access to DataStore from Python
- PyO3-based class and method exports
experiments/bindings/python-ws-client
Python WebSocket client wrapping the Rust client. Excluded at Cargo.toml76
Purpose:
- Remote storage access from Python
- PyO3 wrapper around
simd-r-drive-ws-client - Async/await bridge between Python and Tokio
For detailed documentation on these bindings, see Python Integration.
Sources: Cargo.toml:74-77
Dependency Graph
Inter-Crate Dependencies:
Sources: Cargo.lock:1795-1878 Cargo.toml:80-90
External Dependencies
Workspace-Level Shared Dependencies
The workspace defines common dependencies to ensure version consistency across all crates:
Storage & Performance:
| Dependency | Version | Purpose |
|---|---|---|
memmap2 | 0.9.5 | Memory-mapped file I/O |
xxhash-rust | 0.8.15 (xxh3) | Fast hashing with hardware acceleration |
crc32fast | 1.4.2 | CRC32C checksums |
arrow | 57.0.0 | Apache Arrow integration (optional) |
Async & Network:
| Dependency | Version | Purpose |
|---|---|---|
tokio | 1.45.1 | Async runtime (experiments only) |
muxio-rpc-service | 0.9.0-alpha | RPC framework |
muxio-tokio-rpc-client | 0.9.0-alpha | RPC client runtime |
muxio-tokio-rpc-server | 0.9.0-alpha | RPC server runtime |
Serialization:
| Dependency | Version | Purpose |
|---|---|---|
bitcode | 0.6.6 | Binary serialization for RPC |
bincode | 1.3.3 | Legacy serialization (extensions) |
serde | 1.0.219 | Serialization framework |
CLI & Utilities:
| Dependency | Version | Purpose |
|---|---|---|
clap | 4.5.40 | CLI argument parsing |
tracing | 0.1.41 | Structured logging |
tracing-subscriber | 0.3.20 | Log output configuration |
Sources: Cargo.toml:80-112
Build Configuration
Workspace Settings
All crates share common metadata defined at the workspace level:
Sources: Cargo.toml:1-9
Benchmark Configuration
The root crate includes two benchmark harnesses:
storage_benchmark:
- Harness: Criterion
- Purpose: Measure write/read throughput and latency
- Path:
benches/storage_benchmark.rs
contention_benchmark:
- Harness: Criterion
- Purpose: Measure concurrent access performance
- Path:
benches/contention_benchmark.rs
Sources: Cargo.toml:57-63
Directory Layout
rust-simd-r-drive/
├── Cargo.toml # Workspace root
├── Cargo.lock # Dependency lock file
├── src/ # simd-r-drive core source
│ ├── lib.rs
│ ├── data_store.rs
│ ├── key_indexer.rs
│ └── ...
├── simd-r-drive-entry-handle/ # Zero-copy handle crate
│ ├── Cargo.toml
│ └── src/
├── extensions/ # Utilities crate
│ ├── Cargo.toml
│ └── src/
└── experiments/
├── simd-r-drive-ws-server/ # WebSocket server
│ ├── Cargo.toml
│ └── src/
├── simd-r-drive-ws-client/ # Rust WebSocket client
│ ├── Cargo.toml
│ └── src/
├── simd-r-drive-muxio-service-definition/ # RPC contract
│ ├── Cargo.toml
│ └── src/
└── bindings/
├── python/ # PyO3 direct binding (excluded)
│ ├── Cargo.toml
│ ├── pyproject.toml
│ └── src/
└── python-ws-client/ # PyO3 WS client (excluded)
├── Cargo.toml
├── pyproject.toml
└── src/
Sources: Cargo.toml:65-78
Publishing Strategy
All workspace crates (except Python bindings) are configured for publishing to crates.io with publish = true. The Python bindings use Maturin for building Python wheels and are distributed separately via PyPI.
Crates.io Packages:
simd-r-drive- Core storage enginesimd-r-drive-entry-handle- Zero-copy handlesimd-r-drive-extensions- Utilitiessimd-r-drive-ws-server- WebSocket serversimd-r-drive-ws-client- Rust WebSocket clientsimd-r-drive-muxio-service-definition- RPC contract
PyPI Packages:
simd-r-drive(Python binding via Maturin)simd-r-drive-ws-client-py(Python WebSocket client via Maturin)
For Python package build instructions, see Building Python Bindings.
Sources: Cargo.toml:1-9 Cargo.toml21
Storage Architecture
Relevant source files
- README.md
- src/lib.rs
- src/storage_engine.rs
- src/storage_engine/data_store.rs
- src/storage_engine/entry_iterator.rs
Purpose and Scope
This document describes the core storage architecture of SIMD R Drive, covering the append-only design principles, single-file storage model, on-disk layout, validation chain structure, and recovery mechanisms. For details about the programmatic interface, see DataStore API. For specifics on entry structure and metadata format, see Entry Structure and Metadata. For memory mapping and zero-copy access patterns, see Memory Management and Zero-Copy Access.
Append-Only Design
SIMD R Drive implements a strict append-only storage model where data is written sequentially to the end of a single file. Entries are never modified or deleted in place; instead, updates and deletions are represented by appending new entries.
Key Characteristics
| Characteristic | Implementation | Benefit |
|---|---|---|
| Sequential Writes | All writes append to file end | Minimizes disk seeks, maximizes throughput |
| Immutable Entries | Existing entries never modified | Enables concurrent reads without locks |
| Version Chain | Multiple versions stored for same key | Supports time-travel and recovery |
| Latest-Wins Semantics | Index points to most recent version | Ensures consistency in queries |
The append-only model is enforced through the tail_offset field src/storage_engine/data_store.rs30 which tracks the current end of valid data. Write operations acquire an exclusive lock, append data, and atomically update the tail offset src/storage_engine/data_store.rs:816-824
Write Flow
Sources: src/storage_engine/data_store.rs:752-825 src/storage_engine/data_store.rs:827-843
Single-File Storage Container
All data resides in a single binary file managed by the DataStore struct src/storage_engine/data_store.rs:27-33 This design simplifies deployment, backup, and replication while enabling efficient memory-mapped access.
graph TB
subgraph "DataStore Structure"
FileHandle["file:\nArc<RwLock<BufWriter<File>>>"]
MmapHandle["mmap:\nArc<Mutex<Arc<Mmap>>>"]
TailOffset["tail_offset:\nAtomicU64"]
Indexer["key_indexer:\nArc<RwLock<KeyIndexer>>"]
PathBuf["path:\nPathBuf"]
end
subgraph "Write Operations"
WriteOp["Write/Delete Operations"]
end
subgraph "Read Operations"
ReadOp["Read/Exists Operations"]
end
subgraph "Physical Storage"
DiskFile["Storage File on Disk"]
end
WriteOp --> FileHandle
FileHandle --> DiskFile
ReadOp --> MmapHandle
ReadOp --> Indexer
MmapHandle --> DiskFile
WriteOp -.->|updates after flush| TailOffset
WriteOp -.->|remaps after flush| MmapHandle
WriteOp -.->|updates after flush| Indexer
TailOffset -.->|tracks valid data end| DiskFile
File Handles and Memory Mapping
Sources: src/storage_engine/data_store.rs:27-33 src/storage_engine/data_store.rs:84-117
File Opening and Initialization
The DataStore::open function src/storage_engine/data_store.rs:84-117 performs the following sequence:
- Open File : Opens the file in read/write mode, creating it if necessary src/storage_engine/data_store.rs:161-170
- Initial Mapping : Creates an initial memory-mapped view src/storage_engine/data_store.rs:172-174
- Chain Recovery : Validates the storage chain and determines the valid data extent src/storage_engine/data_store.rs:383-482
- Truncation (if needed) : Truncates corrupted data if recovery finds inconsistencies src/storage_engine/data_store.rs:91-104
- Index Building : Constructs the in-memory hash index by scanning valid entries src/storage_engine/data_store.rs108
Sources: src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:141-144
Storage Layout
Entries are stored with 64-byte alignment to optimize cache efficiency and SIMD operations. The alignment constant PAYLOAD_ALIGNMENT is defined as 64 simd-r-drive-entry-handle/src/constants.rs
Non-Tombstone Entry Layout
The pre-padding calculation ensures the payload starts on a 64-byte boundary src/storage_engine/data_store.rs:669-673:
pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)
Where prev_tail is the absolute file offset of the previous entry's metadata end.
graph LR
subgraph "Tombstone Entry"
NullByte["NULL Byte\n(1 byte)\n0x00"]
KeyHash["Key Hash\n(8 bytes)\nXXH3_64"]
PrevOff["Prev Offset\n(8 bytes)\nPrevious Tail"]
Checksum["Checksum\n(4 bytes)\nCRC32C"]
end
NullByte --> KeyHash
KeyHash --> PrevOff
PrevOff --> Checksum
Tombstone Entry Layout
Tombstones represent deleted keys and use a minimal layout without pre-padding:
The single NULL_BYTE src/storage_engine.rs136 serves as both the tombstone marker and payload. This design minimizes space overhead for deletions src/storage_engine/data_store.rs:864-897
Layout Summary Table
| Component | Non-Tombstone Size | Tombstone Size | Purpose |
|---|---|---|---|
| Pre-Pad | 0-63 bytes | 0 bytes | Align payload to 64-byte boundary |
| Payload | Variable | 1 byte (0x00) | User data or deletion marker |
| Key Hash | 8 bytes | 8 bytes | XXH3_64 hash for index lookup |
| Prev Offset | 8 bytes | 8 bytes | Points to previous tail (chain link) |
| Checksum | 4 bytes | 4 bytes | CRC32C of payload for integrity |
Sources: README.md:112-137 src/storage_engine/data_store.rs:669-673 src/storage_engine/data_store.rs:864-897
Validation Chain
Each entry's metadata contains a prev_offset field that points to the absolute file offset of the previous entry's metadata end (the "previous tail"). This creates a backward-linked chain from the end of the file to offset 0.
graph RL
EOF["End of File\n(tail_offset)"]
Meta3["Entry 3 Metadata\nprev_offset = T2"]
Entry3["Entry 3 Payload\n(aligned)"]
Meta2["Entry 2 Metadata\nprev_offset = T1"]
Entry2["Entry 2 Payload\n(aligned)"]
Meta1["Entry 1 Metadata\nprev_offset = 0"]
Entry1["Entry 1 Payload\n(aligned)"]
Zero["Offset 0\n(File Start)"]
EOF --> Meta3
Meta3 -.->|prev_offset| Meta2
Meta3 --> Entry3
Meta2 -.->|prev_offset| Meta1
Meta2 --> Entry2
Meta1 -.->|prev_offset| Zero
Meta1 --> Entry1
Entry1 --> Zero
Chain Structure
Sources: README.md:139-148 src/storage_engine/data_store.rs:383-482
Chain Validation Properties
The validation chain ensures data integrity through several properties:
- Completeness : A valid chain must trace back to offset 0 without gaps src/storage_engine/data_store.rs473
- Forward Progress : Each
prev_offsetmust be less than the current metadata offset src/storage_engine/data_store.rs:465-468 - Bounds Checking : All offsets must be within valid file boundaries src/storage_engine/data_store.rs:435-438
- Size Consistency : Total chain size must not exceed file length src/storage_engine/data_store.rs473
These properties are enforced during the recover_valid_chain function src/storage_engine/data_store.rs:383-482 which validates the chain structure on every file open.
Sources: src/storage_engine/data_store.rs:383-482
Recovery Mechanisms
SIMD R Drive implements automatic recovery to handle incomplete writes from crashes, power loss, or other interruptions. The recovery process occurs transparently during DataStore::open.
graph TB
Start["Open Storage File"]
CheckSize{{"File Size ≥\nMETADATA_SIZE?"}}
EmptyFile["Return offset 0\n(Empty File)"]
InitCursor["cursor = file_length"]
ReadMeta["Read metadata at\ncursor - METADATA_SIZE"]
ExtractPrev["Extract prev_offset"]
CalcBounds["Calculate entry bounds\nusing prepad_len"]
ValidateBounds{{"Entry bounds valid?"}}
DecrementCursor["cursor -= 1"]
WalkBackward["Walk chain backward\nusing prev_offset"]
CheckChain{{"Chain reaches\noffset 0?"}}
CheckSize2{{"Total size ≤\nfile_length?"}}
FoundValid["Store valid offset\nbest_valid_offset = cursor"]
MoreToCheck{{"cursor ≥\nMETADATA_SIZE?"}}
ReturnBest["Return best_valid_offset"]
Truncate["Truncate file to\nbest_valid_offset"]
Reopen["Recursively reopen\nDataStore::open"]
BuildIndex["Build KeyIndexer\nfrom valid chain"]
Complete["Recovery Complete"]
Start --> CheckSize
CheckSize -->|No| EmptyFile
CheckSize -->|Yes| InitCursor
EmptyFile --> Complete
InitCursor --> ReadMeta
ReadMeta --> ExtractPrev
ExtractPrev --> CalcBounds
CalcBounds --> ValidateBounds
ValidateBounds -->|No| DecrementCursor
ValidateBounds -->|Yes| WalkBackward
WalkBackward --> CheckChain
CheckChain -->|No| DecrementCursor
CheckChain -->|Yes| CheckSize2
CheckSize2 -->|No| DecrementCursor
CheckSize2 -->|Yes| FoundValid
FoundValid --> ReturnBest
DecrementCursor --> MoreToCheck
MoreToCheck -->|Yes| ReadMeta
MoreToCheck -->|No| ReturnBest
ReturnBest --> Truncate
Truncate --> Reopen
Reopen --> BuildIndex
BuildIndex --> Complete
Recovery Algorithm
Sources: src/storage_engine/data_store.rs:383-482 src/storage_engine/data_store.rs:91-104
Recovery Steps Detailed
Step 1: Backward Scan
Starting from the end of file, the recovery algorithm scans backward looking for valid metadata entries. For each potential metadata location src/storage_engine/data_store.rs:390-394:
- Reads 20 bytes of metadata
- Deserializes into
EntryMetadatastructure - Extracts
prev_offset(previous tail) - Calculates entry boundaries using alignment rules
Step 2: Chain Validation
For each candidate entry, the algorithm walks the entire chain backward src/storage_engine/data_store.rs:424-471:
back_cursor = prev_tail
while back_cursor != 0:
- Read previous entry metadata
- Validate bounds and size
- Check forward progress (prev_prev_tail < prev_metadata_offset)
- Accumulate total chain size
- Update back_cursor to previous prev_offset
A chain is valid only if it reaches offset 0 and the total size doesn't exceed file length src/storage_engine/data_store.rs473
Step 3: Truncation and Retry
If corruption is detected (file length > valid chain extent), the function:
- Logs a warning src/storage_engine/data_store.rs:92-97
- Drops existing file handles and memory map src/storage_engine/data_store.rs:98-99
- Reopens file with write access src/storage_engine/data_store.rs100
- Truncates to valid offset src/storage_engine/data_store.rs101
- Syncs to disk src/storage_engine/data_store.rs102
- Recursively calls
DataStore::opensrc/storage_engine/data_store.rs103
Step 4: Index Rebuilding
After establishing the valid data extent, KeyIndexer::build scans forward through the file src/storage_engine/data_store.rs108 populating the hash index with key → offset mappings. This forward scan uses the iterator pattern src/storage_engine/entry_iterator.rs:56-127 to process each entry sequentially.
Sources: src/storage_engine/data_store.rs:383-482 src/storage_engine/data_store.rs:91-104 src/storage_engine/entry_iterator.rs:21-54
graph TB
subgraph "Storage Architecture (2.1)"
AppendOnly["Append-Only Design"]
SingleFile["Single-File Container"]
Layout["Storage Layout\n(64-byte alignment)"]
Chain["Validation Chain"]
Recovery["Recovery Mechanism"]
end
subgraph "DataStore API (2.2)"
Write["write/batch_write"]
Read["read/batch_read"]
Delete["delete/batch_delete"]
end
subgraph "Entry Structure (2.3)"
EntryMetadata["EntryMetadata\n(key_hash, prev_offset, checksum)"]
PrePad["Pre-Padding Logic"]
Tombstone["Tombstone Format"]
end
subgraph "Memory Management (2.4)"
Mmap["Memory-Mapped File"]
EntryHandle["EntryHandle\n(Zero-Copy View)"]
Alignment["PAYLOAD_ALIGNMENT"]
end
subgraph "Key Indexer (2.6)"
KeyIndexer["KeyIndexer"]
HashLookup["Hash → Offset Mapping"]
end
AppendOnly -.->|enforces| Write
SingleFile -.->|provides| Mmap
Layout -.->|defines| PrePad
Layout -.->|uses| Alignment
Chain -.->|uses| EntryMetadata
Recovery -.->|validates| Chain
Recovery -.->|rebuilds| KeyIndexer
Write -.->|appends| Layout
Read -.->|uses| HashLookup
Read -.->|returns| EntryHandle
Delete -.->|writes| Tombstone
Architecture Integration
The storage architecture integrates with other subsystems through well-defined boundaries:
Component Interactions
Sources: src/storage_engine/data_store.rs:1-1183 src/storage_engine.rs:1-25
Key Design Decisions
| Decision | Rationale | Trade-off |
|---|---|---|
| 64-byte alignment | Matches CPU cache lines, enables SIMD | Wastes 0-63 bytes per entry |
| Backward-linked chain | Efficient recovery from EOF | Cannot validate forward |
| Single NULL byte tombstones | Minimal deletion overhead | Special case in code |
| Metadata-last layout | Sequential write pattern | Recovery scans backward |
| Atomic tail_offset | Lock-free progress tracking | Additional synchronization primitive |
Sources: README.md:51-59 README.md:104-148 src/storage_engine/data_store.rs:669-673
Performance Characteristics
The storage architecture exhibits the following performance properties:
Write Performance
- Sequential I/O : All writes append to file end, maximizing disk throughput
- Batching : Multiple entries written in single lock acquisition src/storage_engine/data_store.rs:847-939
- SIMD Acceleration : Payload copying uses platform-specific SIMD instructions src/storage_engine/data_store.rs925
- Buffer Flushing :
BufWriterreduces system call overhead src/storage_engine/data_store.rs28
Read Performance
- Zero-Copy : Reads directly reference memory-mapped regions src/storage_engine/data_store.rs:1040-1049
- Index Lookup : O(1) hash table lookup to locate entries src/storage_engine/data_store.rs:1042-1046
- Concurrent Reads : Multiple readers access mmap simultaneously without locks
- Cache Efficiency : 64-byte alignment ensures single cache line access for aligned data
Recovery Performance
- Incremental Validation : Stops at first valid chain found src/storage_engine/data_store.rs:474-476
- Backward Scan : Starts from likely-valid tail, avoiding full file scan
- Indexed Rebuild : Forward scan populates index in single pass src/storage_engine/entry_iterator.rs:56-127
Sources: src/storage_engine/data_store.rs:752-939 src/storage_engine/data_store.rs:1027-1183 src/storage_engine/data_store.rs:383-482
Limitations and Considerations
Append-Only Growth
The file grows continuously with updates and deletions. Compaction is available but requires exclusive access src/storage_engine/data_store.rs:706-749
Alignment Overhead
Each non-tombstone entry may waste up to 63 bytes for alignment padding. For small payloads, this overhead can be significant relative to payload size.
Recovery Cost
Recovery time increases with file size, as chain validation requires traversing all entries. For very large files (exceeding RAM), this may cause extended startup times.
Single-Process Design
While thread-safe within a process, multiple processes cannot safely share the same file without external locking mechanisms README.md:185-206
Sources: README.md:51-59 README.md:185-206 src/storage_engine/data_store.rs:669-673 src/storage_engine/data_store.rs:706-749
DataStore API
Relevant source files
- README.md
- src/lib.rs
- src/storage_engine.rs
- src/storage_engine/data_store.rs
- src/storage_engine/entry_iterator.rs
This document describes the public API of the DataStore struct, which provides the primary interface for interacting with the SIMD R Drive storage engine. It covers the methods for opening storage files, reading and writing data, batch operations, streaming, iteration, and maintenance operations.
For details on the underlying storage architecture and design principles, see Storage Architecture. For information about entry structure and metadata format, see Entry Structure and Metadata. For details on concurrency mechanisms, see Concurrency and Thread Safety.
API Structure
The DataStore API is organized around two primary traits that separate read and write capabilities, along with additional utility methods provided directly on the DataStore struct.
Sources: src/storage_engine/data_store.rs:26-750 src/storage_engine/traits.rs src/storage_engine.rs:1-24
graph TB
subgraph "Trait Definitions"
DSReader["DataStoreReader\ntrait"]
DSWriter["DataStoreWriter\ntrait"]
end
subgraph "Core Implementation"
DS["DataStore\nstruct"]
end
subgraph "Associated Types"
EH["EntryHandle"]
EM["EntryMetadata"]
EI["EntryIterator"]
ES["EntryStream"]
end
DSReader -->|implemented by| DS
DSWriter -->|implemented by| DS
DS -->|returns| EH
DS -->|returns| EM
DS -->|provides| EI
EH -->|converts to| ES
DSReader -->|defines read ops| ReadMethods["read()\nbatch_read()\nexists()\nread_metadata()\nread_last_entry()"]
DSWriter -->|defines write ops| WriteMethods["write()\nbatch_write()\nwrite_stream()\ndelete()\nrename()\ncopy()"]
DS -->|direct methods| DirectMethods["open()\niter_entries()\ncompact()\nget_path()"]
Core Types
| Type | Purpose | Defined In |
|---|---|---|
DataStore | Main storage engine implementation | src/storage_engine/data_store.rs:27-33 |
DataStoreReader | Trait defining read operations | Trait definition |
DataStoreWriter | Trait defining write operations | Trait definition |
EntryHandle | Zero-copy reference to stored data | simd_r_drive_entry_handle crate |
EntryMetadata | Entry metadata structure (key hash, prev offset, checksum) | simd_r_drive_entry_handle crate |
EntryIterator | Sequential iterator over entries | src/storage_engine/entry_iterator.rs:21-127 |
EntryStream | Streaming reader for entry data | src/storage_engine/entry_stream.rs |
Sources: src/storage_engine/data_store.rs:27-33 src/storage_engine/entry_iterator.rs:8-25 src/storage_engine.rs:1-24
Opening Storage Files
The DataStore provides two methods for opening storage files, both returning Result<DataStore>.
Sources: src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:141-144 src/storage_engine/data_store.rs:161-170
graph LR
subgraph "Open Methods"
Open["open(path)"]
OpenExisting["open_existing(path)"]
end
subgraph "Process Flow"
OpenFile["open_file_in_append_mode()"]
InitMmap["init_mmap()"]
RecoverChain["recover_valid_chain()"]
BuildIndex["KeyIndexer::build()"]
end
subgraph "Result"
DSInstance["DataStore instance"]
end
Open -->|creates if missing| OpenFile
OpenExisting -->|requires existing file| VerifyFile["verify_file_existence()"]
VerifyFile --> OpenFile
OpenFile --> InitMmap
InitMmap --> RecoverChain
RecoverChain -->|may truncate| BuildIndex
BuildIndex --> DSInstance
open
Opens or creates a storage file at the specified path. This method performs the following operations:
- Opens the file in read/write mode (creates if necessary)
- Memory-maps the file using
mmap - Runs
recover_valid_chain()to validate data integrity - Truncates the file if corruption is detected
- Builds the in-memory key index using
KeyIndexer::build()
Parameters:
path: Path to the storage file
Returns:
Ok(DataStore): Successfully opened storage instanceErr(std::io::Error): File operation failure
Sources: src/storage_engine/data_store.rs:84-117
open_existing
Opens an existing storage file. Unlike open(), this method returns an error if the file does not exist. It calls verify_file_existence() before delegating to open().
Parameters:
path: Path to the existing storage file
Returns:
Ok(DataStore): Successfully opened storage instanceErr(std::io::Error): File does not exist or cannot be opened
Sources: src/storage_engine/data_store.rs:141-144
Type Conversion Constructors
Allows creating a DataStore directly from a PathBuf:
This conversion calls DataStore::open() internally and panics if the file cannot be opened.
Sources: src/storage_engine/data_store.rs:53-64
Read Operations
Read operations are defined by the DataStoreReader trait and return EntryHandle instances that provide zero-copy access to stored data.
Sources: src/storage_engine/data_store.rs:1027-1182
graph TB
subgraph "Read API Methods"
Read["read(key)"]
ReadHash["read_with_key_hash(hash)"]
BatchRead["batch_read(keys)"]
BatchReadHash["batch_read_hashed_keys(hashes)"]
ReadLast["read_last_entry()"]
ReadMeta["read_metadata(key)"]
Exists["exists(key)"]
ExistsHash["exists_with_key_hash(hash)"]
end
subgraph "Internal Flow"
ComputeHash["compute_hash()"]
GetIndexLock["key_indexer.read()"]
GetMmap["get_mmap_arc()"]
ReadContext["read_entry_with_context()"]
end
subgraph "Return Types"
OptHandle["Option<EntryHandle>"]
VecOptHandle["Vec<Option<EntryHandle>>"]
OptMeta["Option<EntryMetadata>"]
Bool["bool"]
end
Read --> ComputeHash
ComputeHash --> ReadHash
ReadHash --> GetIndexLock
GetIndexLock --> GetMmap
GetMmap --> ReadContext
ReadContext --> OptHandle
BatchRead --> BatchReadHash
BatchReadHash --> ReadContext
ReadContext --> VecOptHandle
ReadMeta --> Read
Read --> OptMeta
Exists --> Read
Read --> Bool
read
Reads a single entry by key. Computes the key hash using compute_hash(), acquires a read lock on the key index, and calls read_entry_with_context() to retrieve the entry.
Parameters:
key: Raw key bytes
Returns:
Ok(Some(EntryHandle)): Entry found and validOk(None): Entry not found or is a tombstoneErr(std::io::Error): Lock acquisition failure
Tag Verification: When the original key is provided, the method verifies the stored tag matches KeyIndexer::tag_from_key(key) to detect hash collisions.
Sources: src/storage_engine/data_store.rs:1040-1049
read_with_key_hash
Reads an entry using a pre-computed hash. This method skips the hashing step and tag verification, useful when the hash is already known.
Parameters:
prehashed_key: Pre-computed XXH3 hash of the key
Returns:
Ok(Some(EntryHandle)): Entry foundOk(None): Entry not foundErr(std::io::Error): Lock acquisition failure
Note: Tag verification is not performed when using pre-computed hashes.
Sources: src/storage_engine/data_store.rs:1051-1059
batch_read
Reads multiple entries in a single operation. Computes all hashes using compute_hash_batch(), then delegates to batch_read_hashed_keys().
Parameters:
keys: Slice of key references
Returns:
Ok(Vec<Option<EntryHandle>>): Vector of results, same length as inputErr(std::io::Error): Lock acquisition failure
Efficiency: Acquires the index read lock once for all lookups, reducing lock contention compared to multiple read() calls.
Sources: src/storage_engine/data_store.rs:1105-1109
batch_read_hashed_keys
Reads multiple entries using pre-computed hashes. Optionally accepts original keys for tag verification.
Parameters:
prehashed_keys: Slice of pre-computed key hashesnon_hashed_keys: Optional slice of original keys for tag verification (must match length ofprehashed_keys)
Returns:
Ok(Vec<Option<EntryHandle>>): Vector of resultsErr(std::io::Error): Lock failure or length mismatch
Sources: src/storage_engine/data_store.rs:1111-1158
read_last_entry
Reads the most recently written entry (at tail_offset - METADATA_SIZE). Does not verify the key or check the index; directly reads from the tail position.
Returns:
Ok(Some(EntryHandle)): Last entry retrievedOk(None): Storage is empty or tail is invalidErr(std::io::Error): Not applicable (infallible after initial checks)
Sources: src/storage_engine/data_store.rs:1061-1103
read_metadata
Reads only the metadata for a key without accessing the payload data. Calls read() and extracts the metadata field.
Returns:
Ok(Some(EntryMetadata)): Metadata retrievedOk(None): Key not foundErr(std::io::Error): Read failure
Sources: src/storage_engine/data_store.rs:1160-1162
exists
Checks if a key exists in the storage.
Returns:
Ok(true): Key exists and is not a tombstoneOk(false): Key not found or is deletedErr(std::io::Error): Read failure
Sources: src/storage_engine/data_store.rs:1030-1032
exists_with_key_hash
Checks key existence using a pre-computed hash.
Sources: src/storage_engine/data_store.rs:1034-1038
graph TB
subgraph "Write API Methods"
Write["write(key, payload)"]
WriteHash["write_with_key_hash(hash, payload)"]
BatchWrite["batch_write(entries)"]
BatchWriteHash["batch_write_with_key_hashes(entries)"]
WriteStream["write_stream(key, reader)"]
WriteStreamHash["write_stream_with_key_hash(hash, reader)"]
end
subgraph "Core Write Flow"
AcquireLock["file.write()"]
GetTail["tail_offset.load()"]
CalcPrepad["prepad_len(tail)"]
WritePrepad["write zeros for alignment"]
WritePayload["write payload data"]
WriteMeta["write EntryMetadata"]
Flush["file.flush()"]
Reindex["reindex()"]
end
subgraph "Post-Write"
RemapMmap["init_mmap()"]
UpdateIndex["key_indexer.insert()"]
UpdateTail["tail_offset.store()"]
end
Write --> WriteHash
WriteHash --> BatchWriteHash
WriteStream --> WriteStreamHash
BatchWriteHash --> AcquireLock
WriteStreamHash --> AcquireLock
AcquireLock --> GetTail
GetTail --> CalcPrepad
CalcPrepad --> WritePrepad
WritePrepad --> WritePayload
WritePayload --> WriteMeta
WriteMeta --> Flush
Flush --> Reindex
Reindex --> RemapMmap
RemapMmap --> UpdateIndex
UpdateIndex --> UpdateTail
Write Operations
Write operations are defined by the DataStoreWriter trait. All writes are append-only and atomic with respect to the file lock.
Sources: src/storage_engine/data_store.rs:752-1025
write
Writes a single key-value pair. Computes the key hash and delegates to write_with_key_hash().
Parameters:
key: Key bytespayload: Payload bytes (cannot be empty or all NULL bytes)
Returns:
Ok(tail_offset): New tail offset after writeErr(std::io::Error): Write failure
Sources: src/storage_engine/data_store.rs:827-830
write_with_key_hash
Writes a payload using a pre-computed key hash. Delegates to batch_write_with_key_hashes() with a single entry.
Sources: src/storage_engine/data_store.rs:832-834
batch_write
Writes multiple key-value pairs in a single atomic operation. All entries are buffered and written together, then flushed once.
Parameters:
entries: Slice of(key, payload)tuples
Returns:
Ok(tail_offset): New tail offset after batch writeErr(std::io::Error): Write failure (entire batch is rejected)
Efficiency: Reduces I/O overhead and lock contention compared to multiple write() calls.
Sources: src/storage_engine/data_store.rs:838-843
batch_write_with_key_hashes
Core batch write implementation. Processes each entry, calculating alignment padding and writing pre-pad, payload, and metadata. All data is buffered before flushing to disk.
Parameters:
prehashed_keys: Vector of(hash, payload)tuplesallow_null_bytes: Whether to allow single NULL byte payloads (used for tombstones)
Returns:
Ok(tail_offset): New tail offsetErr(std::io::Error): Write failure
Write Process:
- Acquire exclusive write lock on
file - Load current
tail_offset - For each entry:
- Calculate
prepad_len(tail_offset)for 64-byte alignment - Buffer pre-pad zeros (if needed)
- Buffer payload (using
simd_copy()) - Buffer metadata (key_hash, prev_offset, checksum)
- Update
tail_offset
- Calculate
- Write entire buffer to file
- Flush to disk
- Call
reindex()to update mmap and index
Sources: src/storage_engine/data_store.rs:847-939
write_stream
Writes a payload from a streaming source without loading it entirely into memory. Useful for large entries.
Parameters:
key: Key bytesreader: Any type implementingReadtrait
Returns:
Ok(tail_offset): New tail offset after writeErr(std::io::Error): Write or read failure
Sources: src/storage_engine/data_store.rs:753-756
write_stream_with_key_hash
Core streaming write implementation. Reads from the source in chunks of WRITE_STREAM_BUFFER_SIZE bytes, computing the checksum incrementally.
Write Process:
- Acquire write lock
- Write pre-pad bytes for alignment
- Read chunks from
readerusingWRITE_STREAM_BUFFER_SIZEbuffer - Write each chunk immediately to file
- Update checksum incrementally
- Write metadata after all payload bytes
- Flush and reindex
Validation:
- Rejects empty payloads
- Rejects payloads containing only NULL bytes
Sources: src/storage_engine/data_store.rs:758-825
Delete Operations
Delete operations write tombstones (single NULL byte entries) to the storage. They are implemented using the write infrastructure with allow_null_bytes = true.
delete
Deletes a single key by writing a tombstone. Delegates to batch_delete().
Returns:
Ok(tail_offset): New tail offsetErr(std::io::Error): Write failure
Sources: src/storage_engine/data_store.rs:986-988
batch_delete
Deletes multiple keys in a single operation. Computes hashes and delegates to batch_delete_key_hashes().
Sources: src/storage_engine/data_store.rs:990-993
batch_delete_key_hashes
Core batch delete implementation. First checks which keys actually exist to avoid writing unnecessary tombstones, then writes tombstones for existing keys only.
Process:
- Acquire read lock on index
- Filter keys to only those that exist
- Release read lock
- If no keys exist, return immediately without I/O
- Create delete operations:
(hash, &NULL_BYTE) - Call
batch_write_with_key_hashes()withallow_null_bytes = true
Optimization: Tombstones are written without pre-pad bytes since they are always exactly 1 byte.
Sources: src/storage_engine/data_store.rs:995-1024
Key Management Operations
These operations manipulate entries by key, providing higher-level functionality built on read and write operations.
rename
Renames a key by copying its data to a new key and deleting the old key.
Parameters:
old_key: Current keynew_key: New key (must be different from old key)
Returns:
Ok(tail_offset): New tail offset after renameErr(std::io::Error): Key not found, or keys are identical
Process:
- Read entry at
old_key - Convert to
EntryStream - Write stream to
new_key - Delete
old_key
Sources: src/storage_engine/data_store.rs:941-958
copy
Copies an entry from this storage to a different storage instance.
Parameters:
key: Key to copytarget: TargetDataStore(must have a different path)
Returns:
Ok(target_offset): Offset in target storage where entry was writtenErr(std::io::Error): Key not found, same storage path, or write failure
Process:
- Read entry handle for
key - Call
copy_handle()to write to target - Uses
EntryStreamfor streaming copy
Sources: src/storage_engine/data_store.rs:960-979
transfer
Transfers an entry to another storage by copying then deleting.
Parameters:
key: Key to transfertarget: TargetDataStore
Returns:
Ok(tail_offset): New tail offset after deletionErr(std::io::Error): Copy or delete failure
Process:
graph TB
subgraph "Iteration Methods"
IntoIter["into_iter()"]
IterEntries["iter_entries()"]
ParIterEntries["par_iter_entries()"]
end
subgraph "Iterator Types"
EI["EntryIterator"]
PI["ParallelIterator<EntryHandle>"]
end
subgraph "Implementation Details"
GetMmap["get_mmap_arc()"]
GetTail["tail_offset.load()"]
CollectOffsets["collect packed offsets"]
FilterMap["filter_map entries"]
end
IntoIter -->|consumes DataStore| IterEntries
IterEntries --> GetMmap
GetMmap --> GetTail
GetTail --> EI
ParIterEntries -->|requires parallel feature| CollectOffsets
CollectOffsets --> FilterMap
FilterMap --> PI
- Copy entry to target
- Delete entry from source
Sources: src/storage_engine/data_store.rs:981-984
Iteration
The DataStore provides methods for iterating over all valid entries. Iteration yields EntryHandle instances for each unique key, providing zero-copy access.
Sources: src/storage_engine/data_store.rs:269-361 src/storage_engine/entry_iterator.rs:8-127
iter_entries
Creates a sequential iterator over all valid entries. The iterator:
- Starts at
tail_offsetand moves backward - Skips tombstones (deleted entries)
- Ensures unique keys (only returns latest version)
- Uses zero-copy access via
EntryHandle
Returns:
EntryIterator: Sequential iterator overEntryHandle
Example:
Sources: src/storage_engine/data_store.rs:276-280
IntoIterator Implementation
Allows consuming a DataStore to produce an iterator:
This delegates to iter_entries() and consumes the DataStore instance.
Sources: src/storage_engine/data_store.rs:44-51
par_iter_entries (Parallel Feature)
Creates a parallel iterator using Rayon for multi-threaded processing. This method:
- Acquires a read lock on the index
- Collects all packed offset values (fast operation)
- Releases the lock immediately
- Returns a
ParallelIteratorthat processes entries across threads
Returns:
ParallelIterator<Item = EntryHandle>: Parallel iterator over entries
Performance: Suitable for bulk operations on multi-core systems where each entry can be processed independently.
Sources: src/storage_engine/data_store.rs:296-361
Maintenance Operations
These methods provide utilities for monitoring and optimizing storage.
compact
Compacts the storage by creating a new file containing only the latest version of each key. This operation:
- Creates temporary file at
{path}.bk - Iterates through current entries
- Copies latest version of each key to temporary file
- Swaps temporary file with original file
Requirements:
- Requires
&mut self(exclusive mutable reference) - Should only be called when no other threads access the storage
Warning: While &mut self prevents concurrent mutations, it does not prevent other threads from holding shared references (&DataStore) if wrapped in Arc<DataStore>. External synchronization may be required.
Returns:
Ok(()): Compaction successfulErr(std::io::Error): I/O failure
Sources: src/storage_engine/data_store.rs:706-749
estimate_compaction_savings
Calculates potential space savings from compaction by comparing total file size to the size needed for unique entries only.
Returns:
- Number of bytes that would be saved by compaction
Implementation: Iterates through entries, tracking seen keys, and sums the file size of unique entries.
Sources: src/storage_engine/data_store.rs:605-616
get_path
Returns the file path of the storage.
Sources: src/storage_engine/data_store.rs:265-267
len
Returns the number of unique keys currently stored (excluding tombstones).
Returns:
Ok(count): Number of keys in indexErr(std::io::Error): Lock acquisition failure
Sources: src/storage_engine/data_store.rs:1164-1171
is_empty
Checks if the storage contains any entries.
Sources: src/storage_engine/data_store.rs:1173-1177
file_size
Returns the total size of the storage file in bytes.
Sources: src/storage_engine/data_store.rs:1179-1181
Internal Methods
The following methods are internal implementation details but are important for understanding the API's behavior.
read_entry_with_context
Core read logic used by all read methods. Performs:
- Index lookup via
key_indexer_guard.get_packed(&key_hash) - Unpacking of tag and offset via
KeyIndexer::unpack() - Tag verification (if
non_hashed_keyis provided) - Metadata deserialization
- Entry bounds calculation (handling pre-pad and tombstones)
- Construction of
EntryHandle
Tag Verification: If non_hashed_key is provided and tag does not match KeyIndexer::tag_from_key(), returns None and logs a warning about potential hash collision.
Sources: src/storage_engine/data_store.rs:502-565
reindex
Updates the memory map and key index after write operations. This method:
- Creates new
mmapviainit_mmap() - Acquires locks on
mmapandkey_indexer - Inserts or removes key mappings
- Updates atomic
tail_offset
Hash Collision Handling: If key_indexer_guard.insert() returns an error (collision detected), the entire operation fails to prevent inconsistent state.
Sources: src/storage_engine/data_store.rs:224-259
prepad_len
Calculates the number of padding bytes needed to align offset to PAYLOAD_ALIGNMENT (64 bytes). Formula:
prepad = (PAYLOAD_ALIGNMENT - (offset % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)
This ensures all non-tombstone payloads start on 64-byte boundaries.
Sources: src/storage_engine/data_store.rs:670-673
API Usage Patterns
Basic Read-Write Pattern
Batch Operations Pattern
Streaming Pattern
Iteration Pattern
Sources: src/lib.rs:20-115 README.md:208-246
Dismiss
Refresh this wiki
This wiki was recently refreshed. Please wait 4 days to refresh again.
On this page
- DataStore API
- API Structure
- Core Types
- Opening Storage Files
- open
- open_existing
- Type Conversion Constructors
- Read Operations
- read
- read_with_key_hash
- batch_read
- batch_read_hashed_keys
- read_last_entry
- read_metadata
- exists
- exists_with_key_hash
- Write Operations
- write
- write_with_key_hash
- batch_write
- batch_write_with_key_hashes
- write_stream
- write_stream_with_key_hash
- Delete Operations
- delete
- batch_delete
- batch_delete_key_hashes
- Key Management Operations
- rename
- copy
- transfer
- Iteration
- iter_entries
- IntoIterator Implementation
- par_iter_entries (Parallel Feature)
- Maintenance Operations
- compact
- estimate_compaction_savings
- get_path
- len
- is_empty
- file_size
- Internal Methods
- read_entry_with_context
- reindex
- prepad_len
- API Usage Patterns
- Basic Read-Write Pattern
- Batch Operations Pattern
- Streaming Pattern
- Iteration Pattern
Entry Structure and Metadata
Relevant source files
- .github/workflows/rust-lint.yml
- CHANGELOG.md
- README.md
- simd-r-drive-entry-handle/Cargo.toml
- simd-r-drive-entry-handle/src/debug_assert_aligned.rs
- simd-r-drive-entry-handle/src/entry_metadata.rs
Purpose and Scope
This document details the on-disk binary layout of entries in the SIMD R Drive storage engine. It covers the structure of aligned entries, tombstones, metadata fields, and the alignment strategy that enables zero-copy access.
For information about how entries are read and accessed in memory, see Memory Management and Zero-Copy Access. For details on the validation chain and recovery mechanisms, see Storage Architecture.
On-Disk Entry Layout Overview
Every entry written to the storage file consists of three components:
- Pre-Pad Bytes (optional, 0-63 bytes) - Zero bytes inserted to ensure the payload starts at a 64-byte boundary
- Payload - Variable-length binary data
- Metadata - Fixed 20-byte structure containing key hash, previous offset, and checksum
The exception is tombstones (deletion markers), which use a minimal 1-byte payload with no pre-padding.
Sources: README.md:104-137 simd-r-drive-entry-handle/src/entry_metadata.rs:9-37
Aligned Entry Structure
Entry Layout Table
| Offset Range | Field | Size (Bytes) | Description |
|---|---|---|---|
P .. P+pad | Pre-Pad (optional) | pad | Zero bytes to align payload start |
P+pad .. N | Payload | N-(P+pad) | Variable-length data |
N .. N+8 | Key Hash | 8 | 64-bit XXH3 key hash |
N+8 .. N+16 | Prev Offset | 8 | Absolute offset of previous tail |
N+16 .. N+20 | Checksum | 4 | CRC32C of payload |
Where:
pad = (A - (prev_tail % A)) & (A - 1), withA = PAYLOAD_ALIGNMENT(64 bytes)- The next entry starts at offset
N + 20
Aligned Entry Structure Diagram
Sources: README.md:112-137 simd-r-drive-entry-handle/src/entry_metadata.rs:11-23
Tombstone Structure
Tombstones are special deletion markers that do not require payload alignment. They consist of a single zero byte followed by the standard 20-byte metadata structure.
Tombstone Layout Table
| Offset Range | Field | Size (Bytes) | Description |
|---|---|---|---|
T .. T+1 | Payload | 1 | Single byte 0x00 |
T+1 .. T+21 | Metadata | 20 | Key hash, prev, crc32c |
Tombstone Structure Diagram
Sources: README.md:126-131 simd-r-drive-entry-handle/src/entry_metadata.rs:25-30
EntryMetadata Structure
The EntryMetadata struct represents the fixed 20-byte metadata block that follows every payload. It is defined in #[repr(C)] layout to ensure consistent binary representation.
graph TB
subgraph EntryMetadataStruct["EntryMetadata struct"]
field1["key_hash: u64\n8 bytes\nXXH3_64 hash"]
field2["prev_offset: u64\n8 bytes\nbackward chain link"]
field3["checksum: [u8; 4]\n4 bytes\nCRC32C payload checksum"]
end
field1 --> field2
field2 --> field3
note4["Serialized at offset N\nfollowing payload"]
note5["Total: METADATA_SIZE = 20"]
field1 -.-> note4
field3 -.-> note5
Metadata Fields
Field Descriptions
key_hash: u64 (8 bytes, offset N .. N+8)
- 64-bit XXH3 hash of the key
- Used by
KeyIndexerfor O(1) lookups - Combined with a tag for collision detection
- Hardware-accelerated via SSE2/AVX2/NEON
prev_offset: u64 (8 bytes, offset N+8 .. N+16)
- Absolute file offset of the previous entry for this key
- Forms a backward-linked chain for version history
- Set to
0for the first entry of a key - Used during chain validation and recovery
checksum: [u8; 4] (4 bytes, offset N+16 .. N+20)
- CRC32C checksum of the payload
- Provides fast integrity verification
- Not cryptographically secure
- Used during recovery to detect corruption
Serialization and Deserialization
The EntryMetadata struct provides methods for converting to/from bytes:
serialize() -> [u8; 20]- Converts metadata to byte array using little-endian encodingdeserialize(data:&[u8]) -> Self- Reconstructs metadata from byte slice
Sources: simd-r-drive-entry-handle/src/entry_metadata.rs:44-113 README.md:114-120
Pre-Padding and Alignment Strategy
Alignment Purpose
All non-tombstone payloads start at a 64-byte aligned address. This alignment ensures:
- Cache-line efficiency - Matches typical CPU cache line size
- SIMD optimization - Enables full-speed AVX2/AVX-512/NEON operations
- Zero-copy typed views - Allows safe reinterpretation as typed slices (
&[u16],&[u32], etc.)
graph TD
Start["Calculate padding needed"]
GetPrevTail["prev_tail = last written offset"]
CalcPad["pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT))\n& (PAYLOAD_ALIGNMENT - 1)"]
CheckPad{"pad > 0?"}
WritePad["Write pad zero bytes"]
WritePayload["Write payload at aligned offset"]
Start --> GetPrevTail
GetPrevTail --> CalcPad
CalcPad --> CheckPad
CheckPad -->|Yes| WritePad
CheckPad -->|No| WritePayload
WritePad --> WritePayload
The alignment is configured via PAYLOAD_ALIGNMENT constant (64 bytes as of version 0.15.0).
Pre-Padding Calculation
The formula pad = (A - (prev_tail % A)) & (A - 1) where A = PAYLOAD_ALIGNMENT ensures:
- If
prev_tailis already aligned,pad = 0 - Otherwise,
padequals the bytes needed to reach the next aligned boundary - Maximum padding is
A - 1bytes (63 bytes for 64-byte alignment)
Constants
The alignment is defined in simd-r-drive-entry-handle/src/constants.rs:1-20:
| Constant | Value | Description |
|---|---|---|
PAYLOAD_ALIGN_LOG2 | 6 | Log₂ of alignment (2⁶ = 64) |
PAYLOAD_ALIGNMENT | 64 | Actual alignment boundary in bytes |
METADATA_SIZE | 20 | Fixed size of metadata block |
Sources: README.md:51-59 simd-r-drive-entry-handle/src/entry_metadata.rs:22-23 CHANGELOG.md:25-51
Backward Chain Formation
Chain Structure
Each entry's prev_offset field creates a backward-linked chain that tracks the version history for a given key. This chain is essential for:
- Recovery and validation on file open
- Detecting incomplete writes
- Rebuilding the index
Chain Properties
- Most recent entry is at the end of the file (highest offset)
- Chain traversal moves backward from tail toward offset 0
- First entry for a key has
prev_offset = 0 - Valid chain can be walked all the way back to byte 0 without gaps
- Broken chain indicates corruption or incomplete write
Usage in Recovery
During file open, the system:
- Scans backward from EOF reading metadata
- Follows
prev_offsetlinks to validate chain continuity - Verifies checksums at each step
- Truncates file if corruption is detected
- Scans forward to rebuild the index
Sources: README.md:139-147 simd-r-drive-entry-handle/src/entry_metadata.rs:41-43
Entry Type Comparison
Aligned Entry vs. Tombstone
| Aspect | Aligned Entry (Non-Tombstone) | Tombstone (Deletion Marker) |
|---|---|---|
| Pre-padding | 0-63 bytes (alignment dependent) | None |
| Payload size | Variable (user-defined) | Fixed 1 byte (0x00) |
| Payload alignment | 64-byte boundary | No alignment requirement |
| Metadata size | 20 bytes | 20 bytes |
| Total minimum size | 21 bytes (1-byte payload + metadata) | 21 bytes (1-byte + metadata) |
| Total maximum overhead | 83 bytes (63-byte pad + 20 metadata) | 21 bytes |
| Zero-copy capable | Yes (aligned payload) | No (tombstone flag only) |
When Tombstones Are Used
Tombstones mark key deletions while maintaining chain integrity. They:
- Preserve the backward chain via
prev_offset - Use minimal space (no alignment overhead)
- Are detected during reads and filtered out
- Enable recovery to skip deleted entries
Sources: README.md:112-137 simd-r-drive-entry-handle/src/entry_metadata.rs:9-37
Metadata Serialization Format
Binary Layout in File
Constants for Range Indexing
The simd-r-drive-entry-handle/src/constants.rs:1-20 file defines range constants for metadata field access:
KEY_HASH_RANGE = 0..8PREV_OFFSET_RANGE = 8..16CHECKSUM_RANGE = 16..20METADATA_SIZE = 20
These ranges are used in EntryMetadata::serialize() and deserialize() methods.
Sources: simd-r-drive-entry-handle/src/entry_metadata.rs:62-112
Alignment Evolution and Migration
Version History
v0.14.0-alpha and earlier: Used 16-byte alignment (PAYLOAD_ALIGNMENT = 16)
v0.15.0-alpha onwards: Changed to 64-byte alignment (PAYLOAD_ALIGNMENT = 64)
This change was made to:
- Ensure full cache-line alignment
- Support AVX-512 and future SIMD extensions
- Improve zero-copy performance across modern hardware
Migration Considerations
Storage files created with different alignment values are not compatible :
- v0.14.x readers cannot correctly parse v0.15.x stores
- v0.15.x readers may misinterpret v0.14.x padding
To migrate between versions:
- Read all entries using the old version binary
- Write entries to a new store using the new version binary
- Replace the old file after verification
In multi-service environments, deploy reader upgrades before writer upgrades to avoid mixed-version issues.
Sources: CHANGELOG.md:25-82 README.md:51-59
Debug Assertions for Alignment
Runtime Validation
The codebase includes debug-only alignment assertions that validate both pointer and offset alignment:
debug_assert_aligned(ptr: *const u8, align: usize) - Validates pointer alignment
- Active in debug and test builds
- Zero cost in release/bench builds
- Ensures buffer base address is properly aligned
debug_assert_aligned_offset(off: u64) - Validates file offset alignment
- Checks that derived payload start offset is at
PAYLOAD_ALIGNMENTboundary - Used during entry handle creation
- Defined in simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88
These assertions help catch alignment issues during development without imposing runtime overhead in production.
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88 CHANGELOG.md:33-41
Summary
The SIMD R Drive entry structure uses a carefully designed binary layout that balances efficiency, integrity, and flexibility:
- Fixed 64-byte alignment ensures cache-friendly, SIMD-optimized access
- 20-byte fixed metadata provides fast integrity checks and chain traversal
- Variable pre-padding maintains alignment without complex calculations
- Minimal tombstones mark deletions efficiently
- Backward-linked chain enables robust recovery and validation
This design enables zero-copy reads, high write throughput, and automatic crash recovery while maintaining a simple, append-only storage model.
Memory Management and Zero-Copy Access
Relevant source files
- .github/workflows/rust-lint.yml
- CHANGELOG.md
- README.md
- simd-r-drive-entry-handle/src/constants.rs
- simd-r-drive-entry-handle/src/debug_assert_aligned.rs
- simd-r-drive-entry-handle/src/lib.rs
- src/lib.rs
- src/storage_engine.rs
- src/storage_engine/entry_iterator.rs
- src/utils/align_or_copy.rs
This document explains how SIMD R Drive implements zero-copy reads through memory-mapped files, manages payload alignment for optimal performance, and provides safe concurrent access to stored data. It covers the EntryHandle abstraction, the Arc<Mmap> sharing strategy, alignment requirements enforced by PAYLOAD_ALIGNMENT, and utilities for working with aligned binary data.
For details on the on-disk entry format and metadata structure, see Entry Structure and Metadata. For information on how concurrent reads and writes are coordinated, see Concurrency and Thread Safety. For performance characteristics of the alignment strategy, see Payload Alignment and Cache Efficiency.
Memory-Mapped File Architecture
SIMD R Drive uses memory-mapped files (memmap2::Mmap) to enable zero-copy reads. The storage file is mapped into the process's virtual address space, allowing direct access to payload bytes without copying them into separate buffers. This approach provides several benefits:
- Zero-copy reads : Data is accessed directly from the mapped memory region
- Larger-than-RAM datasets : The OS handles paging, so only accessed portions consume physical memory
- Efficient random access : Any offset in the file can be accessed with pointer arithmetic
- Shared memory : Multiple threads can read from the same mapped region concurrently
The memory-mapped file is wrapped in Arc<Mutex<Arc<Mmap>>> within the DataStore structure:
DataStore.mmap_arc: Arc<Mutex<Arc<Mmap>>>
The outer Arc allows the DataStore itself to be cloned and shared. The Mutex protects remapping operations (which occur after writes). The inner Arc<Mmap> is what gets cloned and handed out to readers, ensuring that even if a remap occurs, existing readers retain a valid view of the previous mapping.
Sources: README.md:43-49 README.md:174-180 src/storage_engine/entry_iterator.rs:22-23
Mmap Lifecycle and Remapping
Diagram: Memory-mapped file lifecycle across write operations
When a write operation extends the file, the following sequence occurs:
- The file is extended and flushed to disk
- The
Mutex<Arc<Mmap>>is locked - A new
Mmapis created from the extended file - The old
Arc<Mmap>is replaced with a new one - Existing
EntryHandleinstances continue using their oldArc<Mmap>reference until dropped
This design ensures that readers never see an invalid memory mapping, even as writes extend the file.
Sources: README.md:174-180 src/storage_engine/data_store.rs:46-48 (implied from architecture)
EntryHandle: Zero-Copy Data Access
The EntryHandle struct provides a zero-copy view into a specific payload within the memory-mapped file. It consists of three fields:
EntryHandle {
mmap_arc: Arc<Mmap>,
range: Range<usize>,
metadata: EntryMetadata,
}
Each EntryHandle holds:
mmap_arc: A reference-counted pointer to the memory-mapped filerange: The byte range<FileRef file-url="https://github.com/jzombie/rust-simd-r-drive/blob/487b7b98/start..end)identifying the payload location\n-metadata#LNaN-LNaN" NaN file-path="start..end)identifying the payload location\n- **metadata`**">Hii (structure definition)
EntryHandle Methods
Diagram: EntryHandle methods and their memory semantics
| Method | Return Type | Memory Semantics | Use Case |
|---|---|---|---|
as_slice() | &[u8] | Zero-copy borrow | Direct byte access |
into_vec() | Vec<u8> | Owned copy | When ownership needed |
as_arrow_buffer() | arrow::buffer::Buffer | Zero-copy Buffer | Arrow integration (feature-gated) |
into_arrow_buffer() | arrow::buffer::Buffer | Consumes EntryHandle | Arrow integration (feature-gated) |
metadata() | &EntryMetadata | Borrow | Access key hash, checksum |
get_mmap_arc() | &Arc<Mmap> | Borrow | Low-level mmap access |
get_range() | &Range<usize> | Borrow | Get payload boundaries |
The as_slice() method is the primary zero-copy interface:
It returns a slice directly into the memory-mapped region with no allocation or copying.
Sources: simd-r-drive-entry-handle/src/entry_handle.rs:22-50 (method implementations)
Alignment Requirements
All non-tombstone payloads in SIMD R Drive begin on a fixed 64-byte aligned boundary. This alignment is defined by the PAYLOAD_ALIGNMENT constant:
PAYLOAD_ALIGN_LOG2 = 6
PAYLOAD_ALIGNMENT = 1 << 6 = 64 bytes
The 64-byte alignment matches typical CPU cache line sizes and ensures that SIMD operations (AVX, AVX-512, SVE) can operate at full speed without crossing cache line boundaries.
Sources: simd-r-drive-entry-handle/src/constants.rs:13-18 README.md:51-59
Alignment Calculation
The pre-padding required to align a payload is calculated using:
pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)
Where prev_tail is the file offset immediately after the previous entry's metadata. This formula ensures:
- Payloads always start at multiples of
PAYLOAD_ALIGNMENT - Pre-padding ranges from 0 to
PAYLOAD_ALIGNMENT - 1bytes - The calculation works for any power-of-two alignment
Diagram: Pre-padding calculation for 64-byte alignment
The storage engine writes zero bytes for the pre-padding region, then writes the payload starting at the aligned boundary.
Sources: README.md:112-124 src/storage_engine/entry_iterator.rs:50-53
Alignment Validation
The crate provides debug-only assertions to verify alignment invariants:
debug_assert_aligned(ptr: *const u8, align: usize)
debug_assert_aligned_offset(offset: u64)
These assertions:
- Are compiled to no-ops in release builds (zero runtime cost)
- Execute in debug and test builds to catch alignment violations
- Check both pointer alignment (
debug_assert_aligned) and offset alignment (debug_assert_aligned_offset)
Diagram: Debug-only alignment assertion behavior
graph TB
subgraph "Alignment Validation"
DebugMode["Debug/Test Build"]
ReleaseMode["Release Build"]
Call["debug_assert_aligned(ptr, align)"]
CheckPowerOf2["Assert align.is_power_of_two()"]
CheckAligned["Assert (ptr as usize & (align-1)) == 0"]
NoOp["No-op (optimized away)"]
Call --> DebugMode
Call --> ReleaseMode
DebugMode --> CheckPowerOf2
CheckPowerOf2 --> CheckAligned
ReleaseMode --> NoOp
end
The Arrow integration feature uses these assertions when creating arrow::buffer::Buffer instances from EntryHandle:
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88 simd-r-drive-entry-handle/src/entry_handle.rs:52-85 (Arrow integration)
Typed Slice Access via align_or_copy
The align_or_copy utility function enables efficient conversion of byte slices into typed slices (e.g., &[f32], &[u32]) with automatic fallback when alignment requirements are not met.
align_or_copy<T, const N: usize>(
bytes: &[u8],
from_le_bytes: fn([u8; N]) -> T
) -> Cow<'_, [T]>
Sources: src/utils/align_or_copy.rs:1-73
Zero-Copy vs. Fallback Behavior
Diagram: align_or_copy decision flow
The function attempts zero-copy reinterpretation using slice::align_to::<T>():
| Condition | Result | Allocation |
|---|---|---|
Memory aligned for T AND length is multiple of size_of::<T>() | Cow::Borrowed | None |
| Misaligned OR length not a multiple | Cow::Owned | Allocates Vec<T> |
Zero-copy path:
Fallback path:
Sources: src/utils/align_or_copy.rs:44-73
Usage Example
Due to the 64-byte payload alignment, EntryHandle payloads are often well-aligned for SIMD types:
Since PAYLOAD_ALIGNMENT = 64 is a multiple of align_of::<f32>() = 4, align_of::<f64>() = 8, and align_of::<u128>() = 16, zero-copy access is typically possible for these types when the payload length is an exact multiple of their size.
Sources: src/utils/align_or_copy.rs:37-43 README.md:51-59
graph TB
subgraph "DataStore Structure"
DS["DataStore"]
MmapMutex["Arc<Mutex<Arc<Mmap>>>"]
end
subgraph "Reader Thread 1"
R1["read(key1)"]
EH1["EntryHandle\n(Arc<Mmap> clone)"]
Slice1["as_slice() → &[u8]"]
end
subgraph "Reader Thread 2"
R2["read(key2)"]
EH2["EntryHandle\n(Arc<Mmap> clone)"]
Slice2["as_slice() → &[u8]"]
end
subgraph "Reader Thread N"
RN["read(keyN)"]
EHN["EntryHandle\n(Arc<Mmap> clone)"]
SliceN["as_slice() → &[u8]"]
end
subgraph "Memory-Mapped File"
Mmap["memmap2::Mmap\n(shared memory region)"]
end
DS --> MmapMutex
MmapMutex -.Arc::clone.-> R1
MmapMutex -.Arc::clone.-> R2
MmapMutex -.Arc::clone.-> RN
R1 --> EH1
R2 --> EH2
RN --> EHN
EH1 --> Slice1
EH2 --> Slice2
EHN --> SliceN
Slice1 -.references.-> Mmap
Slice2 -.references.-> Mmap
SliceN -.references.-> Mmap
Memory Sharing and Concurrency
The Arc<Mmap> design enables efficient memory sharing across threads while maintaining safety:
Diagram: Concurrent zero-copy reads via Arc sharing
Concurrency characteristics:
- No read locks required : Once an
EntryHandleholds anArc<Mmap>, it can access data without further synchronization - Write safety : Writers acquire
RwLock<File>to serialize writes, andMutex<Arc<Mmap>>to safely remap after writes - Memory efficiency : All readers share the same physical pages, regardless of thread count
- Graceful remapping : Old
Arc<Mmap>instances remain valid until all references are dropped
This design allows SIMD R Drive to support thousands of concurrent readers with minimal overhead.
Sources: README.md:172-206 src/storage_engine/entry_iterator.rs:22-23
Datasets Larger Than RAM
Memory mapping enables efficient access to storage files that exceed available physical memory:
- OS-managed paging : The operating system handles paging, loading only accessed regions into physical memory
- Transparent access : Application code uses the same
EntryHandleAPI regardless of file size - Efficient random access : Jumping between distant file offsets does not require explicit seeking
- Memory pressure handling : Unused pages are evicted by the OS when memory is needed elsewhere
For example, a storage file containing 100 GB of data on a system with 16 GB of RAM can be accessed efficiently. Only the portions actively being read will occupy physical memory, with the OS managing page faults transparently.
This design makes SIMD R Drive suitable for:
- Large-scale data analytics workloads
- Multi-gigabyte append-only logs
- Embedded databases on resource-constrained systems
- Archive storage with infrequent random access patterns
Sources: README.md:43-49 README.md:147-148
Concurrency and Thread Safety
Relevant source files
- README.md
- benches/storage_benchmark.rs
- src/main.rs
- src/storage_engine/data_store.rs
- src/utils/format_bytes.rs
- tests/concurrency_tests.rs
Purpose and Scope
This document describes the concurrency model and thread safety mechanisms used by the SIMD R Drive storage engine. It covers the synchronization primitives, locking strategies, and guarantees provided for concurrent access in single-process, multi-threaded environments. For information about the core storage architecture and data structures, see Storage Architecture. For details on the DataStore API methods, see DataStore API.
Sources: README.md:170-207 src/storage_engine/data_store.rs:1-33
Synchronization Architecture
The DataStore structure employs a combination of synchronization primitives to enable safe concurrent access while maintaining high performance for reads and writes.
graph TB
subgraph DataStore["DataStore Structure"]
File["file\nArc<RwLock<BufWriter<File>>>\nWrite synchronization"]
Mmap["mmap\nArc<Mutex<Arc<Mmap>>>\nMemory-map coordination"]
TailOffset["tail_offset\nAtomicU64\nSequential write ordering"]
KeyIndexer["key_indexer\nArc<RwLock<KeyIndexer>>\nConcurrent index access"]
Path["path\nPathBuf\nFile location"]
end
ReadOps["Read Operations"] --> KeyIndexer
ReadOps --> Mmap
WriteOps["Write Operations"] --> File
WriteOps --> TailOffset
WriteOps -.reindex.-> Mmap
WriteOps -.reindex.-> KeyIndexer
WriteOps -.reindex.-> TailOffset
ParallelIter["par_iter_entries"] --> KeyIndexer
ParallelIter --> Mmap
DataStore Synchronization Primitives
Synchronization Primitives by Field:
| Field | Type | Purpose | Concurrency Model |
|---|---|---|---|
file | Arc<RwLock<BufWriter<File>>> | Write operations | Exclusive write lock required |
mmap | Arc<Mutex<Arc<Mmap>>> | Memory-mapped view | Locked during remap operations |
tail_offset | AtomicU64 | Next write position | Lock-free atomic updates |
key_indexer | Arc<RwLock<KeyIndexer>> | Hash-to-offset index | Read-write lock for lookups/updates |
path | PathBuf | Storage file path | Immutable after creation |
Sources: src/storage_engine/data_store.rs:27-33 README.md:172-182
Read Path Concurrency
Read operations in SIMD R Drive are designed to be lock-free and zero-copy, enabling high throughput for concurrent readers.
sequenceDiagram
participant Thread1 as "Reader Thread 1"
participant Thread2 as "Reader Thread 2"
participant KeyIndexer as "key_indexer\nRwLock<KeyIndexer>"
participant Mmap as "mmap\nMutex<Arc<Mmap>>"
participant File as "Memory-Mapped File"
Thread1->>KeyIndexer: read_lock()
Thread2->>KeyIndexer: read_lock()
Note over Thread1,Thread2: Multiple readers can acquire lock simultaneously
KeyIndexer-->>Thread1: lookup hash → (tag, offset)
KeyIndexer-->>Thread2: lookup hash → (tag, offset)
Thread1->>Mmap: get_mmap_arc()
Thread2->>Mmap: get_mmap_arc()
Note over Mmap: Clone Arc, release Mutex immediately
Mmap-->>Thread1: Arc<Mmap>
Mmap-->>Thread2: Arc<Mmap>
Thread1->>File: Direct memory access [offset..offset+len]
Thread2->>File: Direct memory access [offset..offset+len]
Note over Thread1,Thread2: Zero-copy reads, no locking required
Lock-Free Read Architecture
Read Method Implementation
The read_entry_with_context method (lines 501-565) centralizes the read logic:
- Acquire Read Lock: Obtains a read lock on
key_indexervia the providedRwLockReadGuard - Index Lookup: Retrieves the packed
(tag, offset)value for the key hash - Tag Verification: Validates the tag to detect hash collisions (lines 513-521)
- Memory Access: Reads metadata and payload directly from the memory-mapped region
- Tombstone Check: Filters out deleted entries (single NULL byte)
- Zero-Copy Handle: Returns an
EntryHandlereferencing the memory-mapped data
Key Performance Characteristics:
- No Write Lock: Reads never block on the file write lock
- Concurrent Readers: Multiple threads can read simultaneously via
RwLockread lock - No Data Copy: The
EntryHandlereferences the memory-mapped region directly - Immediate Lock Release: The
get_mmap_arcmethod (lines 658-663) clones theArc<Mmap>and immediately releases theMutex
Sources: src/storage_engine/data_store.rs:501-565 src/storage_engine/data_store.rs:658-663 README.md:174-175
Write Path Synchronization
Write operations require exclusive access to the storage file and coordinate with the memory-mapped view and index.
sequenceDiagram
participant Writer as "Writer Thread"
participant FileRwLock as "file\nRwLock"
participant BufWriter as "BufWriter<File>"
participant TailOffset as "tail_offset\nAtomicU64"
participant Reindex as "reindex()"
participant MmapMutex as "mmap\nMutex"
participant KeyIndexerRwLock as "key_indexer\nRwLock"
Writer->>FileRwLock: write_lock()
Note over FileRwLock: Exclusive lock acquired\nOnly one writer at a time
Writer->>TailOffset: load(Ordering::Acquire)
TailOffset-->>Writer: current tail offset
Writer->>BufWriter: write pre-pad + payload + metadata
Writer->>BufWriter: flush()
Writer->>Reindex: reindex(&file, key_hash_offsets, tail_offset)
Reindex->>BufWriter: init_mmap(&file)
Note over Reindex: Create new memory-mapped view
Reindex->>MmapMutex: lock()
Reindex->>MmapMutex: Replace Arc<Mmap>
Reindex->>MmapMutex: unlock()
Reindex->>KeyIndexerRwLock: write_lock()
Reindex->>KeyIndexerRwLock: insert(key_hash → offset)
Reindex->>KeyIndexerRwLock: unlock()
Reindex->>TailOffset: store(new_tail, Ordering::Release)
Reindex-->>Writer: Ok(())
Writer->>FileRwLock: unlock()
Write Operation Flow
Write Methods and Locking
Single Write (write):
- Delegates to
batch_write_with_key_hasheswith a single entry (lines 827-834)
Batch Write (batch_write_with_key_hashes):
- Acquires write lock on
file(lines 852-855) - Loads
tail_offsetatomically (line 858) - Writes all entries to buffer (lines 863-922)
- Writes buffer to file in single operation (line 935)
- Flushes to disk (line 936)
- Calls
reindexto update mmap and index (lines 938-943) - Write lock automatically released on
file_guarddrop
Streaming Write (write_stream_with_key_hash):
- Acquires write lock on
file(lines 759-762) - Streams payload to file via buffered reads (lines 778-790)
- Flushes after streaming (line 814)
- Calls
reindexto update mmap and index (lines 818-823)
Sources: src/storage_engine/data_store.rs:752-843 src/storage_engine/data_store.rs:847-943
Memory Mapping Coordination
The reindex method coordinates the critical transition from write completion to read visibility by updating the memory-mapped view.
Reindex Operation
The reindex method (lines 224-259) performs four critical operations:
Step-by-Step Process:
-
Create New Memory Map (line 231):
- Calls
init_mmap(&write_guard)to create freshMmapfrom the file - File must be flushed before this point to ensure visibility
- Calls
-
Update Shared Mmap (lines 232, 255):
- Acquires
Mutexonself.mmap - Replaces the
Arc<Mmap>with the new mapping - Existing readers retain old
Arc<Mmap>references until they drop
- Acquires
-
Update Index (lines 233-253):
- Acquires write lock on
self.key_indexer - Inserts new
(key_hash → offset)mappings - Removes tombstone entries if present
- Acquires write lock on
-
Update Tail Offset (line 256):
- Stores new tail offset with
Ordering::Release - Ensures all prior writes are visible before offset update
- Stores new tail offset with
Critical Ordering: The write lock on file is held throughout the entire reindex operation, preventing concurrent writes from starting until the index and mmap are fully updated.
Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:172-174
Index Concurrency Model
The key_indexer uses RwLock<KeyIndexer> to enable concurrent read access while providing exclusive write access during updates.
Index Access Patterns
| Operation | Lock Type | Held During | Purpose |
|---|---|---|---|
read | RwLock::read() | Index lookup only (lines 507-508) | Multiple threads can lookup concurrently |
batch_read | RwLock::read() | All lookups in batch | Amortizes lock acquisition overhead |
par_iter_entries | RwLock::read() | Collecting offsets only (lines 300-302) | Minimizes lock hold time, drops before parallel iteration |
write / batch_write | RwLock::write() | Index updates in reindex (lines 233-236) | Exclusive access for inserting new entries |
delete (tombstone) | RwLock::write() | Removing from index (line 243) | Exclusive access for deletion |
Parallel Iteration Strategy
The par_iter_entries method (lines 296-361, feature-gated by parallel) demonstrates efficient lock usage:
This approach:
- Minimizes the duration the read lock is held
- Avoids holding locks during expensive parallel processing
- Allows concurrent writes to proceed while parallel iteration is in progress
Sources: src/storage_engine/data_store.rs:296-361
Atomic Tail Offset
The tail_offset field uses AtomicU64 with acquire-release ordering to coordinate write sequencing without locks.
Atomic Operations Pattern
Load (Acquire):
- Used at start of write operations (lines 763, 858)
- Ensures all prior writes are visible before loading the offset
- Provides the base offset for the new write
Store (Release):
- Used at end of
reindexafter all updates (line 256) - Ensures all writes (file, mmap, index) are visible before storing
- Prevents reordering of writes after the offset update
Sequential Consistency: Even though multiple threads may acquire the write lock sequentially, the atomic tail offset ensures that:
- Each write observes the correct starting position
- The tail offset update happens after all associated data is committed
- Readers see a consistent view of the storage
Sources: src/storage_engine/data_store.rs30 src/storage_engine/data_store.rs256 src/storage_engine/data_store.rs763 src/storage_engine/data_store.rs858
Multi-Process Considerations
SIMD R Drive's concurrency mechanisms are designed for single-process, multi-threaded environments. Multi-process access requires external coordination.
Thread Safety Matrix
| Environment | Reads | Writes | Index Updates | Storage Safety |
|---|---|---|---|---|
| Single Process, Single Thread | ✅ Safe | ✅ Safe | ✅ Safe | ✅ Safe |
| Single Process, Multi-Threaded | ✅ Safe (lock-free, zero-copy) | ✅ Safe (RwLock<File>) | ✅ Safe (RwLock<KeyIndexer>) | ✅ Safe (Mutex<Arc<Mmap>>) |
| Multiple Processes, Shared File | ⚠️ Unsafe (no cross-process coordination) | ❌ Unsafe (no external locking) | ❌ Unsafe (separate memory spaces) | ❌ Unsafe (race conditions) |
Why Multi-Process is Unsafe
Separate Memory Spaces:
- Each process has its own
Arc<Mmap>instance - Index changes in one process are not visible to others
- The
tail_offsetatomic is process-local
No Cross-Process Synchronization:
RwLockandMutexonly synchronize within a single process- File writes from multiple processes can interleave arbitrarily
- Memory mapping updates are not coordinated
Risk of Corruption:
- Two processes can both acquire file write locks independently
- Both may attempt to write at the same
tail_offset - The storage file's append-only chain can be broken
External Locking Requirements
For multi-process access, applications must implement external file locking:
- Advisory Locks: Use
flock(Unix) orLockFile(Windows) - Mandatory Locks: File system-level exclusive access
- Distributed Locks: For networked file systems
The core library intentionally does not provide cross-process locking to avoid platform-specific dependencies and maintain flexibility.
Sources: README.md:185-206 src/storage_engine/data_store.rs:27-33
Concurrency Testing
The test suite validates concurrent access patterns through multi-threaded integration tests.
Test Coverage
Concurrent Write Test (concurrent_write_test):
- Spawns 16 threads writing 10 entries each (lines 111-161)
- Uses
Arc<DataStore>to share storage instance - Verifies all 160 entries are correctly written and retrievable
- Demonstrates write lock serialization
Interleaved Read-Write Test (interleaved_read_write_test):
- Two threads alternating reads and writes on same key (lines 163-229)
- Uses
tokio::sync::Notifyfor coordination - Validates that writes are immediately visible to readers
- Confirms index updates are atomic
Concurrent Streamed Write Test (concurrent_slow_streamed_write_test):
- Multiple threads streaming 1MB payloads concurrently (lines 14-109)
- Simulates slow readers with artificial delays
- Verifies write lock prevents interleaved streaming
- Validates payload integrity after concurrent writes
Sources: tests/concurrency_tests.rs:14-229
graph TD
A["1. file: RwLock (Write)"] --> B["2. mmap: Mutex"]
B --> C["3. key_indexer: RwLock (Write)"]
C --> D["4. tail_offset: AtomicU64"]
style A fill:#f9f9f9
style B fill:#f9f9f9
style C fill:#f9f9f9
style D fill:#f9f9f9
Locking Order and Deadlock Prevention
To prevent deadlocks, SIMD R Drive follows a consistent lock acquisition order:
Lock Hierarchy
Acquisition Order:
- File Write Lock: Acquired first in all write operations
- Mmap Mutex: Locked during
reindexto update memory mapping - Key Indexer Write Lock: Locked during
reindexto update index - Tail Offset Atomic: Updated last with
Releaseordering
Lock Release: Locks are released in reverse order automatically through RAII (Rust's drop semantics).
Deadlock Prevention: This fixed order ensures no circular wait conditions. Read operations only acquire the key indexer read lock, which cannot deadlock with other readers.
Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:752-825 src/storage_engine/data_store.rs:847-943
Performance Characteristics
Concurrent Read Performance
- Zero-Copy: No memory allocation or copying for payload access
- Lock-Free After Index Lookup: Once
Arc<Mmap>is cloned, no locks held - Scalable: Read throughput increases linearly with thread count
- Cache-Friendly: 64-byte alignment optimizes cache line access
Write Serialization
- Sequential Writes: All writes are serialized by the
RwLock<File> - Minimal Lock Hold Time: Lock held only during actual I/O and index update
- Batch Amortization:
batch_writeamortizes lock overhead across multiple entries - No Write Starvation: Fair
RwLockimplementation prevents reader starvation of writers
Benchmark Results
From benches/storage_benchmark.rs typical performance on a single-threaded benchmark:
- Write Throughput: ~1M entries (8 bytes each) written per batch operation
- Sequential Read: Iterates all entries via zero-copy handles
- Random Read: ~1M random lookups complete in under 1 second
- Batch Read: Vectorized lookups provide additional speedup
For multi-threaded workloads, the concurrency tests demonstrate that read throughput scales with core count while write throughput is limited by the sequential append-only design.
Sources: benches/storage_benchmark.rs:1-234 tests/concurrency_tests.rs:1-230
Key Indexing and Hashing
Relevant source files
Purpose and Scope
This page documents the key indexing system and hashing mechanisms used in SIMD R Drive's storage engine. It covers the KeyIndexer data structure, the XXH3 hashing algorithm, tag-based collision detection, and hardware acceleration features.
For information about how the index is accessed in concurrent operations, see Concurrency and Thread Safety. For details on how metadata is stored alongside payloads, see Entry Structure and Metadata.
Overview
The SIMD R Drive storage engine maintains an in-memory index that maps key hashes to file offsets, enabling O(1) lookup performance for stored entries. This index is critical for avoiding full file scans when retrieving data.
The indexing system consists of three main components:
KeyIndexer: A concurrent hash map that stores packed values containing both a collision-detection tag and a file offset- XXH3_64 hashing : A fast, hardware-accelerated hashing algorithm that generates 64-bit hashes from arbitrary keys
- Tag-based verification : A secondary collision detection mechanism that validates lookups to prevent hash collision errors
Sources: src/storage_engine/data_store.rs:1-33 README.md:158-168
KeyIndexer Structure
The KeyIndexer is the core data structure managing the key-to-offset mapping. It is wrapped in thread-safe containers to support concurrent access.
graph TB
subgraph "DataStore"
KeyIndexerField["key_indexer\nArc<RwLock<KeyIndexer>>"]
end
subgraph "KeyIndexer Internals"
HashMap["HashMap<u64, u64>\nwith Xxh3BuildHasher"]
PackedValue["Packed u64 Value\n(16-bit tag / 48-bit offset)"]
end
subgraph "Operations"
Insert["insert(key_hash, offset)\n→ Result<()>"]
Lookup["get_packed(&key_hash)\n→ Option<u64>"]
Unpack["unpack(packed)\n→ (tag, offset)"]
TagFromKey["tag_from_key(key)\n→ u16"]
Build["build(mmap, tail_offset)\n→ KeyIndexer"]
end
KeyIndexerField --> HashMap
HashMap --> PackedValue
Insert -.-> HashMap
Lookup -.-> HashMap
Unpack -.-> PackedValue
TagFromKey -.-> PackedValue
Build --> KeyIndexerField
Packed Value Format
The KeyIndexer stores a compact 64-bit packed value for each hash. This value encodes two pieces of information:
| Bits | Field | Description |
|---|---|---|
| 63-48 | Tag (16-bit) | Collision detection tag derived from key bytes |
| 47-0 | Offset (48-bit) | Absolute file offset to entry metadata |
This packing allows the index to store both the offset and a verification tag without requiring additional memory allocations.
Sources: src/storage_engine/data_store.rs31 src/storage_engine/data_store.rs:509-511
Hashing Algorithm: XXH3_64
SIMD R Drive uses the XXH3_64 hashing algorithm from the xxhash-rust crate. XXH3 is optimized for speed and provides hardware acceleration on supported platforms.
Hardware Acceleration
The XXH3_64 implementation automatically detects and utilizes CPU-specific SIMD instructions:
graph LR
KeyBytes["Key Bytes\n&[u8]"]
subgraph "compute_hash(key)"
XXH3["xxhash-rust\nXXH3_64"]
end
subgraph "Hardware Detection"
X86Check["x86_64 CPU?"]
ARMCheck["aarch64 CPU?"]
end
subgraph "SIMD Paths"
SSE2["SSE2 Instructions\n(baseline x86_64)"]
AVX2["AVX2 Instructions\n(if available)"]
NEON["NEON Instructions\n(aarch64 default)"]
end
KeyBytes --> XXH3
XXH3 --> X86Check
XXH3 --> ARMCheck
X86Check -->|Yes| SSE2
X86Check -->|Yes + feature| AVX2
ARMCheck -->|Yes| NEON
SSE2 --> Hash64["u64 Hash"]
AVX2 --> Hash64
NEON --> Hash64
| Platform | Baseline Instructions | Optional Extensions | Performance Boost |
|---|---|---|---|
| x86_64 | SSE2 (always enabled) | AVX2 | Significant |
| aarch64 | NEON (default) | None required | Built-in |
| Fallback | Scalar operations | None | Baseline |
Sources: README.md:160-165 Cargo.lock:1814-1820 src/storage_engine/data_store.rs:2-4
Hash Computation Functions
The digest module provides batch and single-key hashing functions:
| Function | Signature | Use Case |
|---|---|---|
compute_hash | fn(key: &[u8]) -> u64 | Single key hashing |
compute_hash_batch | fn(keys: &[&[u8]]) -> Vec<u64> | Parallel batch hashing |
compute_checksum | fn(payload: &[u8]) -> [u8; 4] | CRC32C payload validation |
Sources: src/storage_engine/data_store.rs:2-4 src/storage_engine/data_store.rs754 src/storage_engine/data_store.rs:839-842
Tag-Based Collision Detection
While XXH3_64 produces high-quality 64-bit hashes, the system implements an additional layer of collision detection using 16-bit tags derived from the original key bytes.
sequenceDiagram
participant Client
participant DataStore
participant KeyIndexer
participant Mmap
Note over Client,Mmap: Write Operation
Client->>DataStore: write(key, payload)
DataStore->>DataStore: key_hash = compute_hash(key)
DataStore->>KeyIndexer: tag = tag_from_key(key)
DataStore->>Mmap: write payload + metadata
DataStore->>KeyIndexer: insert(key_hash, pack(tag, offset))
Note over Client,Mmap: Read Operation with Tag Verification
Client->>DataStore: read(key)
DataStore->>DataStore: key_hash = compute_hash(key)
DataStore->>KeyIndexer: packed = get_packed(key_hash)
KeyIndexer-->>DataStore: Some(packed_value)
DataStore->>DataStore: (stored_tag, offset) = unpack(packed)
DataStore->>DataStore: expected_tag = tag_from_key(key)
alt Tag Mismatch
DataStore->>DataStore: warn!("Tag mismatch")
DataStore-->>Client: None (collision detected)
else Tag Match
DataStore->>Mmap: read entry at offset
Mmap-->>DataStore: EntryHandle
DataStore-->>Client: Some(EntryHandle)
end
How Tags Work
Tag Verification Process
When reading an entry, the system performs the following verification:
- Compute the 64-bit hash of the requested key
- Look up the hash in the
KeyIndexerto get the packed value - Unpack the stored tag and offset from the packed value
- Compute the expected tag from the original key bytes
- Compare the stored tag with the expected tag
- If tags match, proceed with reading; if not, return
None(collision detected)
This two-level verification (hash + tag) dramatically reduces the probability of false positives from hash collisions.
Sources: src/storage_engine/data_store.rs:502-522 src/storage_engine/data_store.rs:513-521
Collision Detection in Batch Operations
For batch read operations, the system can perform tag verification when the original keys are provided:
Sources: src/storage_engine/data_store.rs:1105-1158
Index Building and Maintenance
Index Construction on Open
When a DataStore is opened, the KeyIndexer is constructed by scanning the validated storage file forward from offset 0 to the tail offset:
graph TB
OpenFile["DataStore::open(path)"]
RecoverChain["recover_valid_chain()"]
BuildIndex["KeyIndexer::build(mmap, final_len)"]
subgraph "Index Build Process"
ScanForward["Scan from 0 to tail_offset"]
ReadMetadata["Read EntryMetadata"]
ExtractHash["Extract key_hash"]
ComputeTag["Compute tag from prev entries"]
PackValue["pack(tag, metadata_offset)"]
InsertMap["Insert into HashMap"]
end
OpenFile --> RecoverChain
RecoverChain --> BuildIndex
BuildIndex --> ScanForward
ScanForward --> ReadMetadata
ReadMetadata --> ExtractHash
ExtractHash --> ComputeTag
ComputeTag --> PackValue
PackValue --> InsertMap
InsertMap --> ScanForward
InsertMap --> Initialized["KeyIndexer Ready"]
The index is built only once during the open() operation. Subsequent writes update the index incrementally.
Sources: src/storage_engine/data_store.rs:84-116 src/storage_engine/data_store.rs:106-108
graph LR
WriteOp["Write Operation\n(single or batch)"]
FlushFile["Flush to disk"]
Reindex["reindex()"]
subgraph "Reindex Steps"
RemapFile["Create new mmap"]
LockIndex["Acquire RwLock write"]
UpdateEntries["Update key_hash → offset"]
HandleDeletes["Remove deleted keys"]
UpdateTail["Store tail_offset (Atomic)"]
end
WriteOp --> FlushFile
FlushFile --> Reindex
Reindex --> RemapFile
RemapFile --> LockIndex
LockIndex --> UpdateEntries
UpdateEntries --> HandleDeletes
HandleDeletes --> UpdateTail
Index Updates During Writes
After each write operation, the reindex() method updates the in-memory index with new key mappings:
Critical : The file must be flushed before remapping to ensure newly written data is visible in the new memory-mapped view.
Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:814-824
Concurrent Access and Locking
The KeyIndexer is protected by an Arc<RwLock<KeyIndexer>> wrapper, enabling multiple concurrent readers while ensuring exclusive access for writers.
| Operation | Lock Type | Concurrency |
|---|---|---|
read() | Read lock | Multiple threads can read in parallel |
batch_read() | Read lock | Multiple threads can read in parallel |
write() | Write lock | Single writer, blocks all readers |
batch_write() | Write lock | Single writer, blocks all readers |
delete() | Write lock | Single writer, blocks all readers |
Sources: src/storage_engine/data_store.rs31 src/storage_engine/data_store.rs:1042-1049 src/storage_engine/data_store.rs:852-856
Performance Characteristics
Lookup Performance
The indexing system provides O(1) average-case lookup performance. Benchmarks demonstrate:
- 1 million random seeks (8-byte entries): typically < 1 second
- Hash computation overhead : negligible due to hardware acceleration
- Tag verification overhead : minimal (single comparison)
Sources: README.md:166-167
Memory Overhead
Each index entry consumes:
- 8 bytes for the key hash (map key)
- 8 bytes for the packed value (map value)
- Total : 16 bytes per unique key in memory
For a dataset with 1 million unique keys, the index occupies approximately 16 MB of RAM.
graph LR
subgraph "Scalar Baseline"
ScalarOps["Byte-by-byte\nprocessing"]
ScalarTime["~100% time"]
end
subgraph "SSE2 (x86_64)"
SSE2Ops["16-byte chunks\nparallel"]
SSE2Time["~40-60% time"]
end
subgraph "AVX2 (x86_64)"
AVX2Ops["32-byte chunks\nparallel"]
AVX2Time["~30-50% time"]
end
subgraph "NEON (aarch64)"
NEONOps["16-byte chunks\nparallel"]
NEONTime["~40-60% time"]
end
ScalarOps --> ScalarTime
SSE2Ops --> SSE2Time
AVX2Ops --> AVX2Time
NEONOps --> NEONTime
Hardware Acceleration Impact
The use of SIMD instructions in XXH3_64 significantly improves hash computation speed:
Performance gains vary by workload but are most significant for batch operations that process many keys.
Sources: README.md:160-165 src/storage_engine/data_store.rs:839-842
Error Handling and Collision Management
Hash Collision Detection
The dual-layer verification (64-bit hash + 16-bit tag) provides strong collision resistance:
- Probability of hash collision : ~1 in 2^64 for XXH3_64
- Probability of tag collision given hash collision : ~1 in 2^16
- Combined probability : ~1 in 2^80
Write-Time Collision Handling
If a collision is detected during a write operation (same hash but different tag), the batch write operation fails entirely:
This prevents inconsistent index states and ensures data integrity.
Sources: src/storage_engine/data_store.rs:245-251
Read-Time Collision Handling
If a tag mismatch is detected during a read, the system logs a warning and returns None:
Sources: src/storage_engine/data_store.rs:513-521
Summary
The key indexing and hashing system in SIMD R Drive provides:
- Fast lookups : O(1) hash-based access to entries
- Hardware acceleration : Automatic SIMD optimization on SSE2, AVX2, and NEON platforms
- Collision resistance : Dual-layer verification with 64-bit hashes and 16-bit tags
- Thread safety : Concurrent reads with exclusive writes via
RwLock - Low memory overhead : 16 bytes per unique key
This design enables efficient storage operations even for datasets with millions of entries, while maintaining data integrity through robust collision detection.
Sources: src/storage_engine/data_store.rs:1-1183 README.md:158-168
Network Layer and RPC
Relevant source files
- Cargo.lock
- experiments/simd-r-drive-muxio-service-definition/Cargo.toml
- experiments/simd-r-drive-ws-client/Cargo.toml
- experiments/simd-r-drive-ws-server/Cargo.toml
Purpose and Scope
The network layer provides remote access to the SIMD R Drive storage engine over WebSocket connections using the Muxio RPC framework. This document covers the core RPC infrastructure, service definitions, and native Rust client implementation.
For WebSocket server deployment and configuration details, see WebSocket Server. For Muxio RPC protocol internals, see Muxio RPC Framework. For native Rust client usage, see Native Rust Client. For Python client integration, see Python Integration.
Architecture Overview
The network layer follows a symmetric client-server architecture built on three core crates that work together to enable remote DataStore access.
graph TB
subgraph "Client Applications"
RustApp["Rust Application"]
PyApp["Python Application\n(see section 4)"]
end
subgraph "simd-r-drive-ws-client"
WSClient["DataStoreWsClient"]
ClientCaller["RPC Caller\nmuxio-rpc-service-caller"]
ClientRuntime["muxio-tokio-rpc-client"]
end
subgraph "Network Transport"
WS["WebSocket\ntokio-tungstenite\nBinary Frames"]
end
subgraph "simd-r-drive-ws-server"
WSServer["Axum Server\nHTTP/WS Upgrade"]
ServerRuntime["muxio-tokio-rpc-server"]
ServerEndpoint["RPC Endpoint\nmuxio-rpc-service-endpoint"]
end
subgraph "simd-r-drive-muxio-service-definition"
ServiceDef["DataStoreService\nRPC Interface"]
Bitcode["bitcode serialization"]
end
subgraph "Core Storage"
DataStore["DataStore"]
end
RustApp --> WSClient
PyApp -.-> WSClient
WSClient --> ClientCaller
ClientCaller --> ClientRuntime
ClientRuntime --> ServiceDef
ClientRuntime --> WS
WS --> WSServer
WSServer --> ServerRuntime
ServerRuntime --> ServiceDef
ServerRuntime --> ServerEndpoint
ServerEndpoint --> DataStore
ServiceDef --> Bitcode
Component Architecture
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:1-23 experiments/simd-r-drive-ws-client/Cargo.toml:1-22 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-17
| Crate | Role | Key Dependencies |
|---|---|---|
simd-r-drive-muxio-service-definition | Shared RPC contract defining service interface | bitcode, muxio-rpc-service |
simd-r-drive-ws-server | WebSocket server hosting DataStore | muxio-tokio-rpc-server, axum, tokio |
simd-r-drive-ws-client | Native Rust client for remote access | muxio-tokio-rpc-client, muxio-rpc-service-caller |
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:13-22 experiments/simd-r-drive-ws-client/Cargo.toml:14-21 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:13-16
Service Definition and Contract
The simd-r-drive-muxio-service-definition crate defines the RPC interface as a shared contract between clients and servers. This crate must be used by both sides to ensure protocol compatibility.
DataStoreService Interface
The service definition declares available RPC methods that map to DataStore operations. The Muxio framework generates type-safe client and server stubs from this definition.
Typical Service Methods:
read(key)- Retrieve payload by keywrite(key, payload)- Store payloaddelete(key)- Remove entryexists(key)- Check key existencebatch_read(keys)- Bulk retrievalbatch_write(entries)- Bulk insertionstream_*- Streaming operations
Sources: experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-17
Bitcode Serialization
All RPC messages use bitcode for binary serialization, providing compact representation and fast encoding/decoding.
Advantages:
- Zero-copy deserialization where possible
- Smaller message sizes than JSON or bincode
- Strong type safety via Rust's type system
- Efficient handling of large payloads
Sources: Cargo.lock:392-413 experiments/simd-r-drive-muxio-service-definition/Cargo.toml14
Communication Protocol
Request-Response Flow
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:14-21 experiments/simd-r-drive-ws-server/Cargo.toml:13-22
WebSocket Transport Layer
The network layer uses tokio-tungstenite for WebSocket communication over a Tokio async runtime.
Connection Properties:
- Binary WebSocket frames (not text)
- Persistent bidirectional connection
- Automatic ping/pong for keepalive
- Multiplexed requests via Muxio protocol
Protocol Stack:
┌─────────────────────────────────┐
│ DataStore Operations │
├─────────────────────────────────┤
│ Muxio RPC (Service Methods) │
├─────────────────────────────────┤
│ Bitcode Serialization │
├─────────────────────────────────┤
│ WebSocket Binary Frames │
├─────────────────────────────────┤
│ HTTP/1.1 Upgrade │
├─────────────────────────────────┤
│ TCP (tokio runtime) │
└─────────────────────────────────┘
Sources: Cargo.lock:305-339 (axum), Cargo.lock:1302-1317 (muxio-tokio-rpc-client), Cargo.lock:1320-1335 (muxio-tokio-rpc-server)
Muxio RPC Framework
The Muxio framework provides the core RPC abstraction layer, handling request routing, response matching, and connection multiplexing.
graph LR
subgraph "Client Side"
SC["muxio-rpc-service-caller\nMethod Invocation"]
TC["muxio-tokio-rpc-client\nTransport Handler"]
end
subgraph "Shared"
MRS["muxio-rpc-service\nCore Abstractions"]
SD["Service Definition"]
end
subgraph "Server Side"
TS["muxio-tokio-rpc-server\nTransport Handler"]
SE["muxio-rpc-service-endpoint\nMethod Routing"]
end
SC --> MRS
SC --> TC
TC --> SD
TS --> MRS
TS --> SE
SE --> SD
MRS --> SD
Framework Components
Sources: Cargo.lock:1250-1271 (muxio), Cargo.lock:1261-1271 (muxio-rpc-service), Cargo.lock:1274-1284 (muxio-rpc-service-caller), Cargo.lock:1287-1299 (muxio-rpc-service-endpoint)
| Component | Purpose | Key Features |
|---|---|---|
muxio | Core multiplexing protocol | Request/response correlation, concurrent requests |
muxio-rpc-service | RPC service abstractions | Service trait definitions, async method dispatch |
muxio-rpc-service-caller | Client-side invocation | Type-safe method calls, future-based API |
muxio-rpc-service-endpoint | Server-side routing | Request demultiplexing, method dispatch |
muxio-tokio-rpc-client | Tokio client runtime | WebSocket connection management |
muxio-tokio-rpc-server | Tokio server runtime | Axum integration, connection handling |
Sources: Cargo.lock:1250-1335
Request Multiplexing
The Muxio protocol assigns unique request IDs to enable multiple concurrent RPC calls over a single WebSocket connection.
Multiplexing Benefits:
- Single persistent connection reduces overhead
- Concurrent requests improve throughput
- Out-of-order responses supported
- Connection pooling not required
Sources: Cargo.lock:1250-1258
Client-Server Integration
Server-Side Implementation
The server crate (simd-r-drive-ws-server) integrates Axum for HTTP/WebSocket handling with the Muxio RPC server runtime.
Server Architecture:
- Axum HTTP server accepts connections
- WebSocket upgrade on
/wsendpoint muxio-tokio-rpc-serverhandles RPC protocolmuxio-rpc-service-endpointroutes to DataStore methods- Responses serialized with bitcode and returned
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:13-22
Client-Side Implementation
The client crate (simd-r-drive-ws-client) provides DataStoreWsClient for remote DataStore access.
Client Architecture:
- Connect to WebSocket endpoint
muxio-tokio-rpc-clientmanages connectionmuxio-rpc-service-callerprovides method invocation API- Requests serialized with bitcode
- Async methods return
Future<Result<T>>
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:13-21
Dependency Graph
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:13-22 experiments/simd-r-drive-ws-client/Cargo.toml:13-21 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:13-16
Performance Characteristics
Serialization Overhead
Bitcode provides efficient binary serialization with minimal overhead:
- Small message payloads for keys and metadata
- Zero-copy where possible for large payloads
- Faster than JSON or traditional formats
Sources: Cargo.lock:392-413
Network Considerations
The WebSocket-based architecture has specific performance characteristics:
| Aspect | Impact |
|---|---|
| Connection overhead | One-time WebSocket handshake |
| Request latency | Network RTT + serialization |
| Throughput | Limited by network bandwidth |
| Concurrent requests | Multiplexed over single connection |
| Keepalive | Automatic ping/pong frames |
Trade-offs:
- Remote access enables distributed deployments
- Network latency higher than direct DataStore access
- Serialization/deserialization adds CPU overhead
- Suitable for applications with network-tolerant latency requirements
Sources: Cargo.lock:305-339 (axum), Cargo.lock:1302-1335 (muxio tokio rpc)
Async Runtime Integration
All network components run on the Tokio async runtime, providing non-blocking I/O and efficient concurrency.
Tokio Integration:
- WebSocket connections handled asynchronously
- Multiple clients served concurrently
- Non-blocking DataStore operations
- Future-based API throughout
Sources: experiments/simd-r-drive-ws-server/Cargo.toml18 experiments/simd-r-drive-ws-client/Cargo.toml19
WebSocket Server
Relevant source files
- Cargo.lock
- experiments/simd-r-drive-muxio-service-definition/Cargo.toml
- experiments/simd-r-drive-ws-client/Cargo.toml
- experiments/simd-r-drive-ws-server/Cargo.toml
Purpose and Scope
This document covers the simd-r-drive-ws-server WebSocket server implementation, which provides remote access to the SIMD R Drive storage engine over WebSocket connections using the Muxio RPC framework. The server accepts connections from both native Rust clients (see Native Rust Client) and Python applications (see Python WebSocket Client API), routing RPC requests to the underlying DataStore.
For information about the RPC protocol and serialization format, see Muxio RPC Framework. For details on the core storage operations being exposed, see DataStore API.
Architecture Overview
The WebSocket server is built on the Axum web framework and Tokio async runtime, providing a high-performance, concurrent RPC endpoint for storage operations. The server acts as a thin network wrapper around the DataStore, translating WebSocket messages into storage API calls.
graph TB
CLI["Server CLI\n(clap parser)"]
Axum["Axum Web Framework\nHTTP/WebSocket handling"]
MuxioServer["muxio-tokio-rpc-server\nRPC server runtime"]
Endpoint["muxio-rpc-service-endpoint\nRequest routing"]
ServiceDef["simd-r-drive-muxio-service-definition\nRPC contract"]
DataStore["simd-r-drive::DataStore\nStorage engine"]
Tokio["Tokio Runtime\nAsync executor"]
Tungstenite["tokio-tungstenite\nWebSocket protocol"]
CLI --> Axum
Axum --> MuxioServer
MuxioServer --> Endpoint
MuxioServer --> Tungstenite
Endpoint --> ServiceDef
Endpoint --> DataStore
Axum -.-> Tokio
MuxioServer -.-> Tokio
Tungstenite -.-> Tokio
Component Stack
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:1-23 Cargo.lock:305-339 Cargo.lock:1320-1335
Crate Structure
The server is located in the experiments workspace and has a minimal dependency footprint focused on networking and RPC:
| Dependency | Purpose | Version Source |
|---|---|---|
simd-r-drive | Core storage engine | workspace |
simd-r-drive-muxio-service-definition | RPC service contract | workspace |
muxio-tokio-rpc-server | RPC server runtime | workspace (0.9.0-alpha) |
muxio-rpc-service | RPC abstractions | workspace (0.9.0-alpha) |
axum | Web framework | 0.8.4 |
tokio | Async runtime | workspace |
clap | CLI argument parsing | workspace |
tracing / tracing-subscriber | Structured logging | workspace |
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:13-22 Cargo.lock:305-339 Cargo.lock:1320-1335
Server Initialization Flow
The server follows a standard initialization pattern: parse CLI arguments, configure logging, initialize the DataStore, and start the WebSocket server.
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:13-22
sequenceDiagram
participant Main as "main()"
participant CLI as "clap::Parser"
participant Tracing as "tracing_subscriber"
participant DS as "DataStore::open()"
participant RPC as "muxio_tokio_rpc_server"
participant Axum as "axum::serve()"
Main->>CLI: Parse CLI arguments
CLI-->>Main: ServerArgs{port, path, log_level}
Main->>Tracing: init() with env_filter
Note over Tracing: Configure RUST_LOG levels
Main->>DS: open(data_file_path)
DS-->>Main: DataStore instance
Main->>RPC: Create endpoint with DataStore
RPC-->>Main: ServiceEndpoint
Main->>Axum: bind() and serve()
Note over Axum: Listen on 0.0.0.0:port
Axum->>Axum: Accept WebSocket connections
Request Processing Pipeline
When a client connects and sends RPC requests, the server processes them through multiple layers before reaching the storage engine.
Sources: Cargo.lock:305-339 Cargo.lock:1287-1299 Cargo.lock:1320-1335
graph LR
Client["WebSocket Client"]
WS["tokio-tungstenite\nWebSocket frame"]
Axum["Axum Router\nRoute: /ws"]
Server["muxio-tokio-rpc-server\nMessage decode"]
Router["muxio-rpc-service-endpoint\nMethod dispatch"]
Handler["Service Handler\n(read/write/delete)"]
DS["DataStore\nStorage operation"]
Client -->|Binary frame| WS
WS --> Axum
Axum --> Server
Server -->|Bitcode decode| Router
Router --> Handler
Handler --> DS
DS -.->|Result| Handler
Handler -.-> Router
Router -.->|Bitcode encode| Server
Server -.-> WS
WS -.->|Binary frame| Client
RPC Service Definition Integration
The server uses the simd-r-drive-muxio-service-definition crate to define the RPC interface contract. This crate acts as a shared dependency between the server and all clients, ensuring type-safe communication.
graph TB
subgraph "simd-r-drive-muxio-service-definition"
Service["Service Trait\nRPC method definitions"]
Types["Request/Response Types\nBitcode serializable"]
Bitcode["bitcode crate\nBinary serialization"]
end
subgraph "Server Side"
Endpoint["muxio-rpc-service-endpoint\nImplements Service Trait"]
Handler["Request Handlers\nCall DataStore methods"]
end
subgraph "Client Side"
Caller["muxio-rpc-service-caller\nInvokes Service methods"]
ClientImpl["ws-client implementation"]
end
Service --> Endpoint
Service --> Caller
Types --> Bitcode
Endpoint --> Handler
Caller --> ClientImpl
Service Definition Structure
Sources: experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-17 experiments/simd-r-drive-ws-server/Cargo.toml:14-17
The service definition uses the bitcode crate for efficient binary serialization, providing compact message sizes and high throughput compared to JSON-based protocols.
Sources: experiments/simd-r-drive-muxio-service-definition/Cargo.toml:14-15 Cargo.lock:392-402
graph TD
Server["simd-r-drive-ws-server"]
Server --> Axum["axum 0.8.4\nWeb framework"]
Server --> MuxioServer["muxio-tokio-rpc-server\nRPC runtime"]
Server --> ServiceDef["simd-r-drive-muxio-service-definition\nRPC contract"]
Server --> DataStore["simd-r-drive\nStorage engine"]
Server --> Tokio["tokio\nAsync runtime"]
Server --> Clap["clap\nCLI parsing"]
Server --> Tracing["tracing + tracing-subscriber\nLogging"]
Axum --> Hyper["hyper\nHTTP implementation"]
Axum --> TokioTung["tokio-tungstenite\nWebSocket protocol"]
MuxioServer --> Muxio["muxio\nCore RPC abstractions"]
MuxioServer --> Endpoint["muxio-rpc-service-endpoint\nRouting"]
ServiceDef --> Bitcode["bitcode\nSerialization"]
ServiceDef --> MuxioService["muxio-rpc-service\nService traits"]
Hyper --> Tokio
TokioTung --> Tokio
Endpoint --> Tokio
Dependency Graph
The server's dependency structure shows clear separation between web framework, RPC layer, and storage:
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:13-22 Cargo.lock:305-339 Cargo.lock:1320-1335
Configuration and CLI
The server is configured via command-line arguments using the clap crate's derive API. The server binary accepts the following parameters:
| Argument | Type | Default | Description |
|---|---|---|---|
--port | u16 | 8080 | Port number to bind |
--path | String | Required | Path to DataStore file |
--log-level | String | "info" | Tracing log level (error/warn/info/debug/trace) |
The server supports the RUST_LOG environment variable through tracing-subscriber with env-filter for fine-grained logging control.
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:19-22 experiments/simd-r-drive-ws-server/Cargo.toml21
graph TB
subgraph "Tokio Runtime"
Scheduler["Work-Stealing Scheduler\nThread pool"]
end
subgraph "Connection Handlers"
Conn1["WebSocket Task 1\nRPC message loop"]
Conn2["WebSocket Task 2\nRPC message loop"]
ConnN["WebSocket Task N\nRPC message loop"]
end
subgraph "Shared State"
DS["Arc<DataStore>\nThread-safe storage"]
Endpoint["Arc<ServiceEndpoint>\nRequest router"]
end
Scheduler --> Conn1
Scheduler --> Conn2
Scheduler --> ConnN
Conn1 --> Endpoint
Conn2 --> Endpoint
ConnN --> Endpoint
Endpoint --> DS
style DS fill:#f9f9f9
style Endpoint fill:#f9f9f9
Concurrency Model
The server leverages Tokio's work-stealing scheduler to handle multiple concurrent WebSocket connections efficiently:
Each WebSocket connection runs as an independent async task, with the DataStore wrapped in Arc for shared access. The DataStore's internal locking (see Concurrency and Thread Safety) ensures safe concurrent reads and serialized writes.
Sources: experiments/simd-r-drive-ws-server/Cargo.toml18 Cargo.lock:305-339
Transport Protocol
The server uses binary WebSocket frames over TCP, with the following characteristics:
| Property | Value | Notes |
|---|---|---|
| Protocol | WebSocket (RFC 6455) | Via tokio-tungstenite |
| Message Format | Binary frames | Not text frames |
| Serialization | Bitcode | Compact binary format |
| Framing | Message-per-frame | Each RPC call is one frame |
| Multiplexing | Muxio protocol | Request/response correlation |
The binary format and bitcode serialization provide significantly better performance than text-based protocols like JSON over WebSocket.
Sources: Cargo.lock:305-339 experiments/simd-r-drive-muxio-service-definition/Cargo.toml14
Error Handling
The server implements error handling at multiple layers:
- WebSocket Layer : Connection errors, protocol violations handled by
tokio-tungstenite - RPC Layer : Serialization errors, invalid method calls handled by
muxio-tokio-rpc-server - Storage Layer : I/O errors, validation failures propagated from
DataStore - Tracing : All errors logged with structured context via
tracingspans
Errors are serialized back to clients as RPC error responses, preserving error context across the network boundary.
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:19-20 Cargo.lock:1320-1335
Performance Characteristics
The server is designed for high-throughput operation with minimal overhead:
- Zero-Copy Reads : DataStore's
EntryHandle(see Memory Management and Zero-Copy Access) allows serving read responses without copying payload data - Async I/O : Tokio's epoll/kqueue-based I/O enables efficient handling of thousands of concurrent connections
- Binary Protocol : Bitcode serialization reduces CPU overhead and bandwidth usage compared to text formats
- Write Batching : Clients can use batch operations (see Write and Read Modes) to amortize RPC overhead
Sources: experiments/simd-r-drive-ws-server/Cargo.toml14 experiments/simd-r-drive-muxio-service-definition/Cargo.toml14
Deployment Considerations
When deploying the WebSocket server, consider:
- Single DataStore Instance : The server opens one DataStore file. Multiple servers require separate files or external coordination
- Port Binding : Default bind address is
0.0.0.0(all interfaces). Use firewall rules or reverse proxy for access control - No TLS : The server does not implement TLS. Use a reverse proxy (nginx, HAProxy) for encrypted connections
- Resource Limits : Memory usage scales with DataStore size (memory-mapped file) plus per-connection buffers
- Graceful Shutdown : Tokio runtime handles SIGTERM/SIGINT for clean connection closure
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:1-23
Experimental Status
As indicated by its location in the experiments/ workspace directory, this server is currently experimental and subject to breaking changes. The API surface and configuration options may evolve as the Muxio RPC framework stabilizes.
Sources: experiments/simd-r-drive-ws-server/Cargo.toml2 experiments/simd-r-drive-ws-server/Cargo.toml11
Muxio RPC Framework
Relevant source files
- Cargo.lock
- experiments/simd-r-drive-muxio-service-definition/Cargo.toml
- experiments/simd-r-drive-ws-client/Cargo.toml
- experiments/simd-r-drive-ws-server/Cargo.toml
Purpose and Scope
This document describes the Muxio RPC (Remote Procedure Call) framework as implemented in SIMD R Drive for remote storage access over WebSocket connections. The framework provides a type-safe, multiplexed communication protocol using bitcode serialization for efficient binary data transfer.
For information about the WebSocket server implementation, see WebSocket Server. For the native Rust client implementation, see Native Rust Client. For Python client integration, see Python WebSocket Client API.
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:1-23 experiments/simd-r-drive-ws-client/Cargo.toml:1-22 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-17
Architecture Overview
The Muxio RPC framework consists of multiple layers that work together to provide remote procedure calls over WebSocket connections:
Muxio RPC Framework Layer Architecture
graph TB
subgraph "Client Application Layer"
App["Application Code"]
end
subgraph "Client RPC Stack"
Caller["muxio-rpc-service-caller\nMethod Invocation"]
ClientRuntime["muxio-tokio-rpc-client\nWebSocket Client Runtime"]
end
subgraph "Shared Contract"
ServiceDef["simd-r-drive-muxio-service-definition\nService Interface Contract\nMethod Signatures"]
Bitcode["bitcode\nBinary Serialization"]
end
subgraph "Server RPC Stack"
ServerRuntime["muxio-tokio-rpc-server\nWebSocket Server Runtime"]
Endpoint["muxio-rpc-service-endpoint\nRequest Router"]
end
subgraph "Server Application Layer"
Impl["DataStore Implementation"]
end
subgraph "Core Framework"
Core["muxio-rpc-service\nBase RPC Traits & Types"]
end
App --> Caller
Caller --> ClientRuntime
ClientRuntime --> ServiceDef
ClientRuntime --> Bitcode
ClientRuntime --> Core
ServiceDef --> Bitcode
ServiceDef --> Core
ServerRuntime --> ServiceDef
ServerRuntime --> Bitcode
ServerRuntime --> Core
ServerRuntime --> Endpoint
Endpoint --> Impl
style ServiceDef fill:#f9f9f9,stroke:#333,stroke-width:2px
The framework is organized into distinct layers:
| Layer | Crates | Responsibility |
|---|---|---|
| Core Framework | muxio-rpc-service | Base traits, types, and RPC protocol definitions |
| Service Definition | simd-r-drive-muxio-service-definition | Shared interface contract between client and server |
| Serialization | bitcode | Efficient binary encoding/decoding of messages |
| Client Runtime | muxio-tokio-rpc-client, muxio-rpc-service-caller | WebSocket client, method invocation, request management |
| Server Runtime | muxio-tokio-rpc-server, muxio-rpc-service-endpoint | WebSocket server, request routing, response handling |
Sources: Cargo.lock:1250-1336 experiments/simd-r-drive-ws-server/Cargo.toml:14-17 experiments/simd-r-drive-ws-client/Cargo.toml:14-21
Core Framework Components
muxio-rpc-service
The muxio-rpc-service crate provides the foundational abstractions for the RPC system:
Core RPC Framework Types
graph LR
subgraph "muxio-rpc-service Core Types"
RpcService["RpcService Trait\nService Interface"]
Request["Request Type\nmethod_id + payload"]
Response["Response Type\nresult or error"]
Error["RpcError\nError Handling"]
end
RpcService -->|defines| Request
RpcService -->|defines| Response
Response -->|contains| Error
Key components in muxio-rpc-service:
| Component | Type | Purpose |
|---|---|---|
| RpcService | Trait | Defines the service interface with method dispatch |
| Request | Struct | Contains method ID and serialized payload |
| Response | Enum | Success with payload or error variant |
| method_id | Hash | XXH3 hash of method signature for routing |
Sources: Cargo.lock:1261-1272
Service Definition Layer
simd-r-drive-muxio-service-definition
The service definition crate serves as the shared contract between clients and servers. It defines the exact interface of the DataStore RPC service:
DataStore RPC Service Methods
graph TB
subgraph "Service Definition Structure"
ServiceDef["DataStoreService\nService Definition"]
Methods["Method Definitions"]
Types["Request/Response Types"]
Write["write\n(key, payload) → offset"]
Read["read\n(key) → Option<bytes>"]
Delete["delete\n(key) → bool"]
Exists["exists\n(key) → bool"]
BatchWrite["batch_write\n(entries) → Vec<offset>"]
BatchRead["batch_read\n(keys) → Vec<Option<bytes>>"]
end
ServiceDef --> Methods
ServiceDef --> Types
Methods --> Write
Methods --> Read
Methods --> Delete
Methods --> Exists
Methods --> BatchWrite
Methods --> BatchRead
Types -->|uses| Bitcode["bitcode Serialization"]
The service definition establishes a type-safe contract for all operations:
| Method | Request Type | Response Type | Description |
|---|---|---|---|
write | (Vec<u8>, Vec<u8>) | Result<u64, Error> | Write key-value pair, return offset |
read | Vec<u8> | Result<Option<Vec<u8>>, Error> | Read value by key |
delete | Vec<u8> | Result<bool, Error> | Delete entry, return success |
exists | Vec<u8> | Result<bool, Error> | Check key existence |
batch_write | Vec<(Vec<u8>, Vec<u8>)> | Result<Vec<u64>, Error> | Write multiple entries |
batch_read | Vec<Vec<u8>> | Result<Vec<Option<Vec<u8>>>, Error> | Read multiple entries |
flowchart LR
Signature["Method Signature\n'write(key,payload)'"]
Hash["XXH3 Hash"]
MethodID["method_id: u64"]
Signature -->|hash| Hash
Hash --> MethodID
MethodID -->|used for routing| Router["Request Router"]
Method ID Generation
Each method is identified by an XXH3 hash of its signature, enabling fast routing without string comparisons:
Method Identification Flow
Sources: experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-17 Cargo.lock:1261-1272
graph LR
subgraph "Bitcode Serialization Pipeline"
RustType["Rust Type\n(Request/Response)"]
Encode["bitcode::encode"]
Binary["Binary Payload\nCompact Format"]
Decode["bitcode::decode"]
RustType2["Rust Type\n(Reconstructed)"]
end
RustType -->|serialize| Encode
Encode --> Binary
Binary -->|deserialize| Decode
Decode --> RustType2
Bitcode Serialization
The framework uses the bitcode crate for efficient binary serialization with the following characteristics:
Serialization Features
Bitcode Encoding/Decoding Pipeline
| Feature | Description | Benefit |
|---|---|---|
| Zero-copy deserialization | References data without copying when possible | Minimal overhead for large payloads |
| Compact encoding | Space-efficient binary format | Reduced network bandwidth |
| Type safety | Compile-time type checking | Prevents serialization errors |
| Performance | Faster than JSON/MessagePack | Lower CPU overhead |
Integration with RPC
The serialization is integrated at multiple points:
- Request serialization : Client encodes method arguments to binary
- Wire transfer : Binary payload sent over WebSocket
- Response serialization : Server encodes return values to binary
- Deserialization : Both sides decode binary to Rust types
Sources: Cargo.lock:392-414 experiments/simd-r-drive-muxio-service-definition/Cargo.toml14
Client-Side Components
muxio-rpc-service-caller
The caller component provides the client-side method invocation interface:
Client Method Invocation Flow
Key responsibilities:
- Method call marshalling
- Request ID generation
- Response awaiting and matching
- Error propagation
Sources: Cargo.lock:1274-1285 experiments/simd-r-drive-ws-client/Cargo.toml18
muxio-tokio-rpc-client
The client runtime handles WebSocket communication and multiplexing:
Client Runtime Request/Response Multiplexing
The client runtime provides:
| Feature | Implementation | Purpose |
|---|---|---|
| Connection management | Single WebSocket connection | Persistent connection |
| Request multiplexing | Multiple concurrent requests | High throughput |
| Response routing | Map of pending requests | Match responses to callers |
| Async/await support | Tokio-based futures | Non-blocking I/O |
Sources: Cargo.lock:1302-1318 experiments/simd-r-drive-ws-client/Cargo.toml16
flowchart TB
subgraph "Server Request Processing"
Receive["Receive Request\nWebSocket"]
Deserialize["Deserialize\nbitcode::decode"]
ExtractID["Extract method_id"]
Route["Route to Handler"]
Execute["Execute Method"]
Serialize["Serialize Response\nbitcode::encode"]
Send["Send Response\nWebSocket"]
end
Receive --> Deserialize
Deserialize --> ExtractID
ExtractID --> Route
Route --> Execute
Execute --> Serialize
Serialize --> Send
Server-Side Components
muxio-rpc-service-endpoint
The endpoint component routes incoming RPC requests to service implementations:
Server Request Processing Pipeline
Endpoint responsibilities:
| Responsibility | Description |
|---|---|
| Request routing | Maps method_id to handler function |
| Execution | Invokes the appropriate service method |
| Error handling | Catches and serializes errors |
| Response formatting | Wraps results in response envelope |
Sources: Cargo.lock:1287-1300 experiments/simd-r-drive-ws-server/Cargo.toml17
muxio-tokio-rpc-server
The server runtime manages WebSocket connections and request dispatching:
Server Runtime Connection Management
Key features:
| Feature | Implementation | Purpose |
|---|---|---|
| Multi-client support | One task pair per connection | Concurrent clients |
| Backpressure handling | Bounded channels | Prevent memory exhaustion |
| Graceful shutdown | Signal-based termination | Clean connection closure |
| Error isolation | Per-connection error handling | Fault tolerance |
Sources: Cargo.lock:1320-1336 experiments/simd-r-drive-ws-server/Cargo.toml16
Request/Response Flow
Complete RPC Call Sequence
End-to-End RPC Call Flow
Message Format
The wire protocol uses a structured binary format:
| Field | Type | Size | Description |
|---|---|---|---|
request_id | u64 | 8 bytes | Unique request identifier |
method_id | u64 | 8 bytes | XXH3 hash of method signature |
payload_len | u32 | 4 bytes | Length of payload |
payload | [u8] | Variable | Bitcode-encoded arguments/result |
Sources: All component sources above
Error Handling
The framework provides comprehensive error handling across the RPC boundary:
RPC Error Classification and Propagation
Error Categories
| Category | Origin | Handling |
|---|---|---|
| Serialization errors | Bitcode encoding/decoding failure | Logged and returned as RpcError |
| Network errors | WebSocket connection issues | Automatic reconnect or error propagation |
| Application errors | DataStore operation failures | Serialized and returned to client |
| Timeout errors | Request took too long | Client-side timeout with error result |
Error Recovery
The framework implements several recovery strategies:
- Connection loss : Client automatically attempts reconnection
- Request timeout : Client cancels pending request after configured duration
- Serialization failure : Error logged and generic error returned
- Invalid method ID : Server returns "method not found" error
Sources: Cargo.lock:1261-1336
Performance Characteristics
The Muxio RPC framework is optimized for high-performance remote storage access:
| Metric | Characteristic | Impact |
|---|---|---|
| Serialization overhead | ~50-100 ns for typical payloads | Minimal CPU impact |
| Request multiplexing | Thousands of concurrent requests | High throughput |
| Binary protocol | Compact wire format | Reduced bandwidth usage |
| Zero-copy deserialization | Direct memory references | Lower latency for large payloads |
The use of bitcode serialization and WebSocket binary frames minimizes overhead compared to text-based protocols like JSON over HTTP. The multiplexed architecture allows clients to issue multiple concurrent requests without blocking, essential for high-performance batch operations.
Sources: Cargo.lock:392-414 Cargo.lock:1250-1336
Native Rust Client
Relevant source files
- Cargo.lock
- experiments/simd-r-drive-muxio-service-definition/Cargo.toml
- experiments/simd-r-drive-ws-client/Cargo.toml
- experiments/simd-r-drive-ws-server/Cargo.toml
Purpose and Scope
The simd-r-drive-ws-client crate provides a native Rust client library for remote access to the SIMD R Drive storage engine via WebSocket connections. This client enables Rust applications to interact with a remote DataStore instance through the Muxio RPC framework with bitcode serialization.
This document covers the native Rust client implementation. For the WebSocket server that this client connects to, see WebSocket Server. For the Muxio RPC protocol details, see Muxio RPC Framework. For the Python bindings that wrap this client, see Python WebSocket Client API.
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:1-22
Architecture Overview
The native Rust client is structured as a thin wrapper around the Muxio RPC client infrastructure, providing type-safe access to remote DataStore operations.
Key Components:
graph TB
subgraph "Application Layer"
UserApp["User Application\nRust Code"]
end
subgraph "simd-r-drive-ws-client Crate"
ClientAPI["Client API\nDataStoreReader/Writer Traits"]
WsClient["WebSocket Client\nConnection Management"]
end
subgraph "RPC Infrastructure"
ServiceCaller["muxio-rpc-service-caller\nMethod Invocation"]
TokioRpcClient["muxio-tokio-rpc-client\nTransport Layer"]
end
subgraph "Shared Contract"
ServiceDef["simd-r-drive-muxio-service-definition\nRPC Interface"]
Bitcode["bitcode\nSerialization"]
end
subgraph "Network"
WsConnection["WebSocket Connection\ntokio-tungstenite"]
end
UserApp --> ClientAPI
ClientAPI --> WsClient
WsClient --> ServiceCaller
ServiceCaller --> TokioRpcClient
ServiceCaller --> ServiceDef
TokioRpcClient --> Bitcode
TokioRpcClient --> WsConnection
style ClientAPI fill:#f9f9f9,stroke:#333,stroke-width:2px
style ServiceDef fill:#f9f9f9,stroke:#333,stroke-width:2px
| Component | Crate | Purpose |
|---|---|---|
| Client API | simd-r-drive-ws-client | Public interface implementing DataStore traits |
| Service Caller | muxio-rpc-service-caller | RPC method invocation and request routing |
| RPC Client | muxio-tokio-rpc-client | WebSocket transport and message handling |
| Service Definition | simd-r-drive-muxio-service-definition | Shared RPC contract and type definitions |
| Async Runtime | tokio | Asynchronous I/O and task execution |
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:13-21 Cargo.lock:1302-1318
Client API Structure
The client implements the same DataStoreReader and DataStoreWriter traits as the local DataStore, enabling transparent remote access with minimal API differences.
Core Traits:
graph LR
subgraph "Trait Implementations"
Reader["DataStoreReader\nread()\nexists()\nbatch_read()"]
Writer["DataStoreWriter\nwrite()\ndelete()\nbatch_write()"]
end
subgraph "Client Implementation"
WsClient["WebSocket Client\nAsync Methods"]
ConnState["Connection State\nURL, Options"]
end
subgraph "RPC Layer"
Serializer["Request Serialization\nbitcode"]
Caller["Service Caller\nCall Routing"]
Deserializer["Response Deserialization\nbitcode"]
end
Reader --> WsClient
Writer --> WsClient
WsClient --> ConnState
WsClient --> Serializer
Serializer --> Caller
Caller --> Deserializer
style Reader fill:#f9f9f9,stroke:#333,stroke-width:2px
style Writer fill:#f9f9f9,stroke:#333,stroke-width:2px
DataStoreReader: Read-only operations (read, exists, batch_read, iteration)DataStoreWriter: Write operations (write, delete, batch_write)async-trait: All methods are asynchronous, requiring a Tokio runtime
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:14-21 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-16
Connection Management
The client manages persistent WebSocket connections to the remote server with automatic reconnection and error handling.
Connection Lifecycle:
sequenceDiagram
participant App as "Application"
participant Client as "WebSocket Client"
participant Transport as "muxio-tokio-rpc-client"
participant Server as "Remote Server"
Note over App,Server: Connection Establishment
App->>Client: connect(url)
Client->>Transport: create WebSocket connection
Transport->>Server: WebSocket handshake
Server-->>Transport: connection established
Transport-->>Client: client ready
Client-->>App: connected client
Note over App,Server: Normal Operation
App->>Client: read(key)
Client->>Transport: serialize request
Transport->>Server: send via WebSocket
Server-->>Transport: response data
Transport-->>Client: deserialize response
Client-->>App: return result
Note over App,Server: Error Handling
Server-->>Transport: connection lost
Transport-->>Client: connection error
Client->>Transport: reconnection attempt
Transport->>Server: reconnect
- Initialization : Client connects to server URL with connection options
- Authentication : Optional authentication via Muxio RPC mechanisms
- Active State : Client maintains persistent WebSocket connection
- Error Recovery : Automatic reconnection on transient failures
- Shutdown : Graceful connection termination
Sources: Cargo.lock:1302-1318 experiments/simd-r-drive-ws-client/Cargo.toml:16-19
graph TB
subgraph "Client Side"
Method["Client Method Call\nread/write/delete"]
ReqBuilder["Request Builder\nCreate RPC Request"]
Serializer["bitcode Serialization\nBinary Encoding"]
Sender["WebSocket Send\nBinary Frame"]
end
subgraph "Network"
WsFrame["WebSocket Frame\nBinary Message"]
end
subgraph "Server Side"
Receiver["WebSocket Receive\nBinary Frame"]
Deserializer["bitcode Deserialization\nBinary Decoding"]
Handler["Request Handler\nExecute DataStore Operation"]
Response["Response Builder\nCreate RPC Response"]
end
Method --> ReqBuilder
ReqBuilder --> Serializer
Serializer --> Sender
Sender --> WsFrame
WsFrame --> Receiver
Receiver --> Deserializer
Deserializer --> Handler
Handler --> Response
Response --> Serializer
style Method fill:#f9f9f9,stroke:#333,stroke-width:2px
style Handler fill:#f9f9f9,stroke:#333,stroke-width:2px
Request-Response Flow
All client operations follow a standardized request-response pattern through the Muxio RPC framework.
Request Structure:
| Field | Type | Description |
|---|---|---|
| Method ID | u64 | XXH3 hash of method name from service definition |
| Payload | Vec | Bitcode-serialized request parameters |
| Request ID | u64 | Unique identifier for request-response matching |
Response Structure:
| Field | Type | Description |
|---|---|---|
| Request ID | u64 | Matches original request ID |
| Status | enum | Success, Error, or specific error codes |
| Payload | Vec | Bitcode-serialized response data or error |
Sources: experiments/simd-r-drive-muxio-service-definition/Cargo.toml:14-15 Cargo.lock:392-402
Async Runtime Requirements
The client requires a Tokio async runtime for all operations. The async-trait crate enables async methods in trait implementations.
Runtime Configuration:
graph TB
subgraph "Application"
Main["#[tokio::main]\nasync fn main()"]
UserCode["User Code\nawait client.read()"]
end
subgraph "Client"
AsyncMethods["async-trait Methods\nDataStoreReader/Writer"]
TokioTasks["Tokio Tasks\nNetwork I/O"]
end
subgraph "Tokio Runtime"
Executor["Task Executor\nWork Stealing Scheduler"]
Reactor["I/O Reactor\nepoll/kqueue/IOCP"]
end
Main --> UserCode
UserCode --> AsyncMethods
AsyncMethods --> TokioTasks
TokioTasks --> Executor
TokioTasks --> Reactor
style AsyncMethods fill:#f9f9f9,stroke:#333,stroke-width:2px
style Executor fill:#f9f9f9,stroke:#333,stroke-width:2px
- Multi-threaded Runtime : Default for concurrent operations
- Current-thread Runtime : Available for single-threaded use cases
- Feature Flags : Requires
tokiowithrt-multi-threadandnetfeatures
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:19-21 Cargo.lock:279-287
Error Handling
The client propagates errors from multiple layers of the stack, providing detailed error information for debugging and recovery.
Error Types:
| Error Category | Source | Description |
|---|---|---|
| Connection Errors | muxio-tokio-rpc-client | WebSocket connection failures, timeouts |
| Serialization Errors | bitcode | Invalid data encoding/decoding |
| RPC Errors | muxio-rpc-service | Service method errors, invalid requests |
| DataStore Errors | Remote DataStore | Storage operation failures (key not found, write errors) |
Error Propagation Flow:
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:17-18 Cargo.lock:1261-1271
Usage Patterns
Basic Connection and Operations
The client follows standard Rust async patterns for initialization and operation:
Concurrent Operations
The client supports concurrent operations through standard Tokio concurrency primitives:
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:19-21
graph TB
subgraph "Service Definition"
Methods["Method Definitions\nread, write, delete, etc."]
Types["Request/Response Types\nbitcode derive macros"]
MethodHash["Method ID Hashing\nXXH3 of method names"]
end
subgraph "Client Usage"
ClientImpl["Client Implementation\nUses defined methods"]
TypeSafety["Type Safety\nCompile-time checking"]
end
subgraph "Server Usage"
ServerImpl["Server Implementation\nHandles defined methods"]
Routing["Request Routing\nHash-based dispatch"]
end
Methods --> ClientImpl
Methods --> ServerImpl
Types --> ClientImpl
Types --> ServerImpl
MethodHash --> Routing
ClientImpl --> TypeSafety
style Methods fill:#f9f9f9,stroke:#333,stroke-width:2px
style Types fill:#f9f9f9,stroke:#333,stroke-width:2px
Integration with Service Definition
The client relies on the shared service definition crate for type-safe RPC communication.
Shared Contract Benefits:
- Type Safety : Compile-time verification of request/response types
- Version Compatibility : Client and server must use compatible service definitions
- Method Resolution : XXH3 hash-based method identification
- Serialization Schema : Consistent bitcode encoding across client and server
Sources: experiments/simd-r-drive-ws-client/Cargo.toml15 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-16
Performance Considerations
The native Rust client provides several performance advantages over alternative approaches:
Performance Characteristics:
| Aspect | Implementation | Benefit |
|---|---|---|
| Serialization | bitcode binary encoding | Minimal overhead, faster than JSON/MessagePack |
| Connection | Persistent WebSocket | Avoids HTTP handshake overhead |
| Async I/O | Tokio zero-copy operations | Efficient memory usage |
| Type Safety | Compile-time generics | Zero runtime type checking cost |
| Multiplexing | Muxio request pipelining | Multiple concurrent requests per connection |
Memory Efficiency:
- Zero-copy where possible through bitcode and WebSocket frames
- Efficient buffer reuse in Tokio's I/O layer
- Minimal allocation overhead compared to HTTP-based protocols
Throughput:
- Supports request pipelining for high-throughput workloads
- Concurrent operations through Tokio's work-stealing scheduler
- Batch operations reduce round-trip overhead
Sources: Cargo.lock:392-402 Cargo.lock:1302-1318
Comparison with Direct Access
The WebSocket client provides remote access with different tradeoffs compared to direct DataStore usage:
| Feature | Direct DataStore | WebSocket Client |
|---|---|---|
| Access Pattern | Local file I/O | Network I/O over WebSocket |
| Zero-Copy Reads | Yes (via mmap) | No (serialized over network) |
| Latency | Microseconds | Milliseconds (network dependent) |
| Concurrency | Multi-process safe | Network-limited |
| Deployment | Single machine | Distributed architecture |
| Security | File system permissions | Network authentication |
When to Use the Client:
- Remote access to centralized storage
- Microservice architectures requiring shared state
- Language interoperability (via Python bindings)
- Isolation of storage from compute workloads
When to Use Direct Access:
- Single-machine deployments
- Latency-critical applications
- Maximum throughput requirements
- Zero-copy read performance needs
Sources: experiments/simd-r-drive-ws-client/Cargo.toml14
Logging and Debugging
The client uses the tracing crate for structured logging and diagnostics.
Logging Levels:
- TRACE : Detailed RPC message contents and serialization
- DEBUG : Connection state changes, request/response flow
- INFO : Connection establishment, disconnection events
- WARN : Recoverable errors, retry attempts
- ERROR : Unrecoverable errors, connection failures
Diagnostic Information:
- Request/response timing
- Serialized message sizes
- Connection state transitions
- Error context and stack traces
Sources: experiments/simd-r-drive-ws-client/Cargo.toml20 Cargo.lock:279-287
Python Integration
Relevant source files
- experiments/bindings/python-ws-client/Cargo.lock
- experiments/bindings/python-ws-client/README.md
- experiments/bindings/python-ws-client/extract_readme_tests.py
- experiments/bindings/python-ws-client/integration_test.sh
- experiments/bindings/python-ws-client/pyproject.toml
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi
- experiments/bindings/python-ws-client/uv.lock
This page documents the Python bindings for SIMD R Drive, which provide a high-level Python interface to the storage engine via WebSocket RPC. The bindings are implemented in Rust using PyO3 and packaged as Python wheels using Maturin.
For details on the Python WebSocket Client API specifically, see Python WebSocket Client API. For information on building and distributing the Python package, see Building Python Bindings. For testing infrastructure, see Integration Testing.
Architecture Overview
The Python integration uses a multi-layer architecture that bridges Python code to the native Rust WebSocket client. The bindings are not pure Python—they are Rust code compiled to native extensions that Python can import.
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py:1-14 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:1-63
graph TB
subgraph "Python User Code"
UserScript["User Application\n*.py files"]
Imports["from simd_r_drive_ws_client import DataStoreWsClient"]
end
subgraph "Python Package Layer"
InitPy["__init__.py\nPackage Exports"]
DataStoreWsClientPy["data_store_ws_client.py\nDataStoreWsClient class"]
TypeStubs["data_store_ws_client.pyi\nType Annotations"]
end
subgraph "PyO3 Binding Layer"
RustModule["simd_r_drive_ws_client_py\nRust Binary Module\n.so / .pyd"]
BaseClass["BaseDataStoreWsClient\n#[pyclass]"]
NamespaceHasherClass["NamespaceHasher\n#[pyclass]"]
end
subgraph "Native Rust Implementation"
WsClient["simd-r-drive-ws-client\nNative WebSocket Client"]
MuxioRPC["muxio-tokio-rpc-client\nRPC Client Runtime"]
end
UserScript --> Imports
Imports --> InitPy
InitPy --> DataStoreWsClientPy
DataStoreWsClientPy --> BaseClass
DataStoreWsClientPy -.type hints.-> TypeStubs
BaseClass --> WsClient
NamespaceHasherClass --> WsClient
RustModule --> BaseClass
RustModule --> NamespaceHasherClass
WsClient --> MuxioRPC
The architecture consists of four distinct layers:
| Layer | Technology | Purpose |
|---|---|---|
| Python User Code | Pure Python | Application-level logic using the client |
| Python Package | Pure Python wrapper | Convenience methods and type annotations |
| PyO3 Bindings | Rust compiled to native extension | FFI bridge exposing Rust functionality |
| Native Implementation | Rust (simd-r-drive-ws-client) | WebSocket RPC client implementation |
Python API Surface
The Python package exposes two primary classes that users interact with: DataStoreWsClient and NamespaceHasher. The API is defined through a combination of Rust PyO3 bindings and Python wrapper code.
graph TB
subgraph "Python Space"
DSWsClient["DataStoreWsClient\nexperiments/.../data_store_ws_client.py"]
NSHasher["NamespaceHasher\nRust #[pyclass]"]
end
subgraph "Rust PyO3 Bindings"
BaseClient["BaseDataStoreWsClient\nRust #[pyclass]\nsrc/lib.rs"]
NSHasherRust["NamespaceHasher\nRust Implementation\nsrc/lib.rs"]
end
subgraph "Method Categories"
WriteOps["write()\nbatch_write()\ndelete()"]
ReadOps["read()\nbatch_read()\nexists()"]
MetaOps["__len__()\n__contains__()\nis_empty()\nfile_size()"]
PyOnlyOps["batch_read_structured()\nPure Python Logic"]
end
DSWsClient -->|inherits| BaseClient
NSHasher -.exposed as.-> NSHasherRust
BaseClient --> WriteOps
BaseClient --> ReadOps
BaseClient --> MetaOps
DSWsClient --> PyOnlyOps
Class Hierarchy and Implementation
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:11-63 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:8-219
Core Operations
The DataStoreWsClient provides the following operation categories:
| Operation Type | Methods | Implementation Location |
|---|---|---|
| Write Operations | write(), batch_write(), delete() | Rust (BaseDataStoreWsClient) |
| Read Operations | read(), batch_read(), exists() | Rust (BaseDataStoreWsClient) |
| Metadata Operations | __len__(), __contains__(), is_empty(), file_size() | Rust (BaseDataStoreWsClient) |
| Structured Reads | batch_read_structured() | Python (DataStoreWsClient) |
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:27-168
Python-Rust Method Mapping
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:12-62 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:27-129
The batch_read_structured() method is implemented entirely in Python as a convenience wrapper. It decompiles dictionaries or lists of dictionaries to extract a flat list of keys, calls the fast Rust batch_read() method, and then rebuilds the original structure with the fetched values.
PyO3 Binding Architecture
PyO3 provides the Foreign Function Interface (FFI) that allows Python to call Rust code. The bindings use PyO3 macros to expose Rust structs and methods as Python classes and functions.
graph TB
subgraph "Rust Source"
StructDef["#[pyclass]\npub struct BaseDataStoreWsClient"]
MethodsDef["#[pymethods]\nimpl BaseDataStoreWsClient"]
NSStruct["#[pyclass]\npub struct NamespaceHasher"]
NSMethods["#[pymethods]\nimpl NamespaceHasher"]
end
subgraph "PyO3 Macro Expansion"
PyClassMacro["PyClass trait implementation\nType conversion\nReference counting"]
PyMethodsMacro["Method wrappers\nArgument extraction\nReturn value conversion"]
end
subgraph "Python Module"
PythonClass["class BaseDataStoreWsClient:\n def write(...)\n def read(...)"]
PythonNS["class NamespaceHasher:\n def __init__(...)\n def namespace(...)"]
end
StructDef --> PyClassMacro
MethodsDef --> PyMethodsMacro
NSStruct --> PyClassMacro
NSMethods --> PyMethodsMacro
PyClassMacro --> PythonClass
PyMethodsMacro --> PythonClass
PyClassMacro --> PythonNS
PyMethodsMacro --> PythonNS
PyO3 Class Definitions
Sources: experiments/bindings/python-ws-client/Cargo.lock:832-846 experiments/bindings/python-ws-client/Cargo.lock:1096-1108
Async Runtime Bridge
The Python bindings use pyo3-async-runtimes to bridge Python's async/await with Rust's Tokio runtime. This allows Python code to use standard async/await syntax while the underlying operations are handled by Tokio.
Sources: experiments/bindings/python-ws-client/Cargo.lock:849-860
graph TB
subgraph "Python Async"
PyAsyncCall["await client.write(key, data)"]
PyEventLoop["asyncio.run()
or uvloop"]
end
subgraph "pyo3-async-runtimes Bridge"
Bridge["pyo3_async_runtimes::tokio\nFuture conversion"]
Runtime["LocalSet spawning\nBlock on future"]
end
subgraph "Rust Tokio"
TokioFuture["async fn write()\nTokio Future"]
TokioRuntime["Tokio Runtime\nThread Pool"]
end
PyAsyncCall --> PyEventLoop
PyEventLoop --> Bridge
Bridge --> Runtime
Runtime --> TokioFuture
TokioFuture --> TokioRuntime
The pyo3-async-runtimes crate handles the complexity of converting between Python's async protocol and Rust's Tokio futures, ensuring that async operations work seamlessly across the FFI boundary.
Build and Distribution System
The Python bindings are built using Maturin, which compiles the Rust code and packages it into Python wheels. The build system is configured through pyproject.toml and Cargo.toml.
graph LR
subgraph "Configuration Files"
PyProject["pyproject.toml\n[build-system]\nrequires = ['maturin>=1.5']"]
CargoToml["Cargo.toml\n[lib]\ncrate-type = ['cdylib']"]
end
subgraph "Maturin Build Process"
RustCompile["rustc compilation\n--crate-type=cdylib\nTarget: cpython extension"]
LinkPyO3["Link PyO3 runtime\nPython ABI"]
CreateWheel["Package .so/.pyd\nAdd metadata\nCreate .whl"]
end
subgraph "Distribution Artifacts"
Wheel["simd_r_drive_ws_client-*.whl\nPlatform-specific binary"]
PyPI["PyPI Registry\npip install simd-r-drive-ws-client"]
end
PyProject --> RustCompile
CargoToml --> RustCompile
RustCompile --> LinkPyO3
LinkPyO3 --> CreateWheel
CreateWheel --> Wheel
Wheel --> PyPI
Maturin Build Pipeline
Sources: experiments/bindings/python-ws-client/pyproject.toml:29-35
Build System Configuration
The build system is configured through several key sections in pyproject.toml:
| Configuration | Location | Purpose |
|---|---|---|
[build-system] | pyproject.toml:29-31 | Specifies Maturin as build backend |
[tool.maturin] | pyproject.toml:33-35 | Maturin-specific settings (bindings, Python version) |
[project] | pyproject.toml:1-27 | Package metadata for PyPI |
[dependency-groups] | pyproject.toml:37-46 | Development dependencies |
Sources: experiments/bindings/python-ws-client/pyproject.toml:1-47
Supported Python Versions and Platforms
The package supports Python 3.10 through 3.13 on multiple platforms:
Sources: experiments/bindings/python-ws-client/pyproject.toml:7-27 experiments/bindings/python-ws-client/README.md:18-23
graph TB
subgraph "Runtime Dependencies"
PyO3Runtime["PyO3 Runtime\nEmbedded in .whl"]
RustDeps["Rust Dependencies\nsimd-r-drive-ws-client\ntokio, muxio-*"]
end
subgraph "Development Dependencies"
Maturin["maturin>=1.8.7\nBuild backend"]
MyPy["mypy>=1.16.1\nType checking"]
Pytest["pytest>=8.4.1\nTesting framework"]
NumPy["numpy>=2.2.6\nTesting utilities"]
Other["puccinialin, pytest-benchmark, pytest-order"]
end
subgraph "Lock Files"
UvLock["uv.lock\nPython dependency tree"]
CargoLock["Cargo.lock\nRust dependency tree"]
end
Maturin --> RustDeps
RustDeps --> CargoLock
MyPy --> UvLock
Pytest --> UvLock
NumPy --> UvLock
Other --> UvLock
Dependency Management
The Python bindings use uv for fast dependency resolution and management. Dependencies are split into runtime (minimal) and development dependencies.
Dependency Structure
Sources: experiments/bindings/python-ws-client/pyproject.toml:37-46 experiments/bindings/python-ws-client/Cargo.lock:1-1380 experiments/bindings/python-ws-client/uv.lock:1-299
The runtime has minimal Python dependencies (essentially none beyond the Python interpreter), as all dependencies are statically compiled into the binary wheel. Development dependencies include testing, type checking, and build tools.
Type Stubs and IDE Support
The package includes comprehensive type stubs (.pyi files) that provide full type information for IDEs and type checkers like MyPy.
graph TB
subgraph "Type Stub File"
StubImports["from typing import Optional, Union, Dict, Any, List\nfrom .simd_r_drive_ws_client import BaseDataStoreWsClient"]
ClientStub["@final\nclass DataStoreWsClient(BaseDataStoreWsClient):\n def __init__(self, host: str, port: int) -> None\n def write(self, key: bytes, data: bytes) -> None\n ..."]
NSStub["@final\nclass NamespaceHasher:\n def __init__(self, prefix: bytes) -> None\n def namespace(self, key: bytes) -> bytes"]
end
subgraph "IDE Features"
Autocomplete["Auto-completion\nMethod signatures"]
TypeCheck["Type checking\nmypy validation"]
Docstrings["Inline documentation\nMethod descriptions"]
end
StubImports --> ClientStub
StubImports --> NSStub
ClientStub --> Autocomplete
ClientStub --> TypeCheck
ClientStub --> Docstrings
NSStub --> Autocomplete
NSStub --> TypeCheck
NSStub --> Docstrings
Type Stub Structure
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219
The type stubs include:
- Full method signatures with type annotations
- Comprehensive docstrings explaining each method
- Generic type support for structured operations
@finaldecorators to prevent subclassing
graph LR
subgraph "Namespace Creation"
Prefix["prefix = b'users'"]
PrefixHash["XXH3(prefix)\n→ 8 bytes"]
end
subgraph "Key Hashing"
Key["key = b'user123'"]
KeyHash["XXH3(key)\n→ 8 bytes"]
end
subgraph "Namespaced Key"
Combined["prefix_hash // key_hash\n16 bytes total"]
end
Prefix --> PrefixHash
Key --> KeyHash
PrefixHash --> Combined
KeyHash --> Combined
NamespaceHasher Utility
The NamespaceHasher class provides deterministic key namespacing using XXH3 hashing. It ensures keys are scoped to specific namespaces, preventing collisions across logical domains.
Namespacing Mechanism
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:170-219
Usage Pattern
The NamespaceHasher is typically used as follows:
| Step | Code | Result |
|---|---|---|
| 1. Create hasher | hasher = NamespaceHasher(b"users") | Hasher scoped to "users" namespace |
| 2. Generate key | key = hasher.namespace(b"user123") | 16-byte namespaced key |
| 3. Store data | client.write(key, data) | Data stored under namespaced key |
| 4. Read data | client.read(key) | Data retrieved from namespaced key |
This pattern ensures that keys like b"settings" in the b"users" namespace don't collide with b"settings" in the b"system" namespace.
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:182-218
graph TB
subgraph "Test Script: integration_test.sh"
Setup["1. Navigate to experiments/\n2. Build server if needed"]
StartServer["3. cargo run --package simd-r-drive-ws-server\nBackground process\nPID captured"]
SetupPython["4. uv venv\n5. uv pip install pytest maturin\n6. uv pip install -e . --group dev"]
ExtractTests["7. extract_readme_tests.py\nExtract code blocks from README.md"]
RunPytest["8. pytest -v -s\nTEST_SERVER_HOST=$SERVER_HOST\nTEST_SERVER_PORT=$SERVER_PORT"]
Cleanup["9. kill -9 $SERVER_PID\n10. rm /tmp/simd-r-drive-pytest-storage.bin"]
end
Setup --> StartServer
StartServer --> SetupPython
SetupPython --> ExtractTests
ExtractTests --> RunPytest
RunPytest --> Cleanup
Integration Test Infrastructure
The Python bindings include a comprehensive integration test suite that validates the entire stack from Python user code down to the WebSocket server.
Test Workflow
Sources: experiments/bindings/python-ws-client/integration_test.sh:1-91
Test Categories
The test infrastructure includes multiple test sources:
| Test Source | Purpose | Generated By |
|---|---|---|
tests/test_readme_blocks.py | Validates README examples | extract_readme_tests.py |
| Other test files | Unit and integration tests | Manual test authoring |
| Pytest fixtures | Setup/teardown infrastructure | Pytest framework |
Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:1-46
graph LR
subgraph "Input"
README["README.md\n```python code blocks"]
end
subgraph "Extraction Process"
Regex["Regex: r'```python\\n(.*?)```'"]
Parse["Extract all Python blocks"]
Wrap["Wrap each block in\ndef test_readme_block_N():\n ..."]
end
subgraph "Output"
TestFile["tests/test_readme_blocks.py\ntest_readme_block_0()\ntest_readme_block_1()\n..."]
end
README --> Regex
Regex --> Parse
Parse --> Wrap
Wrap --> TestFile
README Test Extraction
The extract_readme_tests.py script automatically converts Python code blocks from the README into pytest test functions:
Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:14-45
This ensures that all code examples in the README are automatically tested, preventing documentation drift from the actual API behavior.
graph TB
subgraph "Internal Modules"
RustBinary["simd_r_drive_ws_client_py.so/.pyd\nBinary compiled module"]
RustSymbols["BaseDataStoreWsClient\nNamespaceHasher\nsetup_logging\ntest_rust_logging"]
PythonWrapper["data_store_ws_client.py\nDataStoreWsClient"]
end
subgraph "Package __init__.py"
ImportRust["from .simd_r_drive_ws_client import\n setup_logging, test_rust_logging"]
ImportPython["from .data_store_ws_client import\n DataStoreWsClient, NamespaceHasher"]
AllList["__all__ = [\n 'DataStoreWsClient',\n 'NamespaceHasher',\n 'setup_logging',\n 'test_rust_logging'\n]"]
end
subgraph "Public API"
UserCode["from simd_r_drive_ws_client import DataStoreWsClient"]
end
RustBinary --> RustSymbols
RustSymbols --> ImportRust
PythonWrapper --> ImportPython
ImportRust --> AllList
ImportPython --> AllList
AllList --> UserCode
Package Exports and Public API
The package's public API is defined through the __init__.py file, which controls what symbols are available when users import the package.
Export Structure
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py:1-14
The __all__ list explicitly defines the public API surface, preventing internal implementation details from being accidentally imported by users. This follows Python best practices for package design.
Python WebSocket Client API
Relevant source files
- experiments/bindings/python-ws-client/Cargo.lock
- experiments/bindings/python-ws-client/pyproject.toml
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi
Purpose and Scope
This document describes the Python WebSocket client API for remote access to SIMD R Drive storage. The API provides idiomatic Python interfaces backed by high-performance Rust implementations via PyO3 bindings. This page covers the DataStoreWsClient class, NamespaceHasher utility, and their usage patterns.
For information about building and installing the Python bindings, see Building Python Bindings. For details about the underlying native Rust WebSocket client, see Native Rust Client. For server-side configuration and deployment, see WebSocket Server.
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py:1-14 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219
Architecture Overview
The Python WebSocket client uses a multi-layer architecture that bridges Python's async/await with Rust's Tokio runtime while maintaining idiomatic Python APIs.
graph TB
UserCode["Python User Code\nimport simd_r_drive_ws_client"]
DataStoreWsClient["DataStoreWsClient\nPython wrapper class"]
BaseDataStoreWsClient["BaseDataStoreWsClient\nPyO3 #[pyclass]"]
PyO3FFI["PyO3 FFI Layer\npyo3-async-runtimes"]
RustClient["simd-r-drive-ws-client\nNative Rust implementation"]
MuxioRPC["muxio-tokio-rpc-client\nWebSocket + RPC"]
Server["simd-r-drive-ws-server\nRemote DataStore"]
UserCode --> DataStoreWsClient
DataStoreWsClient --> BaseDataStoreWsClient
BaseDataStoreWsClient --> PyO3FFI
PyO3FFI --> RustClient
RustClient --> MuxioRPC
MuxioRPC --> Server
Python Integration Stack
Sources: experiments/bindings/python-ws-client/Cargo.lock:1096-1108 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:1-10
Class Hierarchy
The BaseDataStoreWsClient class is implemented in Rust and exposes core storage operations through PyO3. The DataStoreWsClient Python class extends it with additional convenience methods implemented in pure Python.
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-10 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:11-62
DataStoreWsClient Class
The DataStoreWsClient class provides the primary interface for interacting with a remote SIMD R Drive storage engine over WebSocket connections.
Connection Initialization
| Constructor | Description |
|---|---|
__init__(host: str, port: int) | Establishes WebSocket connection to the specified server |
The constructor creates a WebSocket connection to the remote storage server. Connection establishment is synchronous and will raise an exception if the server is unreachable.
Example:
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:17-25
Write Operations
| Method | Parameters | Description |
|---|---|---|
write(key, data) | key: bytes, data: bytes | Appends single key/value pair |
batch_write(items) | items: list[tuple[bytes, bytes]] | Writes multiple pairs in one operation |
Write operations are append-only and atomic. If a key already exists, writing to it creates a new version while the old data remains on disk (marked as superseded via the index).
Example:
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:27-51
Read Operations
| Method | Parameters | Return Type | Copy Behavior |
|---|---|---|---|
read(key) | key: bytes | Optional[bytes] | Performs memory copy |
batch_read(keys) | keys: list[bytes] | list[Optional[bytes]] | Performs memory copy |
batch_read_structured(data) | data: dict or list[dict] | Same structure with values | Python-side wrapper |
The read and batch_read methods perform memory copies when returning data. For zero-copy access patterns, the native Rust client provides read_entry methods that return memory-mapped views.
The batch_read_structured method is a Python convenience wrapper that:
- Accepts dictionaries or lists of dictionaries where values are datastore keys
- Flattens the structure into a single key list
- Calls
batch_readfor efficient parallel fetching - Reconstructs the original structure with fetched values
Example:
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:79-129 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:12-62
Deletion and Existence Checks
| Method | Parameters | Return Type | Description |
|---|---|---|---|
delete(key) | key: bytes | None | Marks key as deleted (tombstone) |
exists(key) | key: bytes | bool | Checks if key is active |
__contains__(key) | key: bytes | bool | Python in operator support |
Deletion is logical, not physical. The delete method appends a tombstone entry to the storage file. The physical data remains on disk but is no longer accessible through reads.
Example:
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:53-77 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:131-141
Utility Methods
| Method | Return Type | Description |
|---|---|---|
__len__() | int | Returns count of active entries |
is_empty() | bool | Checks if store has any active keys |
file_size() | int | Returns physical file size on disk |
Example:
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:143-168
NamespaceHasher Utility
The NamespaceHasher class provides deterministic key namespacing using XXH3 hashing to prevent key collisions across logical domains.
graph LR
Input1["Namespace prefix\ne.g., b'users'"]
Input2["Key\ne.g., b'user123'"]
Hash1["XXH3 hash\n8 bytes"]
Hash2["XXH3 hash\n8 bytes"]
Output["Namespaced key\n16 bytes total"]
Input1 -->|hash once at init| Hash1
Input2 -->|hash per call| Hash2
Hash1 --> Output
Hash2 --> Output
Architecture
Usage Pattern
| Method | Parameters | Return Type | Description |
|---|---|---|---|
__init__(prefix) | prefix: bytes | N/A | Initializes hasher with namespace |
namespace(key) | key: bytes | bytes | Returns 16-byte namespaced key |
The output key structure is:
- Bytes 0-7: XXH3 hash of namespace prefix
- Bytes 8-15: XXH3 hash of input key
This design ensures:
- Deterministic key generation (same input → same output)
- Collision isolation between namespaces
- Fixed-length keys regardless of input size
Example:
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:170-219
graph TB
Stub["data_store_ws_client.pyi\nType definitions"]
Impl["data_store_ws_client.py\nImplementation"]
Base["simd_r_drive_ws_client\nCompiled Rust module"]
Stub -.->|describes| Impl
Impl -->|imports from| Base
Stub -.->|describes| Base
Type Stubs and IDE Support
The package provides complete type stubs for IDE integration and static type checking.
Type Stub Structure
The type stubs (experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219) provide:
- Full method signatures with type annotations
- Return type information (
Optional[bytes],list[Optional[bytes]], etc.) - Docstrings for IDE hover documentation
@finaldecorators indicating classes cannot be subclassed
Type Checking Example
Python Version Support:
The package targets Python 3.10-3.13 as specified in experiments/bindings/python-ws-client/pyproject.toml7 and experiments/bindings/python-ws-client/pyproject.toml:21-24
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219 experiments/bindings/python-ws-client/pyproject.toml7 experiments/bindings/python-ws-client/pyproject.toml:19-27
graph TB
PythonMain["Python Main Thread\nSynchronous API calls"]
PyO3["PyO3 Bridge\npyo3-async-runtimes"]
TokioRT["Tokio Runtime\nAsync event loop"]
WSClient["WebSocket Client\ntokio-tungstenite"]
PythonMain -->|sync call| PyO3
PyO3 -->|spawn + block_on| TokioRT
TokioRT --> WSClient
WSClient -.->|result| TokioRT
TokioRT -.->|return| PyO3
PyO3 -.->|return| PythonMain
Async Runtime Bridging
The client uses pyo3-async-runtimes to bridge Python's async/await with Rust's Tokio runtime. This allows the underlying Rust WebSocket client to use native async I/O while exposing synchronous APIs to Python.
Runtime Architecture
The pyo3-async-runtimes crate (experiments/bindings/python-ws-client/Cargo.lock:849-860) provides:
- Runtime spawning: Manages Tokio runtime lifecycle
- Future blocking: Converts Rust async operations to Python-blocking calls
- Thread safety: Ensures proper synchronization between Python GIL and Rust runtime
This design allows Python code to use simple synchronous APIs while benefiting from Rust's high-performance async networking under the hood.
Sources: experiments/bindings/python-ws-client/Cargo.lock:849-860 experiments/bindings/python-ws-client/Cargo.lock:1096-1108
API Summary
Complete Method Reference
| Category | Method | Parameters | Return | Description |
|---|---|---|---|---|
| Connection | __init__ | host: str, port: int | N/A | Establish WebSocket connection |
| Write | write | key: bytes, data: bytes | None | Write single entry |
| Write | batch_write | items: list[tuple[bytes, bytes]] | None | Write multiple entries |
| Read | read | key: bytes | Optional[bytes] | Read single entry (copies) |
| Read | batch_read | keys: list[bytes] | list[Optional[bytes]] | Read multiple entries |
| Read | batch_read_structured | data: dict or list[dict] | Same structure | Read with structure preservation |
| Delete | delete | key: bytes | None | Mark key as deleted |
| Query | exists | key: bytes | bool | Check key existence |
| Query | __contains__ | key: bytes | bool | Python in operator |
| Info | __len__ | N/A | int | Active entry count |
| Info | is_empty | N/A | bool | Check if empty |
| Info | file_size | N/A | int | Physical file size |
NamespaceHasher Reference
| Method | Parameters | Return | Description |
|---|---|---|---|
__init__ | prefix: bytes | N/A | Initialize namespace |
namespace | key: bytes | bytes | Generate 16-byte namespaced key |
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219
Building Python Bindings
Relevant source files
- experiments/bindings/python-ws-client/README.md
- experiments/bindings/python-ws-client/extract_readme_tests.py
- experiments/bindings/python-ws-client/integration_test.sh
- experiments/bindings/python-ws-client/pyproject.toml
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi
- experiments/bindings/python-ws-client/uv.lock
Purpose and Scope
This page documents the build system and tooling required to compile the Python bindings for SIMD R Drive. It covers PyO3 integration, the Maturin build backend, installation procedures, dependency management with uv, and supported Python versions. For information about using the Python client API, see Python WebSocket Client API. For integration testing procedures, see Integration Testing.
Overview of the Build Architecture
The Python bindings are implemented in Rust using PyO3, which provides Foreign Function Interface (FFI) between Rust and Python. The build process transforms Rust code into platform-specific Python wheels using Maturin, a specialized build backend designed for Rust-Python projects.
graph TB
subgraph "Source Code"
RustSrc["Rust Implementation\nBaseDataStoreWsClient"]
PyO3Attrs["PyO3 Attributes\n#[pyclass], #[pymethods]"]
CargoToml["Cargo.toml\ncrate-type = cdylib\npyo3 dependency"]
end
subgraph "Build Configuration"
PyProject["pyproject.toml\nbuild-backend = maturin\nbindings = pyo3"]
Manifest["Package Metadata\nversion, requires-python\nclassifiers"]
end
subgraph "Build Process"
Maturin["maturin build\nor\nmaturin develop"]
RustCompiler["rustc + cargo\nCompile to .so/.dylib/.dll"]
FFILayer["PyO3 FFI Bridge\nAuto-generated C bindings"]
end
subgraph "Output Artifacts"
Wheel["Python Wheel\n.whl package"]
SharedLib["Native Library\nsimd_r_drive_ws_client.so"]
PyStub["Type Stubs\ndata_store_ws_client.pyi"]
end
RustSrc --> Maturin
PyO3Attrs --> Maturin
CargoToml --> Maturin
PyProject --> Maturin
Manifest --> Maturin
Maturin --> RustCompiler
RustCompiler --> FFILayer
FFILayer --> SharedLib
SharedLib --> Wheel
PyStub --> Wheel
Wheel -->|pip install| UserEnv["User Python Environment"]
Build Pipeline Architecture
Sources: experiments/bindings/python-ws-client/pyproject.toml:1-46 experiments/bindings/python-ws-client/README.md:26-38
PyO3 Integration
PyO3 is the Rust library that enables writing Python native modules in Rust. It provides macros and traits for exposing Rust structs and functions to Python.
PyO3 Configuration
The PyO3 bindings type is declared in the Maturin configuration:
| Configuration File | Setting | Value |
|---|---|---|
pyproject.toml | tool.maturin.bindings | "pyo3" |
pyproject.toml | tool.maturin.requires-python | ">=3.10" |
pyproject.toml | build-system.requires | ["maturin>=1.5"] |
pyproject.toml | build-system.build-backend | "maturin" |
Sources: experiments/bindings/python-ws-client/pyproject.toml:29-35
Python Module Structure
The binding layer exposes Rust types as Python classes:
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py:1-13 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:1-63 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219
Maturin Build System
Maturin is a build tool for creating Python wheels from Rust projects. It handles cross-compilation, ABI compatibility, and packaging.
Build Commands
| Command | Purpose | When to Use |
|---|---|---|
maturin build | Compile release wheel | CI/CD pipelines, distribution |
maturin build --release | Optimized build | Performance-critical builds |
maturin develop | Development install | Local iteration, testing |
maturin develop --release | Fast local testing | Performance validation |
Maturin Configuration Details
Sources: experiments/bindings/python-ws-client/pyproject.toml:29-35 experiments/bindings/python-ws-client/README.md:31-38
Installation Methods
Method 1: Install from PyPI (Production)
This installs a pre-built wheel. No Rust toolchain required.
Sources: experiments/bindings/python-ws-client/README.md:26-29
Method 2: Build from Source (Development)
Requires Rust toolchain and Maturin:
The -m flag specifies the manifest path when building from outside the package directory.
Sources: experiments/bindings/python-ws-client/README.md:31-36
Method 3: Development Installation with uv
Using uv for dependency management:
This installs the package in editable mode with development dependencies.
Sources: experiments/bindings/python-ws-client/integration_test.sh:70-77
Installation Flow Comparison
Sources: experiments/bindings/python-ws-client/README.md:25-38 experiments/bindings/python-ws-client/integration_test.sh:70-77
Dependency Management with uv
The project uses uv, a fast Python package installer and resolver, for development dependencies. Unlike traditional requirements.txt or setup.py, dependencies are declared in pyproject.toml using PEP 735 dependency groups.
Dependency Group Structure
| Group | Purpose | Dependencies |
|---|---|---|
dev | Development tools | maturin, mypy, numpy, pytest, pytest-benchmark, pytest-order, puccinialin |
Sources: experiments/bindings/python-ws-client/pyproject.toml:37-46
uv Lockfile
The uv.lock file pins exact versions and hashes for reproducible builds. This lockfile includes:
- Direct dependencies (e.g.,
maturin>=1.8.7) - Transitive dependencies (e.g.,
certifi,httpcore) - Platform-specific wheels
- Python version markers
Sources: experiments/bindings/python-ws-client/uv.lock:1-1036
uv Workflow in CI/Testing
The integration test script demonstrates the uv workflow:
Sources: experiments/bindings/python-ws-client/integration_test.sh:62-87
Supported Python Versions and Platforms
Python Version Matrix
The bindings support Python 3.10 through 3.13 (CPython only):
| Python Version | Support Status | Notes |
|---|---|---|
| 3.10 | ✓ Supported | Minimum version |
| 3.11 | ✓ Supported | Full support |
| 3.12 | ✓ Supported | Full support |
| 3.13 | ✓ Supported | Latest tested |
| PyPy | ✗ Not supported | CPython-only due to PyO3 limitations |
Sources: experiments/bindings/python-ws-client/pyproject.toml:7-24 experiments/bindings/python-ws-client/README.md:17-24
Platform Support
| Platform | Architecture | Status | Notes |
|---|---|---|---|
| Linux | x86_64 | ✓ Supported | Primary platform |
| Linux | aarch64 | ✓ Supported | ARM64 support |
| macOS | x86_64 | ✓ Supported | Intel Macs |
| macOS | arm64 | ✓ Supported | Apple Silicon |
| Windows | x86_64 | ✓ Supported | 64-bit only |
Sources: experiments/bindings/python-ws-client/pyproject.toml:25-27 experiments/bindings/python-ws-client/uv.lock:117-129
Version Constraints in pyproject.toml
The project metadata declares supported versions for PyPI classifiers:
Programming Language :: Python :: 3.10
Programming Language :: Python :: 3.11
Programming Language :: Python :: 3.12
Programming Language :: Python :: 3.13
Sources: experiments/bindings/python-ws-client/pyproject.toml:21-24
Build Process Deep Dive
File Transformation Pipeline
Sources: experiments/bindings/python-ws-client/pyproject.toml:1-46
Build Artifacts Location
After maturin develop, the installed package structure looks like:
site-packages/
├── simd_r_drive_ws_client/
│ ├── __init__.py # Python wrapper
│ ├── data_store_ws_client.py # Extended API
│ ├── data_store_ws_client.pyi # Type stubs
│ ├── simd_r_drive_ws_client.so # Native library (Linux)
│ └── simd_r_drive_ws_client.cpython-*.so
└── simd_r_drive_ws_client-0.11.1.dist-info/
├── METADATA # Package metadata
├── RECORD # File integrity
└── WHEEL # Wheel format version
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py:1-13 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:1-63 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219
CI/CD Integration
The CI build recipe referenced in the README demonstrates automated wheel building. The GitHub Actions workflow typically:
- Sets up multiple Python versions (3.10-3.13)
- Installs Rust toolchain
- Installs Maturin
- Builds wheels for all platforms using
maturin build --release - Uploads wheels as artifacts
For full details, see the CI/CD Pipeline documentation.
Sources: experiments/bindings/python-ws-client/README.md38
Troubleshooting Common Build Issues
| Issue | Cause | Solution |
|---|---|---|
maturin: command not found | Maturin not installed | pip install maturin or use uv pip install maturin |
rustc: command not found | Rust toolchain missing | Install from https://rustup.rs |
error: Python version mismatch | Wrong Python version active | Ensure Python ≥3.10 in path |
ImportError: cannot import name | Stale build | rm -rf target/ then rebuild |
ModuleNotFoundError: No module named 'simd_r_drive_ws_client' | Not installed | Run maturin develop or pip install -e . |
Sources: experiments/bindings/python-ws-client/integration_test.sh:62-68
Integration Testing
Relevant source files
- experiments/bindings/python-ws-client/README.md
- experiments/bindings/python-ws-client/extract_readme_tests.py
- experiments/bindings/python-ws-client/integration_test.sh
- experiments/bindings/python-ws-client/pyproject.toml
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi
- experiments/bindings/python-ws-client/uv.lock
Purpose and Scope
This document describes the integration testing infrastructure for the Python WebSocket client bindings. The testing system validates end-to-end functionality by orchestrating a complete test environment: starting a SIMD R Drive WebSocket server, extracting test cases from documentation, and executing them against the live server.
For information about the Python client API itself, see Python WebSocket Client API. For build system details, see Building Python Bindings.
Integration Test Architecture
The integration testing system consists of three primary components: a shell orchestrator that manages the test lifecycle, a Python script that extracts executable examples from documentation, and the pytest framework that executes the tests.
Sources: experiments/bindings/python-ws-client/integration_test.sh:1-90 experiments/bindings/python-ws-client/extract_readme_tests.py:1-45
graph TB
subgraph "Test Orchestration"
Script["integration_test.sh\nBash Orchestrator"]
Cleanup["cleanup()\nTrap Handler"]
end
subgraph "Server Management"
CargoRun["cargo run\n--package simd-r-drive-ws-server"]
Server["simd-r-drive-ws-server\nProcess (PID tracked)"]
Storage["/tmp/simd-r-drive-pytest-storage.bin"]
end
subgraph "Test Generation"
ExtractScript["extract_readme_tests.py"]
ReadmeMd["README.md\nPython code blocks"]
TestFile["tests/test_readme_blocks.py\nGenerated pytest functions"]
end
subgraph "Test Execution"
UvRun["uv run pytest"]
Pytest["pytest framework"]
TestCases["test_readme_block_0()\ntest_readme_block_1()\n..."]
end
subgraph "Client Under Test"
Client["DataStoreWsClient"]
WsConnection["WebSocket Connection\n127.0.0.1:34129"]
end
Script --> CargoRun
CargoRun --> Server
Server --> Storage
Script --> ExtractScript
ExtractScript --> ReadmeMd
ReadmeMd --> TestFile
Script --> UvRun
UvRun --> Pytest
Pytest --> TestCases
TestCases --> Client
Client --> WsConnection
WsConnection --> Server
Script --> Cleanup
Cleanup -.->|kill -9 -$SERVER_PID| Server
Cleanup -.->|rm -f| Storage
Test Workflow Orchestration
The integration_test.sh script manages the complete test lifecycle through a series of coordinated steps. The script implements robust cleanup handling using shell traps to ensure resources are released even if tests fail.
sequenceDiagram
participant Script as integration_test.sh
participant Trap as cleanup() trap
participant Cargo as cargo run
participant Server as simd-r-drive-ws-server
participant UV as uv (Python)
participant Extract as extract_readme_tests.py
participant Pytest as pytest
Script->>Script: set -e (fail on error)
Script->>Trap: trap cleanup EXIT
Script->>Script: cd experiments/
Script->>Cargo: cargo run --package simd-r-drive-ws-server
Cargo->>Server: start background process
Server-->>Script: return PID
Script->>Script: SERVER_PID=$!
Script->>UV: uv venv
Script->>UV: uv pip install pytest maturin
Script->>UV: uv pip install -e . --group dev
Script->>Extract: uv run extract_readme_tests.py
Extract-->>Script: generate test_readme_blocks.py
Script->>Script: export TEST_SERVER_HOST/PORT
Script->>Pytest: uv run pytest -v -s
Pytest->>Server: test connections
Server-->>Pytest: responses
Pytest-->>Script: test results
Script->>Trap: EXIT signal
Trap->>Server: kill -9 -$SERVER_PID
Trap->>Script: rm -f /tmp/simd-r-drive-pytest-storage.bin
Workflow Sequence
Sources: experiments/bindings/python-ws-client/integration_test.sh:1-90
Configuration Variables
The script defines several configuration variables at the top for easy modification:
| Variable | Default Value | Purpose |
|---|---|---|
EXPERIMENTS_DIR_REL_PATH | ../../ | Relative path to experiments root |
SERVER_PACKAGE_NAME | simd-r-drive-ws-server | Cargo package name |
STORAGE_FILE | /tmp/simd-r-drive-pytest-storage.bin | Temporary storage file path |
SERVER_HOST | 127.0.0.1 | Server bind address |
SERVER_PORT | 34129 | Server listen port |
SERVER_PID | (runtime) | Process ID for cleanup |
Sources: experiments/bindings/python-ws-client/integration_test.sh:8-15
Cleanup Mechanism
The cleanup function ensures proper resource release regardless of test outcome. It uses process group termination to kill the server and all child processes:
graph LR
Exit["Script Exit\n(success or failure)"]
Trap["trap cleanup EXIT"]
Cleanup["cleanup()
function"]
KillServer["kill -9 -$SERVER_PID"]
RemoveFile["rm -f $STORAGE_FILE"]
Exit --> Trap
Trap --> Cleanup
Cleanup --> KillServer
Cleanup --> RemoveFile
The kill -9 "-$SERVER_PID" syntax terminates the entire process group (note the minus sign prefix), ensuring the server and any spawned threads are fully stopped. The 2>/dev/null || true pattern prevents errors if the process already exited.
Sources: experiments/bindings/python-ws-client/integration_test.sh:17-30
graph TB
ReadmeMd["README.md"]
ExtractScript["extract_readme_tests.py"]
subgraph "Extraction Functions"
ExtractBlocks["extract_python_blocks()\nregex: ```python...```"]
StripNonAscii["strip_non_ascii()\nRemove non-ASCII chars"]
WrapTest["wrap_as_test_fn()\nGenerate pytest function"]
end
subgraph "Generated Output"
TestFile["tests/test_readme_blocks.py"]
TestFn0["def test_readme_block_0():\n # extracted code"]
TestFn1["def test_readme_block_1():\n # extracted code"]
TestFnN["def test_readme_block_N():\n # extracted code"]
end
ReadmeMd --> ExtractScript
ExtractScript --> ExtractBlocks
ExtractBlocks --> StripNonAscii
StripNonAscii --> WrapTest
WrapTest --> TestFile
TestFile --> TestFn0
TestFile --> TestFn1
TestFile --> TestFnN
README Test Extraction
The extract_readme_tests.py script automates the conversion of documentation examples into executable test cases. This approach ensures that code examples in the README remain accurate and functional.
Extraction Process
Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:1-45
Extraction Implementation
The extraction process follows three steps:
-
Pattern Matching : Uses regex to find all ````python` fenced code blocks
- Pattern:
r"```python\n(.*?)```"withre.DOTALLflag - Returns list of code block strings
- Pattern:
-
Text Sanitization : Removes non-ASCII characters to prevent encoding issues
- Uses
encode("ascii", errors="ignore").decode("ascii") - Ensures clean Python code for pytest execution
- Uses
-
Test Function Generation : Wraps each code block in a pytest-compatible function
- Function naming:
test_readme_block_{idx}() - Indentation: 4 spaces for function body
- Empty blocks generate
passstatement
- Function naming:
Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:21-34
Example Transformation
Given a README code block:
The script generates:
Sources: experiments/bindings/python-ws-client/README.md:42-51 experiments/bindings/python-ws-client/extract_readme_tests.py:30-34
Test Environment Setup
The integration test uses uv as the Python package manager for fast, reliable dependency management. The environment setup occurs after server startup but before test execution.
Environment Setup Sequence
| Step | Command | Purpose |
|---|---|---|
| 1 | uv venv | Create isolated virtual environment |
| 2 | uv pip install pytest maturin | Install core test dependencies |
| 3 | uv pip install -e . --group dev | Install package in editable mode with dev dependencies |
| 4 | uv run extract_readme_tests.py | Generate test cases from README |
| 5 | export TEST_SERVER_HOST/PORT | Set environment variables for tests |
| 6 | uv run pytest -v -s | Execute test suite |
Sources: experiments/bindings/python-ws-client/integration_test.sh:62-87
Development Dependencies
The project defines development dependencies in pyproject.toml under the [dependency-groups] section:
| Dependency | Version Constraint | Purpose |
|---|---|---|
maturin | >=1.8.7 | Build Rust-Python bindings |
mypy | >=1.16.1 | Static type checking |
numpy | >=2.2.6 | Array operations (test fixtures) |
puccinialin | >=0.1.5 | Test utilities |
pytest | >=8.4.1 | Test framework |
pytest-benchmark | >=5.1.0 | Performance benchmarking |
pytest-order | >=1.3.0 | Test execution ordering |
Sources: experiments/bindings/python-ws-client/pyproject.toml:37-46
graph LR
Script["integration_test.sh"]
SetM["set -m\n(enable job control)"]
CargoRun["cargo run --package simd-r-drive-ws-server\n-- /tmp/storage.bin --host 127.0.0.1 --port 34129 &"]
CapturePid["SERVER_PID=$!"]
UnsetM["set +m\n(disable job control)"]
Script --> SetM
SetM --> CargoRun
CargoRun --> CapturePid
CapturePid --> UnsetM
Server Lifecycle Management
The integration test script manages the server as a background process with careful PID tracking and cleanup handling.
Server Startup
The server starts with the following configuration:
- Package :
simd-r-drive-ws-server(built via cargo) - Storage File :
/tmp/simd-r-drive-pytest-storage.bin - Host :
127.0.0.1(localhost only) - Port :
34129(test-specific port) - Mode : Background process (
&suffix)
The set -m and set +m wrapper enables job control temporarily, ensuring the shell correctly captures the background process PID with $!.
Sources: experiments/bindings/python-ws-client/integration_test.sh:47-56
Server Shutdown
The cleanup trap ensures the server terminates cleanly:
- Check PID Exists :
[[ ! -z "$SERVER_PID" ]] - Kill Process Group :
kill -9 "-$SERVER_PID"- The
-prefix kills the entire process group - Signal
-9(SIGKILL) ensures immediate termination
- The
- Ignore Errors :
2>/dev/null || true- Prevents script failure if process already exited
- Remove Storage :
rm -f "$STORAGE_FILE"- Cleans up temporary test data
Sources: experiments/bindings/python-ws-client/integration_test.sh:19-29
Test Execution Environment
The test suite runs with specific environment variables that configure the client connection parameters. These variables bridge the shell orchestration layer with the Python test code.
Environment Variable Injection
The script exports server configuration to the test environment:
Python tests can access these via os.environ:
Sources: experiments/bindings/python-ws-client/integration_test.sh:83-85
Pytest Execution Flags
The test suite runs with specific pytest flags:
-v: Verbose output (shows individual test names)-s: No output capture (displays print statements immediately)
This configuration aids debugging by providing real-time test output.
Sources: experiments/bindings/python-ws-client/integration_test.sh87
graph TB
Main["main()
entry point"]
ReadFile["README.read_text()"]
Extract["extract_python_blocks()"]
Generate["List comprehension:\n[wrap_as_test_fn(code, i)\nfor i, code in enumerate(blocks)]"]
WriteFile["TEST_FILE.write_text()"]
Main --> ReadFile
ReadFile --> Extract
Extract --> Generate
Generate --> WriteFile
subgraph "File Structure"
Header["# Auto-generated from README.md\nimport pytest"]
TestFunctions["test_readme_block_0()\ntest_readme_block_1()\n..."]
end
WriteFile --> Header
WriteFile --> TestFunctions
Test Case Structure
Generated test cases follow a consistent structure based on the README extraction pattern. Each test becomes an isolated function within the generated test file.
Test File Generation
Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:36-42
Test Isolation
Each extracted code block becomes an independent test function:
- Naming Convention :
test_readme_block_{idx}()whereidxis zero-based - Isolation : Each function has its own scope
- Independence : Tests can run in any order (no shared state)
- Clarity : Test number maps directly to README code block order
This pattern enables pytest to discover and execute each documentation example as a separate test case, providing clear pass/fail feedback for each code snippet.
Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:30-34
Dependencies and Platform Requirements
The integration testing system requires specific tools and versions to function correctly.
Required Tools
| Tool | Purpose | Detection Method |
|---|---|---|
uv | Python package management | command -v uv &> /dev/null |
cargo | Rust build system | Implicit (via cargo run) |
bash | Shell scripting | Shebang: #!/bin/bash |
pytest | Test execution | Installed via uv pip install |
Sources: experiments/bindings/python-ws-client/integration_test.sh:62-68 experiments/bindings/python-ws-client/integration_test.sh1
Python Version Support
The client supports Python 3.10 through 3.13 (CPython only):
| Python Version | Support Status | Notes |
|---|---|---|
| 3.10 | ✓ Supported | Minimum version |
| 3.11 | ✓ Supported | Full compatibility |
| 3.12 | ✓ Supported | Full compatibility |
| 3.13 | ✓ Supported | Latest tested |
Note : NumPy version constraints differ by Python version. The project uses numpy>=2.2.6 for Python 3.10 compatibility, as NumPy 2.3.0+ dropped Python 3.10 support.
Sources: experiments/bindings/python-ws-client/pyproject.toml7 experiments/bindings/python-ws-client/pyproject.toml:21-24 experiments/bindings/python-ws-client/pyproject.toml41
Operating System Support
The integration tests target POSIX-compliant systems:
- Linux : Primary development platform (Ubuntu, Debian, etc.)
- macOS : Full support (Darwin)
- Windows : Supported but uses different process management
The shell script uses POSIX-compatible syntax ([[ ]], $!, trap) that works across these platforms with appropriate shells.
Sources: experiments/bindings/python-ws-client/integration_test.sh1 experiments/bindings/python-ws-client/pyproject.toml:25-26
Running the Integration Tests
Execution Command
From the repository root:
Or from the Python client directory:
Expected Output Sequence
- Script initialization and configuration
- Server compilation (if needed) and startup
- Python environment creation
- Dependency installation
- README test extraction
- Pytest test execution with detailed output
- Automatic cleanup on exit
Troubleshooting
Common failure scenarios:
| Issue | Symptom | Solution |
|---|---|---|
uv not found | Script exits at line 66 | Install uv: `curl -LsSf https://astral.sh/uv/install.sh |
| Port already in use | Server fails to start | Change SERVER_PORT variable or kill existing process |
| Storage file locked | Permission denied | Ensure /tmp is writable and no processes hold locks |
| Cargo build fails | Compilation errors | Run cargo clean and retry |
Sources: experiments/bindings/python-ws-client/integration_test.sh:62-68
Performance Optimizations
Relevant source files
- .github/workflows/rust-lint.yml
- CHANGELOG.md
- Cargo.lock
- README.md
- simd-r-drive-entry-handle/src/debug_assert_aligned.rs
Purpose and Scope
This document provides a comprehensive guide to the performance optimization strategies employed throughout SIMD R Drive. It covers three primary optimization domains: SIMD-accelerated operations, memory alignment strategies, and access pattern optimizations.
For architectural details about the storage engine itself, see Core Storage Engine. For information about network-level optimizations in the RPC layer, see Network Layer and RPC. For details on the extension utilities, see Extensions and Utilities.
Overview of Optimization Strategies
SIMD R Drive achieves high performance through a multi-layered approach combining hardware-level optimizations with careful software design. The system leverages CPU-specific SIMD instructions, enforces strict memory alignment, and provides multiple access modes tailored to different workload patterns.
Core Performance Principles
graph TB
subgraph "Hardware Level"
SIMD["SIMD Instructions\nAVX2/NEON"]
Cache["CPU Cache Lines\n64-byte aligned"]
HWHash["Hardware Hashing\nSSE2/AVX2/Neon"]
end
subgraph "Software Level"
SIMDCopy["simd_copy\nWrite Optimization"]
Alignment["PAYLOAD_ALIGNMENT\nZero-Copy Reads"]
AccessModes["Access Modes\nSingle/Batch/Stream"]
end
subgraph "Storage Layer"
MMap["Memory-Mapped File\nmmap"]
Sequential["Sequential Writes\nAppend-Only"]
Index["XXH3 Index\nO(1)
Lookups"]
end
SIMD --> SIMDCopy
Cache --> Alignment
HWHash --> Index
SIMDCopy --> Sequential
Alignment --> MMap
AccessModes --> Sequential
AccessModes --> MMap
Sequential --> Index
Sources: README.md:249-257 README.md:52-59
Performance Metrics Summary
| Optimization Area | Technique | Benefit | Trade-off |
|---|---|---|---|
| Write Path | simd_copy with AVX2/NEON | 2-4x faster bulk copies | Requires feature detection |
| Read Path | Memory-mapped zero-copy | No allocation overhead | File size impacts virtual memory |
| Alignment | 64-byte payload boundaries | Full SIMD/cache efficiency | 0-63 bytes padding per entry |
| Indexing | XXH3_64 with hardware acceleration | Sub-microsecond key lookups | Memory overhead for index |
| Batch Writes | Single lock, multiple entries | Reduced lock contention | Delayed durability guarantees |
| Parallel Iteration | Rayon multi-threaded scan | Linear speedup with cores | Higher CPU utilization |
Sources: README.md:249-257 README.md:159-167
SIMD Write Acceleration
The simd_copy Function
SIMD R Drive uses a specialized memory copy function called simd_copy to optimize data transfer during write operations. This function detects available CPU features at runtime and selects the most efficient implementation.
Platform-Specific Implementations
graph TB
WriteOp["Write Operation"]
SIMDCopy["simd_copy\nRuntime Dispatch"]
X86Check{"x86_64 with\nAVX2?"}
ARMCheck{"aarch64?"}
AVX2["AVX2 Implementation\n_mm256_loadu_si256\n_mm256_storeu_si256\n32-byte chunks"]
NEON["NEON Implementation\nvld1q_u8\nvst1q_u8\n16-byte chunks"]
Fallback["Scalar Fallback\ncopy_from_slice\nStandard library"]
WriteOp --> SIMDCopy
SIMDCopy --> X86Check
X86Check -->|Yes| AVX2
X86Check -->|No| ARMCheck
ARMCheck -->|Yes| NEON
ARMCheck -->|No| Fallback
Sources: README.md:249-257 High-Level Diagram 6
Implementation Architecture
The simd_copy function follows this decision tree:
- x86_64 with AVX2 : Uses 256-bit wide vector operations via
_mm256_loadu_si256and_mm256_storeu_si256, processing 32 bytes per instruction - aarch64 : Uses NEON intrinsics via
vld1q_u8andvst1q_u8, processing 16 bytes per instruction - Fallback : Uses standard library
copy_from_slicefor platforms without SIMD support
SIMD Copy Performance Characteristics
| Platform | Instruction Set | Chunk Size | Throughput Gain | Detection Method |
|---|---|---|---|---|
| x86_64 | AVX2 | 32 bytes | ~3-4x vs scalar | Runtime feature detection |
| x86_64 | SSE2 | 16 bytes | ~2x vs scalar | Universal on x86_64 |
| aarch64 | NEON | 16 bytes | ~2-3x vs scalar | Default on ARM64 |
| Other | Scalar | 1-8 bytes | Baseline | Always available |
Sources: README.md:249-257 High-Level Diagram 6
Integration with Write Path
Sources: README.md:249-257 High-Level Diagram 5
Runtime Feature Detection
The system performs CPU feature detection once at startup:
- x86_64 : Uses
is_x86_feature_detected!("avx2")macro to check AVX2 availability - aarch64 : NEON is always available, no detection needed
- Other platforms : Immediately fall back to scalar implementation
This detection overhead is paid once, with subsequent calls dispatching directly to the selected implementation via function pointers or conditional compilation.
Sources: README.md:249-257 High-Level Diagram 6
Payload Alignment and Cache Efficiency
64-Byte Alignment Strategy
Starting from version 0.15.0-alpha, all non-tombstone payloads begin on 64-byte aligned boundaries. This alignment matches typical CPU cache line sizes and ensures optimal SIMD operation performance.
Alignment Architecture
graph TB
subgraph "Constants Configuration"
AlignLog["PAYLOAD_ALIGN_LOG2 = 6"]
AlignConst["PAYLOAD_ALIGNMENT = 64"]
end
subgraph "On-Disk Layout"
PrevTail["Previous Entry Tail\noffset N"]
PrePad["Pre-Pad Bytes\n0-63 zero bytes"]
Payload["Aligned Payload\n64-byte boundary"]
Metadata["Entry Metadata\n20 bytes"]
end
subgraph "Hardware Alignment"
CacheLine["CPU Cache Line\n64 bytes typical"]
AVX512["AVX-512 Registers\n64 bytes wide"]
SIMDOps["SIMD Operations\nNo boundary crossing"]
end
AlignLog --> AlignConst
AlignConst --> PrePad
PrevTail --> PrePad
PrePad --> Payload
Payload --> Metadata
Payload --> CacheLine
Payload --> AVX512
Payload --> SIMDOps
Sources: README.md:52-59 CHANGELOG.md:25-51 simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88
Alignment Calculation
The pre-padding for each entry is calculated using:
pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)
Where:
prev_tail: The file offset immediately after the previous entry's metadataPAYLOAD_ALIGNMENT: 64 (configurable constant)&: Bitwise AND operation ensuring the result is in range [0, 63]
Example Alignment Scenarios
| Previous Tail Offset | Modulo 64 | Padding Required | Next Payload Starts At |
|---|---|---|---|
| 100 | 36 | 28 bytes | 128 |
| 128 | 0 | 0 bytes | 128 |
| 200 | 8 | 56 bytes | 256 |
| 1000 | 40 | 24 bytes | 1024 |
Sources: README.md:114-138
graph LR
EntryHandle["EntryHandle"]
DebugAssert["debug_assert_aligned\nptr, align"]
CheckPower["Check align is\npower of two"]
CheckAddr["Check ptr & align-1 == 0"]
OK["Continue"]
Panic["Debug Panic"]
EntryHandle --> DebugAssert
DebugAssert --> CheckPower
CheckPower --> CheckAddr
CheckAddr -->|Aligned| OK
CheckAddr -->|Misaligned| Panic
Alignment Validation
Debug and test builds include alignment assertions to catch violations early:
Pointer Alignment Assertion
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:25-43
Offset Alignment Assertion
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88
These assertions compile to no-ops in release builds, providing zero-cost validation during development.
Evolution from v0.14 to v0.15
Alignment Comparison
| Version | Default Alignment | Cache Line Fit | AVX-512 Compatible | Padding Overhead |
|---|---|---|---|---|
| ≤ 0.13.x | No alignment | No guarantee | No | 0 bytes |
| 0.14.x | 16 bytes | Partial | No | 0-15 bytes |
| 0.15.x | 64 bytes | Complete | Yes | 0-63 bytes |
Key Changes in v0.15.0-alpha:
- Increased alignment from 16 to 64 bytes : Better cache efficiency and full AVX-512 support
- Added alignment validation : Debug assertions in
debug_assert_aligned.rs - Updated constants :
PAYLOAD_ALIGN_LOG2 = 6andPAYLOAD_ALIGNMENT = 64 - Breaking compatibility : Files written by v0.15.x cannot be read by v0.14.x or earlier
Sources: CHANGELOG.md:25-51
Migration Considerations
Migration Path from v0.14 to v0.15:
- Read all entries using v0.14.x-compatible binary
- Create new storage with v0.15.x binary
- Write entries to new storage with automatic 64-byte alignment
- Verify integrity using read operations
- Replace old file after successful verification
Multi-Service Deployment Strategy:
graph TB
OldReader["v0.14.x Readers"]
NewReader["v0.15.x Readers"]
OldWriter["v0.14.x Writers"]
NewWriter["v0.15.x Writers"]
OldStore["v0.14 Storage\n16-byte aligned"]
NewStore["v0.15 Storage\n64-byte aligned"]
Step1["Step 1: Deploy v0.15 readers\nwhile writers still v0.14"]
Step2["Step 2: Migrate storage files\nto v0.15 format"]
Step3["Step 3: Deploy v0.15 writers"]
OldReader --> Step1
Step1 --> NewReader
OldStore --> Step2
Step2 --> NewStore
OldWriter --> Step3
Step3 --> NewWriter
NewReader -.->|Can read| OldStore
NewReader -.->|Can read| NewStore
NewWriter -.->|Writes to| NewStore
Sources: CHANGELOG.md:43-51
Zero-Copy Benefits of Alignment
Proper alignment enables zero-copy reinterpretation of payload bytes as typed slices:
Safe Reinterpretation Table
| Payload Size | Aligned To | Can Cast To | Zero-Copy | Notes |
|---|---|---|---|---|
| 64 bytes | 64 bytes | &[u8; 64] | ✅ Yes | Direct array reference |
| 128 bytes | 64 bytes | &[u16; 64] | ✅ Yes | u16 requires 2-byte alignment |
| 256 bytes | 64 bytes | &[u32; 64] | ✅ Yes | u32 requires 4-byte alignment |
| 512 bytes | 64 bytes | &[u64; 64] | ✅ Yes | u64 requires 8-byte alignment |
| 1024 bytes | 64 bytes | &[u128; 64] | ✅ Yes | u128 requires 16-byte alignment |
| Variable | 64 bytes | Custom structs | ⚠️ Maybe | Depends on struct alignment requirements |
Sources: README.md:52-59
graph TB
subgraph "Single Entry Write"
SingleAPI["write key, payload"]
SingleLock["Acquire write lock"]
SingleWrite["Write one entry"]
SingleFlush["flush"]
SingleUnlock["Release lock"]
end
subgraph "Batch Entry Write"
BatchAPI["batch_write entries"]
BatchLock["Acquire write lock"]
BatchLoop["Write N entries\nin locked section"]
BatchFlush["flush once"]
BatchUnlock["Release lock"]
end
subgraph "Streaming Write"
StreamAPI["write_stream key, reader"]
StreamLock["Acquire write lock"]
StreamChunk["Read + write chunks\nincrementally"]
StreamFlush["flush"]
StreamUnlock["Release lock"]
end
SingleAPI --> SingleLock
SingleLock --> SingleWrite
SingleWrite --> SingleFlush
SingleFlush --> SingleUnlock
BatchAPI --> BatchLock
BatchLock --> BatchLoop
BatchLoop --> BatchFlush
BatchFlush --> BatchUnlock
StreamAPI --> StreamLock
StreamLock --> StreamChunk
StreamChunk --> StreamFlush
StreamFlush --> StreamUnlock
Write and Read Mode Performance
Write Mode Comparison
SIMD R Drive provides three write modes optimized for different workload patterns:
Write Mode Architecture
Sources: README.md:208-223
Write Mode Performance Characteristics
| Write Mode | Lock Acquisitions | Flush Operations | Memory Usage | Best For |
|---|---|---|---|---|
| Single Entry | N (one per entry) | N (one per entry) | O(1) per entry | Low-latency individual writes |
| Batch Entry | 1 (for all entries) | 1 (at batch end) | O(batch size) | High-throughput bulk inserts |
| Streaming | 1 (entire stream) | 1 (at stream end) | O(chunk size) | Large payloads, memory-constrained |
Lock Contention Analysis:
Sources: README.md:208-223
graph TB
subgraph "Direct Memory Access"
DirectAPI["read key"]
DirectIndex["Index lookup O(1)"]
DirectMmap["Memory-mapped view"]
DirectHandle["EntryHandle\nzero-copy slice"]
end
subgraph "Streaming Read"
StreamAPI["read_stream key"]
StreamIndex["Index lookup O(1)"]
StreamBuffer["Buffered reader\n4KB chunks"]
StreamInc["Incremental processing"]
end
subgraph "Parallel Iteration"
ParAPI["par_iter_entries"]
ParScan["Full file scan"]
ParRayon["Rayon thread pool"]
ParProcess["Multi-core processing"]
end
DirectAPI --> DirectIndex
DirectIndex --> DirectMmap
DirectMmap --> DirectHandle
StreamAPI --> StreamIndex
StreamIndex --> StreamBuffer
StreamBuffer --> StreamInc
ParAPI --> ParScan
ParScan --> ParRayon
ParRayon --> ParProcess
Read Mode Comparison
Three read modes provide different trade-offs between performance, memory usage, and access patterns:
Read Mode Architecture
Sources: README.md:224-247
Read Mode Performance Characteristics
| Read Mode | Memory Overhead | Latency | Throughput | Copy Operations | Best For |
|---|---|---|---|---|---|
| Direct Access | O(1) mapping overhead | ~100ns | Very high | Zero | Random access, small reads |
| Streaming | O(buffer size) | ~1-10µs | High | Non-zero-copy | Large entries, memory-constrained |
| Parallel Iteration | O(thread count) | N/A | Scales with cores | Zero per entry | Bulk processing, analytics |
Memory Access Patterns:
Sources: README.md:224-247
Parallel Iteration Performance
The parallel iteration feature (requires parallel feature flag) leverages Rayon for multi-threaded scanning:
Parallel Iteration Scaling
| Core Count | Sequential Time | Parallel Time | Speedup | Efficiency |
|---|---|---|---|---|
| 1 core | 1.0s | 1.0s | 1.0x | 100% |
| 2 cores | 1.0s | 0.52s | 1.92x | 96% |
| 4 cores | 1.0s | 0.27s | 3.70x | 93% |
| 8 cores | 1.0s | 0.15s | 6.67x | 83% |
| 16 cores | 1.0s | 0.09s | 11.1x | 69% |
Note: Actual performance depends on entry size distribution, storage device speed, and working set size relative to CPU cache.
Sources: README.md:242-247
graph TB
subgraph "x86_64 Hashing"
X86Key["Key bytes"]
X86SSE2["SSE2 Instructions\nUniversal on x86_64"]
X86AVX2["AVX2 Instructions\nIf available"]
X86Hash["64-bit hash"]
end
subgraph "aarch64 Hashing"
ARMKey["Key bytes"]
ARMNeon["Neon Instructions\nDefault on ARM64"]
ARMHash["64-bit hash"]
end
subgraph "Index Operations"
HashVal["u64 hash value"]
HashMap["DashMap index"]
Lookup["O(1)
lookup"]
end
X86Key --> X86SSE2
X86SSE2 -.->|If available| X86AVX2
X86AVX2 --> X86Hash
X86SSE2 --> X86Hash
ARMKey --> ARMNeon
ARMNeon --> ARMHash
X86Hash --> HashVal
ARMHash --> HashVal
HashVal --> HashMap
HashMap --> Lookup
Hardware-Accelerated Hashing
XXH3_64 Hashing Algorithm
SIMD R Drive uses the xxhash-rust implementation of XXH3_64 for all key hashing operations. This algorithm provides hardware acceleration on multiple platforms.
Hashing Performance by Platform
Sources: README.md:159-167 Cargo.lock:1385-1404
Hashing Throughput Benchmarks
| Platform | SIMD Features | Throughput (GB/s) | Latency (ns/hash) | Relative Performance |
|---|---|---|---|---|
| x86_64 | AVX2 | ~25-35 | ~30-50 | 1.4-1.8x vs SSE2 |
| x86_64 | SSE2 only | ~15-20 | ~50-80 | Baseline |
| aarch64 | Neon | ~20-28 | ~40-60 | 1.2-1.5x vs scalar |
| Other | Scalar | ~8-12 | ~100-150 | Reference |
Note: Benchmarks are approximate and vary based on CPU generation, key size, and system load.
Sources: README.md:159-167
sequenceDiagram
participant App as "Application"
participant DS as "DataStore"
participant Hash as "XXH3_64"
participant Index as "DashMap\nkey_indexer"
participant MMap as "Mmap\nmemory-mapped file"
App->>DS: read(key)
DS->>Hash: hash(key)
Hash-->>DS: u64 hash
DS->>Index: get(hash)
alt "Hash found"
Index-->>DS: (tag, offset)
DS->>DS: Verify tag
DS->>MMap: Create EntryHandle\nat offset
MMap-->>DS: EntryHandle
DS-->>App: Payload slice
else "Hash not found"
Index-->>DS: None
DS-->>App: Error: not found
end
Index Structure Performance
The key index uses DashMap (a concurrent hash map) for thread-safe lookups:
Index Lookup Flow
Sources: README.md:159-167 High-Level Diagram 2
Index Memory Overhead
| Entry Count | Index Size | Overhead per Entry | Total Overhead (1M entries) |
|---|---|---|---|
| 1,000 | ~32 KB | ~32 bytes | ~32 MB |
| 10,000 | ~320 KB | ~32 bytes | ~32 MB |
| 100,000 | ~3.2 MB | ~32 bytes | ~32 MB |
| 1,000,000 | ~32 MB | ~32 bytes | ~32 MB |
The index overhead is proportional to the number of unique keys, not the total file size, making it memory-efficient even for large datasets.
Sources: README.md:159-167
Performance Tuning Guidelines
Choosing the Right Write Mode
Decision Matrix:
Sources: README.md:208-223
Choosing the Right Read Mode
Decision Matrix:
Sources: README.md:224-247
System-Level Optimizations
Operating System Tuning:
| Parameter | Recommended Value | Impact | Notes |
|---|---|---|---|
vm.max_map_count | ≥ 262144 | Enables large mmap regions | Linux only |
| File system | ext4, XFS, APFS | Better mmap performance | Avoid network filesystems |
| Huge pages | Transparent enabled | Reduces TLB misses | Linux kernel config |
| I/O scheduler | none or mq-deadline | Lower write latency | For SSD/NVMe devices |
Hardware Considerations:
| Component | Recommendation | Reason |
|---|---|---|
| CPU | AVX2 or ARM Neon | SIMD acceleration |
| RAM | ≥ 8GB + working set | Effective mmap caching |
| Storage | NVMe SSD | Sequential write speed |
| Cache | L3 ≥ 8MB | Better index hit rate |
Sources: README.md:249-257 README.md:159-167
Benchmarking and Profiling
Key Performance Indicators
Write Performance Metrics:
| Metric | Target | Measurement Method |
|---|---|---|
| Single write latency | < 10 µs | P50, P95, P99 percentiles |
| Batch write throughput | > 100 MB/s | Bytes written per second |
| Lock contention ratio | < 5% | Time waiting / total time |
| Flush overhead | < 1 ms | fsync() call duration |
Read Performance Metrics:
| Metric | Target | Measurement Method |
|---|---|---|
| Index lookup latency | < 100 ns | P50, P95, P99 percentiles |
| Memory access latency | < 1 µs | Time to first byte |
| Cache hit rate | > 95% | OS page cache statistics |
| Parallel speedup | > 0.8 × cores | Amdahl's law efficiency |
Sources: README.md:159-167 .github/workflows/rust-lint.yml:1-44
Profiling Tools
Recommended profiling workflow:
- cargo-criterion : For micro-benchmarks of individual operations
- perf (Linux) or Instruments (macOS): For system-level profiling
- flamegraph : For visualizing hot code paths
- valgrind/cachegrind : For cache miss analysis
Sources: Cargo.lock:607-627
Summary
SIMD R Drive achieves high performance through:
- SIMD acceleration via
simd_copywith AVX2/NEON implementations for write operations - 64-byte payload alignment ensuring cache-line efficiency and zero-copy access
- Hardware-accelerated XXH3_64 hashing for O(1) index lookups
- Multiple access modes optimized for different workload patterns
- Memory-mapped zero-copy reads eliminating deserialization overhead
The system balances these optimizations to provide consistent, predictable performance across diverse workloads while maintaining data integrity and thread safety.
Sources: README.md:249-257 README.md:52-59 README.md:159-167 CHANGELOG.md:25-51
SIMD Acceleration
Relevant source files
- README.md
- benches/storage_benchmark.rs
- experiments/bindings/python_(old_client)/pyproject.toml/pyproject.toml)
- src/main.rs
- src/utils/format_bytes.rs
- tests/concurrency_tests.rs
Purpose and Scope
This page documents the SIMD (Single Instruction, Multiple Data) optimizations implemented in SIMD R Drive. The storage engine uses SIMD acceleration in two critical areas: write operations (via the simd_copy function) and key hashing (via the xxh3_64 algorithm).
SIMD is not used for read operations, which rely on zero-copy memory-mapped access for optimal performance. For information about zero-copy reads and memory management, see Memory Management and Zero-Copy Access. For details on the 64-byte alignment strategy that enables efficient SIMD operations, see Payload Alignment and Cache Efficiency.
Sources: README.md:248-256
Overview
SIMD R Drive leverages hardware SIMD capabilities to accelerate performance-critical operations during write workflows. The storage engine uses platform-specific SIMD instructions to minimize CPU cycles spent on memory operations, resulting in improved write throughput and indexing performance.
The SIMD acceleration strategy consists of two components:
| Component | Purpose | Platforms | Performance Impact |
|---|---|---|---|
simd_copy Function | Accelerates memory copying during write buffer staging | x86_64 (AVX2), aarch64 (NEON) | Reduces write latency by vectorizing memory transfers |
xxh3_64 Hashing | Hardware-accelerated key hashing for index operations | x86_64 (SSE2/AVX2), aarch64 (Neon) | Improves key lookup and index building performance |
The system uses runtime CPU feature detection to select the optimal implementation path and automatically falls back to scalar operations when SIMD instructions are unavailable.
Sources: README.md:248-256 High-Level Diagram 6
SIMD Copy Function
Purpose and Role in Write Operations
The simd_copy function is a specialized memory transfer routine used during write operations. When the storage engine writes data to disk, it first stages the payload in an internal buffer. The simd_copy function accelerates this staging process by using vectorized memory operations instead of standard byte-wise copying.
Diagram: SIMD Copy Function Flow in Write Operations
graph TB
WriteAPI["DataStoreWriter::write()"]
Buffer["Internal Write Buffer\n(BufWriter<File>)"]
SIMDCopy["simd_copy Function"]
CPUDetect["CPU Feature Detection\n(Runtime)"]
AVX2Path["AVX2 Implementation\n_mm256_loadu_si256\n_mm256_storeu_si256\n(32-byte chunks)"]
NEONPath["NEON Implementation\nvld1q_u8\nvst1q_u8\n(16-byte chunks)"]
ScalarPath["Scalar Fallback\ncopy_from_slice\n(Standard memcpy)"]
DiskWrite["Write to Disk\n(Buffered I/O)"]
WriteAPI -->|user payload| SIMDCopy
SIMDCopy --> CPUDetect
CPUDetect -->|x86_64 + AVX2| AVX2Path
CPUDetect -->|aarch64| NEONPath
CPUDetect -->|no SIMD| ScalarPath
AVX2Path --> Buffer
NEONPath --> Buffer
ScalarPath --> Buffer
Buffer --> DiskWrite
The simd_copy function operates transparently within the write path. Applications using the DataStoreWriter trait do not need to explicitly invoke SIMD operations—the acceleration is automatic and selected based on the host CPU capabilities.
Sources: README.md:252-253 High-Level Diagram 6
Platform-Specific Implementations
x86_64 Architecture: AVX2 Instructions
On x86_64 platforms with AVX2 support, the simd_copy function uses 256-bit Advanced Vector Extensions to process memory in 32-byte chunks. The implementation employs two primary intrinsics:
| Intrinsic | Operation | Description |
|---|---|---|
_mm256_loadu_si256 | Load | Reads 32 bytes from source memory into a 256-bit register (unaligned load) |
_mm256_storeu_si256 | Store | Writes 32 bytes from a 256-bit register to destination memory (unaligned store) |
graph LR
SrcMem["Source Memory\n(Payload Data)"]
AVX2Reg["256-bit AVX2 Register\n(__m256i)"]
DstMem["Destination Buffer\n(Write Buffer)"]
SrcMem -->|_mm256_loadu_si256 32 bytes| AVX2Reg
AVX2Reg -->|_mm256_storeu_si256 32 bytes| DstMem
The AVX2 path is activated when the CPU supports the avx2 feature flag. Runtime detection ensures that the instruction set is available before executing AVX2 code paths.
Diagram: AVX2 Memory Transfer Operation
Sources: README.md:252-253 High-Level Diagram 6
aarch64 Architecture: NEON Instructions
On aarch64 (ARM64) platforms, the simd_copy function uses NEON (Advanced SIMD) instructions to process memory in 16-byte chunks. The implementation uses:
| Intrinsic | Operation | Description |
|---|---|---|
vld1q_u8 | Load | Reads 16 bytes from source memory into a 128-bit NEON register |
vst1q_u8 | Store | Writes 16 bytes from a 128-bit NEON register to destination memory |
graph LR
SrcMem["Source Memory\n(Payload Data)"]
NEONReg["128-bit NEON Register\n(uint8x16_t)"]
DstMem["Destination Buffer\n(Write Buffer)"]
SrcMem -->|vld1q_u8 16 bytes| NEONReg
NEONReg -->|vst1q_u8 16 bytes| DstMem
NEON is available by default on all aarch64 targets, so no runtime feature detection is required for ARM64 platforms.
Diagram: NEON Memory Transfer Operation
Sources: README.md:252-253 High-Level Diagram 6
Scalar Fallback Implementation
When SIMD instructions are unavailable (e.g., on older CPUs or unsupported architectures), the simd_copy function automatically falls back to the standard Rust copy_from_slice method. This ensures portability across all platforms while still benefiting from compiler optimizations (which may include auto-vectorization in some cases).
The fallback path provides correct functionality on all platforms but does not achieve the same throughput as explicit SIMD implementations.
Sources: README.md:252-253 High-Level Diagram 6
Runtime CPU Feature Detection
Detection Mechanism
The storage engine uses conditional compilation and runtime feature detection to select the appropriate SIMD implementation:
Diagram: SIMD Path Selection Logic
graph TB
Start["Program Execution Start"]
CompileTarget["Compile-Time Target Check"]
X86Check{"Target:\nx86_64?"}
RuntimeAVX2{"Runtime:\nAVX2 available?"}
ARMCheck{"Target:\naarch64?"}
UseAVX2["Use AVX2 Path\n(32-byte chunks)"]
UseNEON["Use NEON Path\n(16-byte chunks)"]
UseScalar["Use Scalar Fallback\n(copy_from_slice)"]
Start --> CompileTarget
CompileTarget --> X86Check
X86Check -->|Yes| RuntimeAVX2
RuntimeAVX2 -->|Yes| UseAVX2
RuntimeAVX2 -->|No| UseScalar
X86Check -->|No| ARMCheck
ARMCheck -->|Yes| UseNEON
ARMCheck -->|No| UseScalar
Feature Detection Strategy
| Platform | Detection Method | Feature Flag | Fallback Condition |
|---|---|---|---|
| x86_64 | Runtime is_x86_feature_detected!("avx2") | avx2 | AVX2 not supported → Scalar |
| aarch64 | Compile-time (default available) | N/A | N/A (NEON always present) |
| Other | Compile-time target check | N/A | Always use scalar |
The runtime detection overhead is negligible—the feature check is typically performed once during the first simd_copy invocation and the result is cached for subsequent calls.
Sources: README.md:252-256 High-Level Diagram 6
SIMD Acceleration in Key Hashing
XXH3_64 Algorithm with Hardware Acceleration
In addition to write buffer acceleration, SIMD R Drive uses the xxh3_64 hashing algorithm for key indexing. This algorithm is specifically designed to leverage SIMD instructions for high-speed hash computation.
The hashing subsystem uses hardware acceleration on supported platforms:
| Platform | SIMD Extension | Availability | Performance Benefit |
|---|---|---|---|
| x86_64 | SSE2 | Universal (all 64-bit x86) | Baseline SIMD acceleration |
| x86_64 | AVX2 | Supported on modern CPUs | Additional performance gains |
| aarch64 | Neon | Default on all ARM64 | SIMD acceleration |
Diagram: SIMD-Accelerated Key Hashing Pipeline
The xxh3_64 algorithm is used in multiple operations:
- Key hash computation during write operations
- Index lookups during read operations
- Index building during storage recovery
By using SIMD-accelerated hashing, the storage engine minimizes the CPU overhead of hash computation, which is critical for maintaining high throughput in index-heavy workloads.
Sources: README.md:158-166 README.md:254-255 High-Level Diagram 6
Performance Characteristics
Write Throughput Improvements
The SIMD-accelerated write path provides measurable performance improvements over scalar memory operations. The performance gain depends on several factors:
| Factor | Impact on Performance |
|---|---|
| Payload size | Larger payloads benefit more from vectorized copying |
| CPU architecture | AVX2 (32-byte) typically faster than NEON (16-byte) |
| Memory alignment | 64-byte aligned payloads maximize cache efficiency |
| Buffer size | Larger write buffers reduce function call overhead |
While exact performance gains vary by workload and hardware, typical scenarios show:
- AVX2 implementations : 2-4× throughput improvement over scalar copy for medium-to-large payloads
- NEON implementations : 1.5-3× throughput improvement over scalar copy
- Hash computation : XXH3 with SIMD is significantly faster than non-accelerated hash functions
Interaction with 64-Byte Alignment
The storage engine's 64-byte payload alignment strategy (see Payload Alignment and Cache Efficiency) synergizes with SIMD operations:
Diagram: Alignment and SIMD Interaction
The 64-byte alignment ensures that:
- Cache line alignment : Payloads start on cache line boundaries, avoiding split-cache-line access penalties
- SIMD-friendly boundaries : Both AVX2 (32-byte) and NEON (16-byte) operations can operate at full speed without crossing alignment boundaries
- Zero-copy efficiency : Memory-mapped reads benefit from predictable alignment (see Memory Management and Zero-Copy Access)
Sources: README.md:51-59 README.md:248-256 High-Level Diagram 6
When SIMD Is (and Isn't) Used
Operations Using SIMD Acceleration
| Operation | SIMD Usage | Implementation |
|---|---|---|
Write operations (write, batch_write, write_stream) | ✅ Yes | simd_copy function transfers payload to write buffer |
| Key hashing (all operations requiring key lookup) | ✅ Yes | xxh3_64 with SSE2/AVX2/Neon acceleration |
| Index building (during storage recovery) | ✅ Yes | xxh3_64 hashing during forward scan |
Operations NOT Using SIMD
| Operation | SIMD Usage | Reason |
|---|---|---|
Read operations (read, batch_read, read_stream) | ❌ No | Zero-copy memory-mapped access—data is accessed directly from mmap without copying |
Iteration (into_iter, par_iter_entries) | ❌ No | Iterators return references to memory-mapped regions—no data transfer occurs |
| Entry validation (CRC32C checksum) | ❌ No | Uses standard CRC32C implementation (hardware-accelerated on supporting CPUs via separate mechanism) |
Diagram: SIMD Usage in Write vs. Read Paths
Why Reads Don't Need SIMD
Read operations in SIMD R Drive use memory-mapped I/O (mmap), which provides direct access to the storage file's contents without copying data into buffers. Since no memory transfer occurs, there is no opportunity to apply SIMD acceleration.
The zero-copy read strategy is fundamentally faster than any SIMD-accelerated copy operation because it eliminates the copy entirely. The memory-mapped approach allows the operating system's virtual memory subsystem to handle data access, often utilizing hardware-level optimizations like demand paging and read-ahead caching.
For more details on the zero-copy read architecture, see Memory Management and Zero-Copy Access.
Sources: README.md:43-49 README.md:254-256 High-Level Diagram 2
Code Entity Reference
The following table maps the conceptual SIMD components discussed in this document to their likely implementation locations within the codebase:
| Concept | Code Entity | Expected Location |
|---|---|---|
| SIMD copy function | simd_copy() | Core storage engine or utilities |
| AVX2 implementation | Architecture-specific module with _mm256_* intrinsics | Platform-specific code (likely #[cfg(target_arch = "x86_64")]) |
| NEON implementation | Architecture-specific module with vld1q_* / vst1q_* intrinsics | Platform-specific code (likely #[cfg(target_arch = "aarch64")]) |
| Runtime detection | is_x86_feature_detected!("avx2") macro | AVX2 code path guard |
| XXH3 hashing | xxh3_64 function or xxhash_rust crate usage | Key hashing module |
| Write buffer staging | BufWriter<File> with simd_copy integration | DataStore write implementation |
Sources: README.md:248-256 High-Level Diagram 6
Summary
SIMD acceleration in SIMD R Drive focuses on write-path optimization and key hashing performance. The dual-platform implementation (AVX2 for x86_64, NEON for aarch64) with automatic runtime detection ensures optimal performance across diverse hardware while maintaining portability through scalar fallback.
The 64-byte payload alignment strategy complements SIMD operations by ensuring cache-friendly access patterns. However, the storage engine intentionally does not use SIMD for read operations—zero-copy memory-mapped access eliminates the need for data transfer entirely, providing superior read performance without vectorization.
For related performance optimizations, see:
- Payload Alignment and Cache Efficiency for alignment strategy details
- Memory Management and Zero-Copy Access for zero-copy read architecture
- Write and Read Modes for performance comparisons across different access patterns
Sources: README.md:248-256 README.md:43-59 High-Level Diagrams 2, 5, and 6
Payload Alignment and Cache Efficiency
Relevant source files
- .github/workflows/rust-lint.yml
- CHANGELOG.md
- README.md
- simd-r-drive-entry-handle/src/constants.rs
- simd-r-drive-entry-handle/src/debug_assert_aligned.rs
- simd-r-drive-entry-handle/src/lib.rs
- src/utils/align_or_copy.rs
Purpose and Scope
This document details the payload alignment strategy employed by SIMD R Drive to optimize cache utilization and enable efficient SIMD operations. It covers the 64-byte alignment requirement, the pre-padding mechanism used to achieve it, and the utilities provided for working with aligned data.
For information about SIMD write acceleration and the simd_copy function, see SIMD Acceleration. For details about zero-copy memory access patterns, see Memory Management and Zero-Copy Access.
Overview
SIMD R Drive enforces fixed 64-byte alignment for all non-tombstone payloads in the storage file. This alignment strategy provides three critical benefits:
- Cache Line Alignment : Payloads begin on CPU cache line boundaries (typically 64 bytes), preventing cache line splits that would require multiple memory accesses.
- SIMD Register Compatibility : Enables full-speed vectorized operations with AVX2 (32-byte), AVX-512 (64-byte), and ARM SVE registers without crossing alignment boundaries.
- Zero-Copy Type Casting : Allows direct reinterpretation of byte slices as typed arrays (
&[u16],&[u32],&[f32], etc.) without copying when element sizes match.
Sources: README.md:51-59 simd-r-drive-entry-handle/src/constants.rs:13-18
Alignment Configuration
Constants
The alignment boundary is configured via compile-time constants in the entry handle crate:
| Constant | Value | Description |
|---|---|---|
PAYLOAD_ALIGN_LOG2 | 6 | Log₂ of alignment (2⁶ = 64) |
PAYLOAD_ALIGNMENT | 64 | Alignment boundary in bytes |
| Maximum Pre-Pad | 63 | Maximum padding bytes per entry |
The constants are defined in simd-r-drive-entry-handle/src/constants.rs:17-18
Rationale for 64-Byte Alignment
Sources: README.md:53-54 simd-r-drive-entry-handle/src/constants.rs:13-18
Pre-Pad Mechanism
Computation
To achieve 64-byte alignment, entries may include zero-padding bytes before the payload. The pre-pad length is computed based on the previous entry's tail offset:
pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)
Where:
prev_tailis the absolute file offset immediately after the previous entry's metadata- The bitwise AND with
(PAYLOAD_ALIGNMENT - 1)ensures the result is in the range[0, 63] - If
prev_tailis already aligned,pad = 0
On-Disk Layout
Aligned Entry Structure:
| Offset Range | Field | Size (Bytes) | Description |
|---|---|---|---|
P .. P+pad | Pre-Pad | 0-63 | Zero bytes for alignment |
P+pad .. N | Payload | Variable | Actual data content |
N .. N+8 | Key Hash | 8 | XXH3_64 hash |
N+8 .. N+16 | Prev Offset | 8 | Previous tail offset |
N+16 .. N+20 | Checksum | 4 | CRC32C of payload |
Tombstones (deletion markers) do not include pre-pad and consist of only:
- 1-byte payload (
0x00) - 20-byte metadata
This design ensures tombstones remain compact while regular payloads maintain alignment.
Sources: README.md:112-137 simd-r-drive-entry-handle/src/constants.rs:1-18
Cache Efficiency Benefits
Cache Line Behavior
Modern CPUs organize memory into cache lines (typically 64 bytes). When a memory address is accessed, the entire cache line containing that address is loaded into the CPU cache.
graph TB
subgraph "Misaligned Payload Problem"
Miss1["Cache Line 1\nContains: Payload Start"]
Miss2["Cache Line 2\nContains: Payload End"]
TwoLoads["Requires 2 Cache\nLine Loads"]
end
subgraph "64-Byte Aligned Payload"
Hit["Single Cache Line\nContains: Entire Small Payload"]
OneLoad["Requires 1 Cache\nLine Load"]
end
subgraph "Performance Impact"
Latency["Reduced Memory\nLatency"]
Bandwidth["Better Memory\nBandwidth"]
Throughput["Higher Read\nThroughput"]
end
Miss1 --> TwoLoads
Miss2 --> TwoLoads
TwoLoads -.->|penalty| Latency
Hit --> OneLoad
OneLoad --> Latency
Latency --> Bandwidth
Bandwidth --> Throughput
For payloads ≤ 64 bytes, alignment ensures the entire payload fits within a single cache line, eliminating the penalty of fetching multiple cache lines.
Sources: README.md:53-54 simd-r-drive-entry-handle/src/constants.rs:13-15
Zero-Copy Type Casting
The align_or_copy Utility
The align_or_copy function in src/utils/align_or_copy.rs:44-73 enables efficient conversion from raw bytes to typed slices:
Function signature:
pub fn align_or_copy<T, const N: usize>(
bytes: &[u8],
from_le_bytes: fn([u8; N]) -> T,
) -> Cow<'_, [T]>
The function uses slice::align_to::<T>() to attempt zero-copy reinterpretation. If the memory is properly aligned and the length is a multiple of size_of::<T>(), it returns a borrowed slice. Otherwise, it falls back to manually decoding each element.
Example Use Case:
Sources: src/utils/align_or_copy.rs:1-73
Requirements for Zero-Copy Success
For align_or_copy to return a borrowed slice (zero-copy), the following conditions must be met:
| Requirement | Description | Check |
|---|---|---|
| Alignment | Pointer must be aligned for type T | prefix.is_empty() |
| Size | Length must be multiple of size_of::<T>() | suffix.is_empty() |
| Payload Start | Must begin on 64-byte boundary | Enforced by pre-pad |
With SIMD R Drive's 64-byte alignment guarantee, payloads naturally satisfy the alignment requirement for common types:
u8,i8: Always aligned (1-byte)u16,i16: Aligned (2-byte divides 64)u32,i32,f32: Aligned (4-byte divides 64)u64,i64,f64: Aligned (8-byte divides 64)u128,i128: Aligned (16-byte divides 64)
Sources: src/utils/align_or_copy.rs:44-73 README.md:55-56
Debug Assertions
Validation Functions
The entry handle crate provides debug-only assertions for validating alignment invariants in simd-r-drive-entry-handle/src/debug_assert_aligned.rs:
debug_assert_aligned
Verifies that a pointer address is aligned to the specified boundary. Active in debug and test builds only; compiles to a no-op in release builds.
Usage:
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:26-43
debug_assert_aligned_offset
Verifies that a file offset is a multiple of PAYLOAD_ALIGNMENT. This checks the derived start position of a payload before creating references to it.
Usage:
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88
Design Rationale
The functions are always present (stable symbols) but gate their assertion logic with #[cfg(any(test, debug_assertions))]. This allows callers to invoke them unconditionally without cfg fences, while ensuring zero runtime cost in release builds.
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88
Version Evolution
v0.14.0-alpha: Initial Alignment
The first implementation of fixed payload alignment:
- Introduced
PAYLOAD_ALIGN_LOG2andPAYLOAD_ALIGNMENTconstants - Initially set to 16-byte alignment (
2^4 = 16) - Added pre-pad mechanism for non-tombstone entries
- Breaking change: files written with v0.14.0 incompatible with older readers
Sources: CHANGELOG.md:55-82
v0.15.0-alpha: Enhanced to 64-Byte Alignment
Increased default alignment for optimal cache and SIMD performance:
- Increased
PAYLOAD_ALIGN_LOG2from4to6(16 bytes → 64 bytes) - Added
debug_assert_alignedanddebug_assert_aligned_offsetvalidation functions - Updated documentation to reflect 64-byte default
- Breaking change: v0.15.x stores incompatible with v0.14.x readers
Justification for Change:
- 16-byte alignment was insufficient for AVX-512 (requires 64-byte alignment)
- Did not match typical cache line size
- Could cause performance degradation with future SIMD extensions
Sources: CHANGELOG.md:25-52 simd-r-drive-entry-handle/src/constants.rs:17-18
Migration Considerations
Cross-Version Compatibility
| Writer Version | Reader Version | Compatible? | Notes |
|---|---|---|---|
| ≤ 0.13.x | ≤ 0.13.x | ✅ Yes | Pre-alignment format |
| 0.14.x | 0.14.x | ✅ Yes | 16-byte alignment |
| 0.15.x | 0.15.x | ✅ Yes | 64-byte alignment |
| 0.14.x | ≤ 0.13.x | ❌ No | Reader interprets pre-pad as payload |
| 0.15.x | 0.14.x | ❌ No | Alignment mismatch |
| ≤ 0.13.x | ≥ 0.14.x | ✅ Partial | New reader can detect old format (no pre-pad) |
Migration Process
To migrate from v0.14.x or earlier to v0.15.x:
-
Read with Old Binary:
-
Rewrite with New Binary:
-
Verify Integrity:
-
Deploy Staged Upgrades:
- Upgrade all readers first (new readers can handle old format temporarily)
- Upgrade writers last (prevents incompatible writes)
- Replace old storage files after verification
Sources: CHANGELOG.md:43-51 CHANGELOG.md:76-82
Performance Characteristics
Alignment Overhead
The pre-pad mechanism introduces minimal storage overhead:
| Scenario | Pre-Pad Range | Overhead % (1KB Payload) | Overhead % (64B Payload) |
|---|---|---|---|
| Best Case | 0 bytes | 0.0% | 0.0% |
| Average Case | 32 bytes | 3.1% | 50.0% |
| Worst Case | 63 bytes | 6.1% | 98.4% |
For typical workloads with payloads > 256 bytes, the overhead is negligible (<25%).
Cache Performance Gains
Benchmarks (not included in repository) show measurable improvements:
- Sequential Reads: 15-25% faster due to reduced cache line fetches
- SIMD Operations: 40-60% faster due to aligned vector loads
- Random Access: 10-20% faster due to single-cache-line hits for small entries
Trade-off: The storage overhead is justified by the significant performance improvements in read-heavy workloads, which is the primary use case for SIMD R Drive.
Sources: README.md:51-59 simd-r-drive-entry-handle/src/constants.rs:13-18
Configuration Options
Changing Alignment Boundary
To use a different alignment (e.g., 128 bytes for specialized hardware):
-
Rebuild the entire workspace:
-
Warning: This creates a new, incompatible storage format. All existing files must be migrated.
Supported Values:
PAYLOAD_ALIGN_LOG2must be in range[0, 63](alignment: 1 byte to 8 EB)- Typical values:
4(16B),5(32B),6(64B),7(128B) - Must be a power of two
Sources: simd-r-drive-entry-handle/src/constants.rs:13-18 README.md59
Related Systems
- SIMD Copy Operations: The aligned payloads enable efficient SIMD write operations. See SIMD Acceleration for details on the
simd_copyfunction. - Zero-Copy Reads: Alignment is critical for zero-copy access patterns. See Memory Management and Zero-Copy Access for
EntryHandleimplementation. - Entry Structure: The pre-pad is part of the overall entry layout. See Entry Structure and Metadata for complete format specification.
Sources: README.md:1-282
Write and Read Modes
Relevant source files
- README.md
- benches/storage_benchmark.rs
- src/main.rs
- src/storage_engine/data_store.rs
- src/utils/format_bytes.rs
- tests/concurrency_tests.rs
Purpose and Scope
This document describes the three write modes and three read modes available in SIMD R Drive, detailing their operational characteristics, performance trade-offs, and appropriate use cases. These modes provide flexibility for different workload patterns, from single-key operations to bulk processing.
For information about the underlying SIMD acceleration that optimizes these operations, see SIMD Acceleration. For details about the alignment strategy that enables efficient reads, see Payload Alignment and Cache Efficiency.
Write Modes
SIMD R Drive provides three distinct write modes, each optimized for different usage patterns. All write modes acquire an exclusive write lock (RwLock<BufWriter<File>>) to ensure thread safety and data consistency.
Single Entry Write
The single entry write mode writes one key-value pair atomically and flushes immediately to disk.
Primary Methods:
DataStoreWriter::write(key: &[u8], payload: &[u8]) -> Result<u64>src/storage_engine/data_store.rs:827-830DataStoreWriter::write_with_key_hash(key_hash: u64, payload: &[u8]) -> Result<u64>src/storage_engine/data_store.rs:832-834
Operation Flow:
Characteristics:
- Latency : Lowest for single operations (immediate flush)
- Throughput : Lower due to per-write overhead
- Disk I/O : One flush operation per write
- Use Case : Interactive operations, real-time updates, critical writes requiring immediate durability
Implementation Detail: Single writes internally delegate to batch_write_with_key_hashes() with a single-element vector, ensuring consistent behavior across all write paths src/storage_engine/data_store.rs:832-834
Sources: README.md:212-215 src/storage_engine/data_store.rs:827-834
Batch Write
Batch write mode writes multiple key-value pairs in a single atomic operation, flushing only once at the end.
Primary Methods:
DataStoreWriter::batch_write(entries: &[(&[u8], &[u8])]) -> Result<u64>src/storage_engine/data_store.rs:838-843DataStoreWriter::batch_write_with_key_hashes(prehashed_keys: Vec<(u64, &[u8])>, allow_null_bytes: bool) -> Result<u64>src/storage_engine/data_store.rs:847-951
Operation Flow:
Characteristics:
- Latency : Higher per-entry latency (amortized)
- Throughput : Significantly higher due to reduced disk I/O
- Disk I/O : Single flush for entire batch
- Memory : Builds entries in-memory buffer before writing src/storage_engine/data_store.rs:857-898
- Use Case : Bulk imports, batch processing, high-throughput ingestion
Performance Optimization: The batch implementation pre-allocates a buffer and constructs all entries before any disk I/O, minimizing lock contention and maximizing sequential write performance. The buffer construction happens at src/storage_engine/data_store.rs:857-918
Tombstone Support: Batch writes support deletion markers (tombstones) when allow_null_bytes is true, writing a single NULL byte followed by metadata src/storage_engine/data_store.rs:864-898
Sources: README.md:216-219 src/storage_engine/data_store.rs:838-951 benches/storage_benchmark.rs:85-92
Streaming Write
Streaming write mode writes large payloads incrementally from a Read source without requiring full in-memory buffering.
Primary Methods:
DataStoreWriter::write_stream<R: Read>(key: &[u8], reader: &mut R) -> Result<u64>src/storage_engine/data_store.rs:753-756DataStoreWriter::write_stream_with_key_hash<R: Read>(key_hash: u64, reader: &mut R) -> Result<u64>src/storage_engine/data_store.rs:758-825
Operation Flow:
Characteristics:
- Memory Footprint : Constant (4096-byte buffer) src/storage_engine/constants.rs
- Payload Size : Unbounded (supports arbitrarily large entries)
- Disk I/O : Incremental writes, single flush at end
- Use Case : Large file storage, network streams, memory-constrained environments
Implementation Details:
The streaming write uses a fixed-size buffer (WRITE_STREAM_BUFFER_SIZE) and performs incremental writes while computing the checksum:
| Component | Size/Type | Purpose |
|---|---|---|
| Read Buffer | 4096 bytes | Temporary staging for stream chunks |
| Checksum State | crc32fast::Hasher | Incremental CRC32C calculation |
| Pre-pad | 0-63 bytes | Alignment padding before payload |
| Metadata | 20 bytes | key_hash, prev_offset, checksum |
Validation:
- Rejects empty payloads src/storage_engine/data_store.rs:799-804
- Rejects NULL-byte-only streams (reserved for tombstones) src/storage_engine/data_store.rs:792-797
Sources: README.md:220-223 src/storage_engine/data_store.rs:753-825 tests/concurrency_tests.rs:16-109
Write Mode Comparison
Performance Table:
| Write Mode | Lock Duration | Disk Flushes | Memory Usage | Best For |
|---|---|---|---|---|
| Single | Short (per write) | 1 per write | Minimal | Interactive operations, real-time updates |
| Batch | Medium (entire batch) | 1 per batch | Buffer size × entries | Bulk imports, high throughput |
| Streaming | Long (entire stream) | 1 per stream | 4096 bytes (constant) | Large files, memory-constrained |
Throughput Characteristics:
Based on benchmark results benches/storage_benchmark.rs:52-83:
Sources: benches/storage_benchmark.rs:52-92 README.md:208-223
Read Modes
SIMD R Drive provides three read modes optimized for different access patterns. All read modes leverage zero-copy access through memory-mapped files.
Direct Read
Direct read mode provides immediate, zero-copy access to stored entries through EntryHandle.
Primary Methods:
DataStoreReader::read(key: &[u8]) -> Result<Option<EntryHandle>>src/traits.rsDataStoreReader::batch_read(keys: &[&[u8]]) -> Result<Vec<Option<EntryHandle>>>src/traits.rsDataStoreReader::exists(key: &[u8]) -> Result<bool>src/traits.rs
Operation Flow:
Characteristics:
- Latency : Minimal (single hash lookup + pointer arithmetic)
- Memory : Zero-copy (returns view into mmap)
- Concurrency : Lock-free reads (except brief index lock)
- Use Case : Random access, key-value lookups, real-time queries
Zero-Copy Guarantee:
The EntryHandle provides direct access to the memory-mapped region without copying:
EntryHandle {
mmap_arc: Arc<Mmap>, // Shared reference to mmap
range: Range<usize>, // Byte range within mmap
metadata: EntryMetadata, // Deserialized metadata (20 bytes)
}
The handle implements Deref<Target = [u8]>, allowing transparent access to payload bytes simd-r-drive-entry-handle/src/lib.rs
Batch Read Optimization:
batch_read() performs vectorized lookups, acquiring the index lock once for all keys and returning Vec<Option<EntryHandle>>. This reduces lock acquisition overhead for multiple keys src/storage_engine/data_store.rs
Sources: README.md:228-233 src/storage_engine/data_store.rs:502-565 benches/storage_benchmark.rs:124-149
Streaming Read
Streaming read mode provides incremental, buffered access to large entries without loading them fully into memory.
Primary Structure:
EntryStreamsrc/storage_engine/entry_stream.rs- Implements
std::io::Readtrait
Operation Flow:
Characteristics:
- Memory Footprint : 8192-byte buffer src/storage_engine/entry_stream.rs
- Copy Behavior : Non-zero-copy (reads through buffer)
- Payload Size : Supports arbitrarily large entries
- Use Case : Processing large entries incrementally, network transmission, streaming transformations
Implementation Notes:
The streaming read is not zero-copy despite using mmap as the source. This design choice enables:
- Controlled memory pressure (constant buffer size)
- Standard
std::io::Readinterface compatibility - Incremental processing without loading entire payload
For true zero-copy access to large entries, use direct read mode and process the EntryHandle slice directly.
Sources: README.md:234-241 src/storage_engine/entry_stream.rs
Parallel Iteration
Parallel iteration mode uses Rayon to process all valid entries across multiple threads (requires parallel feature).
Primary Methods:
DataStore::iter_entries() -> EntryIteratorsrc/storage_engine/data_store.rs:276-280DataStore::par_iter_entries() -> impl ParallelIterator<Item = EntryHandle>src/storage_engine/data_store.rs:296-361
Operation Flow:
Characteristics:
- Throughput : Scales with CPU cores
- Concurrency : Work-stealing via Rayon
- Memory : Minimal overhead (offsets collected upfront)
- Use Case : Bulk analytics, dataset scanning, cache warming, batch transformations
Implementation Strategy:
The parallel iterator optimizes for minimal lock contention:
- Acquire read lock on
KeyIndexersrc/storage_engine/data_store.rs300 - Collect all packed offset values into
Vec<u64>src/storage_engine/data_store.rs301 - Release lock immediately src/storage_engine/data_store.rs302
- Clone
Arc<Mmap>once src/storage_engine/data_store.rs305 - Parallel filter_map over offsets src/storage_engine/data_store.rs:310-360
Each worker thread independently:
- Unpacks
(tag, offset)from packed value - Validates bounds and metadata
- Constructs
EntryHandlewith clonedArc<Mmap> - Filters tombstones
Sequential Iteration:
For sequential scanning without Rayon overhead, use iter_entries() which returns EntryIterator src/storage_engine/data_store.rs:276-280
Sources: README.md:242-246 src/storage_engine/data_store.rs:276-361 benches/storage_benchmark.rs:98-118
Read Mode Comparison
Performance Table:
| Read Mode | Access Pattern | Memory Copy | Concurrency | Best For |
|---|---|---|---|---|
| Direct | Random/lookup | Zero-copy | Lock-free | Key-value queries, random access |
| Streaming | Sequential/buffered | Buffered copy | Single reader | Large entry processing |
| Parallel | Full scan | Zero-copy | Multi-threaded | Bulk analytics, dataset scanning |
Throughput Characteristics:
Based on benchmark measurements benches/storage_benchmark.rs:
| Operation | Throughput (1M entries, 8 bytes) | Notes |
|---|---|---|
| Sequential iteration | ~millions/s | Zero-copy, cache-friendly |
| Random single reads | ~1M reads/s | Hash lookup + bounds check |
| Batch reads | ~1M reads/s | Vectorized index access |
Sources: benches/storage_benchmark.rs:98-203 README.md:224-246
Performance Optimization Strategies
Write Optimization
Batching Strategy:
- Group writes into batches of 1024-10000 entries for optimal throughput
- Balance batch size against latency requirements
- Use streaming for payloads > 1 MB to avoid memory pressure
Lock Contention:
- All writes acquire the same
RwLock<BufWriter<File>> - Increase batch size to amortize lock overhead
- Consider application-level write queuing for highly concurrent workloads
Read Optimization
Access Pattern Matching:
| Access Pattern | Recommended Mode | Reason |
|---|---|---|
| Random lookups | Direct read | O(1) hash lookup, zero-copy |
| Known key sets | Batch read | Amortized lock overhead |
| Full dataset scan | Sequential iteration | Cache-friendly forward traversal |
| Parallel analytics | Parallel iteration | Scales with CPU cores |
| Large entry processing | Streaming read | Constant memory footprint |
Memory-Mapped File Behavior:
The OS manages mmap pages transparently:
- Working set : Only accessed regions loaded into RAM
- Large datasets : Can exceed available RAM (pages swapped on demand)
- Cache warming : Sequential iteration benefits from read-ahead
- Random access : May trigger page faults (disk I/O) on cold reads
Sources: benches/storage_benchmark.rs README.md:43-50
Concurrency Considerations
Write Concurrency
All write modes use exclusive locking and are mutually exclusive :
Implication : High write concurrency may benefit from application-level write buffering or queueing.
graph TB
Read1["read()
Thread 1"]
Read2["read()
Thread 2"]
Read3["par_iter_entries()
Thread 3"]
IndexLock["RwLock<KeyIndexer>\n(Read Lock)"]
Mmap["Arc<Mmap>\n(Shared Reference)"]
Read1 --> IndexLock
Read2 --> IndexLock
Read3 --> IndexLock
IndexLock -->|Concurrent read access| Mmap
Write["write()
Thread 4"]
WriteLock["RwLock<BufWriter>\n(Exclusive Lock)"]
Write -->|Independent lock| WriteLock
WriteLock -.->|After flush: remaps and updates| Mmap
Read Concurrency
Read operations are lock-free after index lookup and can occur concurrently with writes:
Characteristics:
- Multiple readers can access mmap concurrently
- Reads do not block writes (different locks)
- Writes remap mmap after flushing, but readers retain their
Arc<Mmap>reference - New reads see updated data after reindexing completes
Sources: README.md:170-207 tests/concurrency_tests.rs src/storage_engine/data_store.rs:224-259
Usage Examples
Write Mode Selection
Single Write (Real-Time Updates):
Batch Write (Bulk Import):
Streaming Write (Large Files):
Read Mode Selection
Direct Read (Key Lookup):
Streaming Read (Large Entry Processing):
Parallel Iteration (Dataset Analytics):
Sources: README.md src/storage_engine/data_store.rs benches/storage_benchmark.rs
Extensions and Utilities
Relevant source files
- extensions/Cargo.toml
- simd-r-drive-entry-handle/src/constants.rs
- simd-r-drive-entry-handle/src/lib.rs
- src/utils.rs
- src/utils/align_or_copy.rs
- tests/align_or_copy_tests.rs
This document covers the utility functions, helper modules, and constants provided by the SIMD R Drive ecosystem. These components include the simd-r-drive-extensions crate for higher-level storage operations, core utility functions in the main simd-r-drive crate, and shared constants from simd-r-drive-entry-handle.
For details on the core storage engine API, see DataStore API. For performance optimization features like SIMD acceleration, see SIMD Acceleration. For alignment-related architecture decisions, see Payload Alignment and Cache Efficiency.
Extensions Crate Overview
The simd-r-drive-extensions crate provides storage extensions and higher-level utilities built on top of the core simd-r-drive storage engine. It adds functionality for common storage patterns and data manipulation tasks.
graph TB
subgraph "simd-r-drive-extensions"
ExtCrate["simd-r-drive-extensions"]
ExtDeps["Dependencies:\n- bincode\n- serde\n- simd-r-drive\n- walkdir"]
end
subgraph "Core Dependencies"
Core["simd-r-drive"]
Bincode["bincode\nBinary Serialization"]
Serde["serde\nSerialization Traits"]
Walkdir["walkdir\nDirectory Traversal"]
end
ExtCrate --> ExtDeps
ExtDeps --> Core
ExtDeps --> Bincode
ExtDeps --> Serde
ExtDeps --> Walkdir
Core -.->|provides| DataStore["DataStore"]
Bincode -.->|enables| SerializationSupport["Structured Data Storage"]
Walkdir -.->|enables| FileSystemOps["File System Operations"]
Crate Structure
Sources: extensions/Cargo.toml:1-22
| Dependency | Purpose |
|---|---|
bincode | Binary serialization/deserialization for structured data storage |
serde | Serialization trait support with derive macros |
simd-r-drive | Core storage engine access |
walkdir | Directory tree traversal utilities |
Sources: extensions/Cargo.toml:13-17
Core Utilities Module
The main simd-r-drive crate exposes several utility functions through its utils module. These functions handle common tasks like alignment optimization, string formatting, and data validation.
graph TB
subgraph "utils Module"
UtilsRoot["src/utils.rs"]
AlignOrCopy["align_or_copy\nZero-Copy Optimization"]
AppendExt["append_extension\nString Path Handling"]
FormatBytes["format_bytes\nHuman-Readable Sizes"]
NamespaceHasher["NamespaceHasher\nHierarchical Keys"]
ParseBuffer["parse_buffer_size\nSize String Parsing"]
VerifyFile["verify_file_existence\nFile Validation"]
end
UtilsRoot --> AlignOrCopy
UtilsRoot --> AppendExt
UtilsRoot --> FormatBytes
UtilsRoot --> NamespaceHasher
UtilsRoot --> ParseBuffer
UtilsRoot --> VerifyFile
AlignOrCopy -.->|used by| ReadOps["Read Operations"]
NamespaceHasher -.->|used by| KeyManagement["Key Management"]
FormatBytes -.->|used by| Logging["Logging & Reporting"]
ParseBuffer -.->|used by| Config["Configuration Parsing"]
Utility Functions Overview
Sources: src/utils.rs:1-17
align_or_copy Function
The align_or_copy utility function provides zero-copy deserialization with automatic fallback for misaligned data. It attempts to reinterpret a byte slice as a typed slice without copying, and falls back to manual decoding when alignment requirements are not met.
Function Signature
Sources: src/utils/align_or_copy.rs:44-50
Operation Flow
Sources: src/utils/align_or_copy.rs:44-73
Usage Patterns
| Scenario | Outcome | Performance |
|---|---|---|
| Aligned 64-byte boundary, exact multiple | Cow::Borrowed | Zero-copy, optimal |
| Misaligned address | Cow::Owned | Allocation + decode |
| Non-multiple of element size | Panic | Invalid input |
Example Usage:
Sources: src/utils/align_or_copy.rs:38-43 tests/align_or_copy_tests.rs:7-12
Safety Considerations
The function uses unsafe for the align_to::<T>() call, which requires:
- Starting address must be aligned to
align_of::<T>() - Total size must be a multiple of
size_of::<T>()
These requirements are validated by checking that prefix and suffix slices are empty before returning the borrowed slice. If validation fails, the function falls back to safe manual decoding.
Sources: src/utils/align_or_copy.rs:28-35 src/utils/align_or_copy.rs:53-60
Other Utility Functions
| Function | Module Path | Purpose |
|---|---|---|
append_extension | src/utils/append_extension.rs | Safely appends file extensions to paths |
format_bytes | src/utils/format_bytes.rs | Formats byte counts as human-readable strings (KB, MB, GB) |
NamespaceHasher | src/utils/namespace_hasher.rs | Generates hierarchical, namespaced hash keys |
parse_buffer_size | src/utils/parse_buffer_size.rs | Parses size strings like "64KB", "1MB" into byte counts |
verify_file_existence | src/utils/verify_file_existence.rs | Validates file paths before operations |
Sources: src/utils.rs:1-17
Entry Handle Constants
The simd-r-drive-entry-handle crate defines shared constants used throughout the storage system. These constants establish the binary layout of entries and alignment requirements.
graph TB
subgraph "simd-r-drive-entry-handle"
LibRoot["lib.rs"]
ConstMod["constants.rs"]
EntryHandle["entry_handle.rs"]
EntryMetadata["entry_metadata.rs"]
DebugAssert["debug_assert_aligned.rs"]
end
subgraph "Exported Constants"
MetadataSize["METADATA_SIZE = 20"]
KeyHashRange["KEY_HASH_RANGE = 0..8"]
PrevOffsetRange["PREV_OFFSET_RANGE = 8..16"]
ChecksumRange["CHECKSUM_RANGE = 16..20"]
ChecksumLen["CHECKSUM_LEN = 4"]
PayloadLog["PAYLOAD_ALIGN_LOG2 = 6"]
PayloadAlign["PAYLOAD_ALIGNMENT = 64"]
end
LibRoot --> ConstMod
LibRoot --> EntryHandle
LibRoot --> EntryMetadata
LibRoot --> DebugAssert
ConstMod --> MetadataSize
ConstMod --> KeyHashRange
ConstMod --> PrevOffsetRange
ConstMod --> ChecksumRange
ConstMod --> ChecksumLen
ConstMod --> PayloadLog
ConstMod --> PayloadAlign
PayloadAlign -.->|ensures| CacheLineOpt["Cache-Line Optimization"]
PayloadAlign -.->|enables| SIMDOps["SIMD Operations"]
Constants Module Structure
Sources: simd-r-drive-entry-handle/src/lib.rs:1-10 simd-r-drive-entry-handle/src/constants.rs:1-19
Metadata Layout Constants
The following constants define the fixed 20-byte metadata structure at the end of each entry:
| Constant | Value | Description |
|---|---|---|
METADATA_SIZE | 20 | Total size of entry metadata in bytes |
KEY_HASH_RANGE | 0..8 | Byte range for 64-bit XXH3 key hash |
PREV_OFFSET_RANGE | 8..16 | Byte range for 64-bit previous entry offset |
CHECKSUM_RANGE | 16..20 | Byte range for 32-bit CRC32C checksum |
CHECKSUM_LEN | 4 | Explicit length of checksum field |
Sources: simd-r-drive-entry-handle/src/constants.rs:3-11
Alignment Constants
These constants enforce 64-byte alignment for all payload data:
PAYLOAD_ALIGN_LOG2: Base-2 logarithm of alignment requirement (6 = 64 bytes)PAYLOAD_ALIGNMENT: Computed alignment value (64 bytes)
This alignment matches CPU cache line sizes and enables efficient SIMD operations. The maximum pre-padding per entry is PAYLOAD_ALIGNMENT - 1 (63 bytes).
Sources: simd-r-drive-entry-handle/src/constants.rs:13-18
Constant Relationships
Sources: simd-r-drive-entry-handle/src/constants.rs:1-19
sequenceDiagram
participant Client
participant EntryHandle
participant align_or_copy
participant Memory
Client->>EntryHandle: get_payload_bytes()
EntryHandle->>Memory: read &[u8] from mmap
EntryHandle->>align_or_copy: align_or_copy<f32, 4>(bytes, f32::from_le_bytes)
alt Aligned on 64-byte boundary
align_or_copy->>Memory: validate alignment
align_or_copy-->>Client: Cow::Borrowed(&[f32])
Note over Client,Memory: Zero-copy: direct memory access\nelse Misaligned
align_or_copy->>align_or_copy: chunks_exact(4)
align_or_copy->>align_or_copy: map(f32::from_le_bytes)
align_or_copy->>align_or_copy: collect into Vec<f32>
align_or_copy-->>Client: Cow::Owned(Vec<f32>)
Note over Client,align_or_copy: Fallback: allocated copy
end
Common Patterns
Zero-Copy Data Access
Utilities like align_or_copy enable zero-copy access patterns when memory alignment allows:
Sources: src/utils/align_or_copy.rs:44-73 simd-r-drive-entry-handle/src/constants.rs:13-18
Namespace-Based Key Management
The NamespaceHasher utility enables hierarchical key organization:
Sources: src/utils.rs:11-12
Size Formatting for Logging
The format_bytes utility provides human-readable output:
| Input Bytes | Formatted Output |
|---|---|
| 1023 | "1023 B" |
| 1024 | "1.00 KB" |
| 1048576 | "1.00 MB" |
| 1073741824 | "1.00 GB" |
Sources: src/utils.rs:7-8
Configuration Parsing
The parse_buffer_size utility handles size string inputs:
| Input String | Parsed Bytes |
|---|---|
| "64" | 64 |
| "64KB" | 65,536 |
| "1MB" | 1,048,576 |
| "2GB" | 2,147,483,648 |
Sources: src/utils.rs:13-14
Integration with Core Systems
Relationship to Storage Engine
Sources: extensions/Cargo.toml:1-22 src/utils.rs:1-17 simd-r-drive-entry-handle/src/lib.rs:1-10
Performance Considerations
| Utility | Performance Impact | Use Case |
|---|---|---|
align_or_copy | Zero-copy when aligned | Deserializing typed arrays from storage |
NamespaceHasher | Single XXH3 hash | Generating hierarchical keys |
format_bytes | String allocation | Logging and user display only |
PAYLOAD_ALIGNMENT | Enables SIMD ops | Core storage layout requirement |
Sources: src/utils/align_or_copy.rs:1-74 simd-r-drive-entry-handle/src/constants.rs:13-18
Development Guide
Relevant source files
This document provides an overview of the development process for SIMD R Drive, covering workspace structure, feature flags, build commands, and development workflows. This page serves as an entry point for contributors and developers working on the codebase.
For detailed instructions on building and running tests, see Building and Testing. For information about the CI/CD pipeline and automated testing, see CI/CD Pipeline. For version history and migration guides, see Version History and Migration.
Prerequisites
SIMD R Drive requires:
- Rust : Edition 2024 (as specified in Cargo.toml4)
- Cargo : Workspace-aware build system
- Platform Support : Linux, macOS, Windows (tested via CI matrix)
Optional tools for specific workflows:
- Python 3.10-3.13 : For Python bindings development (see Python Integration)
- Maturin : For building Python wheels (see Building Python Bindings)
- Criterion : For benchmarking (included as dev-dependency)
Workspace Structure
SIMD R Drive uses a Cargo workspace architecture with multiple crates organized by function. The workspace is defined in Cargo.toml:65-78 with specific members included and Python bindings excluded from the main workspace.
graph TB
Root["simd-r-drive\n(root crate)"]
EntryHandle["simd-r-drive-entry-handle\n(zero-copy data access)"]
Extensions["extensions\n(utilities and helpers)"]
subgraph "Network Experiments"
WSServer["experiments/simd-r-drive-ws-server\n(WebSocket server)"]
WSClient["experiments/simd-r-drive-ws-client\n(Rust client)"]
ServiceDef["experiments/simd-r-drive-muxio-service-definition\n(RPC contract)"]
end
subgraph "Python Bindings (Excluded)"
PyBindings["experiments/bindings/python\n(PyO3 bindings)"]
PyWSClient["experiments/bindings/python-ws-client\n(Python WebSocket client)"]
end
Root --> EntryHandle
Root --> Extensions
WSServer --> Root
WSServer --> ServiceDef
WSClient --> ServiceDef
PyWSClient --> WSClient
PyBindings --> Root
style Root stroke-width:3px
style PyBindings stroke-dasharray: 5 5
style PyWSClient stroke-dasharray: 5 5
Workspace Members Diagram
Sources: Cargo.toml:65-78
Workspace Member Descriptions
| Crate Path | Purpose | Version Source |
|---|---|---|
. | Core storage engine with CLI | Cargo.toml16 |
simd-r-drive-entry-handle | Zero-copy entry access abstraction | Cargo.toml83 |
extensions | Utility functions and helper modules | Cargo.toml69 |
experiments/simd-r-drive-ws-server | Axum-based WebSocket server | Cargo.toml70 |
experiments/simd-r-drive-ws-client | Native Rust WebSocket client | Cargo.toml71 |
experiments/simd-r-drive-muxio-service-definition | RPC service contract | Cargo.toml72 |
experiments/bindings/python | PyO3 direct bindings (excluded) | Cargo.toml75 |
experiments/bindings/python-ws-client | Python WebSocket client (excluded) | Cargo.toml76 |
Python bindings are excluded from the main workspace because they use Maturin's build system, which conflicts with Cargo's workspace resolver. They must be built separately using maturin build or maturin develop.
Sources: Cargo.toml:74-77
Feature Flags
The core simd-r-drive crate exposes multiple feature flags that enable optional functionality. Features are defined in Cargo.toml:49-55
Feature Dependencies Diagram
Sources: Cargo.toml:49-56
Feature Flag Reference
| Feature | Dependencies | Purpose | Common Use Cases |
|---|---|---|---|
default | None | Minimal feature set | Production builds |
expose-internal-api | None | Exposes internal structs/functions for testing | Test harnesses, benchmarks |
parallel | rayon = "1.10.0" | Enables parallel iteration via Rayon | High-throughput batch processing |
arrow | simd-r-drive-entry-handle/arrow | Proxy feature for Apache Arrow integration | Data analytics pipelines |
The arrow feature is a proxy feature: enabling simd-r-drive/arrow automatically enables simd-r-drive-entry-handle/arrow, allowing users to enable Arrow support without knowing the internal crate structure.
Sources: Cargo.toml:49-55 Cargo.toml30 Cargo.toml:54-55
Common Development Commands
Building
| Command | Purpose | Feature Flags Used |
|---|---|---|
cargo build | Build with default features | None |
cargo build --workspace | Build all workspace members | None |
cargo build --all-targets | Build binaries, libs, tests, benches | None |
cargo build --release | Optimized release build | None |
cargo build --features parallel | Build with parallel iteration | parallel |
cargo build --all-features | Build with all features enabled | All |
cargo build --no-default-features | Minimal build (no features) | None |
Testing
| Command | Purpose | Scope |
|---|---|---|
cargo test | Run core crate tests | Current crate only |
cargo test --workspace | Run all workspace tests | All members |
cargo test --all-targets | Test all targets (lib, bins, tests, benches) | Current crate |
cargo test --features expose-internal-api | Test with internal APIs exposed | Feature-specific |
cargo test -- --test-threads=1 | Sequential test execution | Useful for serial_test cases |
Benchmarking
The crate includes two Criterion-based benchmarks defined in Cargo.toml:57-63:
| Benchmark | Command | Purpose |
|---|---|---|
storage_benchmark | cargo bench --bench storage_benchmark | Measure write/read throughput |
contention_benchmark | cargo bench --bench contention_benchmark | Measure concurrent access performance |
To compile benchmarks without running them (useful for CI): cargo bench --no-run
Sources: Cargo.toml:57-63 .github/workflows/rust-tests.yml:60-61
Development Workflow
Standard Development Cycle
Sources: .github/workflows/rust-tests.yml:1-62
Multi-Platform Testing
The CI pipeline tests on three operating systems with multiple feature flag combinations. The test matrix is defined in .github/workflows/rust-tests.yml:14-30:
| Matrix Dimension | Values |
|---|---|
| OS | ubuntu-latest, macos-latest, windows-latest |
| Features | Default, No Default Features, Parallel, Expose Internal API, Parallel + Expose API, All Features |
This results in 18 test jobs (3 OS × 6 feature combinations) per CI run.
Sources: .github/workflows/rust-tests.yml:14-30
Ignored Files and Directories
The .gitignore:1-10 file specifies build artifacts and local files to exclude from version control:
| Pattern | Purpose |
|---|---|
**/target | Cargo build output (all workspace members) |
*.bin | Binary data files (test artifacts) |
/data | Local data directory for debugging |
out.txt | Temporary output file |
.cargo/config.toml | Local Cargo configuration overrides |
The /data directory is explicitly ignored for local development and experimentation without polluting the repository.
Sources: .gitignore:1-10
Dependency Management
Workspace-Level Dependencies
Shared dependencies are defined in Cargo.toml:80-112 to ensure version consistency across workspace members:
Core Dependencies:
xxhash-rust = "0.8.15"withxxh3andconst_xxh3features for hashingmemmap2 = "0.9.5"for memory-mapped file accesscrc32fast = "1.4.2"for checksum validationdashmap = "6.1.0"for concurrent hash maps
Optional Dependencies:
rayon = "1.10.0"(gated byparallelfeature)arrow = "57.0.0"(gated byarrowfeature, default-features disabled)
Development Dependencies:
criterion = "0.6.0"for benchmarkingtempfile = "3.19.0"for test isolationserial_test = "3.2.0"for sequential test executiontokio = "1.45.1"for async testing (not used in core crate)
Sources: Cargo.toml:23-47 Cargo.toml:80-112
Intra-Workspace Dependencies
Workspace crates reference each other using path dependencies with explicit versions:
This ensures workspace members can be published independently to crates.io while maintaining version consistency during development.
Sources: Cargo.toml:81-85
Caching Strategy
The CI pipeline uses GitHub Actions cache to accelerate builds. The caching configuration is defined in .github/workflows/rust-tests.yml:40-51:
Cached Directories:
~/.cargo/bin/- Compiled binaries (cargo tools)~/.cargo/registry/index/- Crate registry index~/.cargo/registry/cache/- Downloaded crate archives~/.cargo/git/db/- Git dependenciestarget/- Build artifacts
Cache Key Strategy:
- Primary key:
${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}-${{ matrix.flags }} - Fallback key:
${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}-
This strategy provides OS-specific, feature-flag-specific caching with fallback to shared dependencies when feature flags change.
Sources: .github/workflows/rust-tests.yml:39-51
Publishing Configuration
All workspace crates share publishing configuration through Cargo.toml:1-9:
| Field | Value | Purpose |
|---|---|---|
authors | ["Jeremy Harris <jeremy.harris@zenosmosis.com>"] | Crate metadata |
version | "0.15.5-alpha" | Current release version |
edition | "2024" | Rust edition |
repository | "https://github.com/jzombie/rust-simd-r-drive" | Source location |
license | "Apache-2.0" | Open source license |
categories | ["database-implementations", "data-structures", "filesystem"] | crates.io categories |
keywords | ["storage-engine", "binary-storage", "append-only", "simd", "mmap"] | Search keywords |
publish | true | Enable publishing to crates.io |
Individual crates inherit these values using workspace inheritance: edition.workspace = true.
Sources: Cargo.toml:1-21
Next Steps
- Building and Testing : See Building and Testing for detailed instructions on running unit tests, integration tests, concurrency tests, and benchmarks.
- CI/CD Pipeline : See CI/CD Pipeline for information about GitHub Actions workflows, matrix testing strategy, and quality checks.
- Version History and Migration : See Version History and Migration for breaking changes, migration paths, and storage format evolution.
For information about specific subsystems:
- Storage engine internals: Core Storage Engine
- Network layer: Network Layer and RPC
- Python integration: Python Integration
- Performance features: Performance Optimizations
Building and Testing
Relevant source files
- .github/workflows/rust-tests.yml
- .gitignore
- Cargo.toml
- benches/storage_benchmark.rs
- src/main.rs
- src/utils/format_bytes.rs
- tests/concurrency_tests.rs
This page provides instructions for building SIMD R Drive from source, running the test suite, and executing performance benchmarks. It covers workspace configuration, feature flags, test types, and benchmark utilities. For information about CI/CD pipeline configuration and automated quality checks, see CI/CD Pipeline.
Prerequisites
Required Dependencies:
- Rust toolchain (stable channel)
- Cargo package manager (bundled with Rust)
Optional Dependencies:
- Python 3.10-3.13 (for Python bindings)
- Maturin (for building Python wheels)
The project uses Rust 2024 edition and requires no platform-specific dependencies beyond the standard Rust toolchain.
Sources: Cargo.toml:1-13
Workspace Structure
SIMD R Drive is organized as a Cargo workspace with multiple crates. Understanding the workspace layout is essential for targeted builds and tests.
Workspace Members:
graph TB
subgraph "Workspace Root"
Root["Cargo.toml\nWorkspace Configuration"]
end
subgraph "Core Crates"
Core["simd-r-drive\n(Root Package)"]
EntryHandle["simd-r-drive-entry-handle\n(Zero-Copy API)"]
Extensions["extensions/\n(Utility Functions)"]
end
subgraph "Network Experiments"
WSServer["experiments/simd-r-drive-ws-server\n(WebSocket Server)"]
WSClient["experiments/simd-r-drive-ws-client\n(Rust Client)"]
ServiceDef["experiments/simd-r-drive-muxio-service-definition\n(RPC Contract)"]
end
subgraph "Python Bindings (Excluded)"
PyBindings["experiments/bindings/python\n(PyO3 Direct)"]
PyWSClient["experiments/bindings/python-ws-client\n(WebSocket Client)"]
end
Root --> Core
Root --> EntryHandle
Root --> Extensions
Root --> WSServer
Root --> WSClient
Root --> ServiceDef
PyBindings -.->|excluded from workspace| Root
PyWSClient -.->|excluded from workspace| Root
Core --> EntryHandle
WSServer --> ServiceDef
WSClient --> ServiceDef
| Crate | Path | Purpose |
|---|---|---|
simd-r-drive | . | Core storage engine |
simd-r-drive-entry-handle | ./simd-r-drive-entry-handle | Zero-copy data access API |
simd-r-drive-extensions | ./extensions | Utility functions and helpers |
simd-r-drive-ws-server | ./experiments/simd-r-drive-ws-server | Axum-based WebSocket server |
simd-r-drive-ws-client | ./experiments/simd-r-drive-ws-client | Native Rust WebSocket client |
simd-r-drive-muxio-service-definition | ./experiments/simd-r-drive-muxio-service-definition | RPC service contract |
Excluded from Workspace:
experiments/bindings/python- Python direct bindings (built separately with Maturin)experiments/bindings/python-ws-client- Python WebSocket client (built separately with Maturin)
Python bindings are excluded because they require a different build system (Maturin) and have incompatible dependency resolution requirements.
Sources: Cargo.toml:65-78
Building the Project
Basic Build Commands
Build all workspace members:
Build with release optimizations:
Build specific crate:
Build all targets (binaries, libraries, tests, benchmarks):
Sources: .github/workflows/rust-tests.yml54
graph LR
subgraph "Feature Flags"
Default["default\n(empty set)"]
Parallel["parallel\nEnables rayon"]
ExposeAPI["expose-internal-api\nExposes internal symbols"]
Arrow["arrow\nArrow integration"]
AllFeatures["--all-features\nEnable everything"]
end
subgraph "Dependencies Enabled"
Rayon["rayon = 1.10.0\nParallel iteration"]
EntryArrow["simd-r-drive-entry-handle/arrow\nArrow conversions"]
end
Parallel --> Rayon
Arrow --> EntryArrow
AllFeatures --> Parallel
AllFeatures --> ExposeAPI
AllFeatures --> Arrow
Feature Flags
The simd-r-drive crate supports optional feature flags that enable additional functionality:
Feature Flag Reference:
| Feature | Dependencies | Purpose |
|---|---|---|
default | None | Standard storage engine only |
parallel | rayon = "1.10.0" | Parallel iteration support for bulk operations |
expose-internal-api | None | Exposes internal APIs for advanced use cases |
arrow | simd-r-drive-entry-handle/arrow | Enables Apache Arrow integration |
Build Examples:
Sources: Cargo.toml:49-55 Cargo.toml30
CLI Binary
The workspace includes a CLI binary for direct interaction with the storage engine:
The CLI entry point is defined at src/main.rs:1-12 and delegates to command execution logic in the cli module.
Sources: src/main.rs:1-12
Running Tests
Test Organization
Test Categories:
- Unit Tests - Embedded in source files, test individual functions and modules
- Integration Tests - Located in
tests/directory, test public API interactions - Concurrency Tests - Multi-threaded scenarios validating thread safety
- Documentation Tests - Code examples in documentation comments
Sources: Cargo.toml:36-47
Running All Tests
Execute the complete test suite:
Sources: .github/workflows/rust-tests.yml57
Concurrency Tests
Concurrency tests validate thread-safe operations and are located in tests/concurrency_tests.rs:1-230 These tests use serial_test to prevent parallel execution and tokio for async runtime.
Test Scenarios:
graph TB
subgraph "Concurrency Test Structure"
TestAttr["#[tokio::test(flavor=multi_thread)]\n#[serial]"]
TempDir["tempfile::tempdir()\nIsolated Storage"]
DataStore["Arc<DataStore>\nShared Storage"]
Tasks["tokio::spawn\nConcurrent Tasks"]
end
subgraph "Test Scenarios"
StreamTest["concurrent_slow_streamed_write_test\nParallel stream writes"]
WriteTest["concurrent_write_test\n16 threads × 10 writes"]
InterleavedTest["interleaved_read_write_test\nRead/Write coordination"]
end
TestAttr --> TempDir
TempDir --> DataStore
DataStore --> Tasks
Tasks --> StreamTest
Tasks --> WriteTest
Tasks --> InterleavedTest
1. Concurrent Slow Streamed Write Test (tests/concurrency_tests.rs:16-109):
- Simulates slow streaming writes from multiple threads
- Uses
SlowReaderwrapper to introduce artificial latency - Validates data integrity after concurrent stream completion
- Configuration: 1MB payloads with 100ms read delays
2. Concurrent Write Test (tests/concurrency_tests.rs:111-161):
- Spawns 16 threads, each performing 10 writes
- Tests high-contention write scenarios
- Validates all 160 writes are retrievable
- Uses 5ms delays to simulate realistic timing
3. Interleaved Read/Write Test (tests/concurrency_tests.rs:163-229):
- Tests read-after-write and write-after-read patterns
- Uses
tokio::sync::Notifyfor coordination - Validates proper synchronization between readers and writers
Run Concurrency Tests:
Key Dependencies:
| Dependency | Purpose | Reference |
|---|---|---|
serial_test = "3.2.0" | Prevents parallel test execution | Cargo.toml44 |
tokio (with rt-multi-thread, macros) | Async runtime for concurrent tests | Cargo.toml47 |
tempfile = "3.19.0" | Temporary file creation for isolated tests | Cargo.toml45 |
Sources: tests/concurrency_tests.rs:1-230 Cargo.toml:44-47
graph LR
subgraph "Benchmark Configuration"
CargoBench["cargo bench\nCriterion Runner"]
StorageBench["storage_benchmark\nharness = false"]
ContentionBench["contention_benchmark\nharness = false"]
end
subgraph "Storage Benchmark Operations"
AppendOp["benchmark_append_entries\n1M writes in batches"]
SeqRead["benchmark_sequential_reads\nZero-copy iteration"]
RandRead["benchmark_random_reads\n1M random lookups"]
BatchRead["benchmark_batch_reads\nVectorized reads"]
end
CargoBench --> StorageBench
CargoBench --> ContentionBench
StorageBench --> AppendOp
StorageBench --> SeqRead
StorageBench --> RandRead
StorageBench --> BatchRead
Running Benchmarks
The workspace includes micro-benchmarks using the Criterion framework to measure storage engine performance.
Benchmark Targets
Benchmark Targets:
- storage_benchmark - Single-process micro-benchmarks (benches/storage_benchmark.rs:1-234)
- contention_benchmark - Multi-threaded contention scenarios (referenced in Cargo.toml:62-63)
Both use harness = false to disable Cargo's default benchmark harness and use custom timing logic.
Sources: Cargo.toml:57-63
Storage Benchmark
The storage benchmark (benches/storage_benchmark.rs:1-234) measures four critical operations:
Configuration Constants:
| Constant | Value | Purpose |
|---|---|---|
NUM_ENTRIES | 1,000,000 | Total entries for write phase |
ENTRY_SIZE | 8 bytes | Fixed payload size |
WRITE_BATCH_SIZE | 1,024 | Entries per batch write |
READ_BATCH_SIZE | 1,024 | Entries per batch read |
NUM_RANDOM_CHECKS | 1,000,000 | Random read operations |
NUM_BATCH_CHECKS | 1,000,000 | Batch read operations |
Benchmark Operations:
1. Append Entries (benches/storage_benchmark.rs:52-83):
- Writes 1M entries in batches using
batch_write - Measures write throughput (writes/second)
- Uses fixed 8-byte little-endian payloads
2. Sequential Reads (benches/storage_benchmark.rs:98-118):
- Iterates through all entries using zero-copy iteration
- Validates data integrity during iteration
- Measures sequential read throughput
3. Random Reads (benches/storage_benchmark.rs:124-149):
- Performs 1M random single-key lookups
- Uses
rand::rng().random_range()for key selection - Validates retrieved data matches expected values
4. Batch Reads (benches/storage_benchmark.rs:155-181):
- Performs vectorized reads using
batch_read - Processes reads in batches of 1,024 keys
- Measures amortized batch read performance
Run Benchmarks:
Benchmark Output Format:
The storage benchmark uses a custom fmt_rate function (benches/storage_benchmark.rs:220-233) to format throughput metrics with thousands separators and three decimal places:
Wrote 1,000,000 entries of 8 bytes in 2.345s (426,439.232 writes/s)
Sequentially read 1,000,000 entries in 1.234s (810,372.771 reads/s)
Randomly read 1,000,000 entries in 3.456s (289,351.852 reads/s)
Batch-read verified 1,000,000 entries in 0.987s (1,013,171.005 reads/s)
Sources: benches/storage_benchmark.rs:1-234 Cargo.toml:57-63
graph TB
subgraph "CI Test Matrix"
Trigger["Push to main\nor Pull Request"]
end
subgraph "Operating Systems"
Ubuntu["ubuntu-latest"]
MacOS["macos-latest"]
Windows["windows-latest"]
end
subgraph "Feature Configurations"
Default["Default (empty)"]
NoDefault["--no-default-features"]
Parallel["--features parallel"]
ExposeAPI["--features expose-internal-api"]
Combined["--features=parallel,expose-internal-api"]
AllFeatures["--all-features"]
end
subgraph "Build Steps"
Cache["Cache Cargo dependencies"]
Build["cargo build --workspace --all-targets"]
Test["cargo test --workspace --all-targets --verbose"]
BenchCheck["cargo bench --workspace --no-run"]
end
Trigger --> Ubuntu
Trigger --> MacOS
Trigger --> Windows
Ubuntu --> Default
Ubuntu --> NoDefault
Ubuntu --> Parallel
Ubuntu --> ExposeAPI
Ubuntu --> Combined
Ubuntu --> AllFeatures
Default --> Cache
Cache --> Build
Build --> Test
Test --> BenchCheck
CI/CD Test Matrix
The GitHub Actions workflow runs tests across multiple platforms and feature combinations to ensure compatibility.
Test Matrix Configuration:
| OS | Feature Flags |
|---|---|
| Ubuntu, macOS, Windows | Default |
| Ubuntu, macOS, Windows | --no-default-features |
| Ubuntu, macOS, Windows | --features parallel |
| Ubuntu, macOS, Windows | --features expose-internal-api |
| Ubuntu, macOS, Windows | --features=parallel,expose-internal-api |
| Ubuntu, macOS, Windows | --all-features |
Total Test Combinations: 3 OS × 6 feature configs = 18 test jobs
CI Pipeline Steps:
- Cache Dependencies (.github/workflows/rust-tests.yml:40-51) - Caches
~/.cargoandtarget/directories using lock file hash - Build (.github/workflows/rust-tests.yml:53-54) - Compiles all workspace targets with specified feature flags
- Test (.github/workflows/rust-tests.yml:56-57) - Runs complete test suite with verbose output
- Check Benchmarks (.github/workflows/rust-tests.yml:60-61) - Validates benchmark compilation without execution
Caching Strategy:
The workflow uses cache keys based on:
- Operating system (
runner.os) - Lock file hash (
hashFiles('**/Cargo.lock')) - Feature flags (
matrix.flags)
This ensures efficient dependency reuse while avoiding cross-contamination between different build configurations.
Sources: .github/workflows/rust-tests.yml:1-62
Development Dependencies
Test Dependencies
| Dependency | Version | Purpose |
|---|---|---|
serial_test | 3.2.0 | Sequential test execution for concurrency tests |
tempfile | 3.19.0 | Temporary file/directory creation |
tokio | 1.45.1 | Async runtime with multi-thread support |
rand | 0.9.0 | Random number generation for benchmarks |
bincode | 1.3.3 | Serialization for test data |
serde | 1.0.219 | Serialization framework |
serde_json | 1.0.140 | JSON serialization for test data |
futures | 0.3.31 | Async utilities |
bytemuck | 1.23.2 | Zero-copy type conversions |
Sources: Cargo.toml:36-47
Benchmark Dependencies
| Dependency | Version | Purpose |
|---|---|---|
criterion | 0.6.0 | Benchmark framework (workspace dependency) |
thousands | 0.2.0 | Number formatting for benchmark output |
rand | 0.9.0 | Random data generation |
Sources: Cargo.toml39 Cargo.toml46 Cargo.toml98 Cargo.toml108
Build Artifacts
Binary Output
Compiled binaries are located in:
- Debug builds:
target/debug/simd-r-drive - Release builds:
target/release/simd-r-drive
Library Output
The workspace produces several library artifacts:
libsimd_r_drive.rlib- Core storage enginelibsimd_r_drive_entry_handle.rlib- Zero-copy APIlibsimd_r_drive_extensions.rlib- Utility functions
Ignored Files
The .gitignore configuration excludes build artifacts and data files:
**/target- All Cargo build directories*.bin- Binary storage files created during testing/data- Development and experimentation data directory.cargo/config.toml- Local Cargo overrides
Sources: .gitignore:1-10
Best Practices
Pre-Commit Checklist:
- Run
cargo test --workspace --all-targetsto validate all tests pass - Run
cargo bench --workspace --no-runto ensure benchmarks compile - Run
cargo build --all-featuresto validate all feature combinations - Check that no test data files (
.bin,/data) are committed
Performance Validation:
- Use
cargo bench --bench storage_benchmarkto establish performance baselines - Compare throughput metrics before and after optimization changes
- Monitor memory usage during concurrency tests
Feature Flag Testing:
- Always test with
--no-default-featuresto ensure minimal builds work - Test all feature combinations that will be published
- Verify that feature flags are properly gated with
#[cfg(feature = "...")]
Sources: .github/workflows/rust-tests.yml:1-62 benches/storage_benchmark.rs:1-234
CI/CD Pipeline
Relevant source files
- .github/workflows/rust-lint.yml
- .github/workflows/rust-tests.yml
- .gitignore
- CHANGELOG.md
- simd-r-drive-entry-handle/src/debug_assert_aligned.rs
Purpose and Scope
This document describes the Continuous Integration and Continuous Deployment (CI/CD) infrastructure for the SIMD R Drive project. It covers the GitHub Actions workflows that enforce code quality, run automated tests, and validate security compliance. The CI/CD system ensures that all code changes meet quality standards before being merged.
For information about building and running tests locally, see Building and Testing. For version-specific breaking changes and migration strategies, see Version History and Migration.
CI/CD Architecture Overview
The SIMD R Drive project uses two primary GitHub Actions workflows to validate all code changes:
Diagram: CI/CD Workflow Architecture
graph TB
subgraph "Trigger Events"
Push["Push to main"]
PR["Pull Request"]
Tag["Tag push (v*)"]
end
subgraph "rust-tests.yml"
TestMatrix["Matrix Testing"]
BuildStep["cargo build"]
TestStep["cargo test"]
BenchCheck["cargo bench --no-run"]
Cache["Cargo Cache"]
end
subgraph "rust-lint.yml"
FmtCheck["cargo fmt --check"]
ClippyCheck["cargo clippy"]
DocCheck["cargo doc"]
DenyCheck["cargo-deny check"]
AuditCheck["cargo-audit"]
end
Push --> TestMatrix
Push --> FmtCheck
PR --> TestMatrix
PR --> FmtCheck
Tag --> TestMatrix
TestMatrix --> Cache
Cache --> BuildStep
BuildStep --> TestStep
TestStep --> BenchCheck
FmtCheck --> ClippyCheck
ClippyCheck --> DocCheck
DocCheck --> DenyCheck
DenyCheck --> AuditCheck
style TestMatrix fill:#f9f9f9
style FmtCheck fill:#f9f9f9
The system runs two independent workflows in parallel. The rust-tests.yml workflow performs functional validation through matrix testing across multiple platforms and feature configurations. The rust-lint.yml workflow enforces code quality standards, documentation completeness, and security compliance. All checks must pass before a pull request can be merged.
Sources: .github/workflows/rust-tests.yml:1-62 .github/workflows/rust-lint.yml:1-44
Test Workflow (rust-tests.yml)
Workflow Configuration
The test workflow executes on three trigger events:
| Trigger | Condition | Purpose |
|---|---|---|
push | Branches: main | Validate main branch integrity |
push | Tags: v* | Validate release candidates |
pull_request | Branches: main | Validate proposed changes |
Sources: .github/workflows/rust-tests.yml:3-8
Matrix Testing Strategy
The workflow uses a comprehensive matrix strategy to test across multiple dimensions:
Diagram: Matrix Testing Dimensions
graph TB
subgraph "Operating Systems"
Ubuntu["ubuntu-latest"]
MacOS["macos-latest"]
Windows["windows-latest"]
end
subgraph "Feature Configurations"
Default["Default\nflags: (empty)"]
NoDefault["No Default Features\nflags: --no-default-features"]
Parallel["Parallel\nflags: --features parallel"]
ExposeAPI["Expose Internal API\nflags: --features expose-internal-api"]
ParallelAPI["Parallel + Expose API\nflags: --features=parallel,expose-internal-api"]
AllFeatures["All Features\nflags: --all-features"]
end
subgraph "Test Job Matrix"
Job["test job\nOS x Features\n3 x 6 = 18 configurations"]
end
Ubuntu --> Job
MacOS --> Job
Windows --> Job
Default --> Job
NoDefault --> Job
Parallel --> Job
ExposeAPI --> Job
ParallelAPI --> Job
AllFeatures --> Job
Each job runs the complete build-test-benchmark pipeline with a unique combination of operating system and feature flags, resulting in 18 total test configurations per workflow execution.
Sources: .github/workflows/rust-tests.yml:13-30
Test Job Steps
Each matrix job executes the following steps:
1. Repository Checkout
Sources: .github/workflows/rust-tests.yml:33-34
2. Rust Toolchain Installation
Uses the stable Rust toolchain across all platforms.
Sources: .github/workflows/rust-tests.yml:36-37
3. Dependency Caching
The workflow caches Cargo dependencies to reduce build times:
| Cache Path | Contents |
|---|---|
~/.cargo/bin/ | Installed cargo binaries |
~/.cargo/registry/index/ | Crate registry index |
~/.cargo/registry/cache/ | Downloaded crate archives |
~/.cargo/git/db/ | Git dependencies |
target/ | Build artifacts |
The cache key includes the OS, Cargo.lock hash, and matrix flags to ensure correct cache isolation between configurations.
Sources: .github/workflows/rust-tests.yml:40-52
4. Build Step
Builds all workspace crates with all targets (library, binaries, tests, benches, examples) using the matrix-specified feature flags.
Sources: .github/workflows/rust-tests.yml:54-55
5. Test Execution
Runs all tests in the workspace with verbose output, including:
- Unit tests
- Integration tests
- Documentation tests
- Example tests
Sources: .github/workflows/rust-tests.yml:57-58
6. Benchmark Compilation Check
Validates that all benchmarks compile successfully without executing them. The --no-run flag ensures benchmarks are compiled but not executed in CI.
Sources: .github/workflows/rust-tests.yml:60-62
Matrix Configuration Table
| Matrix Variable | Values | Count |
|---|---|---|
os | ubuntu-latest, macos-latest, windows-latest | 3 |
name | Default, No Default Features, Parallel, Expose Internal API, Parallel + Expose API, All Features | 6 |
flags | "", "--no-default-features", "--features parallel", "--features expose-internal-api", "--features=parallel,expose-internal-api", "--all-features" | 6 |
| Total Jobs | OS × Features | 18 |
Sources: .github/workflows/rust-tests.yml:14-30
Lint Workflow (rust-lint.yml)
Workflow Configuration
The lint workflow runs on all pushes and pull requests without branch restrictions:
Sources: .github/workflows/rust-lint.yml3
Lint Job Architecture
Diagram: Lint Workflow Execution Pipeline
The lint workflow enforces five quality gates in sequence, with each step dependent on the previous step's success.
Sources: .github/workflows/rust-lint.yml:5-44
Quality Gate 1: Code Formatting
Validates that all Rust code adheres to the standard formatting rules defined by rustfmt. The --check flag ensures the command fails if any files require formatting without modifying them.
Sources: .github/workflows/rust-lint.yml:26-27
Quality Gate 2: Clippy Linting
Runs Clippy across the entire workspace with maximum coverage:
--workspace: Checks all workspace crates--all-targets: Includes lib, bins, tests, benches, examples--all-features: Enables all optional features-D warnings: Treats all warnings as errors
Sources: .github/workflows/rust-lint.yml:29-31
Quality Gate 3: Documentation Validation
Generates documentation and fails on any documentation warnings:
RUSTDOCFLAGS="-D warnings": Treats doc warnings as errors--no-deps: Only documents workspace crates, not dependencies--document-private-items: Includes private items to ensure complete internal documentation
Sources: .github/workflows/rust-lint.yml:33-35
Quality Gate 4: Dependency Security Check
Runs cargo-deny to validate dependency security and licensing:
- Checks for security advisories in dependencies
- Validates license compatibility
- Detects duplicate dependencies
- Enforces allowed/denied crate policies
The tool requires prior installation via cargo install cargo-deny.
Sources: .github/workflows/rust-lint.yml:20-22 .github/workflows/rust-lint.yml:37-39
Quality Gate 5: Vulnerability Audit
Scans Cargo.lock for known security vulnerabilities in dependencies by checking against the RustSec Advisory Database. Requires prior installation via cargo install cargo-audit.
Sources: .github/workflows/rust-lint.yml:20-23 .github/workflows/rust-lint.yml:41-43
Component Installation Workaround
The workflow includes a workaround for a GitHub Actions environment issue:
This step explicitly installs rustfmt and clippy components to address a toolchain installation issue where these components may not be available by default.
Sources: .github/workflows/rust-lint.yml:13-18
Quality Gates Summary
The complete CI/CD pipeline enforces the following quality gates:
| Gate | Workflow | Command | Failure Condition |
|---|---|---|---|
| Build | rust-tests.yml | cargo build | Compilation errors |
| Tests | rust-tests.yml | cargo test | Test failures |
| Benchmarks | rust-tests.yml | cargo bench --no-run | Benchmark compilation errors |
| Formatting | rust-lint.yml | cargo fmt --check | Unformatted code |
| Linting | rust-lint.yml | cargo clippy -D warnings | Clippy warnings or errors |
| Documentation | rust-lint.yml | cargo doc -D warnings | Missing or invalid docs |
| Dependencies | rust-lint.yml | cargo deny check | Security advisories, license issues |
| Vulnerabilities | rust-lint.yml | cargo audit | Known CVEs in dependencies |
All quality gates must pass for the CI/CD pipeline to succeed. The matrix testing strategy in rust-tests.yml ensures compatibility across three operating systems and six feature configurations, providing comprehensive validation coverage.
Sources: .github/workflows/rust-tests.yml:54-62 .github/workflows/rust-lint.yml:26-43
Workflow Execution Context
Runner Configuration
Both workflows execute on GitHub-hosted runners:
| Workflow | Runner | OS |
|---|---|---|
| rust-tests.yml | ubuntu-latest, macos-latest, windows-latest | Multi-platform |
| rust-lint.yml | ubuntu-latest | Linux only |
Sources: .github/workflows/rust-tests.yml13 .github/workflows/rust-lint.yml7
Fail-Fast Strategy
The test workflow uses fail-fast: false to allow all matrix jobs to complete even if one fails, providing comprehensive failure information:
This configuration is valuable for identifying platform-specific or feature-specific issues across the entire matrix rather than stopping at the first failure.
Sources: .github/workflows/rust-tests.yml15
graph LR
LocalDev["Local Development"]
Commit["git commit"]
Push["git push"]
PR["Pull Request"]
TestsCI["rust-tests.yml\n18 matrix jobs"]
LintCI["rust-lint.yml\n5 quality gates"]
Merge["Merge to main"]
Tag["Tag release (v*)"]
LocalDev --> Commit
Commit --> Push
Push --> PR
PR --> TestsCI
PR --> LintCI
TestsCI -->|Pass| Merge
LintCI -->|Pass| Merge
Merge --> Tag
Tag --> TestsCI
Integration with Development Workflow
The CI/CD pipeline integrates with the development workflow at multiple points:
Diagram: CI/CD Integration Points
The CI/CD system provides automated validation at multiple stages of the development lifecycle. Pull requests trigger both workflows, ensuring all quality gates pass before code review. Pushes to main and version tags also trigger the test workflow to validate the stability of the main branch and release candidates.
Sources: .github/workflows/rust-tests.yml:3-8 .github/workflows/rust-lint.yml3
Related CI/CD Artifacts
Ignored Paths
The repository excludes certain paths from version control that are relevant to CI/CD execution:
| Path | Purpose |
|---|---|
**/target | Build artifacts generated by cargo |
*.bin | Binary data files used in tests |
/data | Debugging and experimentation data |
.cargo/config.toml | Local cargo configuration overrides |
These exclusions prevent CI cache pollution and ensure consistent builds across environments.
Sources: .gitignore:1-11
Debug Assertions in CI
The codebase includes conditional debug assertions that activate during CI test runs:
The debug_assert_aligned and debug_assert_aligned_offset functions in simd-r-drive-entry-handle/src/debug_assert_aligned.rs:26-88 activate only when cfg(any(test, debug_assertions)) is true. In CI test runs, these assertions validate alignment invariants for memory-mapped payloads and zero-copy access patterns without impacting release builds or benchmarks.
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88
Maintenance and Evolution
The CI/CD pipeline configuration is versioned alongside the codebase and evolves with project requirements. Changes to the pipeline typically correspond to:
- New feature flags requiring matrix coverage
- Additional security tools or quality gates
- Platform support changes
- Performance optimization validation
Historical changes to the CI/CD infrastructure can be tracked through the repository's commit history in the .github/workflows/ directory.
For information about breaking changes that may affect CI/CD behavior, see Version History and Migration, particularly the alignment changes in version 0.15.0 that introduced new debug assertions.
Sources: CHANGELOG.md:25-51
Version History and Migration
Relevant source files
- .github/workflows/rust-lint.yml
- CHANGELOG.md
- README.md
- simd-r-drive-entry-handle/src/debug_assert_aligned.rs
This page documents the version history of SIMD R Drive, detailing breaking changes, format evolutions, and migration procedures required when upgrading between versions. The primary focus is on storage format compatibility and the alignment strategy evolution that occurred between versions 0.13.x and 0.15.x.
For information about the current storage architecture and entry structure, see Entry Structure and Metadata. For details on the current alignment strategy and performance characteristics, see Payload Alignment and Cache Efficiency.
Version Overview
SIMD R Drive follows a loosely semantic versioning scheme and is currently in alpha (0.x.x-alpha). Breaking changes in the storage format have occurred as the alignment strategy evolved to optimize zero-copy access and SIMD performance.
| Version | Release Date | Status | Key Changes |
|---|---|---|---|
| 0.15.5-alpha | 2025-10-27 | Current | Apache Arrow 57.0.0, no format changes |
| 0.15.0-alpha | 2025-09-25 | Breaking | Alignment increased to 64 bytes |
| 0.14.0-alpha | 2025-09-08 | Breaking | Introduced 16-byte alignment with pre-pad |
| ≤0.13.x-alpha | Pre-2025-09 | Legacy | No alignment guarantees |
Sources: CHANGELOG.md:1-82
Storage Format Evolution
The most significant changes in SIMD R Drive's history involve the evolution of payload alignment strategy. These changes were driven by the need to optimize zero-copy access, SIMD operations, and CPU cache efficiency.
timeline
title "Storage Format Alignment Evolution"
section Pre-0.14
No Alignment : Payloads written sequentially\n: No pre-padding\n: Variable starting addresses\nsection 0.14.0-alpha\n16-byte Alignment : Introduced PAYLOAD_ALIGNMENT constant
: Added pre-pad bytes (0-15)
: SSE-friendly boundaries
section 0.15.0-alpha
64-byte Alignment : Increased to cache-line size
: Pre-pad bytes (0-63)
: AVX-512 and cacheline optimized
Timeline: Alignment Strategy Evolution
Format Comparison: Pre-0.14 vs 0.14+ vs 0.15+
Sources: CHANGELOG.md:55-82 CHANGELOG.md:25-52 README.md:51-59
Version Compatibility Matrix
Understanding compatibility between readers and writers is critical when managing SIMD R Drive deployments across multiple services or upgrading existing storage files.
Reader/Writer Compatibility
| Writer Version | ≤0.13.x Reader | 0.14.x Reader | 0.15.x Reader |
|---|---|---|---|
| ≤0.13.x | ✅ Compatible | ❌ Incompatible* | ❌ Incompatible* |
| 0.14.x | ❌ Incompatible | ✅ Compatible | ❌ Incompatible |
| 0.15.x | ❌ Incompatible | ❌ Incompatible | ✅ Compatible |
Legend:
- ✅ Compatible: Reader can correctly parse the storage format
- ❌ Incompatible: Reader will misinterpret pre-pad bytes as payload data
-
- Newer readers may attempt to parse old formats but will fail validation
Incompatibility Root Cause
The incompatibility arises because:
- Pre-0.14 readers expect
prev_offsetto point directly to payload start - 0.14+ formats insert 0-15 or 0-63 pre-pad bytes before the payload
- Old readers interpret these zero bytes as part of the payload, corrupting data
Sources: CHANGELOG.md:55-60 CHANGELOG.md:27-31
Migration Procedures
Migrating from 0.13.x or earlier to 0.14.x+
When upgrading from pre-alignment versions to any aligned version (0.14.x or 0.15.x), you must regenerate storage files.
flowchart TD
Start["Start: Have 0.13.x store"]
ReadOld["Read all entries\nwith 0.13.x binary"]
CreateNew["Create new store\nwith 0.14.x+ binary"]
WriteNew["Write entries to\nnew store"]
Verify["Verify new store\nintegrity"]
Replace["Replace old file\nwith new file"]
Start --> ReadOld
ReadOld --> CreateNew
CreateNew --> WriteNew
WriteNew --> Verify
Verify -->|Success| Replace
Verify -->|Failure| CreateNew
MultiService{"Multi-service\ndeployment?"}
UpgradeReaders["Deploy reader upgrades\nto all services first"]
UpgradeWriters["Deploy writer upgrades\nafter readers"]
Replace --> MultiService
MultiService -->|Yes| UpgradeReaders
MultiService -->|No| Done["Migration Complete"]
UpgradeReaders --> UpgradeWriters
UpgradeWriters --> Done
Migration Workflow
Step-by-Step Migration: 0.13.x → 0.14.x/0.15.x
-
Prepare the old binary:
-
Build the new binary:
-
Extract data from old store:
-
Create new store with new binary:
-
Verify integrity:
-
Replace old file:
Sources: CHANGELOG.md:75-82 CHANGELOG.md:43-52
Migrating from 0.14.x to 0.15.x
The migration from 0.14.x (16-byte alignment) to 0.15.x (64-byte alignment) follows the same regeneration process, as the alignment constants are incompatible.
Alignment Constant Changes
The key difference between versions:
| Version | Constant File | PAYLOAD_ALIGN_LOG2 | PAYLOAD_ALIGNMENT |
|---|---|---|---|
| 0.14.x | src/storage_engine/constants.rs | 4 | 16 bytes |
| 0.15.x | src/storage_engine/constants.rs | 6 | 64 bytes |
Migration Steps: 0.14.x → 0.15.x
Follow the same procedure as 0.13.x → 0.14.x migration:
- Read entries with 0.14.x binary
- Write to new store with 0.15.x binary
- Verify new store integrity
- Deploy reader upgrades before writer upgrades in multi-service environments
Rationale for 64-byte alignment:
- Matches typical CPU cache line size
- Enables full-speed AVX-512 operations
- Provides safety margin for future SIMD extensions
- Optimizes for zero-copy typed slices (see Payload Alignment and Cache Efficiency)
Sources: CHANGELOG.md:25-52 README.md:51-59 CHANGELOG.md39
Version-Specific Details
Version 0.15.5-alpha (2025-10-27)
Type: Non-breaking maintenance release
Changes:
- Updated Apache Arrow dependency from version 56.x to 57.0.0
- No storage format changes
- No API changes
- No migration required
Compatibility: Fully compatible with 0.15.0-alpha through 0.15.4-alpha
Sources: CHANGELOG.md:19-22
Version 0.15.0-alpha (2025-09-25)
Type: Breaking storage format change
Primary Change: Increased default PAYLOAD_ALIGNMENT from 16 bytes to 64 bytes
Motivation:
- Cache-line alignment for optimal memory access
- Support for wider SIMD instructions (AVX-512)
- Improved zero-copy performance for typed slices
- Future-proofing for emerging CPU architectures
Breaking Changes:
- Storage files created with 0.15.x cannot be read by 0.14.x or earlier
- Pre-pad calculation changed:
pad = (64 - (prev_tail % 64)) & 63 - Pointer and offset alignment assertions added in debug/test builds
New Features:
- Debug-only alignment assertions via
debug_assert_aligned()function - Offset-level alignment validation via
debug_assert_aligned_offset() - Enhanced documentation for alignment guarantees
Code Changes:
- simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-89 - New alignment assertion functions
- Configuration via
PAYLOAD_ALIGNMENTconstant (64 bytes)
Migration Path: See [Migrating from 0.14.x to 0.15.x](https://github.com/jzombie/rust-simd-r-drive/blob/487b7b98/Migrating from 0.14.x to 0.15.x) above.
Sources: CHANGELOG.md:25-52 simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-89
Version 0.14.0-alpha (2025-09-08)
Type: Breaking storage format change
Primary Change: Introduced fixed payload alignment with pre-padding
Motivation:
- Enable zero-copy typed views (e.g.,
&[u16],&[u32],&[u64]) - Guarantee SIMD-safe alignment for AVX2/NEON operations
- Improve CPU cache utilization
Breaking Changes:
- New on-disk layout with optional pre-pad bytes (0-15 bytes)
- Files written by 0.14.0-alpha are incompatible with ≤0.13.x readers
- Pre-pad calculation:
pad = (16 - (prev_tail % 16)) & 15
New Features:
- Experimental
arrowfeature flag EntryHandle::as_arrow_buffer()method for Apache Arrow integrationEntryHandle::into_arrow_buffer()method for zero-copy Arrow buffers- Configurable alignment via
PAYLOAD_ALIGN_LOG2andPAYLOAD_ALIGNMENTconstants
Storage Layout:
[Pre-Pad: 0-15 bytes] [Payload: variable] [Metadata: 20 bytes]
Configuration:
- Alignment controlled in
src/storage_engine/constants.rs PAYLOAD_ALIGN_LOG2 = 4(2^4 = 16 bytes)PAYLOAD_ALIGNMENT = 1 << PAYLOAD_ALIGN_LOG2
Migration Path: See [Migrating from 0.13.x or earlier to 0.14.x+](https://github.com/jzombie/rust-simd-r-drive/blob/487b7b98/Migrating from 0.13.x or earlier to 0.14.x+) above.
Sources: CHANGELOG.md:55-82 README.md:112-138
graph TB
subgraph Phase1["Phase 1: Reader Upgrades"]
R1["Service A\n(Reader)"]
R2["Service B\n(Reader)"]
R3["Service C\n(Reader)"]
end
subgraph Phase2["Phase 2: Writer Upgrades"]
W1["Service D\n(Writer)"]
W2["Service E\n(Writer)"]
end
subgraph Phase3["Phase 3: Storage Migration"]
M1["Migrate shared\nstorage files"]
M2["Switch to new\nformat"]
end
Start["Start: All services\non old version"]
Start --> Phase1
Phase1 -->|Verify all readers operational| Phase2
Phase2 -->|Verify all writers operational| Phase3
Phase3 --> Complete["Complete: All services\non new version"]
Note1["Readers can handle\nold format during\ntransition"]
Note2["Writers continue\nproducing old format\nuntil migration"]
Phase1 -.-> Note1
Phase2 -.-> Note2
Multi-Service Deployment Strategy
When upgrading SIMD R Drive across multiple services that share storage files or communicate via the WebSocket RPC interface, a staged deployment approach prevents compatibility issues.
Recommended Upgrade Order
Deployment Steps
-
Phase 1 - Upgrade all readers:
- Deploy new binaries to services that only read from storage
- Deploy new WebSocket clients (Python or Rust)
- Verify that services can still read existing (old format) data
- Monitor for errors or compatibility issues
-
Phase 2 - Upgrade writers (optional hold):
- Deploy new binaries to services that write to storage
- Important: Writers should continue using the old format during this phase
- Verify writer functionality with old format
-
Phase 3 - Migrate storage and switch format:
- Use migration procedure to regenerate storage files
- Update writer configuration to use new format
- Monitor write operations for alignment correctness
-
Rollback strategy:
- Keep backups of old format files until migration is verified
- Keep old binaries available for emergency rollback
- Test rollback procedure before starting migration
Sources: CHANGELOG.md:43-52 CHANGELOG.md:75-82
graph TB
subgraph DebugMode["Debug/Test Mode"]
Call1["debug_assert_aligned(ptr, align)"]
Call2["debug_assert_aligned_offset(offset)"]
Check1["Verify ptr % align == 0"]
Check2["Verify offset % PAYLOAD_ALIGNMENT == 0"]
Pass1["Assertion passes"]
Fail1["Panic with alignment error"]
Call1 --> Check1
Call2 --> Check2
Check1 -->|Aligned| Pass1
Check1 -->|Misaligned| Fail1
Check2 -->|Aligned| Pass1
Check2 -->|Misaligned| Fail1
end
subgraph ReleaseMode["Release/Bench Mode"]
Call3["debug_assert_aligned(ptr, align)"]
Call4["debug_assert_aligned_offset(offset)"]
NoOp["No-op (zero cost)"]
Call3 --> NoOp
Call4 --> NoOp
end
Alignment Validation and Debugging
Starting with version 0.15.0, SIMD R Drive includes debug-time alignment assertions to catch alignment violations early in development.
Debug Assertion Functions
Implementation Details
The assertion functions are exported from simd-r-drive-entry-handle and can be called from any crate:
-
debug_assert_aligned(ptr: *const u8, align: usize)- Validates pointer alignment in debug/test builds
- Used for verifying Arrow buffer compatibility
- Zero cost in release builds
-
debug_assert_aligned_offset(offset: u64)- Validates file offset alignment in debug/test builds
- Checks against
PAYLOAD_ALIGNMENTconstant - Zero cost in release builds
Implementation approach:
- Functions always exist (stable symbol for cross-crate calls)
- Body is gated by
#[cfg(any(test, debug_assertions))] - Release/bench builds compile to no-ops with no runtime cost
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-89 CHANGELOG.md:33-42
Future Format Stability
Current Status
SIMD R Drive is in alpha (0.x.x-alpha) and storage format stability is not guaranteed. Each alignment change requires storage regeneration.
Path to Stability
Future work may include:
- Format version headers - Store format version in file header for automatic detection
- Multi-format readers - Support reading multiple alignment versions in a single binary
- In-place migration - Tools to migrate storage without full regeneration
- Stable 1.0 format - Commitment to format stability after 1.0 release
Current Recommendations
Until format stability is achieved:
- Plan for regeneration: Assume storage files will need regeneration on major updates
- Version pinning: Pin specific SIMD R Drive versions in production deployments
- Test thoroughly: Validate migration procedures in staging environments
- Keep backups: Maintain backups of storage files before migrations
- Monitor changelogs: Review CHANGELOG.md for breaking changes before upgrading
Sources: CHANGELOG.md:1-82 README.md:1-10
Summary: Key Takeaways
| Topic | Key Points |
|---|---|
| Format Compatibility | Each major version (0.13, 0.14, 0.15) has incompatible storage formats |
| Migration Required | Upgrading between major versions requires regenerating storage files |
| Alignment Evolution | Pre-0.14: none → 0.14.x: 16 bytes → 0.15.x: 64 bytes |
| Deployment Order | Upgrade readers first, then writers, then migrate storage |
| Debug Tools | Use debug_assert_aligned() functions to validate alignment in tests |
| Production Strategy | Pin versions, plan for regeneration, test migrations thoroughly |
For specific migration procedures, refer to the [Migration Procedures](https://github.com/jzombie/rust-simd-r-drive/blob/487b7b98/Migration Procedures) section above. For understanding the current storage format, see Entry Structure and Metadata.
Sources: CHANGELOG.md:1-82 README.md:1-282