Overview
Relevant source files
Purpose and Scope
This document provides a high-level introduction to SIMD R Drive, describing its core purpose, architectural components, access methods, and key features. It serves as the entry point for understanding the system before diving into detailed subsystem documentation.
For details on the core storage engine internals, see Core Storage Engine. For network-based remote access, see Network Layer and RPC. For Python integration, see Python Integration. For performance optimization details, see Performance Optimizations.
Sources: README.md:1-42
What is SIMD R Drive?
SIMD R Drive is a high-performance, append-only, schema-less storage engine designed for zero-copy binary data access. It stores arbitrary binary payloads in a single-file container without imposing serialization formats, schemas, or data interpretation. All data is treated as raw bytes (&[u8]), providing maximum flexibility for applications that require high-speed storage and retrieval of binary data.
Core Characteristics
| Characteristic | Description |
|---|---|
| Storage Model | Append-only, single-file container |
| Data Format | Schema-less binary (&[u8]) |
| Access Pattern | Zero-copy memory-mapped reads |
| Alignment | 64-byte boundaries (configurable via PAYLOAD_ALIGNMENT) |
| Concurrency | Thread-safe reads and writes within a single process |
| Indexing | Hardware-accelerated XXH3_64 hash-based key lookup |
| Integrity | CRC32C checksums and validation chain |
The storage engine is optimized for workloads that benefit from SIMD operations, cache-line efficiency, and direct memory access. By enforcing 64-byte payload alignment, it enables efficient typed slice reinterpretation (e.g., &[u32], &[u64]) without copying.
Sources: README.md:5-8 README.md:43-87 Cargo.toml13
Core Architecture Components
The system consists of three primary layers: the storage engine core, network access layer, and language bindings. The following diagram maps high-level architectural concepts to specific code entities.
graph TB
subgraph "User Interfaces"
CLI["CLI Binary\nsimd-r-drive crate\nmain.rs"]
PythonApp["Python Applications\nsimd_r_drive.DataStoreWsClient"]
RustApp["Native Rust Clients\nDataStoreWsClient struct"]
end
subgraph "Network Layer"
WSServer["WebSocket Server\nsimd-r-drive-ws-server\nAxum HTTP server"]
RPCDef["Service Definition\nsimd-r-drive-muxio-service-definition\nDataStoreService trait"]
end
subgraph "Core Storage Engine"
DataStore["DataStore struct\nsrc/data_store.rs"]
Traits["DataStoreReader trait\nDataStoreWriter trait"]
KeyIndexer["KeyIndexer struct\nsrc/key_indexer.rs"]
EntryHandle["EntryHandle struct\nsimd-r-drive-entry-handle crate"]
end
subgraph "Storage Infrastructure"
Mmap["Arc<Mmap>\nmemmap2 crate"]
FileHandle["BufWriter<File>\nstd::fs::File"]
AtomicOffset["AtomicU64\ntail_offset field"]
end
subgraph "Performance Layer"
SIMDCopy["simd_copy function\nsrc/simd_utils.rs"]
XXH3["xxhash-rust crate\nXXH3_64 algorithm"]
end
CLI --> DataStore
PythonApp --> WSServer
RustApp --> WSServer
WSServer --> RPCDef
RPCDef --> DataStore
DataStore --> Traits
DataStore --> KeyIndexer
DataStore --> EntryHandle
DataStore --> Mmap
DataStore --> FileHandle
DataStore --> AtomicOffset
DataStore --> SIMDCopy
KeyIndexer --> XXH3
EntryHandle --> Mmap
style DataStore fill:#f9f9f9,stroke:#333,stroke-width:3px
style Traits fill:#f9f9f9,stroke:#333,stroke-width:2px
System Architecture with Code Entities
Diagram: System architecture showing code entity mappings
Component Descriptions
| Component | Code Entity | Purpose |
|---|---|---|
| DataStore | DataStore struct in src/data_store.rs | Main storage interface implementing read/write operations |
| DataStoreReader | DataStoreReader trait | Defines zero-copy read operations (read, exists, batch_read) |
| DataStoreWriter | DataStoreWriter trait | Defines synchronized write operations (write, delete, batch_write) |
| KeyIndexer | KeyIndexer struct in src/key_indexer.rs | Hash-based index mapping u64 hashes to (tag, offset) tuples |
| EntryHandle | EntryHandle struct in simd-r-drive-entry-handle/src/lib.rs | Zero-copy reference to memory-mapped payload data |
| Memory Mapping | Arc<Mmap> wrapped in Mutex | Shared memory-mapped file reference for zero-copy reads |
| File Handle | Arc<RwLock<BufWriter<File>>> | Synchronized buffered writer for append operations |
| Tail Offset | AtomicU64 field tail_offset | Atomic counter tracking the current end-of-file position |
Sources: Cargo.toml:66-73 src/data_store.rs (inferred from architecture diagrams), High-level diagrams provided
graph LR
subgraph "Direct Access"
DirectApp["Rust Application"]
DirectDS["DataStore::open\nDataStoreReader\nDataStoreWriter"]
end
subgraph "CLI Access"
CLIApp["Command Line"]
CLIBin["simd-r-drive binary\nclap::Parser"]
end
subgraph "Remote Access"
PyClient["Python Client\nDataStoreWsClient class"]
RustClient["Rust Client\nDataStoreWsClient struct"]
WSServer["WebSocket Server\nmuxio-tokio-rpc-server\nAxum router"]
BackendDS["DataStore instance"]
end
DirectApp --> DirectDS
CLIApp --> CLIBin
CLIBin --> DirectDS
PyClient --> WSServer
RustClient --> WSServer
WSServer --> BackendDS
style DirectDS fill:#f9f9f9,stroke:#333,stroke-width:2px
style WSServer fill:#f9f9f9,stroke:#333,stroke-width:2px
style BackendDS fill:#f9f9f9,stroke:#333,stroke-width:2px
Access Methods
SIMD R Drive can be accessed through three primary interfaces, each optimized for different use cases.
Access Method Architecture
Diagram: Access methods with code entity mappings
Access Method Comparison
| Method | Use Case | Code Entry Point | Latency | Throughput |
|---|---|---|---|---|
| Direct Library | Embedded in Rust applications | DataStore::open() | Microseconds | Highest (zero-copy) |
| CLI | Command-line operations, scripting | simd-r-drive binary with clap | Milliseconds | Process-bound |
| WebSocket RPC | Remote access, language bindings | DataStoreWsClient (Rust/Python) | Network-dependent | RPC-serialization-bound |
Direct Library Access:
- Applications link against the
simd-r-drivecrate directly - Call
DataStore::open()to obtain a storage instance - Use
DataStoreReaderandDataStoreWritertraits for operations - Provides lowest latency and highest throughput
CLI Access:
- The
simd-r-drivebinary provides a command-line interface - Built using
clapfor argument parsing - Useful for scripting, testing, and manual operations
- Each invocation opens the storage, performs operation, and closes
Remote Access (WebSocket RPC):
simd-r-drive-ws-serverprovides network access via WebSocket- Uses the Muxio RPC framework with
bitcodeserialization DataStoreWsClientavailable for both Rust and Python clients- Enables multi-language access and distributed architectures
For CLI details, see Repository Structure. For WebSocket server architecture, see WebSocket Server. For Python client usage, see Python WebSocket Client API.
Sources: Cargo.toml:23-26 README.md9 README.md:262-266 High-level diagrams
Key Features
Zero-Copy Memory-Mapped Access
SIMD R Drive uses memmap2 to memory-map the storage file, allowing direct access to stored data without deserialization or copying. The EntryHandle struct provides a zero-copy view into the memory-mapped region, returning &[u8] slices that point directly into the mapped file.
This approach enables:
- Sub-microsecond reads for indexed lookups
- Minimal memory overhead for large entries
- Efficient processing of datasets larger than available RAM
The memory-mapped file is wrapped in Arc<Mutex<Arc<Mmap>>> to ensure thread-safe access during concurrent reads and remap operations.
Sources: README.md:43-49 simd-r-drive-entry-handle/ (inferred)
Fixed 64-Byte Payload Alignment
Every non-tombstone payload begins on a 64-byte boundary (defined by PAYLOAD_ALIGNMENT constant). This alignment matches typical CPU cache line sizes and enables:
- Cache-friendly access with reduced cache line splits
- Full-speed SIMD operations (AVX2, AVX-512, NEON) without misalignment penalties
- Zero-copy typed slices when payload length matches element size (e.g.,
&[u64])
Pre-padding bytes are inserted before payloads to maintain this alignment. Tombstones (deletion markers) do not require alignment.
Sources: README.md:51-59 simd-r-drive-entry-handle/src/constants.rs (inferred)
Single-File Storage Container
All data is stored in a single append-only file with the following characteristics:
| Aspect | Description |
|---|---|
| File Structure | Sequential entries: [pre-pad] [payload] [metadata] |
| Metadata Size | Fixed 20 bytes: key_hash (8) + prev_offset (8) + checksum (4) |
| Entry Chaining | Each metadata contains prev_offset pointing to previous entry's tail |
| Validation | CRC32C checksums and backward chain traversal |
| Recovery | Automatic truncation of incomplete writes on open |
The storage format is detailed in Entry Structure and Metadata.
Sources: README.md:62-147 README.md:104-150
Thread-Safe Concurrency
SIMD R Drive supports concurrent operations within a single process using:
| Mechanism | Code Entity | Purpose |
|---|---|---|
| Read Lock | RwLock (reads) | Allows multiple concurrent readers |
| Write Lock | RwLock (writes) | Ensures exclusive write access |
| Atomic Offset | AtomicU64 (tail_offset) | Tracks file end without locking |
| Index Lock | RwLock<HashMap> | Protects key index updates |
| Mmap Lock | Mutex<Arc<Mmap>> | Prevents concurrent remapping |
Concurrency Guarantees:
- ✅ Multiple threads can read concurrently (zero-copy, lock-free)
- ✅ Write operations are serialized via
RwLock - ✅ Index updates are synchronized
- ❌ Multiple processes require external file locking
For detailed concurrency model, see Concurrency and Thread Safety.
Sources: README.md:170-206
Hardware-Accelerated Indexing
The KeyIndexer uses the xxhash-rust crate with XXH3_64 algorithm, which provides hardware acceleration:
- SSE2 on x86_64 (universally supported)
- AVX2 on capable x86_64 CPUs (runtime detection)
- NEON on aarch64 (default)
Key lookups are O(1) via HashMap, with benchmarks showing ~1 million random 8-byte lookups completing in under 1 second.
Sources: README.md:158-168 Cargo.toml34
SIMD Write Acceleration
The simd_copy function (in src/simd_utils.rs) accelerates memory copying during write operations:
- x86_64 with AVX2 : 32-byte SIMD chunks using
_mm256_loadu_si256/_mm256_storeu_si256 - aarch64 : 16-byte NEON chunks using
vld1q_u8/vst1q_u8 - Fallback : Standard
copy_from_slicewhen SIMD unavailable
This optimization reduces CPU cycles during buffer staging before disk writes.
For SIMD implementation details, see SIMD Acceleration.
Sources: README.md:249-257 src/simd_utils.rs (inferred)
Write and Read Modes
Write Modes
| Mode | Method | Use Case | Flush Behavior |
|---|---|---|---|
| Single Entry | write(key, payload) | Individual writes | Immediate flush |
| Batch | batch_write(&[(key, payload)]) | Multiple entries | Single flush at end |
| Streaming | write_large_entry(key, Read) | Large payloads | Streaming with immediate flush |
Batch writes reduce disk I/O overhead by grouping multiple entries under a single write lock and flushing once.
Sources: README.md:208-223
Read Modes
| Mode | Method | Memory Behavior | Use Case |
|---|---|---|---|
| Direct | read(key) -> EntryHandle | Zero-copy mmap reference | Standard reads |
| Streaming | read_stream(key) -> impl Read | Buffered, non-zero-copy | Large entries |
| Parallel Iteration | par_iter_entries() (Rayon) | Parallel processing | Bulk analytics |
Direct reads return EntryHandle with zero-copy &[u8] access. Streaming reads process data incrementally through a buffer. Parallel iteration is available via the optional parallel feature.
For iteration details, see Parallel Iteration (via Rayon).
Sources: README.md:225-247
Repository Structure
The project is organized as a Cargo workspace with the following crates:
Diagram: Workspace structure with crate relationships
Crate Descriptions
| Crate | Path | Purpose |
|---|---|---|
simd-r-drive | ./ | Core storage engine with DataStore, KeyIndexer, SIMD utilities |
simd-r-drive-entry-handle | ./simd-r-drive-entry-handle/ | Zero-copy EntryHandle and metadata structures |
simd-r-drive-extensions | ./extensions/ | Utility functions and helper modules |
simd-r-drive-muxio-service-definition | ./experiments/simd-r-drive-muxio-service-definition/ | RPC service trait definitions using bitcode |
simd-r-drive-ws-server | ./experiments/simd-r-drive-ws-server/ | Axum-based WebSocket RPC server |
simd-r-drive-ws-client | ./experiments/simd-r-drive-ws-client/ | Native Rust WebSocket client |
simd-r-drive-py | ./experiments/bindings/python/ | PyO3-based Python bindings for direct access |
simd-r-drive-ws-client-py | ./experiments/bindings/python-ws-client/ | Python WebSocket client wrapper |
The workspace is defined in Cargo.toml:65-78 with version 0.15.5-alpha specified in Cargo.toml3
For detailed repository structure, see Repository Structure.
Sources: Cargo.toml:65-78 Cargo.toml:1-10 README.md:259-266
Performance Characteristics
SIMD R Drive is designed for high-performance workloads with the following characteristics:
Benchmark Context
| Metric | Typical Performance |
|---|---|
| Random Read (8-byte) | ~1M lookups in < 1 second |
| Sequential Write | Limited by disk I/O and flush frequency |
| Memory Overhead | Minimal (mmap-based, on-demand paging) |
| Index Lookup | O(1) via HashMap with XXH3_64 |
Optimization Strategies
- SIMD Copy Operations : The
simd_copyfunction uses AVX2/NEON for bulk memory transfers during writes - Hardware-Accelerated Hashing : XXH3_64 with SSE2/AVX2/NEON for fast key hashing
- Zero-Copy Reads : Memory-mapped access eliminates deserialization overhead
- Cache-Line Alignment : 64-byte boundaries reduce cache misses
- Batch Operations : Grouping writes reduces lock contention and flush overhead
For detailed performance optimization documentation, see Performance Optimizations.
Sources: README.md:158-168 README.md:249-257
Next Steps
This overview provides a foundation for understanding SIMD R Drive. For deeper exploration:
- Storage Internals : See Core Storage Engine for
DataStoreimplementation details - Data Format : See Entry Structure and Metadata for on-disk layout
- Network Access : See Network Layer and RPC for remote access architecture
- Python Usage : See Python Integration for PyO3 bindings and client APIs
- Performance Tuning : See Performance Optimizations for SIMD and alignment strategies
Sources: All sections above