This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Overview
Loading…
Overview
Relevant source files
Purpose and Scope
This document provides a high-level introduction to the SIMD R Drive codebase, explaining its purpose as a high-performance, append-only storage engine and outlining its major architectural components. For detailed information about specific subsystems, see the corresponding sections: Core Storage Engine, Network Layer and RPC, Python Integration, Performance Optimizations, and Extensions and Utilities.
Sources: README.md:1-40 Cargo.toml:1-21
What is SIMD R Drive?
SIMD R Drive is a high-performance, thread-safe, append-only storage engine designed for zero-copy binary access. It stores arbitrary binary data in a single-file storage container where all payloads are written at fixed 64-byte aligned boundaries , optimizing for SIMD operations and cache-line efficiency.
The system operates schema-less —it treats all stored data as raw bytes (&[u8]) without enforcing serialization formats or endianness. This design provides maximum flexibility for applications requiring high-speed storage and retrieval of structured or unstructured binary data.
Key characteristics:
| Feature | Description |
|---|---|
| Storage Model | Single-file, append-only, key-value store |
| Access Pattern | Zero-copy reads via memory-mapped files (memmap2) |
| Alignment | 64-byte payload boundaries (configurable) |
| Indexing | O(1) hash-based lookups using xxh3_64 with SIMD acceleration |
| Concurrency | Thread-safe reads/writes using RwLock, AtomicU64, DashMap |
| Language Support | Rust (native), Python (PyO3 bindings), WebSocket RPC (experimental) |
Sources: README.md:5-87 Cargo.toml:12-21
High-Level System Architecture
The following diagram shows the complete system architecture, mapping high-level concepts to concrete code entities:
Diagram: System Architecture - Mapping Concepts to Code Entities
graph TB
subgraph "User Interfaces"
CLI["CLI Application\n(main.rs)"]
PY_BIND["Python Direct Bindings\n(simd-r-drive-py)"]
PY_WS["Python WebSocket Client\n(simd-r-drive-ws-client-py)"]
end
subgraph "Core Storage Engine (simd-r-drive)"
DS["DataStore\n(data_store/mod.rs)"]
READER["DataStoreReader trait"]
WRITER["DataStoreWriter trait"]
INDEX["KeyIndexer\n(key_indexer.rs)"]
end
subgraph "Entry Abstraction (simd-r-drive-entry-handle)"
EH["EntryHandle\n(entry_handle.rs)"]
META["EntryMetadata\n(entry_metadata.rs)"]
end
subgraph "Network Layer (Experimental)"
WS_SERVER["simd-r-drive-ws-server\n(WebSocket Server)"]
WS_CLIENT["simd-r-drive-ws-client\n(Native Rust Client)"]
SERVICE_DEF["simd-r-drive-muxio-service-definition\n(RPC Contract)"]
end
subgraph "Storage Backend"
MMAP["Memory-Mapped File\n(Arc<Mmap>)"]
FILE["Single Binary File\n(.bin)"]
end
subgraph "Performance Layer"
SIMD["SIMD Operations\n(simd_copy)"]
XXH3["xxh3_64 Hashing\n(KeyIndexer)"]
end
CLI --> DS
PY_BIND --> DS
PY_WS --> WS_CLIENT
WS_CLIENT --> SERVICE_DEF
SERVICE_DEF --> WS_SERVER
WS_SERVER --> DS
DS --> READER
DS --> WRITER
DS --> INDEX
DS --> EH
DS --> MMAP
EH --> META
EH --> MMAP
MMAP --> FILE
DS --> SIMD
INDEX --> XXH3
style DS fill:#f9f9f9,stroke:#333,stroke-width:2px
style EH fill:#f9f9f9,stroke:#333,stroke-width:2px
This diagram illustrates how user-facing interfaces connect to the core storage engine and supporting subsystems, using actual code entity names.
Sources: Cargo.toml:66-77 README.md:11-40
Repository Structure
The codebase is organized as a Cargo workspace with the following packages:
| Package | Path | Purpose |
|---|---|---|
| simd-r-drive | ./ | Core storage engine library and CLI |
| simd-r-drive-entry-handle | ./simd-r-drive-entry-handle | Entry abstraction layer for zero-copy access |
| simd-r-drive-extensions | ./extensions | Utility functions and helper APIs |
| simd-r-drive-ws-server | ./experiments/simd-r-drive-ws-server | WebSocket RPC server (experimental) |
| simd-r-drive-ws-client | ./experiments/simd-r-drive-ws-client | WebSocket RPC client (experimental) |
| simd-r-drive-muxio-service-definition | ./experiments/simd-r-drive-muxio-service-definition | Shared RPC service contract |
| Python bindings | ./experiments/bindings/python | PyO3-based Python direct bindings |
| Python WS client | ./experiments/bindings/python-ws-client | Python WebSocket client bindings |
For detailed information about the repository layout and package relationships, see Repository Structure.
Sources: Cargo.toml:66-77 README.md:259-265
graph TB
subgraph "Public API"
DST["DataStore struct\n(data_store/mod.rs)"]
READER_TRAIT["DataStoreReader trait\n(traits.rs)"]
WRITER_TRAIT["DataStoreWriter trait\n(traits.rs)"]
end
subgraph "Indexing Layer"
KI["KeyIndexer\n(key_indexer.rs)"]
DASHMAP["DashMap<u64, u64>\n(concurrent hash map)"]
XXH3_HASH["xxh3_64\n(key hashing)"]
end
subgraph "Entry Management"
EH_STRUCT["EntryHandle\n(entry_handle.rs)"]
EM_STRUCT["EntryMetadata\n(entry_metadata.rs)"]
PAYLOAD_ALIGN["PAYLOAD_ALIGNMENT\n(constants.rs)"]
end
subgraph "Storage Backend"
RWLOCK_FILE["RwLock<File>"]
MUTEX_MMAP["Mutex<Arc<Mmap>>"]
ATOMIC_TAIL["AtomicU64 tail_offset"]
end
subgraph "SIMD Acceleration"
SIMD_COPY["simd_copy\n(arch-specific impls)"]
AVX2["AVX2 impl (x86_64)"]
NEON["NEON impl (aarch64)"]
end
DST --> READER_TRAIT
DST --> WRITER_TRAIT
DST --> KI
DST --> RWLOCK_FILE
DST --> MUTEX_MMAP
DST --> ATOMIC_TAIL
KI --> DASHMAP
KI --> XXH3_HASH
READER_TRAIT --> EH_STRUCT
EH_STRUCT --> EM_STRUCT
EH_STRUCT --> PAYLOAD_ALIGN
WRITER_TRAIT --> SIMD_COPY
SIMD_COPY --> AVX2
SIMD_COPY --> NEON
style DST fill:#f9f9f9,stroke:#333,stroke-width:2px
style KI fill:#f9f9f9,stroke:#333,stroke-width:2px
Core Storage Components
The following diagram maps storage concepts to their implementing code entities:
Diagram: Core Storage Components - Code Entity Mapping
This diagram shows the relationship between storage concepts and their concrete implementations in the codebase.
Sources: README.md:172-183 Cargo.toml:23-34
Key Features Summary
Storage and Access Patterns
| Feature | Implementation Details |
|---|---|
| Zero-Copy Reads | Memory-mapped file access via memmap2 crate, EntryHandle provides &[u8] views |
| Append-Only Writes | Sequential writes to RwLock<File>, metadata follows payload immediately |
| 64-Byte Alignment | Configurable via PAYLOAD_ALIGNMENT constant in simd-r-drive-entry-handle/src/constants.rs |
| Backward-Linked Chain | Each entry contains prev_offset field, enabling recovery and validation |
| Tombstone Deletions | Single 0x00 byte + metadata marks deleted entries |
Sources: README.md:43-148
Concurrency Model
| Component | Synchronization Primitive | Purpose |
|---|---|---|
| File Writes | RwLock<File> | Serializes write operations |
| Tail Offset | AtomicU64 | Lock-free offset tracking |
| Key Index | DashMap<u64, u64> | Concurrent hash map for lock-free reads |
| Memory Map | Mutex<Arc<Mmap>> | Safe shared access to mmap |
For detailed concurrency semantics, see Concurrency and Thread Safety.
Sources: README.md:170-200
Write and Read Modes
Write Modes:
- Single Entry :
write()- atomic single key-value write - Batch Entry :
batch_write()- multiple writes with single flush - Streaming :
write_stream()- large entries viaReadtrait
Read Modes:
- Direct Memory Access :
read()- zero-copy viaEntryHandle - Streaming :
read_stream()- incremental reads for large entries - Parallel Iteration :
par_iter_entries()- Rayon-powered parallel scanning (requiresparallelfeature)
For detailed read/write APIs, see DataStore API.
Sources: README.md:208-247
SIMD and Performance Optimizations
SIMD R Drive employs multiple optimization strategies:
| Optimization | Implementation | Benefit |
|---|---|---|
| SIMD Memory Copy | simd_copy with AVX2/NEON | Faster buffer staging for writes |
| SIMD Hash Function | xxh3_64 with SSE2/AVX2/NEON | Accelerated key hashing |
| Cache-Line Alignment | 64-byte PAYLOAD_ALIGNMENT | Prevents cache-line splits |
| Lock-Free Reads | DashMap + Arc<Mmap> | Concurrent zero-copy reads |
| Sequential Writes | Append-only design | Minimized disk seeks |
For detailed performance information, see Performance Optimizations and SIMD Acceleration.
Sources: README.md:249-257 Cargo.toml34
Multi-Language Support
Native Rust
The core library is implemented in Rust and can be used directly via Cargo:
Sources: Cargo.toml:11-21
Python Bindings
Two experimental Python integration paths are available:
- Direct Bindings (
simd-r-drive-py): PyO3-based bindings for direct access toDataStore - WebSocket Client (
simd-r-drive-ws-client-py): Remote access via WebSocket RPC
For Python integration details, see Python Integration.
Sources: README.md:262-265 Cargo.toml:74-76
WebSocket RPC (Experimental)
The experimental network layer enables remote access:
- Server :
simd-r-drive-ws-server- ExposesDataStoreover WebSocket - Native Client :
simd-r-drive-ws-client- Rust client for WebSocket connection - Service Definition :
simd-r-drive-muxio-service-definition- Shared RPC contract usingbitcodeserialization
For network layer details, see Network Layer and RPC.
Sources: Cargo.toml:70-72 Cargo.toml:85-89
Feature Flags
The core simd-r-drive package supports the following Cargo features:
| Feature | Description |
|---|---|
parallel | Enables Rayon-powered parallel iteration via par_iter_entries() |
arrow | Enables Apache Arrow integration in simd-r-drive-entry-handle for zero-copy typed views |
expose-internal-api | Exposes internal APIs for advanced use cases (unstable) |
Sources: Cargo.toml:49-55
Dependencies Overview
Core Dependencies
| Crate | Version | Purpose |
|---|---|---|
memmap2 | 0.9.5 | Memory-mapped file access |
xxhash-rust | 0.8.15 | SIMD-accelerated hashing (xxh3_64) |
dashmap | 6.1.0 | Concurrent hash map for lock-free indexing |
crc32fast | 1.4.2 | Payload integrity verification |
rayon | 1.10.0 | Parallel iteration (optional, requires parallel feature) |
Network Layer Dependencies (Experimental)
| Crate | Version | Purpose |
|---|---|---|
muxio-tokio-rpc-server | 0.9.0-alpha | WebSocket RPC server framework |
muxio-tokio-rpc-client | 0.9.0-alpha | WebSocket RPC client framework |
bitcode | 0.6.6 | Compact binary serialization for RPC |
tokio | 1.45.1 | Async runtime for network operations |
Sources: Cargo.toml:23-34 Cargo.toml:80-112
Development and Testing
The repository includes:
- Unit Tests : Inline tests in each module
- Integration Tests :
tests/directory with full system tests - Benchmarks : Criterion-based benchmarks in
benches/(see Benchmarking) - CI/CD : GitHub Actions workflows for cross-platform testing (see CI/CD Pipeline)
Sources: Cargo.toml:36-63
Next Steps
This overview introduces the high-level architecture and key components of SIMD R Drive. For deeper exploration:
- Core Storage Mechanics : See Core Storage Engine for detailed information about
DataStore, storage format, and memory management - API Usage : See DataStore API for method documentation and usage patterns
- Performance Tuning : See Performance Optimizations for SIMD usage, alignment, and benchmarking
- Python Integration : See Python Integration for binding usage and WebSocket client examples
- Building and Testing : See Development Guide for build instructions and contribution guidelines
Sources: README.md:1-285
Dismiss
Refresh this wiki
Enter email to refresh