Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Overview

Loading…

Overview

Relevant source files

Purpose and Scope

This document provides a high-level introduction to the SIMD R Drive codebase, explaining its purpose as a high-performance, append-only storage engine and outlining its major architectural components. For detailed information about specific subsystems, see the corresponding sections: Core Storage Engine, Network Layer and RPC, Python Integration, Performance Optimizations, and Extensions and Utilities.

Sources: README.md:1-40 Cargo.toml:1-21


What is SIMD R Drive?

SIMD R Drive is a high-performance, thread-safe, append-only storage engine designed for zero-copy binary access. It stores arbitrary binary data in a single-file storage container where all payloads are written at fixed 64-byte aligned boundaries , optimizing for SIMD operations and cache-line efficiency.

The system operates schema-less —it treats all stored data as raw bytes (&[u8]) without enforcing serialization formats or endianness. This design provides maximum flexibility for applications requiring high-speed storage and retrieval of structured or unstructured binary data.

Key characteristics:

FeatureDescription
Storage ModelSingle-file, append-only, key-value store
Access PatternZero-copy reads via memory-mapped files (memmap2)
Alignment64-byte payload boundaries (configurable)
IndexingO(1) hash-based lookups using xxh3_64 with SIMD acceleration
ConcurrencyThread-safe reads/writes using RwLock, AtomicU64, DashMap
Language SupportRust (native), Python (PyO3 bindings), WebSocket RPC (experimental)

Sources: README.md:5-87 Cargo.toml:12-21


High-Level System Architecture

The following diagram shows the complete system architecture, mapping high-level concepts to concrete code entities:

Diagram: System Architecture - Mapping Concepts to Code Entities

graph TB
    subgraph "User Interfaces"
        CLI["CLI Application\n(main.rs)"]
PY_BIND["Python Direct Bindings\n(simd-r-drive-py)"]
PY_WS["Python WebSocket Client\n(simd-r-drive-ws-client-py)"]
end
    
    subgraph "Core Storage Engine (simd-r-drive)"
        DS["DataStore\n(data_store/mod.rs)"]
READER["DataStoreReader trait"]
WRITER["DataStoreWriter trait"]
INDEX["KeyIndexer\n(key_indexer.rs)"]
end
    
    subgraph "Entry Abstraction (simd-r-drive-entry-handle)"
        EH["EntryHandle\n(entry_handle.rs)"]
META["EntryMetadata\n(entry_metadata.rs)"]
end
    
    subgraph "Network Layer (Experimental)"
        WS_SERVER["simd-r-drive-ws-server\n(WebSocket Server)"]
WS_CLIENT["simd-r-drive-ws-client\n(Native Rust Client)"]
SERVICE_DEF["simd-r-drive-muxio-service-definition\n(RPC Contract)"]
end
    
    subgraph "Storage Backend"
        MMAP["Memory-Mapped File\n(Arc<Mmap>)"]
FILE["Single Binary File\n(.bin)"]
end
    
    subgraph "Performance Layer"
        SIMD["SIMD Operations\n(simd_copy)"]
XXH3["xxh3_64 Hashing\n(KeyIndexer)"]
end
    
 
   CLI --> DS
 
   PY_BIND --> DS
 
   PY_WS --> WS_CLIENT
 
   WS_CLIENT --> SERVICE_DEF
 
   SERVICE_DEF --> WS_SERVER
 
   WS_SERVER --> DS
    
 
   DS --> READER
 
   DS --> WRITER
 
   DS --> INDEX
 
   DS --> EH
 
   DS --> MMAP
    
 
   EH --> META
 
   EH --> MMAP
 
   MMAP --> FILE
    
 
   DS --> SIMD
 
   INDEX --> XXH3
    
    style DS fill:#f9f9f9,stroke:#333,stroke-width:2px
    style EH fill:#f9f9f9,stroke:#333,stroke-width:2px

This diagram illustrates how user-facing interfaces connect to the core storage engine and supporting subsystems, using actual code entity names.

Sources: Cargo.toml:66-77 README.md:11-40


Repository Structure

The codebase is organized as a Cargo workspace with the following packages:

PackagePathPurpose
simd-r-drive./Core storage engine library and CLI
simd-r-drive-entry-handle./simd-r-drive-entry-handleEntry abstraction layer for zero-copy access
simd-r-drive-extensions./extensionsUtility functions and helper APIs
simd-r-drive-ws-server./experiments/simd-r-drive-ws-serverWebSocket RPC server (experimental)
simd-r-drive-ws-client./experiments/simd-r-drive-ws-clientWebSocket RPC client (experimental)
simd-r-drive-muxio-service-definition./experiments/simd-r-drive-muxio-service-definitionShared RPC service contract
Python bindings./experiments/bindings/pythonPyO3-based Python direct bindings
Python WS client./experiments/bindings/python-ws-clientPython WebSocket client bindings

For detailed information about the repository layout and package relationships, see Repository Structure.

Sources: Cargo.toml:66-77 README.md:259-265


graph TB
    subgraph "Public API"
        DST["DataStore struct\n(data_store/mod.rs)"]
READER_TRAIT["DataStoreReader trait\n(traits.rs)"]
WRITER_TRAIT["DataStoreWriter trait\n(traits.rs)"]
end
    
    subgraph "Indexing Layer"
        KI["KeyIndexer\n(key_indexer.rs)"]
DASHMAP["DashMap<u64, u64>\n(concurrent hash map)"]
XXH3_HASH["xxh3_64\n(key hashing)"]
end
    
    subgraph "Entry Management"
        EH_STRUCT["EntryHandle\n(entry_handle.rs)"]
EM_STRUCT["EntryMetadata\n(entry_metadata.rs)"]
PAYLOAD_ALIGN["PAYLOAD_ALIGNMENT\n(constants.rs)"]
end
    
    subgraph "Storage Backend"
        RWLOCK_FILE["RwLock<File>"]
MUTEX_MMAP["Mutex<Arc<Mmap>>"]
ATOMIC_TAIL["AtomicU64 tail_offset"]
end
    
    subgraph "SIMD Acceleration"
        SIMD_COPY["simd_copy\n(arch-specific impls)"]
AVX2["AVX2 impl (x86_64)"]
NEON["NEON impl (aarch64)"]
end
    
 
   DST --> READER_TRAIT
 
   DST --> WRITER_TRAIT
 
   DST --> KI
 
   DST --> RWLOCK_FILE
 
   DST --> MUTEX_MMAP
 
   DST --> ATOMIC_TAIL
    
 
   KI --> DASHMAP
 
   KI --> XXH3_HASH
    
 
   READER_TRAIT --> EH_STRUCT
 
   EH_STRUCT --> EM_STRUCT
 
   EH_STRUCT --> PAYLOAD_ALIGN
    
 
   WRITER_TRAIT --> SIMD_COPY
 
   SIMD_COPY --> AVX2
 
   SIMD_COPY --> NEON
    
    style DST fill:#f9f9f9,stroke:#333,stroke-width:2px
    style KI fill:#f9f9f9,stroke:#333,stroke-width:2px

Core Storage Components

The following diagram maps storage concepts to their implementing code entities:

Diagram: Core Storage Components - Code Entity Mapping

This diagram shows the relationship between storage concepts and their concrete implementations in the codebase.

Sources: README.md:172-183 Cargo.toml:23-34


Key Features Summary

Storage and Access Patterns

FeatureImplementation Details
Zero-Copy ReadsMemory-mapped file access via memmap2 crate, EntryHandle provides &[u8] views
Append-Only WritesSequential writes to RwLock<File>, metadata follows payload immediately
64-Byte AlignmentConfigurable via PAYLOAD_ALIGNMENT constant in simd-r-drive-entry-handle/src/constants.rs
Backward-Linked ChainEach entry contains prev_offset field, enabling recovery and validation
Tombstone DeletionsSingle 0x00 byte + metadata marks deleted entries

Sources: README.md:43-148

Concurrency Model

ComponentSynchronization PrimitivePurpose
File WritesRwLock<File>Serializes write operations
Tail OffsetAtomicU64Lock-free offset tracking
Key IndexDashMap<u64, u64>Concurrent hash map for lock-free reads
Memory MapMutex<Arc<Mmap>>Safe shared access to mmap

For detailed concurrency semantics, see Concurrency and Thread Safety.

Sources: README.md:170-200

Write and Read Modes

Write Modes:

  • Single Entry : write() - atomic single key-value write
  • Batch Entry : batch_write() - multiple writes with single flush
  • Streaming : write_stream() - large entries via Read trait

Read Modes:

  • Direct Memory Access : read() - zero-copy via EntryHandle
  • Streaming : read_stream() - incremental reads for large entries
  • Parallel Iteration : par_iter_entries() - Rayon-powered parallel scanning (requires parallel feature)

For detailed read/write APIs, see DataStore API.

Sources: README.md:208-247


SIMD and Performance Optimizations

SIMD R Drive employs multiple optimization strategies:

OptimizationImplementationBenefit
SIMD Memory Copysimd_copy with AVX2/NEONFaster buffer staging for writes
SIMD Hash Functionxxh3_64 with SSE2/AVX2/NEONAccelerated key hashing
Cache-Line Alignment64-byte PAYLOAD_ALIGNMENTPrevents cache-line splits
Lock-Free ReadsDashMap + Arc<Mmap>Concurrent zero-copy reads
Sequential WritesAppend-only designMinimized disk seeks

For detailed performance information, see Performance Optimizations and SIMD Acceleration.

Sources: README.md:249-257 Cargo.toml34


Multi-Language Support

Native Rust

The core library is implemented in Rust and can be used directly via Cargo:

Sources: Cargo.toml:11-21

Python Bindings

Two experimental Python integration paths are available:

  1. Direct Bindings (simd-r-drive-py): PyO3-based bindings for direct access to DataStore
  2. WebSocket Client (simd-r-drive-ws-client-py): Remote access via WebSocket RPC

For Python integration details, see Python Integration.

Sources: README.md:262-265 Cargo.toml:74-76

WebSocket RPC (Experimental)

The experimental network layer enables remote access:

  • Server : simd-r-drive-ws-server - Exposes DataStore over WebSocket
  • Native Client : simd-r-drive-ws-client - Rust client for WebSocket connection
  • Service Definition : simd-r-drive-muxio-service-definition - Shared RPC contract using bitcode serialization

For network layer details, see Network Layer and RPC.

Sources: Cargo.toml:70-72 Cargo.toml:85-89


Feature Flags

The core simd-r-drive package supports the following Cargo features:

FeatureDescription
parallelEnables Rayon-powered parallel iteration via par_iter_entries()
arrowEnables Apache Arrow integration in simd-r-drive-entry-handle for zero-copy typed views
expose-internal-apiExposes internal APIs for advanced use cases (unstable)

Sources: Cargo.toml:49-55


Dependencies Overview

Core Dependencies

CrateVersionPurpose
memmap20.9.5Memory-mapped file access
xxhash-rust0.8.15SIMD-accelerated hashing (xxh3_64)
dashmap6.1.0Concurrent hash map for lock-free indexing
crc32fast1.4.2Payload integrity verification
rayon1.10.0Parallel iteration (optional, requires parallel feature)

Network Layer Dependencies (Experimental)

CrateVersionPurpose
muxio-tokio-rpc-server0.9.0-alphaWebSocket RPC server framework
muxio-tokio-rpc-client0.9.0-alphaWebSocket RPC client framework
bitcode0.6.6Compact binary serialization for RPC
tokio1.45.1Async runtime for network operations

Sources: Cargo.toml:23-34 Cargo.toml:80-112


Development and Testing

The repository includes:

  • Unit Tests : Inline tests in each module
  • Integration Tests : tests/ directory with full system tests
  • Benchmarks : Criterion-based benchmarks in benches/ (see Benchmarking)
  • CI/CD : GitHub Actions workflows for cross-platform testing (see CI/CD Pipeline)

Sources: Cargo.toml:36-63


Next Steps

This overview introduces the high-level architecture and key components of SIMD R Drive. For deeper exploration:

  • Core Storage Mechanics : See Core Storage Engine for detailed information about DataStore, storage format, and memory management
  • API Usage : See DataStore API for method documentation and usage patterns
  • Performance Tuning : See Performance Optimizations for SIMD usage, alignment, and benchmarking
  • Python Integration : See Python Integration for binding usage and WebSocket client examples
  • Building and Testing : See Development Guide for build instructions and contribution guidelines

Sources: README.md:1-285

Dismiss

Refresh this wiki

Enter email to refresh