Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Overview

Loading…

Overview

Relevant source files

Purpose and Scope

This document provides a high-level introduction to the SIMD R Drive codebase, explaining its purpose as a high-performance, append-only storage engine and outlining its major architectural components. For detailed information about specific subsystems, see the corresponding sections: Core Storage Engine, Network Layer and RPC, Python Integration, Performance Optimizations, and Extensions and Utilities.

Sources: README.md:1-40 Cargo.toml:1-21


What is SIMD R Drive?

SIMD R Drive is a high-performance, thread-safe, append-only storage engine designed for zero-copy binary access. It stores arbitrary binary data in a single-file storage container where all payloads are written at fixed 64-byte aligned boundaries , optimizing for SIMD operations and cache-line efficiency.

The system operates schema-less —it treats all stored data as raw bytes (&[u8]) without enforcing serialization formats or endianness. This design provides maximum flexibility for applications requiring high-speed storage and retrieval of structured or unstructured binary data.

Key characteristics:

FeatureDescription
Storage ModelSingle-file, append-only, key-value store
Access PatternZero-copy reads via memory-mapped files (memmap2)
Alignment64-byte payload boundaries (configurable)
IndexingO(1) hash-based lookups using xxh3_64 with SIMD acceleration
ConcurrencyThread-safe reads/writes using RwLock, AtomicU64, DashMap
Language SupportRust (native), Python (PyO3 bindings), WebSocket RPC (experimental)

Sources: README.md:5-87 Cargo.toml:12-21


High-Level System Architecture

The following diagram shows the complete system architecture, mapping high-level concepts to concrete code entities:

Diagram: System Architecture - Mapping Concepts to Code Entities

graph TB
    subgraph "User Interfaces"
        CLI["CLI Application\n(main.rs)"]
PY_BIND["Python Direct Bindings\n(simd-r-drive-py)"]
PY_WS["Python WebSocket Client\n(simd-r-drive-ws-client-py)"]
end
    
    subgraph "Core Storage Engine (simd-r-drive)"
        DS["DataStore\n(data_store/mod.rs)"]
READER["DataStoreReader trait"]
WRITER["DataStoreWriter trait"]
INDEX["KeyIndexer\n(key_indexer.rs)"]
end
    
    subgraph "Entry Abstraction (simd-r-drive-entry-handle)"
        EH["EntryHandle\n(entry_handle.rs)"]
META["EntryMetadata\n(entry_metadata.rs)"]
end
    
    subgraph "Network Layer (Experimental)"
        WS_SERVER["simd-r-drive-ws-server\n(WebSocket Server)"]
WS_CLIENT["simd-r-drive-ws-client\n(Native Rust Client)"]
SERVICE_DEF["simd-r-drive-muxio-service-definition\n(RPC Contract)"]
end
    
    subgraph "Storage Backend"
        MMAP["Memory-Mapped File\n(Arc<Mmap>)"]
FILE["Single Binary File\n(.bin)"]
end
    
    subgraph "Performance Layer"
        SIMD["SIMD Operations\n(simd_copy)"]
XXH3["xxh3_64 Hashing\n(KeyIndexer)"]
end
    
 
   CLI --> DS
 
   PY_BIND --> DS
 
   PY_WS --> WS_CLIENT
 
   WS_CLIENT --> SERVICE_DEF
 
   SERVICE_DEF --> WS_SERVER
 
   WS_SERVER --> DS
    
 
   DS --> READER
 
   DS --> WRITER
 
   DS --> INDEX
 
   DS --> EH
 
   DS --> MMAP
    
 
   EH --> META
 
   EH --> MMAP
 
   MMAP --> FILE
    
 
   DS --> SIMD
 
   INDEX --> XXH3
    
    style DS fill:#f9f9f9,stroke:#333,stroke-width:2px
    style EH fill:#f9f9f9,stroke:#333,stroke-width:2px

This diagram illustrates how user-facing interfaces connect to the core storage engine and supporting subsystems, using actual code entity names.

Sources: Cargo.toml:66-77 README.md:11-40


Repository Structure

The codebase is organized as a Cargo workspace with the following packages:

PackagePathPurpose
simd-r-drive./Core storage engine library and CLI
simd-r-drive-entry-handle./simd-r-drive-entry-handleEntry abstraction layer for zero-copy access
simd-r-drive-extensions./extensionsUtility functions and helper APIs
simd-r-drive-ws-server./experiments/simd-r-drive-ws-serverWebSocket RPC server (experimental)
simd-r-drive-ws-client./experiments/simd-r-drive-ws-clientWebSocket RPC client (experimental)
simd-r-drive-muxio-service-definition./experiments/simd-r-drive-muxio-service-definitionShared RPC service contract
Python bindings./experiments/bindings/pythonPyO3-based Python direct bindings
Python WS client./experiments/bindings/python-ws-clientPython WebSocket client bindings

For detailed information about the repository layout and package relationships, see Repository Structure.

Sources: Cargo.toml:66-77 README.md:259-265


graph TB
    subgraph "Public API"
        DST["DataStore struct\n(data_store/mod.rs)"]
READER_TRAIT["DataStoreReader trait\n(traits.rs)"]
WRITER_TRAIT["DataStoreWriter trait\n(traits.rs)"]
end
    
    subgraph "Indexing Layer"
        KI["KeyIndexer\n(key_indexer.rs)"]
DASHMAP["DashMap<u64, u64>\n(concurrent hash map)"]
XXH3_HASH["xxh3_64\n(key hashing)"]
end
    
    subgraph "Entry Management"
        EH_STRUCT["EntryHandle\n(entry_handle.rs)"]
EM_STRUCT["EntryMetadata\n(entry_metadata.rs)"]
PAYLOAD_ALIGN["PAYLOAD_ALIGNMENT\n(constants.rs)"]
end
    
    subgraph "Storage Backend"
        RWLOCK_FILE["RwLock<File>"]
MUTEX_MMAP["Mutex<Arc<Mmap>>"]
ATOMIC_TAIL["AtomicU64 tail_offset"]
end
    
    subgraph "SIMD Acceleration"
        SIMD_COPY["simd_copy\n(arch-specific impls)"]
AVX2["AVX2 impl (x86_64)"]
NEON["NEON impl (aarch64)"]
end
    
 
   DST --> READER_TRAIT
 
   DST --> WRITER_TRAIT
 
   DST --> KI
 
   DST --> RWLOCK_FILE
 
   DST --> MUTEX_MMAP
 
   DST --> ATOMIC_TAIL
    
 
   KI --> DASHMAP
 
   KI --> XXH3_HASH
    
 
   READER_TRAIT --> EH_STRUCT
 
   EH_STRUCT --> EM_STRUCT
 
   EH_STRUCT --> PAYLOAD_ALIGN
    
 
   WRITER_TRAIT --> SIMD_COPY
 
   SIMD_COPY --> AVX2
 
   SIMD_COPY --> NEON
    
    style DST fill:#f9f9f9,stroke:#333,stroke-width:2px
    style KI fill:#f9f9f9,stroke:#333,stroke-width:2px

Core Storage Components

The following diagram maps storage concepts to their implementing code entities:

Diagram: Core Storage Components - Code Entity Mapping

This diagram shows the relationship between storage concepts and their concrete implementations in the codebase.

Sources: README.md:172-183 Cargo.toml:23-34


Key Features Summary

Storage and Access Patterns

FeatureImplementation Details
Zero-Copy ReadsMemory-mapped file access via memmap2 crate, EntryHandle provides &[u8] views
Append-Only WritesSequential writes to RwLock<File>, metadata follows payload immediately
64-Byte AlignmentConfigurable via PAYLOAD_ALIGNMENT constant in simd-r-drive-entry-handle/src/constants.rs
Backward-Linked ChainEach entry contains prev_offset field, enabling recovery and validation
Tombstone DeletionsSingle 0x00 byte + metadata marks deleted entries

Sources: README.md:43-148

Concurrency Model

ComponentSynchronization PrimitivePurpose
File WritesRwLock<File>Serializes write operations
Tail OffsetAtomicU64Lock-free offset tracking
Key IndexDashMap<u64, u64>Concurrent hash map for lock-free reads
Memory MapMutex<Arc<Mmap>>Safe shared access to mmap

For detailed concurrency semantics, see Concurrency and Thread Safety.

Sources: README.md:170-200

Write and Read Modes

Write Modes:

  • Single Entry : write() - atomic single key-value write
  • Batch Entry : batch_write() - multiple writes with single flush
  • Streaming : write_stream() - large entries via Read trait

Read Modes:

  • Direct Memory Access : read() - zero-copy via EntryHandle
  • Streaming : read_stream() - incremental reads for large entries
  • Parallel Iteration : par_iter_entries() - Rayon-powered parallel scanning (requires parallel feature)

For detailed read/write APIs, see DataStore API.

Sources: README.md:208-247


SIMD and Performance Optimizations

SIMD R Drive employs multiple optimization strategies:

OptimizationImplementationBenefit
SIMD Memory Copysimd_copy with AVX2/NEONFaster buffer staging for writes
SIMD Hash Functionxxh3_64 with SSE2/AVX2/NEONAccelerated key hashing
Cache-Line Alignment64-byte PAYLOAD_ALIGNMENTPrevents cache-line splits
Lock-Free ReadsDashMap + Arc<Mmap>Concurrent zero-copy reads
Sequential WritesAppend-only designMinimized disk seeks

For detailed performance information, see Performance Optimizations and SIMD Acceleration.

Sources: README.md:249-257 Cargo.toml34


Multi-Language Support

Native Rust

The core library is implemented in Rust and can be used directly via Cargo:

Sources: Cargo.toml:11-21

Python Bindings

Two experimental Python integration paths are available:

  1. Direct Bindings (simd-r-drive-py): PyO3-based bindings for direct access to DataStore
  2. WebSocket Client (simd-r-drive-ws-client-py): Remote access via WebSocket RPC

For Python integration details, see Python Integration.

Sources: README.md:262-265 Cargo.toml:74-76

WebSocket RPC (Experimental)

The experimental network layer enables remote access:

  • Server : simd-r-drive-ws-server - Exposes DataStore over WebSocket
  • Native Client : simd-r-drive-ws-client - Rust client for WebSocket connection
  • Service Definition : simd-r-drive-muxio-service-definition - Shared RPC contract using bitcode serialization

For network layer details, see Network Layer and RPC.

Sources: Cargo.toml:70-72 Cargo.toml:85-89


Feature Flags

The core simd-r-drive package supports the following Cargo features:

FeatureDescription
parallelEnables Rayon-powered parallel iteration via par_iter_entries()
arrowEnables Apache Arrow integration in simd-r-drive-entry-handle for zero-copy typed views
expose-internal-apiExposes internal APIs for advanced use cases (unstable)

Sources: Cargo.toml:49-55


Dependencies Overview

Core Dependencies

CrateVersionPurpose
memmap20.9.5Memory-mapped file access
xxhash-rust0.8.15SIMD-accelerated hashing (xxh3_64)
dashmap6.1.0Concurrent hash map for lock-free indexing
crc32fast1.4.2Payload integrity verification
rayon1.10.0Parallel iteration (optional, requires parallel feature)

Network Layer Dependencies (Experimental)

CrateVersionPurpose
muxio-tokio-rpc-server0.9.0-alphaWebSocket RPC server framework
muxio-tokio-rpc-client0.9.0-alphaWebSocket RPC client framework
bitcode0.6.6Compact binary serialization for RPC
tokio1.45.1Async runtime for network operations

Sources: Cargo.toml:23-34 Cargo.toml:80-112


Development and Testing

The repository includes:

  • Unit Tests : Inline tests in each module
  • Integration Tests : tests/ directory with full system tests
  • Benchmarks : Criterion-based benchmarks in benches/ (see Benchmarking)
  • CI/CD : GitHub Actions workflows for cross-platform testing (see CI/CD Pipeline)

Sources: Cargo.toml:36-63


Next Steps

This overview introduces the high-level architecture and key components of SIMD R Drive. For deeper exploration:

  • Core Storage Mechanics : See Core Storage Engine for detailed information about DataStore, storage format, and memory management
  • API Usage : See DataStore API for method documentation and usage patterns
  • Performance Tuning : See Performance Optimizations for SIMD usage, alignment, and benchmarking
  • Python Integration : See Python Integration for binding usage and WebSocket client examples
  • Building and Testing : See Development Guide for build instructions and contribution guidelines

Sources: README.md:1-285

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Core Storage Engine

Loading…

Core Storage Engine

Relevant source files

Purpose and Scope

This document provides an overview of the core storage engine architecture in SIMD R Drive. It covers the fundamental design principles, main components, and data flow patterns that enable high-performance, append-only binary storage.

For detailed information about specific aspects of the storage engine:

Sources: README.md:1-50 src/lib.rs:1-28


System Architecture

The core storage engine is implemented as a single-file, append-only key-value store. It consists of four primary components that work together to provide high-performance binary storage with zero-copy read access.

Diagram: Core Storage Engine Components

graph TB
    subgraph "Public API Layer"
        DS["DataStore"]
DSR["DataStoreReader trait"]
DSW["DataStoreWriter trait"]
end
    
    subgraph "Indexing Layer"
        KI["KeyIndexer"]
DASHMAP["DashMap&lt;u64, u64&gt;"]
XXH3["xxh3_64 hasher"]
end
    
    subgraph "Storage Layer"
        MMAP["Arc&lt;Mmap&gt;"]
FILE["Single binary file"]
RWLOCK["RwLock&lt;File&gt;"]
end
    
    subgraph "Access Layer"
        EH["EntryHandle"]
EI["EntryIterator"]
ES["EntryStream"]
end
    
 
   DS --> DSR
 
   DS --> DSW
 
   DS --> KI
 
   DS --> MMAP
 
   DS --> RWLOCK
    
 
   KI --> DASHMAP
 
   KI --> XXH3
    
 
   DSR --> EH
 
   DSR --> EI
 
   DSW --> RWLOCK
    
 
   EH --> MMAP
 
   EI --> MMAP
 
   ES --> EH
    
 
   MMAP --> FILE
 
   RWLOCK --> FILE

The diagram shows the relationship between the main code entities:

ComponentCode EntityPurpose
Public APIDataStoreMain interface for storage operations
TraitsDataStoreReader, DataStoreWriterSeparate read/write capabilities
IndexingKeyIndexerManages in-memory hash index
Hash MapDashMap<u64, u64>Lock-free concurrent hash map
Hashingxxh3_64SIMD-accelerated hash function
Memory MappingArc<Mmap>Shared memory-mapped file reference
File AccessRwLock<File>Synchronized write access
Entry AccessEntryHandleZero-copy view into mapped memory
IterationEntryIteratorBackward chain traversal
StreamingEntryStreamBuffered reading for large entries

Sources: src/storage_engine.rs:1-25 src/lib.rs:129-136 README.md:5-11


Append-Only Design

The storage engine follows a strict append-only model where data is never modified or deleted in place. All operations result in new entries being appended to the end of the file.

Diagram: Write Path Data Flow

graph LR
    subgraph "Write Operations"
        W["write()"]
BW["batch_write()"]
WS["write_stream()"]
DEL["delete()"]
end
    
    subgraph "Internal Write Path"
        HASH["Calculate xxh3_64 hash"]
ALIGN["Calculate prepad for\n64-byte alignment"]
COPY["simd_copy payload"]
APPEND["Append to file"]
META["Write metadata:\nkey_hash, prev_offset, crc32"]
end
    
    subgraph "File State"
        TAIL["AtomicU64 tail_offset"]
CHAIN["Backward-linked chain"]
end
    
 
   W --> HASH
 
   BW --> HASH
 
   WS --> HASH
 
   DEL --> HASH
    
 
   HASH --> ALIGN
 
   ALIGN --> COPY
 
   COPY --> APPEND
 
   APPEND --> META
 
   META --> TAIL
 
   META --> CHAIN

Key Characteristics

CharacteristicImplementation
ImmutabilityEntries are never modified after writing
OverwritesNew entries with same key supersede old ones
DeletionsAppend tombstone marker (single 0x00 byte)
File GrowthFile grows monotonically until compaction
OrderingMaintains temporal order via prev_offset chain
RecoveryIncomplete writes detected via chain validation

The append-only design provides several benefits:

  • Crash safety : Incomplete writes can be detected and discarded
  • Simplified concurrency : No in-place modifications to coordinate
  • Time-travel : Historical data remains until compaction
  • Write performance : Sequential I/O with no seek overhead

Sources: README.md:98-147 src/lib.rs:3-17


Single-File Storage Container

All data is stored in a single binary file with a specific structure designed for efficient access and validation.

Diagram: File Organization and Backward-Linked Chain

graph TB
    subgraph "File Structure"
        START["File Start: offset 0"]
E1["Entry 1\nprepad + payload + metadata"]
E2["Entry 2\nprepad + payload + metadata"]
E3["Entry 3\nprepad + payload + metadata"]
EN["Entry N\nprepad + payload + metadata"]
TAIL["tail_offset\nEnd of valid data"]
end
    
    subgraph "Metadata Chain"
        M1["metadata.prev_offset = 0"]
M2["metadata.prev_offset = end(E1)"]
M3["metadata.prev_offset = end(E2)"]
MN["metadata.prev_offset = end(E3)"]
end
    
 
   START --> E1
 
   E1 --> E2
 
   E2 --> E3
 
   E3 --> EN
 
   EN --> TAIL
    
 
   E1 -.-> M1
 
   E2 -.-> M2
 
   E3 -.-> M3
 
   EN -.-> MN
    
    MN -.backward chain.-> M3
    M3 -.backward chain.-> M2
    M2 -.backward chain.-> M1

Storage Properties

PropertyValuePurpose
File TypeSingle binary fileSimplified management and deployment
Entry Alignment64-byte boundariesCache-line and SIMD optimization
Metadata Size20 byteskey_hash (8) + prev_offset (8) + crc32 (4)
Chain DirectionBackward (tail to head)Fast validation and recovery
Maximum Size256 TiB48-bit offset support
FormatSchema-lessRaw binary with no interpretation

Entry Composition

Each entry consists of three parts:

  1. Pre-padding (0-63 bytes): Zero bytes to align payload start to 64-byte boundary
  2. Payload (variable length): Raw binary data
  3. Metadata (20 bytes): Hash, previous offset, checksum

Tombstone entries (deletions) use a minimal format:

  • 1-byte payload (0x00)
  • 20-byte metadata

Sources: README.md:104-138 README.md:61-97


DataStore: Primary Interface

DataStore is the main public interface providing all storage operations. It implements the DataStoreReader and DataStoreWriter traits to separate read and write capabilities.

Diagram: DataStore Structure and Methods

graph TB
    subgraph "DataStore struct"
        FILE_LOCK["file: RwLock&lt;File&gt;"]
MMAP_LOCK["mmap: Mutex&lt;Arc&lt;Mmap&gt;&gt;"]
INDEXER["indexer: KeyIndexer"]
TAIL["tail_offset: AtomicU64"]
end
    
    subgraph "DataStoreWriter methods"
        W1["write(key, value)"]
W2["batch_write(entries)"]
W3["write_stream(key, reader)"]
W4["delete(key)"]
end
    
    subgraph "DataStoreReader methods"
        R1["read(key) -> Option&lt;EntryHandle&gt;"]
R2["batch_read(keys)"]
R3["iter_entries() -> EntryIterator"]
R4["contains_key(key) -> bool"]
end
    
    subgraph "Maintenance methods"
        M1["compact() -> Stats"]
M2["estimate_compaction_space()"]
M3["verify_file_integrity()"]
end
    
 
   FILE_LOCK --> W1
 
   FILE_LOCK --> W2
 
   FILE_LOCK --> W3
 
   FILE_LOCK --> W4
    
 
   MMAP_LOCK --> R1
 
   MMAP_LOCK --> R2
 
   MMAP_LOCK --> R3
    
 
   INDEXER --> R1
 
   INDEXER --> R2
 
   INDEXER --> R4
    
 
   TAIL --> W1
 
   TAIL --> W2
 
   TAIL --> W3

Core Fields

The DataStore struct maintains four critical fields:

FieldTypePurpose
fileRwLock<File>Serializes write operations
mmapMutex<Arc<Mmap>>Protects memory map updates
indexerKeyIndexerProvides O(1) key lookups
tail_offsetAtomicU64Tracks file end without locks

Sources: src/storage_engine.rs:4-5 src/lib.rs:19-63


graph LR
    subgraph "Read Request"
        KEY["Key bytes"]
HASH_KEY["xxh3_64(key)"]
end
    
    subgraph "Index Lookup"
        DASHMAP_GET["DashMap.get(hash)"]
PACKED["Packed value:\n16-bit tag + 48-bit offset"]
TAG_CHECK["Verify 16-bit tag"]
end
    
    subgraph "Memory Access"
        OFFSET["File offset"]
MMAP_SLICE["Arc&lt;Mmap&gt; slice"]
METADATA_READ["Read EntryMetadata"]
PAYLOAD_RANGE["Calculate payload range"]
end
    
    subgraph "Result"
        EH["EntryHandle\n(zero-copy view)"]
end
    
 
   KEY --> HASH_KEY
 
   HASH_KEY --> DASHMAP_GET
 
   DASHMAP_GET --> PACKED
 
   PACKED --> TAG_CHECK
 
   TAG_CHECK --> OFFSET
 
   OFFSET --> MMAP_SLICE
 
   MMAP_SLICE --> METADATA_READ
 
   METADATA_READ --> PAYLOAD_RANGE
 
   PAYLOAD_RANGE --> EH

Zero-Copy Read Path

Read operations leverage memory-mapped files to provide zero-copy access to stored data without deserialization overhead.

Diagram: Zero-Copy Read Operation Flow

Read Performance Characteristics

OperationComplexityNotes
Single readO(1)Hash index lookup + memory access
Batch readO(n)Independent lookups, parallelizable
Full iterationO(m)m = total entries, follows chain
Collision handlingO(1)16-bit tag check

EntryHandle

The EntryHandle struct provides a zero-copy view into the memory-mapped file:

Methods like as_slice(), as_bytes(), and streaming conversion allow direct access to payload data without copying.

Sources: README.md:43-50 README.md:228-231 src/storage_engine/entry_iterator.rs:8-21


graph TB
    subgraph "Key to Hash"
        K1["Key: 'user:1234'"]
H1["xxh3_64 hash:\n0xABCDEF0123456789"]
end
    
    subgraph "Hash to Packed Value"
        TAG["Extract 16-bit tag:\n0xABCD"]
OFF["Extract file offset:\n0x00EF0123456789"]
PACKED_VAL["Packed 64-bit value:\ntag:16 / offset:48"]
end
    
    subgraph "DashMap Storage"
        DM["DashMap&lt;u64, u64&gt;"]
ENTRY["hash -> packed_value"]
end
    
    subgraph "Collision Detection"
        LOOKUP["On read: lookup hash"]
VERIFY["Verify 16-bit tag matches"]
RESOLVE["If mismatch: collision detected"]
end
    
 
   K1 --> H1
 
   H1 --> TAG
 
   H1 --> OFF
 
   TAG --> PACKED_VAL
 
   OFF --> PACKED_VAL
 
   PACKED_VAL --> ENTRY
 
   ENTRY --> DM
    
 
   DM --> LOOKUP
 
   LOOKUP --> VERIFY
 
   VERIFY --> RESOLVE

Key Indexing System

The KeyIndexer maintains an in-memory hash index for O(1) key lookups using the xxh3_64 hashing algorithm with SIMD acceleration.

Diagram: Key Indexing and Collision Detection

Index Structure

ComponentTypeSizePurpose
Hash MapDashMap<u64, u64>DynamicLock-free concurrent access
Key Hashu648 bytesFull xxh3_64 hash of key
Packed Valueu648 bytes16-bit tag + 48-bit offset
Tagu162 bytesCollision detection
Offsetu486 bytesFile location (0-256 TiB)

Hash Algorithm Properties

The xxh3_64 algorithm provides:

  • SIMD acceleration : Uses SSE2/AVX2 (x86_64) or NEON (ARM)
  • High quality : Low collision probability
  • Performance : Optimized for throughput
  • Stability : Consistent across platforms

Sources: README.md:158-168 src/storage_engine.rs:13-14


graph TB
    subgraph "Concurrent Reads"
        T1["Thread 1 read()"]
T2["Thread 2 read()"]
T3["Thread 3 read()"]
DM_READ["DashMap lock-free read"]
MMAP_SHARE["Shared Arc&lt;Mmap&gt;"]
end
    
    subgraph "Synchronized Writes"
        T4["Thread 4 write()"]
T5["Thread 5 write()"]
RWLOCK_ACQUIRE["RwLock write lock"]
FILE_WRITE["Exclusive file access"]
INDEX_UPDATE["Update DashMap"]
ATOMIC_UPDATE["AtomicU64 tail_offset"]
end
    
    subgraph "Memory Map Updates"
        REMAP["Remap after write"]
MUTEX_LOCK["Mutex&lt;Arc&lt;Mmap&gt;&gt;"]
NEW_ARC["Create new Arc&lt;Mmap&gt;"]
end
    
 
   T1 --> DM_READ
 
   T2 --> DM_READ
 
   T3 --> DM_READ
 
   DM_READ --> MMAP_SHARE
    
 
   T4 --> RWLOCK_ACQUIRE
 
   T5 --> RWLOCK_ACQUIRE
 
   RWLOCK_ACQUIRE --> FILE_WRITE
 
   FILE_WRITE --> INDEX_UPDATE
 
   FILE_WRITE --> ATOMIC_UPDATE
 
   FILE_WRITE --> REMAP
    
 
   REMAP --> MUTEX_LOCK
 
   MUTEX_LOCK --> NEW_ARC
 
   NEW_ARC --> MMAP_SHARE

Thread Safety Model

The storage engine provides thread-safe concurrent access within a single process using a combination of synchronization primitives.

Diagram: Concurrency Control Mechanisms

Synchronization Primitives

PrimitiveProtectsAccess Pattern
RwLock<File>File handleExclusive writes, no lock for reads
Mutex<Arc<Mmap>>Memory mappingLocked during remap, readers get Arc clone
DashMap<u64, u64>Key indexLock-free concurrent reads
AtomicU64Tail offsetLock-free updates

Thread Safety Guarantees

Within single process:

  • ✅ Multiple concurrent reads (zero-copy, lock-free)
  • ✅ Serialized writes (RwLock ensures ordering)
  • ✅ Consistent index updates (DashMap internal locks)
  • ✅ Safe memory mapping (Arc reference counting)

Across multiple processes:

  • ❌ No cross-process coordination
  • ❌ Requires external file locking (e.g., flock)

Sources: README.md:170-206


graph LR
    subgraph "Write Path SIMD"
        PAYLOAD["Payload bytes"]
SIMD_COPY["simd_copy()"]
AVX2["x86_64: AVX2\n256-bit vectors"]
NEON["ARM: NEON\n128-bit vectors"]
BUFFER["Aligned buffer"]
end
    
    subgraph "Hash Path SIMD"
        KEY["Key bytes"]
XXH3_SIMD["xxh3_64 SIMD"]
SSE2["x86_64: SSE2/AVX2"]
NEON2["ARM: NEON"]
HASH_OUT["64-bit hash"]
end
    
 
   PAYLOAD --> SIMD_COPY
 
   SIMD_COPY --> AVX2
 
   SIMD_COPY --> NEON
 
   AVX2 --> BUFFER
 
   NEON --> BUFFER
    
 
   KEY --> XXH3_SIMD
 
   XXH3_SIMD --> SSE2
 
   XXH3_SIMD --> NEON2
 
   SSE2 --> HASH_OUT
 
   NEON2 --> HASH_OUT

Performance Optimizations

The storage engine incorporates several performance optimizations for high-throughput workloads.

SIMD Acceleration

Diagram: SIMD Operations in Write Path

Optimization Features

FeatureImplementationBenefit
SIMD Copysimd_copy() with AVX2/NEONFaster memory operations
Cache Alignment64-byte payload boundariesOptimal cache-line usage
Zero-Copy Readsmmap + EntryHandleNo deserialization overhead
Lock-Free IndexDashMap for readsConcurrent read scaling
Atomic TrackingAtomicU64 tail offsetNo lock contention for offset
Sequential WritesAppend-only designOptimal disk I/O patterns

Alignment Benefits

The 64-byte PAYLOAD_ALIGNMENT constant ensures:

  1. Cache efficiency : Payloads align with CPU cache lines
  2. SIMD compatibility : Vector loads don’t cross boundaries
  3. Predictable performance : Consistent access patterns
  4. Type casting safety : Can reinterpret as typed slices

Sources: README.md:51-59 README.md:248-256


Operation Modes

The storage engine supports multiple operation modes optimized for different use cases.

Write Modes

ModeMethodUse CaseCharacteristics
Singlewrite(key, value)Individual entriesImmediate flush, atomic
Batchbatch_write(entries)Multiple entriesSingle lock, batch flush
Streamingwrite_stream(key, reader)Large payloadsNo full memory allocation

Read Modes

ModeMethodUse CaseCharacteristics
Directread(key)Single entryZero-copy EntryHandle
Batchbatch_read(keys)Multiple entriesIndependent lookups
Iterationiter_entries()Full scanFollows chain backward
Parallelpar_iter_entries()Bulk processingRayon-powered (optional)
StreamingEntryStreamLarge entriesBuffered, incremental

Sources: README.md:208-246 src/lib.rs:20-115


Summary

The core storage engine provides a high-performance, append-only key-value store built on four main components:

  1. DataStore : Public API implementing reader and writer traits
  2. KeyIndexer : O(1) hash-based index with collision detection
  3. Arc : Zero-copy memory-mapped file access
  4. EntryHandle : View abstraction for payload data

Key architectural decisions:

  • Append-only : Simplifies concurrency and crash recovery
  • Single-file : Easy deployment and management
  • 64-byte alignment : Optimizes cache and SIMD performance
  • Backward chain : Enables fast validation and iteration
  • Zero-copy reads : Eliminates deserialization overhead
  • Lock-free index : Scales read throughput across threads

For implementation details on specific subsystems, refer to the child pages listed at the beginning of this document.

Sources: README.md:1-50 src/lib.rs:1-28 src/storage_engine.rs:1-25

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Repository Structure

Loading…

Repository Structure

Relevant source files

Purpose and Scope

This document describes the organization of the SIMD R Drive repository as a Cargo workspace, detailing the individual packages (crates) that comprise the system, their purposes, and their inter-dependencies. The repository is structured as a monorepo containing a core storage engine, supporting libraries, experimental network components, and language bindings.

For information about the core storage engine architecture and on-disk format, see Storage Architecture. For details on building and testing the codebase, see Building and Testing.


Workspace Organization

The SIMD R Drive repository is organized as a Cargo workspace defined in Cargo.toml:65-78 The workspace uses Cargo’s resolver version 2 and manages multiple interdependent packages with shared versioning and dependencies.

Sources: Cargo.toml:65-78


Workspace Configuration

The workspace defines common package metadata that all member crates inherit:

Metadata FieldValue
Version0.15.5-alpha
Edition2024
Repositoryhttps://github.com/jzombie/rust-simd-r-drive
LicenseApache-2.0
Categoriesdatabase-implementations, data-structures, filesystem
Keywordsstorage-engine, binary-storage, append-only, simd, mmap

Sources: Cargo.toml:1-9


Package Structure Overview

Workspace Members

The workspace includes six member crates defined in Cargo.toml:66-73:

  1. "." - The root simd-r-drive package
  2. "simd-r-drive-entry-handle" - Entry abstraction library
  3. "extensions" - Utility extensions
  4. "experiments/simd-r-drive-ws-server" - WebSocket server
  5. "experiments/simd-r-drive-ws-client" - WebSocket client
  6. "experiments/simd-r-drive-muxio-service-definition" - RPC service contract

Excluded Members

Two Python binding packages are excluded from the workspace Cargo.toml:74-77 because they use maturin with separate build systems:

  • "experiments/bindings/python" - Direct Rust-Python bindings
  • "experiments/bindings/python-ws-client" - Python WebSocket client bindings

Sources: Cargo.toml:65-78


Core Packages

simd-r-drive (Root Package)

Location: Root directory
Cargo Name: simd-r-drive
Description: “SIMD-optimized append-only schema-less storage engine. Key-based binary storage in a single-file storage container.”

This is the main storage engine package providing the DataStore API for append-only key-value storage with SIMD acceleration and memory-mapped file access.

Key Exports:

  • DataStore - Main storage interface
  • DataStoreReader / DataStoreWriter - Trait-based access patterns
  • KeyIndexer - Hash-based key indexing with xxh3_64

Dependencies:

  • simd-r-drive-entry-handle (workspace)
  • dashmap - Lock-free concurrent hash map
  • memmap2 - Memory-mapped file access
  • xxhash-rust - Fast hashing with SIMD support
  • rayon (optional, with parallel feature)

Features:

  • default - No features enabled by default
  • expose-internal-api - Exposes internal APIs for testing/extensions
  • parallel - Enables parallel iteration with rayon
  • arrow - Proxies to simd-r-drive-entry-handle/arrow

Sources: Cargo.toml:11-56


simd-r-drive-entry-handle

Location: simd-r-drive-entry-handle/
Cargo Name: simd-r-drive-entry-handle

Provides the EntryHandle abstraction for zero-copy access to stored entries via memory-mapped files. This package is separated to allow optional Apache Arrow integration without requiring arrow dependencies in the core package.

Key Exports:

  • EntryHandle - Zero-copy entry accessor
  • EntryMetadata - Entry metadata structure (key_hash, prev_offset, crc32)

Dependencies:

  • memmap2 - Memory-mapped file access
  • crc32fast - CRC32 checksum validation
  • arrow (optional, with arrow feature) - Apache Arrow buffer integration

Features:

  • arrow - Enables zero-copy integration with Apache Arrow buffers

Sources: Cargo.toml83 Cargo.lock:1823-1829


simd-r-drive-extensions

Location: extensions/
Cargo Name: simd-r-drive-extensions

Utility functions and helpers built on top of the core storage engine, including alignment utilities, formatting helpers, and namespace hashing.

Key Exports:

  • align_or_copy - Memory alignment utilities
  • format_bytes - Human-readable byte formatting
  • NamespaceHasher - Namespace-based key hashing
  • File verification utilities

Dependencies:

  • simd-r-drive (workspace)
  • bincode - Serialization support
  • serde - Serialization framework

Sources: Cargo.toml:66-73 Cargo.lock:1832-1841


Experimental Network Components

simd-r-drive-muxio-service-definition

Location: experiments/simd-r-drive-muxio-service-definition/
Cargo Name: simd-r-drive-muxio-service-definition

Defines the RPC service contract (interface definition) for remote access to the storage engine. This serves as the shared contract between WebSocket clients and servers, ensuring type-safe communication.

Key Exports:

  • Service trait definitions for RPC operations
  • Request/response message types
  • Bitcode serialization schemas

Dependencies:

  • bitcode - Compact binary serialization
  • muxio-rpc-service - RPC service framework

Sources: Cargo.toml:66-73 Cargo.lock:1844-1849


simd-r-drive-ws-server

Location: experiments/simd-r-drive-ws-server/
Cargo Name: simd-r-drive-ws-server

WebSocket server implementation providing remote RPC access to a DataStore instance via the muxio framework.

Key Exports:

  • WebSocket server with RPC endpoint
  • Service implementation for simd-r-drive-muxio-service-definition

Dependencies:

  • simd-r-drive (workspace)
  • simd-r-drive-muxio-service-definition (workspace)
  • muxio-tokio-rpc-server - RPC server implementation
  • tokio - Async runtime
  • clap - CLI argument parsing

Sources: Cargo.toml:66-73 Cargo.lock:1866-1878


simd-r-drive-ws-client

Location: experiments/simd-r-drive-ws-client/
Cargo Name: simd-r-drive-ws-client

Rust WebSocket client for connecting to simd-r-drive-ws-server instances. Provides a native Rust client API matching the core DataStore interface but operating over the network.

Key Exports:

  • WsClient - WebSocket client implementation
  • Async methods mirroring DataStore API

Dependencies:

  • simd-r-drive (workspace)
  • simd-r-drive-muxio-service-definition (workspace)
  • muxio-tokio-rpc-client - RPC client implementation
  • tokio - Async runtime

Sources: Cargo.toml:66-73 Cargo.lock:1852-1863


Python Bindings (External Build System)

experiments/bindings/python

Location: experiments/bindings/python/
Build System: Maturin + PyO3

Direct Python bindings to the core simd-r-drive package using PyO3. Provides a Python API for local (in-process) access to the storage engine. This package is excluded from the Cargo workspace because it uses a separate pyproject.toml build configuration with maturin.

Key Exports:

  • Python DataStore class wrapping Rust implementation
  • Type stubs (.pyi files) for IDE support

Sources: Cargo.toml:74-77


experiments/bindings/python-ws-client

Location: experiments/bindings/python-ws-client/
Build System: Maturin + PyO3

Python bindings for the simd-r-drive-ws-client, enabling remote access to storage servers from Python via asyncio. Uses pyo3-async-runtimes to bridge Python’s asyncio with Rust’s tokio.

Key Exports:

  • DataStoreWsClient - Python async WebSocket client
  • Asyncio-compatible API
  • Type stubs for Python type checkers

Sources: Cargo.toml:74-77


Dependency Relationships

Sources: Cargo.toml:23-34 Cargo.toml:80-112 Cargo.lock:1795-1878


Workspace Dependency Management

The workspace defines shared dependencies in the [workspace.dependencies] section Cargo.toml:80-112 to ensure version consistency across all member crates:

Intra-Workspace Dependencies

Key External Dependencies

DependencyVersionPurpose
memmap20.9.5Memory-mapped file access
dashmap6.1.0Lock-free concurrent hashmap
xxhash-rust0.8.15Fast non-cryptographic hashing
crc32fast1.4.2CRC32 checksum validation
arrow57.0.0Apache Arrow integration (optional)
tokio1.45.1Async runtime (experimental features only)
bitcode0.6.6Compact binary serialization
rayon1.10.0Parallel iteration (optional)

Sources: Cargo.toml:80-112


Feature Flags

The root simd-r-drive package defines three feature flags Cargo.toml:49-55:

default

No features enabled by default. This keeps the core storage engine lightweight with minimal dependencies.

expose-internal-api

Exposes internal APIs that are normally private. Used for extension development and integration testing. Not intended for general use.

parallel

Enables parallel iteration capabilities using the rayon crate. When enabled, operations like iter_entries() can leverage multi-core parallelism for improved throughput on large datasets.

arrow

A proxy feature that enables simd-r-drive-entry-handle/arrow. This provides zero-copy integration with Apache Arrow buffers, allowing EntryHandle instances to be viewed as Arrow arrays without data copying.

Sources: Cargo.toml:49-55


Benchmarks

The root package defines two benchmark suites using Criterion.rs Cargo.toml:57-63:

storage_benchmark

Measures write throughput, read throughput, batch operations, and streaming performance for the core storage engine.

contention_benchmark

Measures performance under concurrent access patterns, testing the effectiveness of the lock-free index and concurrent read scalability.

Both benchmarks use harness = false to integrate with Criterion’s custom benchmark harness.

Sources: Cargo.toml:57-63


Version Management

All workspace members share a common version number 0.15.5-alpha managed through Cargo.toml3 The -alpha suffix indicates this is pre-release software under active development. The workspace uses semantic versioning, where:

  • Major version (0): Pre-1.0 indicating API instability
  • Minor version (15): Feature releases and API changes
  • Patch version (5): Bug fixes and minor improvements
  • Suffix (-alpha): Pre-release stability indicator

Sources: Cargo.toml3


File System Layout

The physical repository structure mirrors the logical package organization:

rust-simd-r-drive/
├── Cargo.toml                      # Workspace root
├── Cargo.lock                      # Dependency lock file
├── src/                            # simd-r-drive source
├── benches/                        # Benchmark suites
├── simd-r-drive-entry-handle/      # Entry handle crate
│   ├── Cargo.toml
│   └── src/
├── extensions/                     # Extensions crate
│   ├── Cargo.toml
│   └── src/
└── experiments/
    ├── simd-r-drive-muxio-service-definition/
    │   ├── Cargo.toml
    │   └── src/
    ├── simd-r-drive-ws-server/
    │   ├── Cargo.toml
    │   └── src/
    ├── simd-r-drive-ws-client/
    │   ├── Cargo.toml
    │   └── src/
    └── bindings/
        ├── python/                 # Excluded from workspace
        │   ├── pyproject.toml
        │   └── src/
        └── python-ws-client/       # Excluded from workspace
            ├── pyproject.toml
            └── src/

Sources: Cargo.toml:65-78


Summary Table: All Packages

| Package Name | Location | Type | Dependencies | Purpose | |—|—|—|—| | simd-r-drive | . | Core | simd-r-drive-entry-handle, dashmap, memmap2, xxhash-rust | Main storage engine | | simd-r-drive-entry-handle | simd-r-drive-entry-handle/ | Library | memmap2, crc32fast, arrow (opt) | Entry abstraction | | simd-r-drive-extensions | extensions/ | Library | simd-r-drive, bincode | Utility functions | | simd-r-drive-muxio-service-definition | experiments/... | Library | bitcode, muxio-rpc-service | RPC contract | | simd-r-drive-ws-server | experiments/... | Binary | Core + service-def + muxio-server | WebSocket server | | simd-r-drive-ws-client | experiments/... | Library | Core + service-def + muxio-client | WebSocket client | | Python bindings | experiments/bindings/python | PyO3 | simd-r-drive, pyo3 | Direct Python access | | Python WS client | experiments/bindings/python-ws-client | PyO3 | simd-r-drive-ws-client, pyo3 | Remote Python access |

Sources: Cargo.toml:1-112 Cargo.lock:1795-1878

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Storage Architecture

Loading…

Storage Architecture

Relevant source files

Purpose and Scope

This document describes the on-disk storage format used by SIMD R Drive, including the physical layout of entries, the alignment strategy, the backward-linked chain structure, and the recovery mechanism that ensures data integrity after crashes or incomplete writes.

For information about the in-memory data structures and API, see DataStore API. For details about entry metadata fields, see Entry Structure and Metadata. For information about memory-mapped access patterns, see Memory Management and Zero-Copy Access.

Single-File Storage Container

SIMD R Drive stores all data in a single binary file with an append-only design. The storage engine writes sequentially to minimize disk seeks and maximize throughput. Each write operation appends a new entry to the end of the file, and the file position is tracked using the AtomicU64 tail_offset field in DataStore.

Sources: README.md:61-97 src/storage_engine/data_store.rs:27-33

graph LR
    FILE["Single Binary File\n(*.simd-r-drive)"]
ENTRY1["Entry 1\n(Pre-pad + Payload + Metadata)"]
ENTRY2["Entry 2\n(Pre-pad + Payload + Metadata)"]
ENTRY3["Entry 3\n(Pre-pad + Payload + Metadata)"]
TAIL["tail_offset\n(AtomicU64)"]
FILE --> ENTRY1
 
   ENTRY1 --> ENTRY2
 
   ENTRY2 --> ENTRY3
 
   ENTRY3 --> TAIL

Entry Layout

Each entry in the storage file consists of three components: optional pre-padding for alignment, the payload data, and metadata. The layout differs between non-tombstone entries (data) and tombstone entries (deletion markers).

Non-Tombstone Entry Structure

Non-tombstone entries store actual data and are aligned to PAYLOAD_ALIGNMENT (64 bytes by default). The alignment ensures cache-line efficiency and enables zero-copy access for typed slices.

Physical Layout Table:

graph LR
    PREV_TAIL["Previous\ntail_offset"]
PREPAD["Pre-Pad\n0-63 bytes\n(zero bytes)"]
PAYLOAD["Payload\nVariable Length\n(actual data)"]
METADATA["EntryMetadata\n20 bytes"]
NEXT_TAIL["New\ntail_offset"]
PREV_TAIL --> PREPAD
 
   PREPAD --> PAYLOAD
 
   PAYLOAD --> METADATA
 
   METADATA --> NEXT_TAIL
Offset RangeFieldSize (Bytes)Description
P .. P+padPre-PadpadZero bytes calculated as (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)
P+pad .. NPayloadN-(P+pad)Variable-length data, starts at aligned boundary
N .. N+8key_hash864-bit XXH3 hash of the key
N+8 .. N+16prev_offset8Absolute file offset of previous entry’s tail
N+16 .. N+20checksum4CRC32C checksum of the payload

Where:

  • pad = DataStore::prepad_len(prev_tail) computed at write time
  • PAYLOAD_ALIGNMENT = 64 (defined in simd-r-drive-entry-handle/src/constants.rs)
  • Next entry starts at N + 20

Sources: README.md:112-125 simd-r-drive-entry-handle/src/entry_metadata.rs:11-23 src/storage_engine/data_store.rs:666-673

Entry Metadata Structure

The EntryMetadata struct stores three critical fields in exactly 20 bytes:

Field Purposes:

FieldTypePurpose
key_hashu64XXH3 hash of the key for index lookups
prev_offsetu64Absolute file offset pointing to the previous entry’s tail (forms backward chain)
checksum[u8; 4]CRC32C checksum of payload for integrity verification

The metadata is serialized using little-endian encoding via EntryMetadata::serialize() and deserialized via EntryMetadata::deserialize().

Sources: simd-r-drive-entry-handle/src/entry_metadata.rs:44-113

Tombstone Entry Structure

Tombstone entries mark deleted keys. They consist of a single null byte (NULL_BYTE[0] = 0x00) followed by metadata, with no pre-padding.

Physical Layout Table:

graph LR
    PREV_TAIL["Previous\ntail_offset"]
NULL["NULL_BYTE\n1 byte\n(0x00)"]
METADATA["EntryMetadata\n20 bytes"]
NEXT_TAIL["New\ntail_offset"]
PREV_TAIL --> NULL
 
   NULL --> METADATA
 
   METADATA --> NEXT_TAIL
Offset RangeFieldSize (Bytes)Description
T .. T+1Payload1Single byte 0x00 (NULL_BYTE)
T+1 .. T+9key_hash8Hash of the deleted key
T+9 .. T+17prev_offset8Previous entry’s tail offset
T+17 .. T+21checksum4CRC32C of the null byte

Tombstones are written using DataStore::batch_write_with_key_hashes() with the allow_null_bytes parameter set to true. The deletion logic filters existing keys before writing tombstones to avoid unnecessary I/O.

Sources: README.md:126-132 src/storage_engine/data_store.rs:863-897 src/storage_engine/data_store.rs:990-1024

Alignment Strategy

The pre-padding mechanism ensures that every non-tombstone payload starts at a 64-byte aligned boundary. This alignment is critical for:

  1. Cache-line efficiency : Payloads align with CPU cache lines (typically 64 bytes)
  2. SIMD operations : Vectorized loads/stores can operate without crossing boundaries
  3. Zero-copy typed access : Enables safe reinterpretation as typed slices (e.g., &[u64])

Alignment Calculation

The DataStore::prepad_len() function implements this calculation:

fn prepad_len(offset: u64) -> usize {
    let a = PAYLOAD_ALIGNMENT;
    ((a - (offset % a)) & (a - 1)) as usize
}

During writes, the code checks if pre-padding is needed and writes zero bytes before the payload:

Sources: src/storage_engine/data_store.rs:666-673 README.md:52-59

Backward-Linked Chain Structure

Each entry’s prev_offset field in EntryMetadata points to the absolute file offset of the previous entry’s tail, forming a backward-linked chain. This chain enables:

  1. Iteration : Walking entries from end to beginning
  2. Recovery : Validating chain integrity
  3. Alignment derivation : Computing payload start from previous tail

Chain Traversal During Reads

When reading an entry from the index:

  1. Look up key_hash in KeyIndexer to get metadata offset
  2. Read EntryMetadata at that offset
  3. Extract prev_offset (previous tail)
  4. Calculate payload start: prev_offset + DataStore::prepad_len(prev_offset)
  5. Payload ends at metadata offset

This logic is implemented in DataStore::read_entry_with_context() at src/storage_engine/data_store.rs:501-565

Chain Traversal Diagram:

Sources: src/storage_engine/data_store.rs:501-565 simd-r-drive-entry-handle/src/entry_metadata.rs:40-43

graph TD
    START["Start at file_len"]
CHECK_SIZE{"file_len <\nMETADATA_SIZE?"}
RETURN_ZERO["Return Ok(0)"]
INIT["cursor = file_len\nbest_valid_offset = None"]
LOOP{"cursor >=\nMETADATA_SIZE?"}
READ_META["Read metadata at\n(cursor - METADATA_SIZE)"]
EXTRACT["Extract prev_offset\nDerive entry_start"]
VALIDATE{"entry_start <\nmetadata_offset?"}
WALK["Walk chain backward\nvia prev_offset"]
CHAIN_VALID{"Chain reaches\noffset 0?"}
SET_BEST["best_valid_offset =\ncursor"]
BREAK["Break loop"]
DECREMENT["cursor -= 1"]
RETURN_BEST["Return\nbest_valid_offset\nor 0"]
START --> CHECK_SIZE
 
   CHECK_SIZE -->|Yes| RETURN_ZERO
 
   CHECK_SIZE -->|No| INIT
 
   INIT --> LOOP
 
   LOOP -->|Yes| READ_META
 
   LOOP -->|No| RETURN_BEST
 
   READ_META --> EXTRACT
 
   EXTRACT --> VALIDATE
 
   VALIDATE -->|No| DECREMENT
 
   VALIDATE -->|Yes| WALK
 
   WALK --> CHAIN_VALID
 
   CHAIN_VALID -->|Yes| SET_BEST
 
   CHAIN_VALID -->|No| DECREMENT
 
   SET_BEST --> BREAK
 
   BREAK --> RETURN_BEST
 
   DECREMENT --> LOOP

Recovery Mechanism

The DataStore::recover_valid_chain() function validates chain integrity when opening a file. It scans backward from the file end to find the deepest valid chain that reaches offset 0, automatically recovering from incomplete writes.

Recovery Algorithm

Recovery Process Steps

  1. Initial Check : If file is smaller than METADATA_SIZE (20 bytes), return offset 0
  2. Backward Scan : Start from file_len and scan backward by 1 byte at a time
  3. Metadata Read : At each position, attempt to read metadata at cursor - METADATA_SIZE
  4. Entry Validation :
    • Extract prev_offset from metadata
    • Calculate expected entry start using DataStore::prepad_len(prev_offset)
    • Handle tombstone special case (single null byte without pre-pad)
    • Verify entry_start < metadata_offset
  5. Chain Walk : For valid entry, walk entire chain backward:
    • Follow prev_offset links
    • Validate each link points to earlier offset
    • Track total chain size
    • Stop when prev_offset = 0 (chain start)
  6. Validation : Chain is valid if:
    • All links point backward (no cycles)
    • Chain reaches offset 0
    • Total chain size ≤ file length
  7. Result : Return the first valid chain found (deepest chain)

Tombstone Handling in Recovery

Tombstones have special handling during recovery because they lack pre-padding:

if entry_end > prev_tail
    && entry_end - prev_tail == 1
    && mmap[prev_tail..entry_end] == NULL_BYTE
{
    entry_start = prev_tail  // No pre-pad for tombstone
} else {
    entry_start = prev_tail + prepad_len(prev_tail)
}

This logic appears at:

Sources: src/storage_engine/data_store.rs:363-482 README.md:139-148

Recovery on File Open

The DataStore::open() function performs recovery automatically:

If recovery detects corruption (incomplete chain), the file is truncated to the last valid offset and reopened. This ensures the storage is always in a consistent state.

Sources: src/storage_engine/data_store.rs:84-117 README.md:150-156

Entry Size Calculation

Each entry’s total file size includes pre-padding, payload, and metadata. The calculation depends on entry type:

Non-Tombstone Entry:

total_size = prepad_len(prev_tail) + payload.len() + METADATA_SIZE

Tombstone Entry:

total_size = 1 + METADATA_SIZE  // No pre-pad

The EntryHandle::file_size() method computes this from the entry’s range and metadata. For iteration and compaction, this allows precise tracking of storage usage.

Sources: README.md:112-132 src/storage_engine/data_store.rs:705-749

File Growth and Tail Tracking

The tail_offset field tracks the current end of valid data:

The atomic store uses Ordering::Release to ensure visibility across threads. Writers acquire tail_offset with Ordering::Acquire before computing pre-padding.

Sources: src/storage_engine/data_store.rs256 src/storage_engine/data_store.rs763 src/storage_engine/data_store.rs858

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

DataStore API

Loading…

DataStore API

Relevant source files

This page documents the public API of the DataStore struct and its associated traits DataStoreReader and DataStoreWriter. These interfaces provide the primary methods for interacting with the storage engine, including write, read, delete, batch operations, and streaming methods.

Scope : This page covers the application-level API methods available to users of the storage engine. For details on the underlying storage format, see Entry Structure and Metadata. For implementation details of concurrency mechanisms, see Concurrency and Thread Safety. For key hashing internals, see Key Indexing and Hashing.


API Architecture

The DataStore API is organized around a core DataStore struct with two trait-based interfaces:

Sources : src/storage_engine/data_store.rs:26-33 src/storage_engine/traits.rs src/storage_engine.rs21

graph TB
    subgraph "Public API"
        DS["DataStore"]
DSR["DataStoreReader trait"]
DSW["DataStoreWriter trait"]
end
    
    subgraph "Core Operations"
        WRITE["Write Operations\nwrite()\nbatch_write()\nwrite_stream()"]
READ["Read Operations\nread()\nbatch_read()\nread_last_entry()"]
DELETE["Delete Operations\ndelete()\nbatch_delete()"]
MANAGE["Management Operations\nrename()\ncopy()\ntransfer()"]
ITER["Iteration\niter_entries()\npar_iter_entries()"]
end
    
    subgraph "Internal Components"
        FILE["Arc<RwLock<BufWriter<File>>>"]
MMAP["Arc<Mutex<Arc<Mmap>>>"]
INDEXER["Arc<RwLock<KeyIndexer>>"]
TAIL["AtomicU64 tail_offset"]
end
    
 
   DS --> DSR
 
   DS --> DSW
    
 
   DSR --> READ
 
   DSR --> ITER
 
   DSW --> WRITE
 
   DSW --> DELETE
 
   DSW --> MANAGE
    
 
   DS --> FILE
 
   DS --> MMAP
 
   DS --> INDEXER
 
   DS --> TAIL
    
 
   WRITE --> FILE
 
   WRITE --> INDEXER
 
   WRITE --> TAIL
 
   READ --> MMAP
 
   READ --> INDEXER
 
   DELETE --> FILE
 
   DELETE --> INDEXER

DataStore Struct

The DataStore struct is the primary interface for interacting with the storage engine. It encapsulates file I/O, memory mapping, key indexing, and concurrency control.

Core Fields

FieldTypePurpose
fileArc<RwLock<BufWriter<File>>>Buffered file writer protected by read-write lock for synchronized writes
mmapArc<Mutex<Arc<Mmap>>>Memory-mapped file reference wrapped in mutex to prevent unsafe remapping
tail_offsetAtomicU64Current end-of-file offset, atomically updated for lock-free reads
key_indexerArc<RwLock<KeyIndexer>>Hash-based index mapping key hashes to file offsets
pathPathBufFile system path to the storage file

Sources : src/storage_engine/data_store.rs:26-33

Creation Methods

Sources : src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:141-144

Opening Storage

MethodSignatureBehavior
open()pub fn open(path: &Path) -> Result<Self>Opens existing storage or creates new file if not present
open_existing()pub fn open_existing(path: &Path) -> Result<Self>Opens only existing files, returns error if file does not exist
from()impl From<PathBuf> for DataStoreConvenience constructor, panics on failure

Sources : src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:141-144 src/storage_engine/data_store.rs:53-64


Write Operations

Write operations are defined by the DataStoreWriter trait and implemented for DataStore. All write methods return Result<u64> where the u64 is the new tail offset after writing.

Sources : src/storage_engine/data_store.rs:752-939

graph TB
    subgraph "Write API Methods"
        W1["write(key, payload)"]
W2["batch_write(entries)"]
W3["write_stream(key, reader)"]
W4["write_with_key_hash(hash, payload)"]
W5["batch_write_with_key_hashes(entries)"]
W6["write_stream_with_key_hash(hash, reader)"]
end
    
    subgraph "Internal Write Path"
        LOCK["Acquire RwLock<File>"]
HASH["compute_hash()
or\ncompute_hash_batch()"]
ALIGN["Calculate prepad_len()"]
BUFFER["Buffer construction"]
SIMD["simd_copy()
for payload"]
META["EntryMetadata serialization"]
FLUSH["file.flush()"]
REINDEX["reindex()"]
end
    
 
   W1 --> HASH
 
   W2 --> HASH
 
   W3 --> HASH
 
   HASH --> W4
 
   HASH --> W5
 
   HASH --> W6
    
 
   W4 --> LOCK
 
   W5 --> LOCK
 
   W6 --> LOCK
    
 
   LOCK --> ALIGN
 
   ALIGN --> BUFFER
 
   BUFFER --> SIMD
 
   SIMD --> META
 
   META --> FLUSH
 
   FLUSH --> REINDEX
    
 
   REINDEX --> MMAP_UPDATE["Update mmap Arc"]
REINDEX --> INDEX_UPDATE["Update KeyIndexer"]
REINDEX --> TAIL_UPDATE["Update AtomicU64"]

Single Entry Write

Method : write(key: &[u8], payload: &[u8]) -> Result<u64>

Writes a single key-value pair atomically. The write is immediately flushed to disk.

Implementation details :

  • Computes XXH3 hash of key using compute_hash()
  • Delegates to write_with_key_hash()
  • Internally uses batch_write_with_key_hashes() with single entry
  • Calculates 64-byte alignment padding via prepad_len()
  • Uses simd_copy() for payload transfer
  • Appends EntryMetadata (20 bytes)
  • Calls reindex() to update mmap and index

Sources : src/storage_engine/data_store.rs:827-834

Batch Write

Method : batch_write(entries: &[(&[u8], &[u8])]) -> Result<u64>

Writes multiple key-value pairs in a single locked operation. Reduces disk I/O overhead by buffering all entries and flushing once at the end.

Process :

  1. Computes hashes for all keys via compute_hash_batch()
  2. Acquires write lock once for entire batch
  3. Builds buffer with aligned entries
  4. Writes buffer to file with single write_all()
  5. Updates index with all new mappings atomically

Sources : src/storage_engine/data_store.rs:838-843 src/storage_engine/data_store.rs:847-939

Streaming Write

Method : write_stream<R: Read>(key: &[u8], reader: &mut R) -> Result<u64>

Writes data from a Read source without requiring full in-memory allocation. Suitable for large payloads that exceed available memory.

Characteristics :

  • Uses fixed 8KB buffer (WRITE_STREAM_BUFFER_SIZE)
  • Reads chunks incrementally from source
  • Computes CRC32C checksum while streaming
  • Validates that payload is non-empty and not null-only
  • Immediately flushes after completion

Sources : src/storage_engine/data_store.rs:753-825

Pre-hashed Write Methods

For performance optimization when keys are reused, the API provides methods accepting pre-computed hashes:

MethodDescription
write_with_key_hash()Single write with pre-computed hash
batch_write_with_key_hashes()Batch write with pre-computed hashes
write_stream_with_key_hash()Streaming write with pre-computed hash

These skip the hashing step and proceed directly to storage operations.

Sources : src/storage_engine/data_store.rs:832-834 src/storage_engine/data_store.rs:847-939 src/storage_engine/data_store.rs:758-825


graph TB
    subgraph "Read API Methods"
        R1["read(key)"]
R2["batch_read(keys)"]
R3["read_last_entry()"]
R4["read_with_key_hash(hash)"]
R5["batch_read_hashed_keys(hashes)"]
R6["read_metadata(key)"]
R7["exists(key)"]
end
    
    subgraph "Internal Read Path"
        HASH_KEY["compute_hash()
or\ncompute_hash_batch()"]
INDEXER_READ["key_indexer.read()"]
MMAP_CLONE["get_mmap_arc()"]
UNPACK["KeyIndexer::unpack(packed)"]
TAG_CHECK["Verify 16-bit tag"]
BOUNDS["Bounds checking"]
PREPAD["Derive entry_start from\nprev_offset + prepad_len()"]
HANDLE["Construct EntryHandle"]
end
    
 
   R1 --> HASH_KEY
 
   R2 --> HASH_KEY
 
   HASH_KEY --> R4
 
   HASH_KEY --> R5
    
 
   R4 --> INDEXER_READ
 
   R5 --> INDEXER_READ
 
   R7 --> R1
    
 
   INDEXER_READ --> MMAP_CLONE
 
   MMAP_CLONE --> UNPACK
 
   UNPACK --> TAG_CHECK
 
   TAG_CHECK --> BOUNDS
 
   BOUNDS --> PREPAD
 
   PREPAD --> HANDLE
    
 
   R3 --> MMAP_CLONE
 
   R6 --> R1

Read Operations

Read operations are defined by the DataStoreReader trait. All reads are zero-copy when possible, returning EntryHandle references to memory-mapped regions.

Sources : src/storage_engine/data_store.rs:1027-1182

Single Entry Read

Method : read(key: &[u8]) -> Result<Option<EntryHandle>>

Retrieves a single entry by key. Returns None if key does not exist or is deleted (tombstone).

Implementation :

  • Computes key hash via compute_hash()
  • Acquires read lock on key_indexer
  • Looks up packed (tag, offset) value
  • Verifies 16-bit tag to detect hash collisions
  • Derives entry boundaries from prev_offset and prepad_len()
  • Returns EntryHandle with zero-copy access to payload

Sources : src/storage_engine/data_store.rs:1040-1049 src/storage_engine/data_store.rs:501-565

Batch Read

Method : batch_read(keys: &[&[u8]]) -> Result<Vec<Option<EntryHandle>>>

Reads multiple entries in a single index lock acquisition. More efficient than individual reads when processing multiple keys.

Process :

  1. Computes all key hashes via compute_hash_batch()
  2. Acquires single read lock on indexer
  3. Performs lookup for each hash
  4. Verifies tags for collision detection
  5. Returns vector preserving input order

Sources : src/storage_engine/data_store.rs:1105-1109 src/storage_engine/data_store.rs:1111-1158

Read Last Entry

Method : read_last_entry() -> Result<Option<EntryHandle>>

Retrieves the most recently written entry without requiring a key lookup. Uses tail_offset to locate the last metadata block.

Use case : Useful for sequential processing or determining the latest state.

Sources : src/storage_engine/data_store.rs:1061-1103

Metadata Read

Method : read_metadata(key: &[u8]) -> Result<Option<EntryMetadata>>

Retrieves only the metadata (key hash, previous offset, checksum) without accessing the payload. More efficient when only metadata is needed.

Sources : src/storage_engine/data_store.rs:1160-1162

Existence Check

Method : exists(key: &[u8]) -> Result<bool>

Checks if a key exists without retrieving the full entry. Lightweight operation that only performs index lookup and tag verification.

Sources : src/storage_engine/data_store.rs:1030-1032

Read Operations Summary

MethodReturnsLock DurationUse Case
read()Option<EntryHandle>Single index read lockStandard single-key retrieval
batch_read()Vec<Option<EntryHandle>>Single index read lockMultiple keys, order preserved
read_last_entry()Option<EntryHandle>No index lock requiredSequential or state check
read_metadata()Option<EntryMetadata>Single index read lockMetadata only, no payload
exists()boolSingle index read lockFast existence check

Sources : src/storage_engine/data_store.rs:1027-1182


graph LR
 
   DELETE_API["delete(key)\nbatch_delete(keys)"] --> HASH["compute_hash_batch()"]
HASH --> CHECK_EXISTS["Filter existing keys\nvia key_indexer.read()"]
CHECK_EXISTS --> TOMBSTONE["Create (hash, NULL_BYTE)\npairs"]
TOMBSTONE --> BATCH_WRITE["batch_write_with_key_hashes()\nwith allow_null_bytes=true"]
BATCH_WRITE --> UPDATE_INDEX["reindex()
removes\nkeys from index"]

Delete Operations

Delete operations write tombstone entries (single null byte + metadata) to mark keys as deleted. The append-only model means deletions do not reclaim space until compaction.

Sources : src/storage_engine/data_store.rs:986-1024

Single Delete

Method : delete(key: &[u8]) -> Result<u64>

Deletes a single key by writing a tombstone entry. Internally delegates to batch_delete() with a single key.

Sources : src/storage_engine/data_store.rs:986-988

Batch Delete

Method : batch_delete(keys: &[&[u8]]) -> Result<u64>

Deletes multiple keys in a single operation. Optimized to skip keys that don’t exist, avoiding unnecessary tombstone writes.

Process :

  1. Hashes all keys via compute_hash_batch()
  2. Filters to only keys present in index
  3. Constructs tombstone entries (NULL_BYTE + metadata)
  4. Calls batch_write_with_key_hashes() with allow_null_bytes=true
  5. Index updated to remove deleted keys

Sources : src/storage_engine/data_store.rs:990-1024

Pre-hashed Delete

Method : batch_delete_key_hashes(prehashed_keys: &[u64]) -> Result<u64>

Deletes keys using pre-computed hashes. Useful when hashes are already available from previous operations.

Sources : src/storage_engine/data_store.rs:995-1024


Entry Management Operations

These operations combine read, write, and delete to provide higher-level functionality for managing entries across storage instances.

Rename

Method : rename(old_key: &[u8], new_key: &[u8]) -> Result<u64>

Renames a key by:

  1. Reading the entry at old_key
  2. Creating an EntryStream from it
  3. Writing to new_key via write_stream()
  4. Deleting old_key

Constraint : old_key must exist and must differ from new_key.

Sources : src/storage_engine/data_store.rs:941-958

Copy

Method : copy(key: &[u8], target: &DataStore) -> Result<u64>

Copies an entry from the current storage to a different DataStore instance. The source entry remains unchanged.

Process :

  1. Reads entry from source
  2. Extracts payload and metadata
  3. Writes to target using write_stream_with_key_hash()
  4. Preserves original key hash

Constraint : Source and target must be different storage files.

Sources : src/storage_engine/data_store.rs:960-979 src/storage_engine/data_store.rs:587-590

Transfer

Method : transfer(key: &[u8], target: &DataStore) -> Result<u64>

Moves an entry from the current storage to a different instance by copying then deleting from source.

Equivalent to : copy() followed by delete()

Sources : src/storage_engine/data_store.rs:981-984

Entry Management Summary

OperationSource ModifiedTarget ModifiedUse Case
rename()Yes (old deleted, new added)N/ASame storage, different key
copy()NoYes (entry added)Cross-storage duplication
transfer()Yes (entry deleted)Yes (entry added)Cross-storage migration

Sources : src/storage_engine/data_store.rs:941-984


graph TB
    subgraph "Iteration Methods"
        ITER_OWNED["into_iter()\n(consumes DataStore)"]
ITER_REF["iter_entries()\n(borrows DataStore)"]
PAR_ITER["par_iter_entries()\n(parallel, requires 'parallel' feature)"]
end
    
    subgraph "EntryIterator Implementation"
        CURSOR["cursor: u64 = tail_offset"]
SEEN_KEYS["seen_keys: HashSet<u64>"]
NEXT["next()
method"]
METADATA["Read EntryMetadata"]
PREPAD_CALC["Derive entry_start from\nprev_offset + prepad_len()"]
SKIP_DUPE["Skip if key_hash in seen_keys"]
SKIP_TOMB["Skip if entry is NULL_BYTE"]
EMIT["Emit EntryHandle"]
end
    
    subgraph "Parallel Iterator"
        COLLECT["Collect key_indexer offsets"]
PAR_MAP["Rayon par_iter()"]
FILTER_MAP["filter_map constructs\nEntryHandle per thread"]
end
    
 
   ITER_OWNED --> ITER_REF
 
   ITER_REF --> CURSOR
 
   ITER_REF --> SEEN_KEYS
    
 
   CURSOR --> NEXT
 
   NEXT --> METADATA
 
   METADATA --> PREPAD_CALC
 
   PREPAD_CALC --> SKIP_DUPE
 
   SKIP_DUPE --> SKIP_TOMB
 
   SKIP_TOMB --> EMIT
    
 
   PAR_ITER --> COLLECT
 
   COLLECT --> PAR_MAP
 
   PAR_MAP --> FILTER_MAP

Iteration and Traversal

The DataStore provides multiple methods for iterating over all valid entries in the storage.

Sources : src/storage_engine/data_store.rs:269-361 src/storage_engine/entry_iterator.rs:8-127

Sequential Iteration

Method : iter_entries() -> EntryIterator

Returns an iterator that traverses all valid entries sequentially. The iterator:

  • Starts at tail_offset and walks backward via prev_offset chain
  • Tracks seen key hashes to ensure only latest versions are returned
  • Filters out tombstone entries automatically
  • Returns EntryHandle objects with zero-copy access

Sources : src/storage_engine/data_store.rs:276-280 src/storage_engine/entry_iterator.rs:41-47

Consuming Iteration

Trait : impl IntoIterator for DataStore

Allows consuming a DataStore instance to produce an iterator:

Internally delegates to iter_entries().

Sources : src/storage_engine/data_store.rs:44-50

Parallel Iteration

Method : par_iter_entries() -> impl ParallelIterator<Item = EntryHandle>

Feature gate : Requires parallel feature flag.

Provides Rayon-powered parallel iteration for high-throughput processing on multi-core systems.

Implementation strategy :

  1. Acquires read lock on key_indexer briefly
  2. Collects all packed offset values into a Vec<u64>
  3. Releases lock immediately
  4. Creates parallel iterator over collected offsets
  5. Constructs EntryHandle objects in parallel threads

Performance : Ideal for bulk operations like analytics, caching, or transformation pipelines.

Sources : src/storage_engine/data_store.rs:296-361

Iteration Comparison

MethodOwnershipConcurrencyLock Hold TimeUse Case
iter_entries()BorrowsSequentialLock per next() callGeneral-purpose scanning
into_iter()ConsumesSequentialLock per next() callOne-time full traversal
par_iter_entries()BorrowsParallelBrief upfront lockHigh-throughput processing

Sources : src/storage_engine/data_store.rs:44-50 src/storage_engine/data_store.rs:276-280 src/storage_engine/data_store.rs:296-361


Utility and Maintenance Methods

File Information

MethodReturnsDescription
len()Result<usize>Number of unique keys in storage (excludes deleted)
is_empty()Result<bool>Returns true if no keys exist
file_size()Result<u64>Total size of storage file in bytes
get_path()PathBufReturns path to storage file

Sources : src/storage_engine/data_store.rs:1164-1181 src/storage_engine/data_store.rs:265-267

Compaction

Method : compact(&mut self) -> Result<()>

Reclaims disk space by creating a new storage file containing only the latest version of each key. Tombstone entries are excluded.

Process :

  1. Creates temporary backup file with .bk extension
  2. Iterates through iter_entries() (which returns only latest versions)
  3. Copies each entry via copy_handle()
  4. Swaps temporary file with original via std::fs::rename()

Thread safety warning : Should only be called when no other threads are accessing the storage. The &mut self requirement prevents concurrent mutations but does not prevent reads if the instance is wrapped in Arc<DataStore>.

Sources : src/storage_engine/data_store.rs:706-749

Compaction Estimation

Method : estimate_compaction_savings() -> u64

Calculates potential space savings from compaction without performing the operation. Returns the difference between total file size and the size needed for unique entries only.

Sources : src/storage_engine/data_store.rs:605-616


Internal Support Methods

These methods support the public API but are not directly exposed to users.

Reindexing

Method : reindex()

Called after every write operation to:

  1. Re-map the file via init_mmap() to include new data
  2. Update key_indexer with new key-to-offset mappings
  3. Update tail_offset atomically

Acquires locks on both mmap and key_indexer to ensure consistency.

Sources : src/storage_engine/data_store.rs:224-259

Entry Context Reading

Method : read_entry_with_context()

Internal helper centralizing read logic for both read() and batch_read(). Parameters include the key hash, mmap reference, and indexer guard. Performs:

  • Index lookup
  • Tag verification (if original key provided)
  • Bounds checking
  • Tombstone detection
  • EntryHandle construction

Sources : src/storage_engine/data_store.rs:501-565

Recovery Chain Validation

Method : recover_valid_chain()

Called during open() to validate storage file integrity. Walks backward through the file following prev_offset chains until reaching offset 0. Truncates file if incomplete write detected.

Sources : src/storage_engine/data_store.rs:383-482

Alignment Calculation

Method : prepad_len(offset: u64) -> usize

Computes padding bytes required to align offset to PAYLOAD_ALIGNMENT (64 bytes). Uses bitwise operations for efficiency:

pad = (A - (offset % A)) & (A - 1)

Sources : src/storage_engine/data_store.rs:669-673


API Patterns and Conventions

Return Value Pattern

Most write operations return Result<u64> where the u64 is the new tail_offset after the operation. This allows chaining operations or validating expected file growth.

Error Handling

The API uses std::io::Result<T> consistently. Common error cases:

  • InvalidInput: Empty payloads, null-byte-only payloads, invalid rename
  • NotFound: Key does not exist (for operations requiring existing keys)
  • Lock poisoning errors (converted to std::io::Error)

Pre-hashed Key Methods

Many operations offer both standard and pre-hashed variants:

  • Standard: write(key, payload) - computes hash internally
  • Pre-hashed: write_with_key_hash(hash, payload) - uses provided hash

Pre-hashed methods enable optimization when keys are reused across multiple operations.

Batch Operations Benefit

Batch methods acquire locks once for the entire batch, significantly reducing overhead:

  • batch_write() vs multiple write() calls
  • batch_read() vs multiple read() calls
  • batch_delete() vs multiple delete() calls

Sources : src/storage_engine/data_store.rs:752-1182


Trait Implementations

DataStoreReader Trait

Defines read-only operations. Associated type EntryHandleType allows flexibility in handle implementation.

Implementors : DataStore

Key methods : read(), batch_read(), read_last_entry(), read_metadata(), exists(), len(), is_empty(), file_size()

Sources : src/storage_engine/traits.rs

DataStoreWriter Trait

Defines mutating operations. All methods take &self (not &mut self) because internal synchronization via RwLock enables safe concurrent access.

Implementors : DataStore

Key methods : write(), batch_write(), write_stream(), delete(), batch_delete(), rename(), copy(), transfer()

Sources : src/storage_engine/traits.rs

From Trait

Convenience constructor that panics on failure:

Sources : src/storage_engine/data_store.rs:53-64

IntoIterator Trait

Allows consuming iteration over storage entries. Returns EntryIterator as the iterator type.

Sources : src/storage_engine/data_store.rs:44-50

Dismiss

Refresh this wiki

Enter email to refresh

On this page

  • DataStore API
  • API Architecture
  • DataStore Struct
  • Core Fields
  • Creation Methods
  • Opening Storage
  • Write Operations
  • Single Entry Write
  • Batch Write
  • Streaming Write
  • Pre-hashed Write Methods
  • Read Operations
  • Single Entry Read
  • Batch Read
  • Read Last Entry
  • Metadata Read
  • Existence Check
  • Read Operations Summary
  • Delete Operations
  • Single Delete
  • Batch Delete
  • Pre-hashed Delete
  • Entry Management Operations
  • Rename
  • Copy
  • Transfer
  • Entry Management Summary
  • Iteration and Traversal
  • Sequential Iteration
  • Consuming Iteration
  • Parallel Iteration
  • Iteration Comparison
  • Utility and Maintenance Methods
  • File Information
  • Compaction
  • Compaction Estimation
  • Internal Support Methods
  • Reindexing
  • Entry Context Reading
  • Recovery Chain Validation
  • Alignment Calculation
  • API Patterns and Conventions
  • Return Value Pattern
  • Error Handling
  • Pre-hashed Key Methods
  • Batch Operations Benefit
  • Trait Implementations
  • DataStoreReader Trait
  • DataStoreWriter Trait
  • From Trait
  • IntoIterator Trait

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Entry Structure and Metadata

Loading…

Entry Structure and Metadata

Relevant source files

Purpose and Scope

This document details the on-disk binary layout of entries in the SIMD R Drive storage engine. It covers the structure of aligned entries, tombstones, metadata fields, and the alignment strategy that enables zero-copy access.

For information about how entries are read and accessed in memory, see Memory Management and Zero-Copy Access. For details on the validation chain and recovery mechanisms, see Storage Architecture.


On-Disk Entry Layout Overview

Every entry written to the storage file consists of three components:

  1. Pre-Pad Bytes (optional, 0-63 bytes) - Zero bytes inserted to ensure the payload starts at a 64-byte boundary
  2. Payload - Variable-length binary data
  3. Metadata - Fixed 20-byte structure containing key hash, previous offset, and checksum

The exception is tombstones (deletion markers), which use a minimal 1-byte payload with no pre-padding.

Sources: README.md:104-137 simd-r-drive-entry-handle/src/entry_metadata.rs:9-37


Aligned Entry Structure

Entry Layout Table

Offset RangeFieldSize (Bytes)Description
P .. P+padPre-Pad (optional)padZero bytes to align payload start
P+pad .. NPayloadN-(P+pad)Variable-length data
N .. N+8Key Hash864-bit XXH3 key hash
N+8 .. N+16Prev Offset8Absolute offset of previous tail
N+16 .. N+20Checksum4CRC32C of payload

Where:

  • pad = (A - (prev_tail % A)) & (A - 1), with A = PAYLOAD_ALIGNMENT (64 bytes)
  • The next entry starts at offset N + 20

Aligned Entry Structure Diagram

Sources: README.md:112-137 simd-r-drive-entry-handle/src/entry_metadata.rs:11-23


Tombstone Structure

Tombstones are special deletion markers that do not require payload alignment. They consist of a single zero byte followed by the standard 20-byte metadata structure.

Tombstone Layout Table

Offset RangeFieldSize (Bytes)Description
T .. T+1Payload1Single byte 0x00
T+1 .. T+21Metadata20Key hash, prev, crc32c

Tombstone Structure Diagram

Sources: README.md:126-131 simd-r-drive-entry-handle/src/entry_metadata.rs:25-30


EntryMetadata Structure

The EntryMetadata struct represents the fixed 20-byte metadata block that follows every payload. It is defined in #[repr(C)] layout to ensure consistent binary representation.

graph TB
    subgraph EntryMetadataStruct["EntryMetadata struct"]
field1["key_hash: u64\n8 bytes\nXXH3_64 hash"]
field2["prev_offset: u64\n8 bytes\nbackward chain link"]
field3["checksum: [u8; 4]\n4 bytes\nCRC32C payload checksum"]
end
    
 
   field1 --> field2
 
   field2 --> field3
    
    note4["Serialized at offset N\nfollowing payload"]
note5["Total: METADATA_SIZE = 20"]
field1 -.-> note4
 
   field3 -.-> note5

Metadata Fields

Field Descriptions

key_hash: u64 (8 bytes, offset N .. N+8)

  • 64-bit XXH3 hash of the key
  • Used by KeyIndexer for O(1) lookups
  • Combined with a tag for collision detection
  • Hardware-accelerated via SSE2/AVX2/NEON

prev_offset: u64 (8 bytes, offset N+8 .. N+16)

  • Absolute file offset of the previous entry for this key
  • Forms a backward-linked chain for version history
  • Set to 0 for the first entry of a key
  • Used during chain validation and recovery

checksum: [u8; 4] (4 bytes, offset N+16 .. N+20)

  • CRC32C checksum of the payload
  • Provides fast integrity verification
  • Not cryptographically secure
  • Used during recovery to detect corruption

Serialization and Deserialization

The EntryMetadata struct provides methods for converting to/from bytes:

  • serialize() -> [u8; 20] - Converts metadata to byte array using little-endian encoding
  • deserialize(data:&[u8]) -> Self - Reconstructs metadata from byte slice

Sources: simd-r-drive-entry-handle/src/entry_metadata.rs:44-113 README.md:114-120


Pre-Padding and Alignment Strategy

Alignment Purpose

All non-tombstone payloads start at a 64-byte aligned address. This alignment ensures:

  • Cache-line efficiency - Matches typical CPU cache line size
  • SIMD optimization - Enables full-speed AVX2/AVX-512/NEON operations
  • Zero-copy typed views - Allows safe reinterpretation as typed slices (&[u16], &[u32], etc.)
graph TD
    Start["Calculate padding needed"]
GetPrevTail["prev_tail = last written offset"]
CalcPad["pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT))\n& (PAYLOAD_ALIGNMENT - 1)"]
CheckPad{"pad > 0?"}
WritePad["Write pad zero bytes"]
WritePayload["Write payload at aligned offset"]
Start --> GetPrevTail
 
   GetPrevTail --> CalcPad
 
   CalcPad --> CheckPad
 
   CheckPad -->|Yes| WritePad
 
   CheckPad -->|No| WritePayload
 
   WritePad --> WritePayload

The alignment is configured via PAYLOAD_ALIGNMENT constant (64 bytes as of version 0.15.0).

Pre-Padding Calculation

The formula pad = (A - (prev_tail % A)) & (A - 1) where A = PAYLOAD_ALIGNMENT ensures:

  • If prev_tail is already aligned, pad = 0
  • Otherwise, pad equals the bytes needed to reach the next aligned boundary
  • Maximum padding is A - 1 bytes (63 bytes for 64-byte alignment)

Constants

The alignment is defined in simd-r-drive-entry-handle/src/constants.rs:1-20:

ConstantValueDescription
PAYLOAD_ALIGN_LOG26Log₂ of alignment (2⁶ = 64)
PAYLOAD_ALIGNMENT64Actual alignment boundary in bytes
METADATA_SIZE20Fixed size of metadata block

Sources: README.md:51-59 simd-r-drive-entry-handle/src/entry_metadata.rs:22-23 CHANGELOG.md:25-51


Backward Chain Formation

Chain Structure

Each entry’s prev_offset field creates a backward-linked chain that tracks the version history for a given key. This chain is essential for:

  • Recovery and validation on file open
  • Detecting incomplete writes
  • Rebuilding the index

Chain Properties

  • Most recent entry is at the end of the file (highest offset)
  • Chain traversal moves backward from tail toward offset 0
  • First entry for a key has prev_offset = 0
  • Valid chain can be walked all the way back to byte 0 without gaps
  • Broken chain indicates corruption or incomplete write

Usage in Recovery

During file open, the system:

  1. Scans backward from EOF reading metadata
  2. Follows prev_offset links to validate chain continuity
  3. Verifies checksums at each step
  4. Truncates file if corruption is detected
  5. Scans forward to rebuild the index

Sources: README.md:139-147 simd-r-drive-entry-handle/src/entry_metadata.rs:41-43


Entry Type Comparison

Aligned Entry vs. Tombstone

AspectAligned Entry (Non-Tombstone)Tombstone (Deletion Marker)
Pre-padding0-63 bytes (alignment dependent)None
Payload sizeVariable (user-defined)Fixed 1 byte (0x00)
Payload alignment64-byte boundaryNo alignment requirement
Metadata size20 bytes20 bytes
Total minimum size21 bytes (1-byte payload + metadata)21 bytes (1-byte + metadata)
Total maximum overhead83 bytes (63-byte pad + 20 metadata)21 bytes
Zero-copy capableYes (aligned payload)No (tombstone flag only)

When Tombstones Are Used

Tombstones mark key deletions while maintaining chain integrity. They:

  • Preserve the backward chain via prev_offset
  • Use minimal space (no alignment overhead)
  • Are detected during reads and filtered out
  • Enable recovery to skip deleted entries

Sources: README.md:112-137 simd-r-drive-entry-handle/src/entry_metadata.rs:9-37


Metadata Serialization Format

Binary Layout in File

Constants for Range Indexing

The simd-r-drive-entry-handle/src/constants.rs:1-20 file defines range constants for metadata field access:

  • KEY_HASH_RANGE = 0..8
  • PREV_OFFSET_RANGE = 8..16
  • CHECKSUM_RANGE = 16..20
  • METADATA_SIZE = 20

These ranges are used in EntryMetadata::serialize() and deserialize() methods.

Sources: simd-r-drive-entry-handle/src/entry_metadata.rs:62-112


Alignment Evolution and Migration

Version History

v0.14.0-alpha and earlier: Used 16-byte alignment (PAYLOAD_ALIGNMENT = 16)

v0.15.0-alpha onwards: Changed to 64-byte alignment (PAYLOAD_ALIGNMENT = 64)

This change was made to:

  • Ensure full cache-line alignment
  • Support AVX-512 and future SIMD extensions
  • Improve zero-copy performance across modern hardware

Migration Considerations

Storage files created with different alignment values are not compatible :

  • v0.14.x readers cannot correctly parse v0.15.x stores
  • v0.15.x readers may misinterpret v0.14.x padding

To migrate between versions:

  1. Read all entries using the old version binary
  2. Write entries to a new store using the new version binary
  3. Replace the old file after verification

In multi-service environments, deploy reader upgrades before writer upgrades to avoid mixed-version issues.

Sources: CHANGELOG.md:25-82 README.md:51-59


Debug Assertions for Alignment

Runtime Validation

The codebase includes debug-only alignment assertions that validate both pointer and offset alignment:

debug_assert_aligned(ptr: *const u8, align: usize) - Validates pointer alignment

  • Active in debug and test builds
  • Zero cost in release/bench builds
  • Ensures buffer base address is properly aligned

debug_assert_aligned_offset(off: u64) - Validates file offset alignment

These assertions help catch alignment issues during development without imposing runtime overhead in production.

Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88 CHANGELOG.md:33-41


Summary

The SIMD R Drive entry structure uses a carefully designed binary layout that balances efficiency, integrity, and flexibility:

  • Fixed 64-byte alignment ensures cache-friendly, SIMD-optimized access
  • 20-byte fixed metadata provides fast integrity checks and chain traversal
  • Variable pre-padding maintains alignment without complex calculations
  • Minimal tombstones mark deletions efficiently
  • Backward-linked chain enables robust recovery and validation

This design enables zero-copy reads, high write throughput, and automatic crash recovery while maintaining a simple, append-only storage model.

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Memory Management and Zero-Copy Access

Loading…

Memory Management and Zero-Copy Access

Relevant source files

Purpose and Scope

This document describes the memory management strategy used by SIMD R Drive’s core storage engine, focusing on memory-mapped file access and zero-copy read patterns. It covers the memmap2 crate integration, the Arc<Mmap> shared reference architecture, and how EntryHandle provides zero-copy views into stored data.

For details on entry structure and metadata organization, see Entry Structure and Metadata. For concurrency mechanisms that protect memory-mapped access, see Concurrency and Thread Safety.


Memory-Mapped File Architecture

Core mmap Integration

The storage engine uses the memmap2 crate to memory-map the entire storage file, allowing direct access to file contents without explicit read system calls. The memory-mapped region is managed through a layered reference-counting structure:

Arc<Mutex<Arc<Mmap>>>
    │
    ├─ Outer Arc: Shared across DataStore clones
    ├─ Mutex: Serializes remapping operations
    └─ Inner Arc<Mmap>: Shared across readers

Sources: src/storage_engine/data_store.rs:1-30

DataStore mmap Field Structure

The DataStore struct maintains the memory map using nested Arc wrappers:

LayerTypePurpose
OuterArc<Mutex<...>>Allows shared ownership of the mutex across DataStore instances
MutexMutex<...>Serializes remapping operations during writes
InnerArc<Mmap>Enables zero-cost cloning for concurrent readers
CoreMmapThe actual memory-mapped file region from memmap2

This structure enables:

  • Multiple readers to hold Arc<Mmap> references simultaneously
  • Safe remapping after writes without invalidating existing reader references
  • Lock-free reads once an Arc<Mmap> is obtained

Sources: src/storage_engine/data_store.rs:26-33 README.md:174-183


Memory Map Initialization and Remapping

graph TB
    Open["DataStore::open()"]
OpenFile["open_file_in_append_mode()"]
InitMmap["init_mmap()"]
UnsafeMap["unsafe memmap2::MmapOptions::new().map()"]
ArcWrap["Arc::new(mmap)"]
Open --> OpenFile
 
   OpenFile --> InitMmap
 
   InitMmap --> UnsafeMap
 
   UnsafeMap --> ArcWrap
    
    OpenFile -.returns.-> File
    UnsafeMap -.returns.-> Mmap
    ArcWrap -.stored in.-> DataStore

Initial Mapping

When a DataStore is opened, the storage file is memory-mapped using unsafe code that delegates to the OS:

Diagram: Initial memory map creation flow

The init_mmap function wraps the unsafe memmap2::MmapOptions::new().map() call, which asks the OS to map the file into the process address space. The resulting Mmap is immediately wrapped in an Arc for shared access.

Sources: src/storage_engine/data_store.rs:172-174 src/storage_engine/data_store.rs:84-117

sequenceDiagram
    participant Writer as "Write Operation"
    participant File as "RwLock<BufWriter<File>>"
    participant Reindex as "reindex()"
    participant MmapMutex as "Mutex<Arc<Mmap>>"
    participant Indexer as "RwLock<KeyIndexer>"
    
    Writer->>File: Acquire write lock
    Writer->>File: Append data + metadata
    Writer->>File: flush()
    Writer->>Reindex: reindex(&write_guard, offsets, tail)
    
    Reindex->>File: init_mmap(&write_guard)
    Note over Reindex,File: Create new Mmap from flushed file
    
    Reindex->>MmapMutex: lock()
    Reindex->>MmapMutex: *guard = Arc::new(new_mmap)
    Note over MmapMutex: Old Arc<Mmap> still valid for readers
    
    Reindex->>Indexer: write().insert(key_hash, offset)
    Reindex->>Indexer: Release lock
    
    Reindex->>MmapMutex: Release lock
    
    Note over Writer: New reads see updated mmap

Remapping After Writes

After write operations extend the file, the memory map must be refreshed to make new data visible. The reindex method handles this critical operation:

Diagram: Memory map remapping sequence during writes

The reindex method performs three synchronized updates:

  1. Creates a new Mmap from the extended file
  2. Atomically replaces the Arc<Mmap> in the mutex
  3. Updates the key indexer with new offsets

Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:176-186


Zero-Copy Read Patterns

graph LR
    subgraph "DataStore"
        MmapContainer["Mutex<Arc<Mmap>>"]
end
    
    subgraph "EntryHandle"
        MmapRef["Arc<Mmap>"]
Range["range: Range<usize>"]
Metadata["metadata: EntryMetadata"]
end
    
    subgraph "User Code"
        Slice["&[u8] payload slice"]
end
    
 
   MmapContainer -->|get_mmap_arc| MmapRef
 
   MmapRef -->|&mmap[range]| Slice
    Range -.defines region.-> Slice
    
    Note1["Zero-copy: slice points\ndirectly into mmap"]
Slice -.-> Note1

EntryHandle Architecture

EntryHandle is the primary abstraction for zero-copy reads. It holds an Arc<Mmap> reference and a byte range, providing direct slice access without copying:

Diagram: EntryHandle zero-copy architecture

When EntryHandle::as_slice() is called, it returns &self.mmap_arc[self.range.clone()], which is a direct reference into the memory-mapped region. No data is copied; the slice is a view into the OS page cache.

Sources: [simd-r-drive-entry-handle crate](https://github.com/jzombie/rust-simd-r-drive/blob/0299fd5d/simd-r-drive-entry-handle crate) src/storage_engine/data_store.rs:560-565

graph TB
    Read["read(key)"]
ComputeHash["compute_hash(key)"]
GetMmap["get_mmap_arc()"]
LockIndex["key_indexer.read()"]
ReadContext["read_entry_with_context()"]
IndexLookup["key_indexer.get_packed(key_hash)"]
Unpack["KeyIndexer::unpack(packed)"]
CreateHandle["EntryHandle { mmap_arc, range, metadata }"]
AsSlice["entry.as_slice()"]
DirectRef["&mmap[range]"]
Read --> ComputeHash
 
   Read --> GetMmap
 
   Read --> LockIndex
 
   ComputeHash --> ReadContext
 
   GetMmap --> ReadContext
 
   LockIndex --> ReadContext
 
   ReadContext --> IndexLookup
 
   IndexLookup --> Unpack
 
   Unpack --> CreateHandle
 
   CreateHandle --> AsSlice
 
   AsSlice --> DirectRef
    
    DirectRef -.zero-copy.-> OSPageCache["OS Page Cache"]

Read Operation Flow

The zero-copy read flow demonstrates how data moves from disk to user code without intermediate buffers:

Diagram: Zero-copy read operation flow from key lookup to slice access

Key points:

  • get_mmap_arc() obtains an Arc<Mmap> clone (cheap atomic increment)
  • Index lookup finds the file offset
  • EntryHandle is constructed with the Arc<Mmap> and byte range
  • as_slice() returns a reference directly into the mapped memory

Sources: src/storage_engine/data_store.rs:1040-1049 src/storage_engine/data_store.rs:502-565 src/storage_engine/data_store.rs:658-663


Shared Access with Arc

Thread-Safe Reference Counting

The Arc<Mmap> enables multiple threads to hold references to the same memory-mapped region simultaneously. Each clone increments an atomic reference count:

OperationCostThread Safety
Arc::clone()Single atomic incrementLock-free
Holding Arc<Mmap>No synchronization neededFully safe
Dropping Arc<Mmap>Single atomic decrementLock-free
Last reference dropsMmap unmapped by OSSafe

When a writer remaps the file, it replaces the Arc<Mmap> inside the mutex. Old Arc<Mmap> references remain valid until all readers drop them, at which point the OS automatically unmaps the old region.

Sources: src/storage_engine/data_store.rs:658-663 README.md:174-183

Clone Semantics in Iteration

EntryIterator demonstrates efficient Arc<Mmap> usage. The iterator holds one Arc<Mmap> and clones it for each EntryHandle it yields:

Diagram: Arc cloning pattern in EntryIterator

graph TB
    IterNew["EntryIterator::new(mmap_arc, tail)"]
IterField["EntryIterator { mmap: Arc<Mmap>, ... }"]
Next["next()
called"]
CreateHandle["EntryHandle { mmap_arc: Arc::clone(&self.mmap), ... }"]
UserCode["User processes EntryHandle"]
Drop["EntryHandle dropped"]
IterNew --> IterField
 
   IterField --> Next
 
   Next --> CreateHandle
    CreateHandle -.cheap clone.-> UserCode
 
   UserCode --> Drop
    Drop -.atomic decrement.-> RefCount["Reference count"]
Note["Iterator holds 1 Arc\nEach EntryHandle clones it\nAll point to same Mmap"]
IterField -.-> Note

This design allows the iterator and all yielded handles to coexist safely. The cloning overhead is minimal—just an atomic operation—while providing complete memory safety.

Sources: src/storage_engine/entry_iterator.rs:21-47 src/storage_engine/entry_iterator.rs:121-125


Memory Management Flow

graph TB
    subgraph "Initialization"
        OpenFile["open_file_in_append_mode()"]
InitMmap1["init_mmap(&file)"]
Recovery["recover_valid_chain()"]
ReinitMmap["Remap if truncation needed"]
BuildIndex["KeyIndexer::build()"]
StoreMmap["Store Arc<Mutex<Arc<Mmap>>>"]
end
    
    subgraph "Read Path"
        GetArc["get_mmap_arc()"]
ReadLock["key_indexer.read()"]
Lookup["Index lookup"]
ConstructHandle["EntryHandle { Arc::clone(mmap_arc), range, ... }"]
AsSlice["as_slice() → &mmap[range]"]
end
    
    subgraph "Write Path"
        WriteLock["file.write()"]
AppendData["Append payload + metadata"]
Flush["flush()"]
Reindex["reindex()"]
NewMmap["init_mmap() → new Mmap"]
SwapMmap["Mutex: *guard = Arc::new(new_mmap)"]
UpdateIndex["KeyIndexer: insert offsets"]
end
    
    subgraph "Iterator Path"
        IterCreate["iter_entries()"]
CloneMmap["get_mmap_arc()"]
IterNew["EntryIterator::new(mmap_arc, tail)"]
IterNext["next() → EntryHandle"]
end
    
 
   OpenFile --> InitMmap1
 
   InitMmap1 --> Recovery
 
   Recovery --> ReinitMmap
 
   ReinitMmap --> BuildIndex
 
   BuildIndex --> StoreMmap
    
    StoreMmap -.available for.-> GetArc
 
   GetArc --> ReadLock
 
   ReadLock --> Lookup
 
   Lookup --> ConstructHandle
 
   ConstructHandle --> AsSlice
    
    StoreMmap -.available for.-> WriteLock
 
   WriteLock --> AppendData
 
   AppendData --> Flush
 
   Flush --> Reindex
 
   Reindex --> NewMmap
 
   NewMmap --> SwapMmap
 
   SwapMmap --> UpdateIndex
    
    StoreMmap -.available for.-> IterCreate
 
   IterCreate --> CloneMmap
 
   CloneMmap --> IterNew
 
   IterNew --> IterNext

Complete Lifecycle

The following diagram maps the complete lifecycle of memory-mapped access, from initial file open through reads and writes to iterator cleanup:

Diagram: Complete memory management lifecycle

Sources: src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:1040-1049 src/storage_engine/data_store.rs:752-825 src/storage_engine/data_store.rs:276-280

Code Entity Mapping

The following table maps high-level concepts to specific code entities:

ConceptCode EntityLocation
Memory-mapped filememmap2::Mmapsrc/storage_engine/data_store.rs9
Shared mmap referenceArc<Mmap>Throughout codebase
Mmap containerArc<Mutex<Arc<Mmap>>>src/storage_engine/data_store.rs29
Mmap initializationinit_mmap(file: &BufWriter<File>)src/storage_engine/data_store.rs:172-174
Mmap retrievalget_mmap_arc(&self)src/storage_engine/data_store.rs:658-663
Remapping operationreindex(&self, write_guard, offsets, tail, deleted)src/storage_engine/data_store.rs:224-259
Zero-copy handlesimd_r_drive_entry_handle::EntryHandleSeparate crate
Iterator with mmapEntryIterator { mmap: Arc<Mmap>, ... }src/storage_engine/entry_iterator.rs:21-25
Raw mmap pointer (testing)arc_ptr(&self) → *const u8src/storage_engine/data_store.rs:653-655

Sources: src/storage_engine/data_store.rs:1-33 src/storage_engine/entry_iterator.rs:21-25


Safety Considerations

OS Page Cache Integration

The memory-mapped approach delegates memory management to the OS page cache:

Diagram: OS page cache interaction with memory-mapped region

Key benefits:

  • Pages loaded on-demand (lazy loading)
  • OS handles eviction when memory is tight
  • Multiple processes can share the same page cache entries
  • No explicit memory allocation in application code

Sources: README.md:43-50 README.md174

Large File Handling

The system is designed to handle datasets larger than available RAM. The memory mapping does not load the entire file into RAM:

File SizeRAM UsageBehavior
< Available RAMEntire file may be cachedFast access, no swapping

Available RAM| Only accessed pages cached| OS loads pages on-demand

Available RAM| LRU page eviction active| Older pages evicted as needed

When iterating or reading, only the accessed byte ranges are loaded into physical memory. The OS automatically evicts least-recently-used pages under memory pressure.

Sources: README.md:45-50

Unsafe Code Boundaries

Memory mapping inherently requires unsafe code:

DataStore::init_mmap()
    └─> unsafe { memmap2::MmapOptions::new().map(file) }

The memmap2 crate provides safe abstractions over this unsafe operation, ensuring:

  • The file descriptor remains valid while mapped
  • The mapped region respects file size boundaries
  • Concurrent modifications to the file (outside the mmap) are handled correctly

SIMD R Drive’s architecture ensures safety by:

  • Never resizing the file while an mmap exists
  • Remapping after writes extend the file
  • Using Arc<Mmap> to prevent use-after-unmap bugs

Sources: src/storage_engine/data_store.rs:172-174 src/lib.rs:123-124

Thread Safety Guarantees

The nested Arc<Mutex<Arc<Mmap>>> structure provides these guarantees:

OperationSynchronizationSafety Property
Reading from Arc<Mmap>None (lock-free)Safe: immutable data
Cloning Arc<Mmap>Atomic refcountSafe: no data race
RemappingMutex heldSafe: serialized with other remaps
Old mmap still referencedIndependent ArcSafe: won’t be unmapped
Concurrent reads + remapSeparate Arc instancesSafe: readers use old or new mmap

The key insight is that remapping creates a new Arc<Mmap> without invalidating existing references. Readers holding old Arc<Mmap> instances continue accessing the old mapping until they drop their references.

Sources: src/storage_engine/data_store.rs:26-33 README.md:174-183 README.md:196-206


Memory Pressure and Resource Management

Automatic Resource Cleanup

When memory pressure increases, the OS automatically evicts pages from the page cache. However, the Mmap object itself is small—it only holds file descriptor information and address space pointers. The actual memory is managed by the kernel.

Arc<Mmap> ensures that:

  • The file is not unmapped while any thread holds a reference
  • When the last Arc is dropped, the Mmap destructor unmaps the region
  • The OS then reclaims the virtual address space

Sources: src/storage_engine/data_store.rs:658-663

Testing Hooks

For validation and testing, the system exposes mmap internals in debug builds:

MethodPurposeAvailability
get_mmap_arc_for_testing()Returns Arc<Mmap> for inspection#[cfg(any(test, debug_assertions))]
arc_ptr()Returns raw *const u8 pointer#[cfg(any(test, debug_assertions))]

These methods allow tests to verify zero-copy behavior by comparing pointer addresses and validating that slices point directly into the mapped region.

Sources: src/storage_engine/data_store.rs:631-656

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Concurrency and Thread Safety

Loading…

Concurrency and Thread Safety

Relevant source files

Purpose and Scope

This document describes the concurrency model and thread safety guarantees of the SIMD R Drive storage engine. It covers the synchronization primitives used to enable safe multi-threaded access within a single process, including lock strategies for reads and writes, atomic operations, and memory map management.

For information about the core storage architecture and data structures, see Storage Architecture. For details on memory-mapped file usage, see Memory Management and Zero-Copy Access.

Key Limitation : The concurrency mechanisms described here apply only to single-process, multi-threaded environments. Multiple processes accessing the same storage file simultaneously are not supported and require external file locking mechanisms.


Concurrency Model Overview

The DataStore structure uses a combination of read-write locks, atomic operations, and mutexes to enable safe concurrent access across multiple threads while maintaining data consistency.

Diagram: DataStore Synchronization Architecture

graph TB
    subgraph "DataStore Synchronization Primitives"
        FILE["Arc&lt;RwLock&lt;BufWriter&lt;File&gt;&gt;&gt;\nfile"]
MMAP["Arc&lt;Mutex&lt;Arc&lt;Mmap&gt;&gt;&gt;\nmmap"]
TAIL["AtomicU64\ntail_offset"]
INDEX["Arc&lt;RwLock&lt;KeyIndexer&gt;&gt;\nkey_indexer"]
end
    
    subgraph "Write Operations"
        W_STREAM["write_stream"]
W_SINGLE["write"]
W_BATCH["batch_write"]
end
    
    subgraph "Read Operations"
        R_SINGLE["read"]
R_BATCH["batch_read"]
R_ITER["iter_entries"]
end
    
 
   W_STREAM --> FILE
 
   W_SINGLE --> FILE
 
   W_BATCH --> FILE
    
 
   W_STREAM --> TAIL
 
   W_SINGLE --> TAIL
 
   W_BATCH --> TAIL
    
    W_STREAM -.updates.-> MMAP
    W_SINGLE -.updates.-> MMAP
    W_BATCH -.updates.-> MMAP
    
    W_STREAM -.updates.-> INDEX
    W_SINGLE -.updates.-> INDEX
    W_BATCH -.updates.-> INDEX
    
 
   R_SINGLE --> INDEX
 
   R_BATCH --> INDEX
 
   R_ITER --> MMAP
    
 
   R_SINGLE --> MMAP
 
   R_BATCH --> MMAP

Sources: src/storage_engine/data_store.rs:27-33 README.md:172-183


Synchronization Primitives

DataStore Field Overview

The DataStore struct contains four primary fields that implement concurrency control:

FieldTypePurposeLock Type
fileArc<RwLock<BufWriter<File>>>File handle for writesRead-write lock
mmapArc<Mutex<Arc<Mmap>>>Memory-mapped viewExclusive mutex
tail_offsetAtomicU64Current file end positionLock-free atomic
key_indexerArc<RwLock<KeyIndexer>>Hash index for lookupsRead-write lock

Sources: src/storage_engine/data_store.rs:27-33


RwLock for File Writes

All write operations acquire an exclusive write lock on the file handle to prevent concurrent modifications.

Diagram: Write Lock Serialization

sequenceDiagram
    participant T1 as "Thread 1"
    participant T2 as "Thread 2"
    participant FILE as "RwLock&lt;File&gt;"
    
    T1->>FILE: write.lock() - acquire
    Note over T1,FILE: Thread 1 holds write lock
    T2->>FILE: write.lock() - blocks
    Note over T2: Thread 2 waits
    T1->>FILE: write data + flush
    T1->>FILE: release lock
    Note over FILE: Lock released
    T2->>FILE: acquire lock
    Note over T2,FILE: Thread 2 now writes
    T2->>FILE: write data + flush
    T2->>FILE: release lock

Write Lock Acquisition

Write operations acquire the lock at the start of the write process:

This pattern appears in:

The write lock ensures that only one thread can append data to the file at any given time, preventing:

  • Race conditions on file position
  • Interleaved writes corrupting the append-only chain
  • Inconsistent metadata ordering

Sources: src/storage_engine/data_store.rs:752-825 src/storage_engine/data_store.rs:847-945 README.md176


AtomicU64 for Tail Offset

The tail_offset field tracks the current end of the valid data in the storage file using atomic operations, enabling lock-free reads of the current file position.

Atomic Operations Used

OperationMethodPurpose
Loadload(Ordering::Acquire)Read current tail position
Storestore(offset, Ordering::Release)Update tail after write

Load Operation

Reads use Acquire ordering to ensure they see all previous writes:

Examples:

Store Operation

Writes use Release ordering to ensure all previous writes are visible:

Location: src/storage_engine/data_store.rs256

This atomic coordination ensures that:

  • Readers always see a consistent tail offset
  • Writers update the tail only after data is flushed
  • No locks are needed for reading the tail position

Sources: src/storage_engine/data_store.rs30 src/storage_engine/data_store.rs256 src/storage_engine/data_store.rs278 README.md182


graph LR
    subgraph "Memory Map Management"
        MUTEX["Mutex&lt;Arc&lt;Mmap&gt;&gt;"]
MMAP1["Arc&lt;Mmap&gt; v1"]
MMAP2["Arc&lt;Mmap&gt; v2"]
end
    
    subgraph "Readers"
        R1["Reader Thread 1"]
R2["Reader Thread 2"]
end
    
    subgraph "Writer"
        W["Writer Thread"]
end
    
    R1 -.clones.-> MMAP1
    R2 -.clones.-> MMAP1
 
   W -->|1. Lock mutex| MUTEX
 
   W -->|2. Create new| MMAP2
 
   W -->|3. Replace| MUTEX
 
   W -->|4. Release| MUTEX
    
    MMAP1 -.remains valid.-> R1
    MMAP1 -.remains valid.-> R2

Mutex for Memory Map

The memory-mapped file reference is protected by a Mutex<Arc<Mmap>> to prevent concurrent remapping during reads.

Diagram: Memory Map Arc Cloning Pattern

Accessing the Memory Map

Read operations clone the Arc<Mmap> to obtain a stable reference:

Source: src/storage_engine/data_store.rs:658-663

This pattern ensures:

  • Readers hold a reference to a specific memory map version
  • Writers can create a new memory map without invalidating existing readers
  • The Arc reference counting prevents premature deallocation
  • The mutex is held only briefly during the clone operation

Remapping After Writes

After writing and flushing data, the reindex function creates a new memory map:

Source: src/storage_engine/data_store.rs:231-255

Sources: src/storage_engine/data_store.rs29 src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:658-663 README.md180


RwLock for Key Index

The KeyIndexer is protected by a read-write lock, allowing multiple concurrent readers but exclusive writers.

Read Access Pattern

Multiple threads can acquire read locks simultaneously for lookups:

Example: src/storage_engine/data_store.rs509

Write Access Pattern

Index updates require exclusive write access:

Source: src/storage_engine/data_store.rs:233-253

Parallel Iterator Lock Strategy

The parallel iterator minimizes lock holding time by collecting offsets first:

Source: src/storage_engine/data_store.rs:300-302

Sources: src/storage_engine/data_store.rs31 src/storage_engine/data_store.rs:233-253 src/storage_engine/data_store.rs:300-302 README.md178


sequenceDiagram
    participant R1 as "Reader 1"
    participant R2 as "Reader 2"
    participant R3 as "Reader 3"
    participant INDEX as "RwLock&lt;KeyIndexer&gt;"
    participant MMAP as "Arc&lt;Mmap&gt;"
    
    par Concurrent Reads
        R1->>INDEX: read().lock() - acquire
        R2->>INDEX: read().lock() - acquire
        R3->>INDEX: read().lock() - acquire
    end
    
    par Index Lookups
        R1->>INDEX: get_packed(key_hash_1)
        R2->>INDEX: get_packed(key_hash_2)
        R3->>INDEX: get_packed(key_hash_3)
    end
    
    Note over R1,R3: All readers release index lock
    
    par Zero-Copy Access
        R1->>MMAP: Access offset_1
        R2->>MMAP: Access offset_2
        R3->>MMAP: Access offset_3
    end
    
    Note over R1,MMAP: No locks during data access

Lock-Free Read Operations

Read operations achieve lock-free access through memory-mapped files and atomic operations.

Diagram: Concurrent Lock-Free Read Pattern

Zero-Copy Read Implementation

Once the offset is obtained from the index, data access is lock-free:

Source: src/storage_engine/data_store.rs:502-565

Benefits of Lock-Free Reads

  1. No Read Contention : Multiple readers access different memory regions simultaneously
  2. Zero-Copy : Data is accessed directly from the memory map without copying
  3. Scalability : Read throughput scales linearly with CPU cores
  4. Low Latency : No lock acquisition overhead after index lookup

Sources: README.md174 src/storage_engine/data_store.rs:502-565 tests/concurrency_tests.rs:163-229


graph TB
    subgraph "Write Operation Phases"
        ACQUIRE["1. Acquire RwLock&lt;File&gt;"]
LOAD["2. Load tail_offset\n(Atomic)"]
CALC["3. Calculate pre-padding"]
WRITE["4. Write payload + metadata"]
FLUSH["5. Flush to disk"]
REMAP["6. Remap file (Mutex)"]
UPDATE["7. Update index (RwLock)"]
STORE["8. Store new tail_offset\n(Atomic)"]
RELEASE["9. Release file lock"]
end
    
 
   ACQUIRE --> LOAD
 
   LOAD --> CALC
 
   CALC --> WRITE
 
   WRITE --> FLUSH
 
   FLUSH --> REMAP
 
   REMAP --> UPDATE
 
   UPDATE --> STORE
 
   STORE --> RELEASE

Write Synchronization

Write operations are fully serialized through the file lock, ensuring consistency.

Diagram: Write Operation Synchronization Flow

Single Write Flow

Source: src/storage_engine/data_store.rs:758-825

Batch Write Optimization

Batch writes hold the lock once for multiple entries:

Source: src/storage_engine/data_store.rs:847-945

Sources: src/storage_engine/data_store.rs:752-825 src/storage_engine/data_store.rs:847-945 README.md176


Thread Safety Guarantees

Thread Safety Matrix

The following table summarizes thread safety guarantees for different environments:

EnvironmentReadsWritesIndex UpdatesStorage Safety
Single Process, Single Thread✅ Safe✅ Safe✅ Safe✅ Safe
Single Process, Multi-Threaded✅ Safe (lock-free, zero-copy)✅ Safe (RwLock<File>)✅ Safe (RwLock<KeyIndexer>)✅ Safe (Mutex<Arc<Mmap>>)
Multiple Processes, Shared File⚠️ Unsafe (no cross-process coordination)❌ Unsafe (no external locking)❌ Unsafe (separate memory spaces)❌ Unsafe (risk of race conditions)

Source: README.md:196-200

graph TB
    subgraph "Thread Safety Properties"
        P1["Atomic Tail Updates\nNo torn reads/writes"]
P2["Serialized File Writes\nNo interleaved data"]
P3["Consistent Index View\nRwLock guarantees"]
P4["Valid Memory Maps\nArc prevents premature free"]
P5["Backward Chain Integrity\nSequential offsets"]
end
    
 
   P1 --> SAFE["Thread-Safe\nMulti-Reader/Single-Writer"]
P2 --> SAFE
 
   P3 --> SAFE
 
   P4 --> SAFE
 
   P5 --> SAFE

Safe Concurrency Properties

The design ensures the following properties in single-process, multi-threaded environments:

Diagram: Thread Safety Property Dependencies

  1. Atomicity : All operations on shared state are atomic or properly locked
  2. Visibility : Changes made by one thread are visible to others through Release/Acquire semantics
  3. Ordering : The append-only design ensures writes happen in a strict sequence
  4. Isolation : Readers see a consistent snapshot via Arc<Mmap> cloning

Sources: README.md:172-206 src/storage_engine/data_store.rs:27-33


Single-Process vs Multi-Process

Single-Process Multi-Threaded (Supported)

All synchronization primitives work correctly within a single process:

Diagram: Single-Process Shared State

Example from concurrency tests:

Source: tests/concurrency_tests.rs:117-137

graph TB
    subgraph "Process 1"
        P1_DS["DataStore"]
P1_INDEX["KeyIndexer\n(separate)"]
P1_MMAP["Mmap\n(separate)"]
end
    
    subgraph "Process 2"
        P2_DS["DataStore"]
P2_INDEX["KeyIndexer\n(separate)"]
P2_MMAP["Mmap\n(separate)"]
end
    
    FILE["storage.bin\n(shared file)"]
P1_DS --> P1_INDEX
 
   P1_DS --> P1_MMAP
 
   P2_DS --> P2_INDEX
 
   P2_DS --> P2_MMAP
    
    P1_MMAP -.unsafe.-> FILE
    P2_MMAP -.unsafe.-> FILE

Multi-Process (Not Supported)

Multiple processes have separate address spaces and cannot share the in-memory synchronization primitives:

Diagram: Multi-Process Unsafe Access

Why Multi-Process is Unsafe

  1. Separate Index State : Each process has its own KeyIndexer in memory
  2. Independent Mmap Views : Memory maps are not synchronized across processes
  3. No Lock Coordination : RwLock and Mutex are process-local, not system-wide
  4. Race Conditions : Concurrent writes can corrupt the file structure

Recommendation : Use external file locking (e.g., flock, advisory locks) if multi-process access is required.

Sources: README.md:186-206 README.md:189-191


Testing Concurrency

The test suite validates concurrent access patterns to ensure thread safety guarantees.

Concurrent Write Test

Tests multiple threads writing simultaneously:

Source: tests/concurrency_tests.rs:111-161

Interleaved Read-Write Test

Tests read-after-write consistency with coordinated threads:

Source: tests/concurrency_tests.rs:163-229

Concurrent Streamed Write Test

Tests slow, streaming writes that hold the lock for extended periods:

Source: tests/concurrency_tests.rs:14-109

Sources: tests/concurrency_tests.rs:1-230


Summary

The SIMD R Drive concurrency model provides thread-safe access through a carefully coordinated set of synchronization primitives:

  • RwLock : Serializes file writes while allowing concurrent reads of the lock
  • AtomicU64 : Provides lock-free tail offset tracking
  • Mutex : Protects memory map updates without blocking existing readers
  • RwLock : Enables highly concurrent index reads with exclusive write access

This design achieves:

  • Zero-copy concurrent reads via memory mapping
  • Serialized writes preventing data corruption
  • Linear read scalability across CPU cores
  • Consistent snapshots through atomic operations

However, these guarantees apply only within a single process. Multi-process access requires external coordination mechanisms.

Sources: README.md:170-206 src/storage_engine/data_store.rs:26-33 tests/concurrency_tests.rs:1-230

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Key Indexing and Hashing

Loading…

Key Indexing and Hashing

Relevant source files

Purpose and Scope

This page documents the key indexing system and hashing mechanisms used in SIMD R Drive’s storage engine. It covers the KeyIndexer data structure, the XXH3 hashing algorithm, tag-based collision detection, and hardware acceleration features.

For information about how the index is accessed in concurrent operations, see Concurrency and Thread Safety. For details on how metadata is stored alongside payloads, see Entry Structure and Metadata.


Overview

The SIMD R Drive storage engine maintains an in-memory index that maps key hashes to file offsets, enabling O(1) lookup performance for stored entries. This index is critical for avoiding full file scans when retrieving data.

The indexing system consists of three main components:

  1. KeyIndexer : A concurrent hash map that stores packed values containing both a collision-detection tag and a file offset
  2. XXH3_64 hashing : A fast, hardware-accelerated hashing algorithm that generates 64-bit hashes from arbitrary keys
  3. Tag-based verification : A secondary collision detection mechanism that validates lookups to prevent hash collision errors

Sources: src/storage_engine/data_store.rs:1-33 README.md:158-168


KeyIndexer Structure

The KeyIndexer struct is defined in src/storage_engine/key_indexer.rs:56-59 and manages the in-memory hash index. It wraps a HashMap<u64, u64, Xxh3BuildHasher> where keys are XXH3 key hashes and values are packed 64-bit integers containing both a collision-detection tag and file offset.

graph TB
    subgraph DataStore["DataStore struct"]
KeyIndexerField["key_indexer: Arc&lt;RwLock&lt;KeyIndexer&gt;&gt;"]
end
    
    subgraph KeyIndexer["KeyIndexer struct"]
IndexField["index: HashMap&lt;u64, u64, Xxh3BuildHasher&gt;"]
MapKey["Key: u64\n(key_hash from compute_hash)"]
MapValue["Value: u64\n(packed tag / offset)"]
end
    
    subgraph Constants["Key Constants"]
TagBits["TAG_BITS = 16"]
OffsetMask["OFFSET_MASK = (1 << 48) - 1"]
end
    
    subgraph Methods["Public Methods"]
TagFromHash["tag_from_hash(key_hash) -> u16"]
TagFromKey["tag_from_key(key) -> u16"]
Pack["pack(tag, offset) -> u64"]
Unpack["unpack(packed) -> (u16, u64)"]
Build["build(mmap, tail_offset) -> Self"]
Insert["insert(key_hash, offset) -> Result"]
GetPacked["get_packed(&key_hash) -> Option<&u64>"]
GetOffset["get_offset(&key_hash) -> Option<u64>"]
Remove["remove(&key_hash) -> Option<u64>"]
end
    
 
   KeyIndexerField --> IndexField
 
   IndexField --> MapKey
 
   IndexField --> MapValue
 
   MapValue --> TagBits
 
   MapValue --> OffsetMask
    
    Pack -.uses.-> TagBits
    Pack -.uses.-> OffsetMask
    Unpack -.uses.-> TagBits
    Unpack -.uses.-> OffsetMask

Packed Value Format

The KeyIndexer stores a compact 64-bit packed value for each hash. This value is constructed by the pack function src/storage_engine/key_indexer.rs:79-85 and decoded by unpack src/storage_engine/key_indexer.rs:88-93:

BitsFieldDescriptionConstant Used
63-48Tag (16-bit)Collision detection tag from upper hash bitsTAG_BITS = 16
47-0Offset (48-bit)Absolute file offset to entry metadataOFFSET_MASK = 0xFFFFFFFFFFFF

The packing formula is: packed = (tag << (64 - TAG_BITS)) | offset

The unpacking extracts: tag = (packed >> (64 - TAG_BITS)) as u16 and offset = packed & OFFSET_MASK

Maximum Addressable File Size : The 48-bit offset field supports files up to 256 TiB (2^48 bytes). Attempting to use larger offsets will panic in debug builds due to debug_assert! checks src/storage_engine/key_indexer.rs:80-83

Sources: src/storage_engine/key_indexer.rs:9-15 src/storage_engine/key_indexer.rs:56-59 src/storage_engine/key_indexer.rs:79-93 src/storage_engine/data_store.rs31


Hashing Algorithm: XXH3_64

SIMD R Drive uses the XXH3_64 hashing algorithm from the xxhash-rust crate src/storage_engine/digest.rs XXH3 is optimized for speed and provides automatic hardware acceleration through SIMD instructions.

graph TB
    subgraph DigestModule["storage_engine::digest module"]
ComputeHash["compute_hash(key: &[u8]) -> u64"]
ComputeHashBatch["compute_hash_batch(keys: &[&[u8]]) -> Vec&lt;u64&gt;"]
ComputeChecksum["compute_checksum(payload: &[u8]) -> [u8; 4]"]
Xxh3BuildHasher["Xxh3BuildHasher struct"]
end
    
    subgraph Callers["Usage in DataStore"]
WriteMethod["write(key, payload)"]
BatchWrite["batch_write(entries)"]
KeyIndexerHashMap["HashMap&lt;u64, u64, Xxh3BuildHasher&gt;"]
end
    
 
   WriteMethod -->|calls| ComputeHash
 
   BatchWrite -->|calls| ComputeHashBatch
 
   WriteMethod -->|calls| ComputeChecksum
 
   KeyIndexerHashMap -->|uses hasher| Xxh3BuildHasher
    
 
   ComputeHash -->|produces| KeyHash["key_hash: u64"]
ComputeHashBatch -->|produces| KeyHashes["Vec&lt;u64&gt;"]

Hash Function API

The digest module exports the following functions used throughout the codebase:

FunctionSignatureImplementationUse Case
compute_hashfn(key: &[u8]) -> u64Wraps xxhash_rust::xxh3::xxh3_64Single key hashing
compute_hash_batchfn(keys: &[&[u8]]) -> Vec<u64>Parallel iterator over keysBatch write operations
compute_checksumfn(payload: &[u8]) -> [u8; 4]CRC32C from crc32fastPayload integrity checks
Xxh3BuildHasherstruct implementing BuildHasherCustom hasher for HashMapKeyIndexer HashMap hasher

The compute_hash_batch function leverages Rayon for parallel hashing when processing multiple keys simultaneously src/storage_engine/data_store.rs:839-842

graph LR
    KeyInput["Input: &[u8]"]
subgraph XXH3Crate["xxhash-rust crate"]
CPUDetect["Runtime CPU\nFeature Detection"]
subgraph x86Implementation["x86_64 Implementation"]
SSE2Path["SSE2 Path\n(always available)"]
AVX2Path["AVX2 Path\n(if cpuid detects)"]
end
        
        subgraph ARMImplementation["aarch64 Implementation"]
NEONPath["NEON Path\n(default on ARM)"]
end
        
        subgraph FallbackImplementation["Fallback"]
ScalarPath["Scalar Operations"]
end
    end
    
    Output["Output: u64 hash"]
KeyInput --> CPUDetect
 
   CPUDetect --> SSE2Path
 
   CPUDetect --> AVX2Path
 
   CPUDetect --> NEONPath
 
   CPUDetect --> ScalarPath
    
 
   SSE2Path --> Output
 
   AVX2Path --> Output
 
   NEONPath --> Output
 
   ScalarPath --> Output

Hardware Acceleration

The XXH3_64 implementation automatically detects and utilizes CPU-specific SIMD instructions at runtime:

PlatformDefault SIMDOptional FeaturesDetection Method
x86_64SSE2 (baseline)AVX2Runtime cpuid instruction
aarch64NEON (always)NoneCompile-time default
OtherScalar fallbackNoneCompile-time detection

The hardware acceleration is transparent to the application code. The compute_hash function signature remains the same regardless of which SIMD path is taken README.md:160-165

Sources: src/storage_engine/digest.rs src/storage_engine/data_store.rs:2-4 src/storage_engine/key_indexer.rs2 README.md:158-168


Tag-Based Collision Detection

While XXH3_64 produces high-quality 64-bit hashes, the system implements an additional collision detection layer using 16-bit tags. The tag is derived from the upper 16 bits of the key hash src/storage_engine/key_indexer.rs:64-66

Tag Computation Methods

Two methods generate tags for collision detection:

MethodSignatureSourceUsage Context
tag_from_hashfn(key_hash: u64) -> u16src/storage_engine/key_indexer.rs:64-66When hash is known
tag_from_keyfn(key: &[u8]) -> u16src/storage_engine/key_indexer.rs:69-72Direct from key bytes

The tag_from_hash function extracts the tag: (key_hash >> (64 - TAG_BITS)) as u16

The tag_from_key function computes: tag_from_hash(compute_hash(key))

graph TB
    subgraph WriteFlow["Write Operation Flow"]
Key["key: &[u8]"]
ComputeHash["compute_hash(key)"]
KeyHash["key_hash: u64"]
TagFromHash["KeyIndexer::tag_from_hash(key_hash)"]
Tag["tag: u16"]
WriteData["Write payload + metadata to file"]
Offset["metadata_offset: u64"]
Pack["KeyIndexer::pack(tag, offset)"]
PackedValue["packed: u64"]
Insert["key_indexer.insert(key_hash, offset)"]
end
    
 
   Key --> ComputeHash
 
   ComputeHash --> KeyHash
 
   KeyHash --> TagFromHash
 
   TagFromHash --> Tag
 
   WriteData --> Offset
 
   Tag --> Pack
 
   Offset --> Pack
 
   Pack --> PackedValue
 
   KeyHash --> Insert
 
   PackedValue --> Insert

Write-Time Tag Storage

During write operations, the tag is packed with the offset before insertion into the index:

The insert method in KeyIndexer src/storage_engine/key_indexer.rs:135-160 performs collision detection at write time by verifying that the new tag matches any existing tag for the same key hash.

Read-Time Tag Verification

The read_entry_with_context method src/storage_engine/data_store.rs:501-565 implements tag verification during reads:

The verification logic src/storage_engine/data_store.rs:513-521:

Collision Probability Analysis

The dual-layer verification provides strong collision resistance:

LayerBitsCollision ProbabilityDescription
XXH3_64 Hash64~2^-64Primary hash collision
Tag Verification16~2^-16Secondary tag collision given hash collision
Combined80~2^-80Both hash and tag must collide

With 2^16 = 65,536 possible tag values, the tag check provides sufficient discrimination for practical workloads. The KeyIndexer documentation src/storage_engine/key_indexer.rs:20-56 notes this can distinguish over 4 billion keys with ~50% collision probability (birthday bound).

Sources: src/storage_engine/key_indexer.rs:9-72 src/storage_engine/key_indexer.rs:135-160 src/storage_engine/data_store.rs:501-565

Write-Time Collision Rejection

The KeyIndexer::insert method src/storage_engine/key_indexer.rs:135-160 enforces collision detection at write time:

If a collision is detected (same hash, different tag), the write operation fails with an error src/storage_engine/data_store.rs:245-251 This prevents index corruption and ensures data integrity.

Sources: src/storage_engine/key_indexer.rs:135-160 src/storage_engine/data_store.rs:238-252


Index Building and Maintenance

Index Construction on Open

When DataStore::open src/storage_engine/data_store.rs:84-117 is called, the KeyIndexer is constructed by the static build method src/storage_engine/key_indexer.rs:98-124 which scans backward through the validated storage file:

The backward scan ensures only the most recent version of each key is indexed. Keys seen earlier in the scan (which represent newer entries) are added to the seen HashSet to skip older versions src/storage_engine/key_indexer.rs:108-111

Sources: src/storage_engine/data_store.rs:84-117 src/storage_engine/key_indexer.rs:98-124

Index Updates During Writes

After each write operation, the reindex method src/storage_engine/data_store.rs:224-259 updates the in-memory index with new key mappings:

Critical : The file must be flushed before calling reindex src/storage_engine/data_store.rs814 to ensure newly written data is visible in the new memory-mapped view. The flush guarantees that the OS has persisted the data to disk before the mmap is recreated.

The key_indexer.insert call may return an error if a hash collision is detected at write time src/storage_engine/data_store.rs:246-250 In this case, the entire batch operation is aborted to prevent an inconsistent index state.

Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:818-824


Concurrent Access and Locking

The KeyIndexer is protected by an Arc<RwLock<KeyIndexer>> wrapper src/storage_engine/data_store.rs31 enabling multiple concurrent readers while ensuring exclusive access for writers.

graph TB
    subgraph DataStoreField["DataStore struct field"]
KeyIndexerArc["key_indexer: Arc&lt;RwLock&lt;KeyIndexer&gt;&gt;"]
end
    
    subgraph ReadOperations["Read Operations (Shared Lock)"]
ReadOp["read(key)"]
BatchReadOp["batch_read(keys)"]
IterEntries["iter_entries()"]
ParIterEntries["par_iter_entries() (parallel feature)"]
end
    
    subgraph WriteOperations["Write Operations (Exclusive Lock)"]
WriteOp["write(key, payload)"]
BatchWriteOp["batch_write(entries)"]
DeleteOp["delete(key)"]
ReindexOp["reindex() (internal)"]
end
    
    subgraph LockAcquisition["Lock Acquisition"]
ReadLock["key_indexer.read().unwrap()"]
WriteLock["key_indexer.write().map_err(...)"]
end
    
 
   ReadOp --> ReadLock
 
   BatchReadOp --> ReadLock
 
   IterEntries --> ReadLock
 
   ParIterEntries --> ReadLock
    
 
   WriteOp --> WriteLock
 
   BatchWriteOp --> WriteLock
 
   DeleteOp --> WriteLock
 
   ReindexOp --> WriteLock
    
 
   ReadLock -.-> KeyIndexerArc
 
   WriteLock -.-> KeyIndexerArc

Lock Granularity

OperationLock on key_indexerLock on fileAtomicity
read(key)Read (shared)NoneLock-free read
batch_read(keys)Read (shared)NoneLock-free reads in batch
write(key, data)Write (exclusive)Write (exclusive)Full write atomicity
batch_write(...)Write (exclusive)Write (exclusive)Batch atomicity
delete(key)Write (exclusive)Write (exclusive)Tombstone write atomicity

The reindex method acquires both the mmap mutex src/storage_engine/data_store.rs232 and the key_indexer write lock src/storage_engine/data_store.rs:233-236 to atomically update both structures after a write operation.

Parallel Iteration : When the parallel feature is enabled, par_iter_entries src/storage_engine/data_store.rs:296-361 clones all packed values under a read lock, then releases the lock before parallel processing. This allows concurrent reads during parallel iteration.

Sources: src/storage_engine/data_store.rs31 src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:296-361


Performance Characteristics

Lookup Performance

The KeyIndexer provides O(1) average-case lookup performance using the HashMap src/storage_engine/key_indexer.rs58 The get_packed method src/storage_engine/key_indexer.rs:163-166 performs a single hash table lookup.

Empirical performance from README README.md:166-167:

  • 1 million random seeks (8-byte entries): typically < 1 second
  • Hash computation overhead : negligible due to SIMD acceleration
  • Tag verification overhead : minimal (one bit shift + one comparison)

Memory Overhead

Each index entry in the HashMap<u64, u64, Xxh3BuildHasher> consumes:

ComponentSize (bytes)Description
Key (u64)8XXH3 hash of the key
Value (u64)8Packed (tag
HashMap overhead~16-24Bucket pointers and metadata
Total per entry32-40Approximate overhead per unique key

For a dataset with 1 million unique keys , the KeyIndexer occupies approximately 32-40 MB of RAM. This is a small fraction of typical system memory, enabling efficient indexing even for large datasets.

Batch Operation Performance

The compute_hash_batch function src/storage_engine/data_store.rs:839-842 leverages Rayon for parallel hashing:

This parallel hashing provides near-linear speedup with CPU core count for large batches, as each key hash is computed independently.

Hardware Acceleration Impact

SIMD acceleration in XXH3_64 provides measurable performance improvements for hash-intensive workloads:

PlatformSIMD InstructionsRelative PerformanceSpeedup vs Scalar
x86_64SSE2~2-3x fasterBaseline
x86_64AVX2~3-4x faster1.5x over SSE2
aarch64NEON~2-3x fasterBaseline
FallbackScalar1x (baseline)N/A

Performance gains are most significant for:

  • batch_write operations with many keys
  • compute_hash_batch calls processing large key sets
  • Workloads with small payload sizes where hashing dominates

Sources: src/storage_engine/key_indexer.rs:56-59 src/storage_engine/data_store.rs:838-843 README.md:160-167


Error Handling and Collision Management

Collision Probability Analysis

The dual-layer verification system (64-bit hash + 16-bit tag) provides strong collision resistance as documented in src/storage_engine/key_indexer.rs:17-56:

Verification LayerBitsProbabilityDescription
XXH3_64 Hash Collision642^-64Two different keys produce same hash
Tag Collision (given hash collision)162^-16Tags match despite different keys
Combined Collision802^-80Both hash and tag must collide simultaneously

The 16-bit tag provides 65,536 distinct values. According to the birthday paradox, this supports over 4 billion keys with ~50% collision probability src/storage_engine/key_indexer.rs:48-49

Write-Time Collision Handling

The KeyIndexer::insert method src/storage_engine/key_indexer.rs:135-160 enforces strict collision detection during writes. If a tag mismatch occurs, the insert returns Err:

The reindex method propagates this error and aborts the entire batch operation src/storage_engine/data_store.rs:246-250:

This fail-fast approach ensures:

  • No partial writes that could corrupt the index
  • Deterministic error handling (writes either fully succeed or fully fail)
  • Index consistency is maintained across all operations

Read-Time Collision Handling

The read_entry_with_context method src/storage_engine/data_store.rs:501-565 detects collisions during reads when the original key is provided for verification src/storage_engine/data_store.rs:513-521:

When a read-time collision is detected:

  1. A warning is logged to help diagnose the issue
  2. None is returned to the caller (key not found)
  3. The index remains unchanged (reads do not modify state)

Read operations without the original key (e.g., when using pre-hashed keys) cannot perform tag verification and may return incorrect results if a hash collision exists. This is a tradeoff for performance in batch operations.

Sources: src/storage_engine/key_indexer.rs:17-56 src/storage_engine/key_indexer.rs:135-160 src/storage_engine/data_store.rs:238-252 src/storage_engine/data_store.rs:501-565


Summary

The key indexing and hashing system in SIMD R Drive provides:

  1. Fast lookups : O(1) hash-based access to entries
  2. Hardware acceleration : Automatic SIMD optimization on SSE2, AVX2, and NEON platforms
  3. Collision resistance : Dual-layer verification with 64-bit hashes and 16-bit tags
  4. Thread safety : Concurrent reads with exclusive writes via RwLock
  5. Low memory overhead : 16 bytes per unique key

This design enables efficient storage operations even for datasets with millions of entries, while maintaining data integrity through robust collision detection.

Sources: src/storage_engine/data_store.rs:1-1183 README.md:158-168

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Compaction and Maintenance

Loading…

Compaction and Maintenance

Relevant source files

Purpose and Scope

This page documents the maintenance operations available in the DataStore, focusing on compaction for space reclamation and automatic file recovery mechanisms. These operations ensure the storage engine remains efficient and resilient despite its append-only architecture.

For information about the underlying append-only storage model, see Storage Architecture. For details on entry structure that affects compaction, see Entry Structure and Metadata.

Sources: src/storage_engine/data_store.rs:1-1183


Compaction Process

Overview

The compaction process eliminates space waste caused by outdated entry versions. In the append-only model, updating a key creates a new entry while leaving the old version in the file. Compaction creates a new file containing only the latest version of each key, then atomically swaps it with the original.

graph TB
    subgraph "Original DataStore"
        DS["DataStore\n(self)"]
FILE["file: RwLock&lt;BufWriter&lt;File&gt;&gt;"]
MMAP["mmap: Mutex&lt;Arc&lt;Mmap&gt;&gt;"]
IDX["key_indexer: RwLock&lt;KeyIndexer&gt;"]
PATH["path: PathBuf"]
end
    
    subgraph "Compaction Process"
        COMPACT["compact()"]
ITER["iter_entries()"]
COPY["copy_handle()"]
end
    
    subgraph "Temporary DataStore"
        TEMP_DS["DataStore\n(compacted_storage)"]
TEMP_FILE["file (path + .bk)"]
TEMP_PATH["compacted_path"]
end
    
    subgraph "Final Operation"
        RENAME["std::fs::rename()"]
SWAP["Atomic File Swap"]
end
    
 
   DS --> COMPACT
 
   COMPACT --> TEMP_PATH
 
   TEMP_PATH --> TEMP_DS
 
   DS --> ITER
 
   ITER --> COPY
 
   COPY --> TEMP_DS
 
   TEMP_DS --> TEMP_FILE
 
   TEMP_FILE --> RENAME
 
   RENAME --> SWAP
 
   SWAP --> PATH
    
    style COMPACT fill:#f9f9f9
    style TEMP_DS fill:#f9f9f9
    style SWAP fill:#f9f9f9

Architecture

Sources: src/storage_engine/data_store.rs:706-749

Implementation Details

The compact() method at src/storage_engine/data_store.rs:706-749 performs the following sequence:

Sources: src/storage_engine/data_store.rs:706-749 src/storage_engine/data_store.rs:587-590 src/storage_engine/data_store.rs:269-280

Key Implementation Characteristics

AspectImplementationLocation
Temporary FileAppends .bk extension to original pathsrc/storage_engine/data_store.rs707
Entry SelectionUses iter_entries() which returns only latest versionssrc/storage_engine/data_store.rs714
Copy Mechanismcopy_handle()EntryStreamwrite_stream_with_key_hash()src/storage_engine/data_store.rs:587-590
Atomic Swapstd::fs::rename() provides atomic replacementsrc/storage_engine/data_store.rs746
Space OptimizationSkips static index if it wouldn’t save spacesrc/storage_engine/data_store.rs:727-741

Sources: src/storage_engine/data_store.rs:706-749

Thread Safety Considerations

The compact() method has critical thread safety limitations documented at src/storage_engine/data_store.rs:681-693:

Important Constraints:

  • Requires&mut self: Prevents concurrent mutations but does NOT prevent concurrent reads
  • Arc-wrapped risk : If DataStore is wrapped in Arc<DataStore>, other threads may hold read references during compaction
  • Recommendation : Only compact when you have exclusive access (single thread or external synchronization)
  • No automatic locking : External synchronization must be enforced by the caller

Sources: src/storage_engine/data_store.rs:679-705

EntryHandle Copying Mechanism

The copy_handle() method at src/storage_engine/data_store.rs:587-590 creates an EntryStream from the entry’s memory-mapped data and writes it to the target store using the original key hash. This preserves the entry’s identity while creating a new physical copy.

Sources: src/storage_engine/data_store.rs:567-590


Space Reclamation

Estimating Compaction Savings

The estimate_compaction_savings() method at src/storage_engine/data_store.rs:605-616 calculates potential space savings without performing actual compaction:

Algorithm:

  1. Get total file size via file_size()
  2. Iterate through all valid entries (latest versions only)
  3. Track seen keys using HashSet<u64> with Xxh3BuildHasher
  4. Sum the file_size() of each unique entry
  5. Return total_size - unique_entry_size

Note: The iter_entries() method already filters to latest versions, so this calculation represents the minimum achievable size through compaction.

Sources: src/storage_engine/data_store.rs:592-616

When to Compact

Compaction should be considered when:

ConditionDescriptionDetection Method
High Waste RatioSignificant difference between total and unique entry sizesestimate_compaction_savings() / file_size() > threshold
Frequent UpdatesMany keys updated multiple timesApplication-level tracking
Long-Running StorageFile has been in use for extended periodsTime-based policy
Before BackupMinimize backup sizePre-backup maintenance

The compaction process at src/storage_engine/data_store.rs:727-741 includes logic to skip static index generation if it wouldn’t save space, indicating an awareness of space optimization trade-offs.

Sources: src/storage_engine/data_store.rs:605-616 src/storage_engine/data_store.rs:727-741


File Recovery

Chain Validation

The recover_valid_chain() method at src/storage_engine/data_store.rs:383-482 ensures data integrity by validating the backward-linked chain of entries:

Sources: src/storage_engine/data_store.rs:363-482

Validation Algorithm Details

The recovery process validates multiple constraints:

ValidationCheckPurpose
Metadata Sizecursor >= METADATA_SIZEEnsure enough space for metadata read
Prev Offset Boundsprev_tail < metadata_offsetPrevent circular references
Entry Startentry_start < metadata_offsetEnsure valid entry range
Pre-paddingprepad_len(prev_tail)Account for 64-byte alignment
Tombstone DetectionSingle NULL byte checkHandle deletion markers
Chain DepthWalk to offset 0Ensure complete chain
Total Sizetotal_size <= file_lenPrevent overflow

Sources: src/storage_engine/data_store.rs:383-482

Corruption Detection and Handling

When DataStore::open() is called at src/storage_engine/data_store.rs:84-117 it automatically invokes recovery:

Sources: src/storage_engine/data_store.rs:84-117

Automatic Recovery Steps

The recovery process at src/storage_engine/data_store.rs:91-103:

  1. Detection : recover_valid_chain() returns final_len < file_len
  2. Warning : Logs truncation message with offsets
  3. Cleanup : Drops mmap and file handles
  4. Truncation : Re-opens file and calls set_len(final_len)
  5. Sync : Forces sync_all() to persist truncation
  6. Retry : Recursively calls open() with clean file

This ensures corrupted tail data is removed and the file is left in a valid state.

Sources: src/storage_engine/data_store.rs:89-103


Maintenance Operations

Re-indexing After Writes

The reindex() method at src/storage_engine/data_store.rs:224-259 updates internal structures after writes:

Key Operations:

  1. Re-map file : Creates new Mmap reflecting written data
  2. Update index : Inserts or removes key-hash-to-offset mappings
  3. Collision check : insert() returns error on hash collision (tag mismatch)
  4. Atomic update : Stores new tail_offset with Release ordering
  5. Visibility : New mmap makes written data visible to readers

Sources: src/storage_engine/data_store.rs:176-259

File Path and Metadata Access

MethodReturn TypePurposeLocation
get_path()PathBufReturns storage file pathsrc/storage_engine/data_store.rs:265-267
file_size()Result<u64>Returns current file sizesrc/storage_engine/data_store.rs:1179-1181
len()Result<usize>Returns count of unique keyssrc/storage_engine/data_store.rs:1164-1171
is_empty()Result<bool>Checks if storage has no entriessrc/storage_engine/data_store.rs:1173-1177

These methods provide essential metadata for maintenance decision-making.

Sources: src/storage_engine/data_store.rs:261-267 src/storage_engine/data_store.rs:1164-1181

Iterators for Maintenance

The storage provides multiple iteration mechanisms useful for maintenance:

Iteration Characteristics:

Sources: src/storage_engine/data_store.rs:269-361


Summary

The maintenance system provides:

ComponentPurposeKey Method
CompactionReclaim space from outdated entriescompact()
Space EstimationCalculate potential savingsestimate_compaction_savings()
RecoveryValidate and repair corrupted filesrecover_valid_chain()
Re-indexingUpdate structures after writesreindex()
IterationScan for maintenance operationsiter_entries(), par_iter_entries()

All operations maintain the append-only guarantee while ensuring data integrity and efficient space utilization.

Sources: src/storage_engine/data_store.rs:1-1183

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Network Layer and RPC

Loading…

Network Layer and RPC

Relevant source files

Purpose and Scope

This document describes the network layer and Remote Procedure Call (RPC) system that enables remote access to the DataStore over WebSocket connections. The system is built on the Muxio framework and provides a standardized interface for clients in any language to communicate with a DataStore server.

For information about the core storage engine that the network layer wraps, see Core Storage Engine. For details on language-specific client implementations, see Python Integration and Native Rust Client.

Overview

The network layer provides a WebSocket-based RPC protocol that allows remote clients to perform DataStore operations. The architecture consists of three main components:

ComponentPurposeLocation
Service DefinitionDefines the RPC contract (methods and data types)experiments/simd-r-drive-muxio-service-definition
WebSocket ServerExposes DataStore operations over WebSocketexperiments/simd-r-drive-ws-server
WebSocket ClientConnects to remote servers and invokes RPC methodsexperiments/simd-r-drive-ws-client

The system uses bitcode for efficient binary serialization and tokio-tungstenite for WebSocket transport. The Muxio framework handles message framing, multiplexing, and concurrent request processing over a single WebSocket connection.

Sources: experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-17 experiments/simd-r-drive-ws-server/Cargo.toml:1-23 experiments/simd-r-drive-ws-client/Cargo.toml:1-22

Architecture Overview

Diagram: Network Layer Component Architecture

The client and server both depend on simd-r-drive-muxio-service-definition to ensure they share the same RPC contract. The Muxio framework components (muxio-rpc-service-caller and muxio-rpc-service-endpoint) handle the RPC mechanics, while muxio-tokio-rpc-client and muxio-tokio-rpc-server provide the WebSocket transport layer.

Sources: experiments/simd-r-drive-ws-client/Cargo.toml:13-21 experiments/simd-r-drive-ws-server/Cargo.toml:13-22 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:13-16

Service Definition Layer

The simd-r-drive-muxio-service-definition crate defines the RPC contract shared between client and server. This contract specifies:

  • RPC Methods : Operations that can be invoked remotely (e.g., read, write, compact)
  • Request/Response Types : Data structures for method parameters and return values
  • Error Types : Standardized error representations for network and storage errors

Key Components

The service definition uses the muxio-rpc-service framework to define typed RPC interfaces. All data types use bitcode for serialization, which provides:

  • Compact Binary Encoding : Efficient wire format optimized for small message sizes
  • Zero-Copy Deserialization : Where possible, data is accessed without copying
  • Type Safety : Compile-time guarantees that client and server speak the same protocol

Diagram: Service Definition Structure

The service trait defines method signatures, while request/response structs define the data exchanged. The bitcode derive macros generate serialization code automatically.

Sources: experiments/simd-r-drive-muxio-service-definition/Cargo.toml:14-15 experiments/bindings/python-ws-client/Cargo.lock:133-143

Transport Layer

WebSocket Protocol

The network layer uses WebSocket as the transport protocol for several reasons:

FeatureBenefit
Full DuplexServer can push notifications to clients
Single ConnectionReduces connection overhead and latency
Firewall FriendlyWorks through HTTP/HTTPS proxies
Binary FramesEfficient for bitcode-encoded messages

The transport is implemented using tokio-tungstenite, which provides async WebSocket support integrated with the Tokio runtime.

Sources: experiments/bindings/python-ws-client/Cargo.lock:1213-1222 experiments/bindings/python-ws-client/Cargo.lock:1303-1317

Muxio Framework

The Muxio framework provides the RPC layer on top of WebSocket:

Diagram: RPC Message Flow Through Muxio

The caller assigns a unique request_id to each RPC call, enabling multiplexing : multiple concurrent requests can be in flight over the same WebSocket connection. The endpoint matches responses to requests using these IDs.

Sources: experiments/bindings/python-ws-client/Cargo.lock:659-682 experiments/bindings/python-ws-client/Cargo.lock:685-700

Connection Management

Diagram: WebSocket Connection State Machine

The client maintains connection state and can implement automatic reconnection logic. The Muxio layer handles connection interruptions gracefully, returning errors for in-flight requests when the connection drops.

Sources: experiments/bindings/python-ws-client/Cargo.lock:685-700

Concurrency Model

The network layer is fully asynchronous and built on Tokio:

ComponentConcurrency Mechanism
ClientMultiple concurrent RPC calls multiplexed over one connection
ServerOne task per connected client; concurrent request handling within each connection
Request ProcessingEach RPC method invocation runs as a separate async task

Client-Side Concurrency

The muxio-rpc-service-caller manages request IDs and matches responses to the correct awaiting future, enabling concurrent operations without blocking.

Server-Side Concurrency

The server spawns a new Tokio task for each incoming RPC request, allowing concurrent processing of multiple client requests. The underlying DataStore handles concurrent reads efficiently through its shared memory-mapped file access pattern (see Concurrency and Thread Safety).

Sources: experiments/simd-r-drive-ws-client/Cargo.toml19 experiments/simd-r-drive-ws-server/Cargo.toml18 experiments/bindings/python-ws-client/Cargo.lock:1184-1199

Error Handling

The RPC layer distinguishes between different error categories:

Diagram: Error Type Hierarchy

Each error category provides different information to help diagnose issues:

  • Transport Errors : Indicate network connectivity problems
  • Protocol Errors : Suggest version mismatches or implementation bugs
  • Application Errors : Represent normal error conditions from DataStore operations

Sources: experiments/bindings/python-ws-client/Cargo.lock:659-669

Security Considerations

The current implementation provides:

Security FeatureStatusNotes
EncryptionNot implementedUses plain WebSocket (ws://)
AuthenticationNot implementedNo built-in auth mechanism
AuthorizationNot implementedAll connected clients have full access

For production deployments, consider:

  1. Use WSS (WebSocket Secure) : Implement TLS encryption by placing the server behind a reverse proxy (nginx, Caddy, etc.)
  2. Implement Authentication : Add token-based auth in the WebSocket handshake
  3. Add Authorization : Implement per-key access controls in the server handler
  4. Rate Limiting : Protect against DoS by limiting request rates per client

Sources: experiments/simd-r-drive-ws-server/Cargo.toml:1-23

Performance Characteristics

Serialization Overhead

The bitcode serialization format is optimized for performance:

  • Small Message Size : Typically 30-50% smaller than JSON
  • Fast Encoding : Zero-copy for many types, SIMD-optimized where applicable
  • Predictable Layout : Fixed-size types don’t require length prefixes

Network Latency

RPC call latency consists of:

Total Latency = Serialization + Network RTT + Deserialization + Processing

For typical operations:

  • Serialization/Deserialization : <1ms for small payloads
  • Network RTT : Depends on network conditions (LAN: <1ms, WAN: 10-100ms)
  • Processing : Varies by operation (read: <1ms, write: ~1-5ms with flush)

Connection Multiplexing

A single WebSocket connection can handle thousands of concurrent RPC calls, limited only by:

  • Available memory for buffering requests
  • Server processing capacity
  • Network bandwidth

Sources: experiments/simd-r-drive-muxio-service-definition/Cargo.toml14 experiments/bindings/python-ws-client/Cargo.lock:133-143

Dependency Graph

Diagram: Complete Dependency Graph for Network Layer

The layered architecture ensures clean separation of concerns:

  • Application Layer : Client and server applications
  • simd-r-drive Components : Project-specific RPC wrappers and service definitions
  • Muxio Framework : Generic RPC infrastructure
  • Transport & Runtime: Low-level WebSocket and async runtime

Sources: experiments/simd-r-drive-ws-client/Cargo.toml:13-21 experiments/simd-r-drive-ws-server/Cargo.toml:13-22 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:13-16 experiments/bindings/python-ws-client/Cargo.lock:648-700

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

WebSocket Server

Loading…

WebSocket Server

Relevant source files

Purpose and Scope

The WebSocket Server provides remote network access to the SIMD R Drive storage engine through an RPC-based interface. This experimental component enables clients to perform storage operations over WebSocket connections using the muxio RPC framework with bitcode serialization.

This page covers the server implementation, configuration, and connection handling. For information about the RPC protocol and service definitions, see Muxio RPC Framework. For client-side implementation, see Native Rust Client. For the underlying storage operations, see DataStore API.

Sources: experiments/simd-r-drive-ws-server/Cargo.toml:1-23


Server Architecture

The WebSocket Server acts as a bridge between remote clients and the local storage engine, handling WebSocket connections, deserializing RPC requests, executing storage operations, and returning serialized responses.

graph TB
    subgraph "simd-r-drive-ws-server"
        MAIN["main.rs\nServer Entry Point"]
CLI["clap Parser\nCLI Arguments"]
SERVER["muxio-tokio-rpc-server\nWebSocket Server"]
SERVICE["Service Implementation\nRPC Handler"]
end
    
    subgraph "Dependencies"
        DEFINITION["simd-r-drive-muxio-service-definition\nService Contract"]
CORE["simd-r-drive\nDataStore"]
TOKIO["tokio Runtime\nAsync Executor"]
TRACING["tracing-subscriber\nLogging"]
end
    
    subgraph "Network"
        WS_CONN["WebSocket Connection\ntokio-tungstenite"]
CLIENT["Remote Client"]
end
    
 
   MAIN --> CLI
 
   MAIN --> TRACING
 
   MAIN --> SERVER
 
   SERVER --> SERVICE
 
   SERVICE --> DEFINITION
 
   SERVICE --> CORE
 
   SERVER --> TOKIO
 
   SERVER --> WS_CONN
 
   WS_CONN --> CLIENT
    
    style SERVER fill:#f9f9f9
    style SERVICE fill:#f9f9f9
    style CORE fill:#e8e8e8

Component Overview

Sources: experiments/simd-r-drive-ws-server/Cargo.toml:13-23


Server Implementation

Package Structure

The simd-r-drive-ws-server package implements the WebSocket server as an experimental component in the workspace. It provides a binary that can be executed to start the server.

ComponentPurpose
main.rsServer entry point, CLI parsing, initialization
Service HandlerImplements the RPC service interface defined in simd-r-drive-muxio-service-definition
DataStore WrapperManages the local DataStore instance and access

Service Implementation Flow

Sources: experiments/simd-r-drive-ws-server/Cargo.toml:13-18


Configuration and Startup

Command-Line Arguments

The server uses clap for CLI argument parsing with the derive feature, providing a structured interface for configuration.

Expected Configuration Options

OptionTypePurpose
--host / -hStringBind address (e.g., 127.0.0.1, 0.0.0.0)
--port / -pu16Port number (e.g., 9001)
--pathPathBufStorage file path for DataStore
--log-levelStringLogging level (trace, debug, info, warn, error)

Initialization Sequence

The server initialization creates a local DataStore instance which is then accessed by the service handler for all RPC operations. The tracing-subscriber dependency with env-filter feature allows runtime configuration of logging levels.

Sources: experiments/simd-r-drive-ws-server/Cargo.toml:19-22


Connection Handling

WebSocket Lifecycle

The muxio-tokio-rpc-server package handles the WebSocket protocol details, including:

  1. Connection Upgrade : HTTP to WebSocket protocol upgrade
  2. Message Framing : Binary message framing over WebSocket
  3. Multiplexing : Multiple concurrent RPC calls over a single connection
  4. Error Handling : Connection errors and RPC-level errors

Connection State Management

Each WebSocket connection is handled by a separate tokio task, allowing concurrent client connections without blocking.

Sources: experiments/simd-r-drive-ws-server/Cargo.toml:16-18


Service Implementation Details

RPC Service Interface

The service implementation must implement the interface defined in simd-r-drive-muxio-service-definition. This shared contract ensures type-safe communication between client and server.

Service Methods

Based on the DataStore API and typical RPC patterns, the service likely implements these methods:

MethodRequest TypeResponse TypePurpose
write(Vec<u8>, Vec<u8>)Result<(), Error>Write key-value pair
readVec<u8>Result<Option<Vec<u8>>, Error>Read value by key
deleteVec<u8>Result<(), Error>Mark key as deleted
batch_writeVec<(Vec<u8>, Vec<u8>)>Result<(), Error>Write multiple pairs
batch_readVec<Vec<u8>>Result<Vec<Option<Vec<u8>>>, Error>Read multiple values
compact()Result<(), Error>Trigger compaction

Service Handler Structure

The service handler wraps a shared reference to the DataStore (likely Arc<DataStore>) to allow concurrent read access across multiple RPC calls while serializing writes through the DataStore’s internal concurrency control.

Sources: experiments/simd-r-drive-ws-server/Cargo.toml:13-17 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-17


Dependencies and Runtime

Core Dependencies

muxio-tokio-rpc-server

The muxio-tokio-rpc-server package provides the WebSocket server implementation built on top of:

  • axum for HTTP/WebSocket handling
  • tokio for async runtime
  • muxio-rpc-service-endpoint for RPC dispatch
graph TB
    subgraph "tokio Runtime"
        MULTI["Multi-threaded\nExecutor"]
REACTOR["Reactor\nI/O Events"]
TIMER["Timer\nTimeouts"]
end
    
    subgraph "Server Tasks"
        LISTENER["Listener Task\nAccept Connections"]
CONN1["Connection Task 1\nClient 1"]
CONN2["Connection Task 2\nClient 2"]
CONNN["Connection Task N\nClient N"]
end
    
 
   MULTI --> LISTENER
 
   MULTI --> CONN1
 
   MULTI --> CONN2
 
   MULTI --> CONNN
    
 
   REACTOR --> LISTENER
 
   REACTOR --> CONN1
 
   REACTOR --> CONN2
 
   REACTOR --> CONNN

Serialization

The service uses bitcode for binary serialization, shared through the simd-r-drive-muxio-service-definition package. Bitcode provides compact binary encoding with zero-copy deserialization where possible.

Async Runtime

The server runs on the tokio multi-threaded runtime, with each WebSocket connection handled by an independent task. This allows efficient concurrent handling of multiple clients.

Sources: experiments/simd-r-drive-ws-server/Cargo.toml:16-20


Logging and Observability

tracing Integration

The server uses tracing with tracing-subscriber for structured logging. The env-filter feature allows configuration via environment variables:

Log Levels

LevelUse Case
traceDetailed RPC message tracing
debugConnection lifecycle events
infoServer startup, configuration, client connections
warnNon-fatal errors, retries
errorFatal errors, connection failures

Key Trace Points

The server likely includes trace instrumentation at:

  • Server initialization and configuration
  • WebSocket connection establishment
  • RPC method dispatch
  • DataStore operation execution
  • Error conditions and recovery

Sources: experiments/simd-r-drive-ws-server/Cargo.toml:19-20


Building and Running

Build Command

Run Command Example

Development Mode

For development with detailed logging:

Sources: experiments/simd-r-drive-ws-server/Cargo.toml:1-23


Security Considerations

Network Exposure

The WebSocket server exposes the DataStore over the network. Important considerations:

  1. Authentication : The current implementation does not include authentication (experimental status)
  2. Encryption : WebSocket connections are not TLS-encrypted by default
  3. Access Control : No per-key or per-operation access control
  4. Network Binding : Binding to 0.0.0.0 exposes the server to all network interfaces

For production deployment, additional layers would be needed:

  • TLS/SSL termination (via reverse proxy or native support)
  • Authentication middleware
  • Rate limiting
  • Request validation
  • Network-level access control (firewall rules)

Note: This is an experimental component and should not be used in production without additional security hardening.

Sources: experiments/simd-r-drive-ws-server/Cargo.toml:1-11


Integration with Core Storage

DataStore Access Pattern

The server maintains a single DataStore instance that is shared across all RPC handlers. Write operations serialize through the DataStore’s internal RwLock, while read operations can proceed concurrently through the lock-free DashMap index.

Thread Safety

The server implementation relies on the DataStore’s thread-safe design:

  • Multiple concurrent reads via DashMap
  • Serialized writes via RwLock
  • Atomic tail offset tracking via AtomicU64

This allows multiple WebSocket connections to safely access the same DataStore instance without additional synchronization.

Sources: experiments/simd-r-drive-ws-server/Cargo.toml:14-15

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Muxio RPC Framework

Loading…

Muxio RPC Framework

Relevant source files

Purpose and Scope

This document describes the Muxio RPC (Remote Procedure Call) framework as implemented in SIMD R Drive for remote storage access over WebSocket connections. The framework provides a type-safe, multiplexed communication protocol using bitcode serialization for efficient binary data transfer.

For information about the WebSocket server implementation, see WebSocket Server. For the native Rust client implementation, see Native Rust Client. For Python client integration, see Python WebSocket Client API.

Sources: experiments/simd-r-drive-ws-server/Cargo.toml:1-23 experiments/simd-r-drive-ws-client/Cargo.toml:1-22 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-17


Architecture Overview

The Muxio RPC framework consists of multiple layers that work together to provide remote procedure calls over WebSocket connections:

Muxio RPC Framework Layer Architecture

graph TB
    subgraph "Client Application Layer"
        App["Application Code"]
end
    
    subgraph "Client RPC Stack"
        Caller["muxio-rpc-service-caller\nMethod Invocation"]
ClientRuntime["muxio-tokio-rpc-client\nWebSocket Client Runtime"]
end
    
    subgraph "Shared Contract"
        ServiceDef["simd-r-drive-muxio-service-definition\nService Interface Contract\nMethod Signatures"]
Bitcode["bitcode\nBinary Serialization"]
end
    
    subgraph "Server RPC Stack"
        ServerRuntime["muxio-tokio-rpc-server\nWebSocket Server Runtime"]
Endpoint["muxio-rpc-service-endpoint\nRequest Router"]
end
    
    subgraph "Server Application Layer"
        Impl["DataStore Implementation"]
end
    
    subgraph "Core Framework"
        Core["muxio-rpc-service\nBase RPC Traits & Types"]
end
    
 
   App --> Caller
 
   Caller --> ClientRuntime
 
   ClientRuntime --> ServiceDef
 
   ClientRuntime --> Bitcode
 
   ClientRuntime --> Core
    
 
   ServiceDef --> Bitcode
 
   ServiceDef --> Core
    
 
   ServerRuntime --> ServiceDef
 
   ServerRuntime --> Bitcode
 
   ServerRuntime --> Core
 
   ServerRuntime --> Endpoint
 
   Endpoint --> Impl
    
    style ServiceDef fill:#f9f9f9,stroke:#333,stroke-width:2px

The framework is organized into distinct layers:

LayerCratesResponsibility
Core Frameworkmuxio-rpc-serviceBase traits, types, and RPC protocol definitions
Service Definitionsimd-r-drive-muxio-service-definitionShared interface contract between client and server
SerializationbitcodeEfficient binary encoding/decoding of messages
Client Runtimemuxio-tokio-rpc-client, muxio-rpc-service-callerWebSocket client, method invocation, request management
Server Runtimemuxio-tokio-rpc-server, muxio-rpc-service-endpointWebSocket server, request routing, response handling

Sources: Cargo.lock:1250-1336 experiments/simd-r-drive-ws-server/Cargo.toml:14-17 experiments/simd-r-drive-ws-client/Cargo.toml:14-21


Core Framework Components

muxio-rpc-service

The muxio-rpc-service crate provides the foundational abstractions for the RPC system. This crate defines the core traits and types that both client and server components build upon.

Core RPC Framework Message Structure and Dependencies

graph TB
    subgraph "muxio-rpc-service Crate"
        RpcService["#[async_trait]\nRpcService Trait"]
Request["RpcRequest\nStruct"]
Response["RpcResponse\nStruct"]
ServiceDef["Service Definition\nInfrastructure"]
end
    
    subgraph "RpcRequest Fields"
        ReqID["request_id: u64\n(unique per call)"]
MethodID["method_id: u64\n(xxhash-rust XXH3)"]
Payload["payload: Vec&lt;u8&gt;\n(bitcode serialized)"]
end
    
    subgraph "RpcResponse Fields"
        RespID["request_id: u64\n(matches request)"]
Result["result: Result&lt;Vec&lt;u8&gt;, Error&gt;\n(bitcode serialized)"]
end
    
    subgraph "Dependencies"
        AsyncTrait["async-trait"]
Futures["futures"]
NumEnum["num_enum"]
XXHash["xxhash-rust"]
end
    
 
   RpcService -->|defines| ServiceDef
 
   Request -->|contains| ReqID
 
   Request -->|contains| MethodID
 
   Request -->|contains| Payload
 
   Response -->|contains| RespID
 
   Response -->|contains| Result
    
    RpcService -.uses.- AsyncTrait
    MethodID -.hashed with.- XXHash

The muxio-rpc-service crate provides:

ComponentTypePurpose
RpcService#[async_trait] traitDefines async service interface with method dispatch
RpcRequestStructContains request_id, method_id (XXH3 hash from xxhash-rust), and bitcode payload
RpcResponseStructContains request_id and Result<Vec<u8>, Error> variant
Method ID hashingxxhash-rust XXH3Generates stable 64-bit method identifiers
Enum conversionnum_enumConverts between numeric and enum representations

The framework uses async-trait to enable async methods in traits, and XXH3 hashing (via xxhash-rust) for method identification, allowing fast O(1) method dispatch without string comparisons.

Sources: Cargo.lock:1261-1272 experiments/simd-r-drive-muxio-service-definition/Cargo.toml15


Service Definition Layer

simd-r-drive-muxio-service-definition

The simd-r-drive-muxio-service-definition crate serves as the shared RPC contract between clients and servers. This crate is compiled into both client and server binaries, ensuring type-safe method signatures on both sides.

Service Definition Compilation Model

graph TB
    subgraph "simd-r-drive-muxio-service-definition"
        Contract["RPC Service Contract"]
Methods["Method Signatures"]
Types["Shared Types"]
end
    
    subgraph "Client Binary"
        ClientStub["Generated Client Stubs"]
end
    
    subgraph "Server Binary"
        ServerImpl["Generated Server Handlers"]
end
    
 
   Contract --> Methods
 
   Contract --> Types
 
   Methods -->|compiled into| ClientStub
 
   Methods -->|compiled into| ServerImpl
 
   Types -->|used by| ClientStub
 
   Types -->|used by| ServerImpl
    
 
   ClientStub -->|invokes via| WS["WebSocket"]
WS -->|routes to| ServerImpl

The service definition provides the RPC interface contract. Both client and server depend on this crate, which defines:

ComponentDescriptionImplementation
Method signaturesDataStore operations (write, read, delete, etc.)Uses muxio-rpc-service traits
Request typesBitcode-serializable structs for each methodImplements bitcode::Encode
Response typesBitcode-serializable result typesImplements bitcode::Decode
Error typesShared error definitionsSerializable across RPC boundary

Method ID Generation

Each RPC method is identified by a stable method_id computed as the XXH3 hash of its signature string. This enables O(1) method routing:

Method ID Computation and Routing with Code Entities

flowchart LR
    Sig["Method Signature\n'write(key: &[u8], value: &[u8])\n-> Result&lt;u64&gt;'"]
XXH3["xxhash_rust::xxh3\nxxh3_64(sig.as_bytes())"]
ID["method_id: u64\ne.g., 0x1a2b3c4d5e6f7890"]
HashMap["HashMap&lt;u64,\nBox&lt;dyn Fn&gt;&gt;\nin RpcServiceEndpoint"]
Lookup["HashMap::get\n(&method_id)"]
Handler["async fn handler\n(decoded args)"]
Sig -->|hash at compile time| XXH3
 
   XXH3 --> ID
 
   ID -->|stored in| HashMap
 
   HashMap -->|O 1 lookup| Lookup
 
   Lookup --> Handler

The XXH3 hash (via xxhash-rust crate) ensures:

PropertyImplementationBenefit
Deterministic routingxxh3_64(signature.as_bytes())Same signature → same ID
Fast dispatchHashMap::get(&method_id)O(1) integer key lookup
Version compatibilityDifferent signatures → different IDsBreaking changes detected
Collision resistance64-bit hash space (2^64 values)Negligible collision probability
Compile-time computationconst or build-time hashingNo runtime overhead

The xxhash-rust dependency provides the xxh3_64 function used by muxio-rpc-service for method ID generation. The server’s RpcServiceEndpoint struct maintains the HashMap<u64, Box<dyn Fn>> dispatcher.

Sources: Cargo.lock:1261-1272 Cargo.lock:1905-1915 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-17


Bitcode Serialization

The framework uses the bitcode crate (version 0.6.6) for efficient binary serialization with the following characteristics:

graph LR
    subgraph "Bitcode Serialization Pipeline"
        RustType["Rust Type\n#[derive(Encode, Decode)]"]
Encode["bitcode::encode\n&lt;T&gt;(&value)"]
Binary["Vec&lt;u8&gt;\nCompact Binary"]
Decode["bitcode::decode\n&lt;T&gt;(&bytes)"]
RustType2["Rust Type\nReconstructed"]
end
    
    subgraph "bitcode Dependencies"
        BitcodeDerve["bitcode_derive\nproc macros"]
Bytemuck["bytemuck\nzero-copy casts"]
Arrayvec["arrayvec\nstack arrays"]
Glam["glam\nSIMD vectors"]
end
    
 
   RustType -->|serialize| Encode
 
   Encode --> Binary
 
   Binary -->|deserialize| Decode
 
   Decode --> RustType2
    
    Encode -.uses.- BitcodeDerve
    Encode -.uses.- Bytemuck
    Decode -.uses.- BitcodeDerve
    Decode -.uses.- Bytemuck

Serialization Features

Bitcode Encoding/Decoding Pipeline with Dependencies

FeatureImplementationBenefit
Zero-copy deserializationbytemuck for Pod typesMinimal overhead for aligned data
Compact encodingVariable-length integers, bit packingSmaller than bincode/MessagePack
Type safety#[derive(Encode, Decode)] proc macrosCompile-time serialization code
Performance~50ns per small structLower CPU than JSON/CBOR
SIMD supportglam integrationEfficient vector serialization

Integration with RPC

The serialization is integrated at multiple points:

Integration PointOperationCode Path
Request serializationbitcode::encode(&args)Vec<u8>Client RpcServiceCaller::call
Wire transferVec<u8> in RpcRequest.payloadWebSocket binary message
Request deserializationbitcode::decode::<Args>(&payload)Server RpcServiceEndpoint::dispatch
Response serializationbitcode::encode(&result)Vec<u8>Server after method execution
Response deserializationbitcode::decode::<Result>(&payload)Client response handler

The use of #[derive(Encode, Decode)] on request/response types ensures compile-time validation of serialization compatibility.

Sources: Cargo.lock:392-414 experiments/simd-r-drive-muxio-service-definition/Cargo.toml14


Client-Side Components

flowchart TB
    subgraph "Client Call Flow"
        ClientApp["Client Application"]
Caller["RpcServiceCaller\nStruct"]
GenID["Generate request_id\n(AtomicU64::fetch_add)"]
Request["Create RpcRequest\nStruct"]
Serialize["bitcode::encode\n(method args)"]
Send["Send via\ntokio::sync::mpsc"]
Await["tokio::sync::oneshot\nawait response"]
Deserialize["bitcode::decode\n(response payload)"]
Return["Return Result\nto caller"]
end
    
 
   ClientApp -->|async fn call| Caller
 
   Caller --> GenID
 
   GenID --> Request
 
   Request --> Serialize
 
   Serialize --> Send
 
   Send --> Await
 
   Await --> Deserialize
 
   Deserialize --> Return
 
   Return --> ClientApp

muxio-rpc-service-caller

The muxio-rpc-service-caller crate provides the client-side method invocation interface:

Client Method Invocation Flow with tokio Primitives

Key responsibilities and implementation:

ResponsibilityImplementationPurpose
Method call marshallingRpcServiceCaller structProvides typed interface to remote methods
Request ID generationAtomicU64::fetch_add(1, Ordering::Relaxed)Unique, monotonic request identifiers
Response awaitingtokio::sync::oneshot::ReceiverSingle-use channel for response delivery
Request queuingtokio::sync::mpsc::SenderSends requests to send loop
Error propagationResult<T, RpcError> return typesType-safe error handling

The caller uses tokio’s async primitives to coordinate between the application thread and the WebSocket send/receive loops.

Sources: Cargo.lock:1274-1285 experiments/simd-r-drive-ws-client/Cargo.toml18

graph TB
    subgraph "muxio-tokio-rpc-client Crate"
        Client["RpcClient\nStruct"]
SendLoop["send_loop\ntokio::task::spawn"]
RecvLoop["recv_loop\ntokio::task::spawn"]
PendingMap["Arc&lt;DashMap&lt;u64,\noneshot::Sender&lt;Result&gt;&gt;&gt;\nShared state"]
ReqChan["mpsc::Receiver\n&lt;RpcRequest&gt;"]
end
    
    subgraph "tokio-tungstenite Integration"
        WS["WebSocketStream\n&lt;MaybeTlsStream&gt;"]
Split["ws.split()"]
WSRead["SplitStream\n(read half)"]
WSWrite["SplitSink\n(write half)"]
end
    
    subgraph "Application Layer"
        AppCall["async fn call()"]
Future["impl Future\n&lt;Output=Result&gt;"]
end
    
 
   AppCall -->|1. create oneshot| Client
 
   Client -->|2. insert into| PendingMap
 
   Client -->|3. mpsc::send| ReqChan
 
   ReqChan -->|4. recv request| SendLoop
 
   SendLoop -->|5. bitcode::encode| SendLoop
 
   SendLoop -->|6. send binary| WSWrite
 
   WSRead -->|7. next binary| RecvLoop
 
   RecvLoop -->|8. bitcode::decode| RecvLoop
 
   RecvLoop -->|9. lookup by id| PendingMap
 
   PendingMap -->|10. oneshot::send| Future
 
   Future -->|11. return| AppCall
    
 
   WS --> Split
 
   Split --> WSRead
 
   Split --> WSWrite

muxio-tokio-rpc-client

The muxio-tokio-rpc-client crate implements the WebSocket client runtime with request multiplexing and response routing:

Client Runtime Request Multiplexing with tokio and tungstenite

Implementation details:

ComponentTypePurpose
RpcClientStructMain client interface, owns WebSocket and spawns tasks
send_looptokio::taskReceives from mpsc, serializes, writes to SplitSink
recv_looptokio::taskReads from SplitStream, deserializes, routes via DashMap
Pending requestsArc<DashMap<u64, oneshot::Sender>>Thread-safe map for response routing
Request channelmpsc::Sender/Receiver<RpcRequest>Queue for outbound requests
WebSockettokio_tungstenite::WebSocketStreamBinary WebSocket with TLS support
Split streamsfutures::stream::SplitStream/SplitSinkSeparate read/write halves

The multiplexing architecture uses DashMap for lock-free concurrent access to pending requests. The WebSocket stream is split into read and write halves, allowing the send_loop and recv_loop tasks to operate independently. Each request gets a unique request_id, and the recv_loop task matches responses back to waiting callers via oneshot channels.

Sources: Cargo.lock:1302-1318 experiments/simd-r-drive-ws-client/Cargo.toml16 Cargo.lock:681-693


Server-Side Components

graph TB
    subgraph "muxio-tokio-rpc-server Crate"
        Server["RpcServer\nStruct"]
AxumApp["axum::Router\nwith WebSocket route"]
AcceptLoop["tokio::spawn\n(per connection)"]
ConnHandler["handle_connection\nasync fn"]
Dispatcher["RpcServiceEndpoint\n&lt;ServiceImpl&gt;"]
end
    
    subgraph "axum WebSocket Integration"
        Route["GET /ws\nWebSocket upgrade"]
WSUpgrade["axum::extract::ws\nWebSocketUpgrade"]
WSStream["axum::extract::ws\nWebSocket"]
end
    
    subgraph "Service Implementation"
        ServiceImpl["Arc&lt;ServiceImpl&gt;\n(e.g., DataStore)"]
Methods["#[async_trait]\nRpcService methods"]
end
    
    subgraph "Method Dispatch"
        MethodMap["HashMap&lt;u64,\nBox&lt;dyn Fn&gt;&gt;\n(method_id → handler)"]
end
    
 
   AxumApp -->|upgrade| WSUpgrade
 
   WSUpgrade -->|on_upgrade| WSStream
 
   WSStream -->|tokio::spawn| AcceptLoop
 
   AcceptLoop --> ConnHandler
 
   ConnHandler -->|recv Message::Binary| ConnHandler
 
   ConnHandler -->|bitcode::decode| ConnHandler
 
   ConnHandler -->|dispatch by id| MethodMap
 
   MethodMap -->|invoke| Methods
    Methods -.implemented by.- ServiceImpl
 
   Methods -->|return Result| ConnHandler
 
   ConnHandler -->|bitcode::encode| ConnHandler
 
   ConnHandler -->|send Message::Binary| WSStream
    
 
   Dispatcher -->|owns| MethodMap
 
   Dispatcher -->|holds Arc| ServiceImpl

muxio-tokio-rpc-server

The muxio-tokio-rpc-server crate implements the WebSocket server runtime with connection management and request dispatching:

Server Runtime with axum WebSocket Integration

The server runtime architecture:

ComponentTypePurpose
RpcServerStructMain server, creates axum::Router with WebSocket route
axum::RouterHTTP routerHandles WebSocket upgrade at /ws endpoint
WebSocketUpgradeaxum::extractPerforms HTTP → WebSocket protocol upgrade
Connection handlerasync fn per clientSpawned via tokio::spawn for each connection
RpcServiceEndpointGeneric structRoutes method_id to service methods via HashMap
Method dispatcherHashMap<u64, Box<dyn Fn>>O(1) lookup and async invocation of methods
Service implementationArc<ServiceImpl>Shared DataStore instance across connections

Request Processing Pipeline

Each incoming request follows this pipeline:

Server Request Processing Pipeline with Code Entities

The dispatcher performs O(1) method lookup using the method_id hash from the HashMap, then invokes the corresponding service implementation. All service methods use #[async_trait], allowing concurrent request handling. The use of Arc<ServiceImpl> enables safe sharing of the DataStore across multiple client connections.

Sources: Cargo.lock:1320-1336 experiments/simd-r-drive-ws-server/Cargo.toml16 Cargo.lock:305-340


Request/Response Flow

Complete RPC Call Sequence

End-to-End RPC Call Flow

Message Format

The Muxio RPC wire protocol uses WebSocket binary frames with bitcode-encoded messages. The exact frame structure is managed by the muxio framework, but the logical message structure is:

ComponentEncodingDescription
Request messagebitcodeContains request_id, method_id, and method arguments
Response messagebitcodeContains request_id and result (success/error)
WebSocket frameBinarySingle frame per request/response for small messages
FragmentationAutomaticLarge payloads may use multiple frames

The use of WebSocket binary frames and bitcode serialization provides:

  • Compact encoding : Smaller than JSON or MessagePack
  • Zero-copy potential : bitcode can deserialize without copying
  • Type safety : Compile-time verification of message structure

Sources: Cargo.lock:133-143 Cargo.lock:648-656 Cargo.lock:1213-1222


Error Handling

The framework provides comprehensive error handling across the RPC boundary:

RPC Error Classification and Propagation

Error Categories

CategoryOriginHandling
Serialization errorsBitcode encoding/decoding failureLogged and returned as RpcError
Network errorsWebSocket connection issuesAutomatic reconnect or error propagation
Application errorsDataStore operation failuresSerialized and returned to client
Timeout errorsRequest took too longClient-side timeout with error result

Error Recovery

The framework implements several recovery strategies:

  • Connection loss : Client automatically attempts reconnection
  • Request timeout : Client cancels pending request after configured duration
  • Serialization failure : Error logged and generic error returned
  • Invalid method ID : Server returns “method not found” error

Sources: Cargo.lock:1261-1336


Performance Characteristics

The Muxio RPC framework is optimized for high-performance remote storage access:

MetricCharacteristicImpact
Serialization overhead~50-100 ns for typical payloadsMinimal CPU impact
Request multiplexingThousands of concurrent requestsHigh throughput
Binary protocolCompact wire formatReduced bandwidth usage
Zero-copy deserializationDirect memory referencesLower latency for large payloads

The use of bitcode serialization and WebSocket binary frames minimizes overhead compared to text-based protocols like JSON over HTTP. The multiplexed architecture allows clients to issue multiple concurrent requests without blocking, essential for high-performance batch operations.

Sources: Cargo.lock:392-414 Cargo.lock:1250-1336

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Native Rust Client

Loading…

Native Rust Client

Relevant source files

Purpose and Scope

The simd-r-drive-ws-client crate provides a native Rust client library for remote access to the SIMD R Drive storage engine via WebSocket connections. This client enables Rust applications to interact with a remote DataStore instance through the Muxio RPC framework with bitcode serialization.

This document covers the native Rust client implementation. For the WebSocket server that this client connects to, see WebSocket Server. For the Muxio RPC protocol details, see Muxio RPC Framework. For the Python bindings that wrap this client, see Python WebSocket Client API.

Sources: experiments/simd-r-drive-ws-client/Cargo.toml:1-22

Architecture Overview

The native Rust client is structured as a thin wrapper around the Muxio RPC client infrastructure, providing type-safe access to remote DataStore operations.

Key Components:

graph TB
    subgraph "Application Layer"
        UserApp["User Application\nRust Code"]
end
    
    subgraph "simd-r-drive-ws-client Crate"
        ClientAPI["Client API\nDataStoreReader/Writer Traits"]
WsClient["WebSocket Client\nConnection Management"]
end
    
    subgraph "RPC Infrastructure"
        ServiceCaller["muxio-rpc-service-caller\nMethod Invocation"]
TokioRpcClient["muxio-tokio-rpc-client\nTransport Layer"]
end
    
    subgraph "Shared Contract"
        ServiceDef["simd-r-drive-muxio-service-definition\nRPC Interface"]
Bitcode["bitcode\nSerialization"]
end
    
    subgraph "Network"
        WsConnection["WebSocket Connection\ntokio-tungstenite"]
end
    
 
   UserApp --> ClientAPI
 
   ClientAPI --> WsClient
 
   WsClient --> ServiceCaller
 
   ServiceCaller --> TokioRpcClient
 
   ServiceCaller --> ServiceDef
 
   TokioRpcClient --> Bitcode
 
   TokioRpcClient --> WsConnection
    
    style ClientAPI fill:#f9f9f9,stroke:#333,stroke-width:2px
    style ServiceDef fill:#f9f9f9,stroke:#333,stroke-width:2px
ComponentCratePurpose
Client APIsimd-r-drive-ws-clientPublic interface implementing DataStore traits
Service Callermuxio-rpc-service-callerRPC method invocation and request routing
RPC Clientmuxio-tokio-rpc-clientWebSocket transport and message handling
Service Definitionsimd-r-drive-muxio-service-definitionShared RPC contract and type definitions
Async RuntimetokioAsynchronous I/O and task execution

Sources: experiments/simd-r-drive-ws-client/Cargo.toml:13-21 Cargo.lock:1302-1318

Client API Structure

The client implements the same DataStoreReader and DataStoreWriter traits as the local DataStore, enabling transparent remote access with minimal API differences.

Core Traits:

graph LR
    subgraph "Trait Implementations"
        Reader["DataStoreReader\nread()\nexists()\nbatch_read()"]
Writer["DataStoreWriter\nwrite()\ndelete()\nbatch_write()"]
end
    
    subgraph "Client Implementation"
        WsClient["WebSocket Client\nAsync Methods"]
ConnState["Connection State\nURL, Options"]
end
    
    subgraph "RPC Layer"
        Serializer["Request Serialization\nbitcode"]
Caller["Service Caller\nCall Routing"]
Deserializer["Response Deserialization\nbitcode"]
end
    
 
   Reader --> WsClient
 
   Writer --> WsClient
 
   WsClient --> ConnState
 
   WsClient --> Serializer
 
   Serializer --> Caller
 
   Caller --> Deserializer
    
    style Reader fill:#f9f9f9,stroke:#333,stroke-width:2px
    style Writer fill:#f9f9f9,stroke:#333,stroke-width:2px
  • DataStoreReader : Read-only operations (read, exists, batch_read, iteration)
  • DataStoreWriter : Write operations (write, delete, batch_write)
  • async-trait : All methods are asynchronous, requiring a Tokio runtime

Sources: experiments/simd-r-drive-ws-client/Cargo.toml:14-21 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-16

Connection Management

The client manages persistent WebSocket connections to the remote server with automatic reconnection and error handling.

Connection Lifecycle:

sequenceDiagram
    participant App as "Application"
    participant Client as "WebSocket Client"
    participant Transport as "muxio-tokio-rpc-client"
    participant Server as "Remote Server"
    
    Note over App,Server: Connection Establishment
    App->>Client: connect(url)
    Client->>Transport: create WebSocket connection
    Transport->>Server: WebSocket handshake
    Server-->>Transport: connection established
    Transport-->>Client: client ready
    Client-->>App: connected client
    
    Note over App,Server: Normal Operation
    App->>Client: read(key)
    Client->>Transport: serialize request
    Transport->>Server: send via WebSocket
    Server-->>Transport: response data
    Transport-->>Client: deserialize response
    Client-->>App: return result
    
    Note over App,Server: Error Handling
    Server-->>Transport: connection lost
    Transport-->>Client: connection error
    Client->>Transport: reconnection attempt
    Transport->>Server: reconnect
  1. Initialization : Client connects to server URL with connection options
  2. Authentication : Optional authentication via Muxio RPC mechanisms
  3. Active State : Client maintains persistent WebSocket connection
  4. Error Recovery : Automatic reconnection on transient failures
  5. Shutdown : Graceful connection termination

Sources: Cargo.lock:1302-1318 experiments/simd-r-drive-ws-client/Cargo.toml:16-19

graph TB
    subgraph "Client Side"
        Method["Client Method Call\nread/write/delete"]
ReqBuilder["Request Builder\nCreate RPC Request"]
Serializer["bitcode Serialization\nBinary Encoding"]
Sender["WebSocket Send\nBinary Frame"]
end
    
    subgraph "Network"
        WsFrame["WebSocket Frame\nBinary Message"]
end
    
    subgraph "Server Side"
        Receiver["WebSocket Receive\nBinary Frame"]
Deserializer["bitcode Deserialization\nBinary Decoding"]
Handler["Request Handler\nExecute DataStore Operation"]
Response["Response Builder\nCreate RPC Response"]
end
    
 
   Method --> ReqBuilder
 
   ReqBuilder --> Serializer
 
   Serializer --> Sender
 
   Sender --> WsFrame
 
   WsFrame --> Receiver
 
   Receiver --> Deserializer
 
   Deserializer --> Handler
 
   Handler --> Response
 
   Response --> Serializer
    
    style Method fill:#f9f9f9,stroke:#333,stroke-width:2px
    style Handler fill:#f9f9f9,stroke:#333,stroke-width:2px

Request-Response Flow

All client operations follow a standardized request-response pattern through the Muxio RPC framework.

Request Structure:

FieldTypeDescription
Method IDu64XXH3 hash of method name from service definition
PayloadVecBitcode-serialized request parameters
Request IDu64Unique identifier for request-response matching

Response Structure:

FieldTypeDescription
Request IDu64Matches original request ID
StatusenumSuccess, Error, or specific error codes
PayloadVecBitcode-serialized response data or error

Sources: experiments/simd-r-drive-muxio-service-definition/Cargo.toml:14-15 Cargo.lock:392-402

Async Runtime Requirements

The client requires a Tokio async runtime for all operations. The async-trait crate enables async methods in trait implementations.

Runtime Configuration:

graph TB
    subgraph "Application"
        Main["#[tokio::main]\nasync fn main()"]
UserCode["User Code\nawait client.read()"]
end
    
    subgraph "Client"
        AsyncMethods["async-trait Methods\nDataStoreReader/Writer"]
TokioTasks["Tokio Tasks\nNetwork I/O"]
end
    
    subgraph "Tokio Runtime"
        Executor["Task Executor\nWork Stealing Scheduler"]
Reactor["I/O Reactor\nepoll/kqueue/IOCP"]
end
    
 
   Main --> UserCode
 
   UserCode --> AsyncMethods
 
   AsyncMethods --> TokioTasks
 
   TokioTasks --> Executor
 
   TokioTasks --> Reactor
    
    style AsyncMethods fill:#f9f9f9,stroke:#333,stroke-width:2px
    style Executor fill:#f9f9f9,stroke:#333,stroke-width:2px
  • Multi-threaded Runtime : Default for concurrent operations
  • Current-thread Runtime : Available for single-threaded use cases
  • Feature Flags : Requires tokio with rt-multi-thread and net features

Sources: experiments/simd-r-drive-ws-client/Cargo.toml:19-21 Cargo.lock:279-287

Error Handling

The client propagates errors from multiple layers of the stack, providing detailed error information for debugging and recovery.

Error Types:

Error CategorySourceDescription
Connection Errorsmuxio-tokio-rpc-clientWebSocket connection failures, timeouts
Serialization ErrorsbitcodeInvalid data encoding/decoding
RPC Errorsmuxio-rpc-serviceService method errors, invalid requests
DataStore ErrorsRemote DataStoreStorage operation failures (key not found, write errors)

Error Propagation Flow:

Sources: experiments/simd-r-drive-ws-client/Cargo.toml:17-18 Cargo.lock:1261-1271

Usage Patterns

Basic Connection and Operations

The client follows standard Rust async patterns for initialization and operation:

Concurrent Operations

The client supports concurrent operations through standard Tokio concurrency primitives:

Sources: experiments/simd-r-drive-ws-client/Cargo.toml:19-21

graph TB
    subgraph "Service Definition"
        Methods["Method Definitions\nread, write, delete, etc."]
Types["Request/Response Types\nbitcode derive macros"]
MethodHash["Method ID Hashing\nXXH3 of method names"]
end
    
    subgraph "Client Usage"
        ClientImpl["Client Implementation\nUses defined methods"]
TypeSafety["Type Safety\nCompile-time checking"]
end
    
    subgraph "Server Usage"
        ServerImpl["Server Implementation\nHandles defined methods"]
Routing["Request Routing\nHash-based dispatch"]
end
    
 
   Methods --> ClientImpl
 
   Methods --> ServerImpl
 
   Types --> ClientImpl
 
   Types --> ServerImpl
 
   MethodHash --> Routing
 
   ClientImpl --> TypeSafety
    
    style Methods fill:#f9f9f9,stroke:#333,stroke-width:2px
    style Types fill:#f9f9f9,stroke:#333,stroke-width:2px

Integration with Service Definition

The client relies on the shared service definition crate for type-safe RPC communication.

Shared Contract Benefits:

  • Type Safety : Compile-time verification of request/response types
  • Version Compatibility : Client and server must use compatible service definitions
  • Method Resolution : XXH3 hash-based method identification
  • Serialization Schema : Consistent bitcode encoding across client and server

Sources: experiments/simd-r-drive-ws-client/Cargo.toml15 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-16

Performance Considerations

The native Rust client provides several performance advantages over alternative approaches:

Performance Characteristics:

AspectImplementationBenefit
Serializationbitcode binary encodingMinimal overhead, faster than JSON/MessagePack
ConnectionPersistent WebSocketAvoids HTTP handshake overhead
Async I/OTokio zero-copy operationsEfficient memory usage
Type SafetyCompile-time genericsZero runtime type checking cost
MultiplexingMuxio request pipeliningMultiple concurrent requests per connection

Memory Efficiency:

  • Zero-copy where possible through bitcode and WebSocket frames
  • Efficient buffer reuse in Tokio’s I/O layer
  • Minimal allocation overhead compared to HTTP-based protocols

Throughput:

  • Supports request pipelining for high-throughput workloads
  • Concurrent operations through Tokio’s work-stealing scheduler
  • Batch operations reduce round-trip overhead

Sources: Cargo.lock:392-402 Cargo.lock:1302-1318

Comparison with Direct Access

The WebSocket client provides remote access with different tradeoffs compared to direct DataStore usage:

FeatureDirect DataStoreWebSocket Client
Access PatternLocal file I/ONetwork I/O over WebSocket
Zero-Copy ReadsYes (via mmap)No (serialized over network)
LatencyMicrosecondsMilliseconds (network dependent)
ConcurrencyMulti-process safeNetwork-limited
DeploymentSingle machineDistributed architecture
SecurityFile system permissionsNetwork authentication

When to Use the Client:

  • Remote access to centralized storage
  • Microservice architectures requiring shared state
  • Language interoperability (via Python bindings)
  • Isolation of storage from compute workloads

When to Use Direct Access:

  • Single-machine deployments
  • Latency-critical applications
  • Maximum throughput requirements
  • Zero-copy read performance needs

Sources: experiments/simd-r-drive-ws-client/Cargo.toml14

Logging and Debugging

The client uses the tracing crate for structured logging and diagnostics.

Logging Levels:

  • TRACE : Detailed RPC message contents and serialization
  • DEBUG : Connection state changes, request/response flow
  • INFO : Connection establishment, disconnection events
  • WARN : Recoverable errors, retry attempts
  • ERROR : Unrecoverable errors, connection failures

Diagnostic Information:

  • Request/response timing
  • Serialized message sizes
  • Connection state transitions
  • Error context and stack traces

Sources: experiments/simd-r-drive-ws-client/Cargo.toml20 Cargo.lock:279-287

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Python Integration

Loading…

Python Integration

Relevant source files

This page provides an overview of the Python bindings for SIMD R Drive. The system offers two approaches for Python integration:

  1. Modern WebSocket Client (simd-r-drive-ws-client-py): Communicates with a remote simd-r-drive-ws-server via WebSocket RPC. This is the primary, recommended approach documented in this section.
  2. Legacy Direct Bindings (simd-r-drive-py): Directly embeds the Rust storage engine into Python. This approach is deprecated and not covered in detail here.

The WebSocket client bindings are implemented in Rust using PyO3, compiled to native Python extension modules (.so/.pyd), and distributed as platform-specific wheels via Maturin. The package is published as simd-r-drive-ws-client on PyPI.

Related Pages:

Sources: experiments/bindings/python-ws-client/README.md:1-60 experiments/bindings/python-ws-client/pyproject.toml:1-6

Architecture Overview

The WebSocket client bindings use a layered architecture that bridges Python user code to the native Rust WebSocket client implementation. The package consists of pure Python wrapper code, PyO3-compiled Rust bindings, and the underlying simd-r-drive-ws-client Rust crate.

Diagram: Python Binding Architecture with Code Entities

graph TB
    subgraph "Python_Layer"
        UserCode["user_script.py"]
Import["from simd_r_drive_ws_client import DataStoreWsClient"]
end
    
    subgraph "Package_simd_r_drive_ws_client"
        InitPy["__init__.py"]
DataStoreWsClientPy["data_store_ws_client.py::DataStoreWsClient"]
TypeStubs["data_store_ws_client.pyi"]
end
    
    subgraph "PyO3_Native_Extension"
        BinaryModule["simd_r_drive_ws_client.so / .pyd"]
BaseDataStoreWsClient["BaseDataStoreWsClient"]
NamespaceHasher["NamespaceHasher"]
end
    
    subgraph "Rust_Dependencies"
        WsClient["simd-r-drive-ws-client crate"]
MuxioClient["muxio-tokio-rpc-client"]
ServiceDef["simd-r-drive-muxio-service-definition"]
end
    
 
   UserCode --> Import
 
   Import --> InitPy
 
   InitPy --> DataStoreWsClientPy
    DataStoreWsClientPy -.inherits.-> BaseDataStoreWsClient
    DataStoreWsClientPy -.types.-> TypeStubs
    
 
   BaseDataStoreWsClient --> WsClient
 
   NamespaceHasher --> WsClient
 
   BinaryModule --> BaseDataStoreWsClient
 
   BinaryModule --> NamespaceHasher
    
 
   WsClient --> MuxioClient
 
   WsClient --> ServiceDef

Architecture Layers

LayerComponentsTechnologyLocation
Python User CodeApplication scriptsPure PythonUser-provided
Python PackageDataStoreWsClient, __init__.pyPure Pythonexperiments/bindings/python-ws-client/simd_r_drive_ws_client/
PyO3 BindingsBaseDataStoreWsClient, NamespaceHasherRust → compiled .so/.pydexperiments/bindings/python-ws-client/src/lib.rs
Rust Implementationsimd-r-drive-ws-client, muxio-*Native Rust cratesexperiments/ws-client/

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py:1-14 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:1-63 experiments/bindings/python-ws-client/README.md:12-15

Python API Surface

The simd-r-drive-ws-client package exposes two primary classes:

  1. DataStoreWsClient - Main client for read/write operations
  2. NamespaceHasher - Utility for generating collision-free namespaced keys
graph TB
    subgraph "Python_Wrapper"
        DSWsClient["data_store_ws_client.py::DataStoreWsClient"]
NSHasher["NamespaceHasher"]
end
    
    subgraph "PyO3_Bindings"
        BaseClient["BaseDataStoreWsClient"]
NSHasherImpl["NamespaceHasher_impl"]
end
    
    subgraph "Method_Sources"
        RustMethods["write()\nbatch_write()\ndelete()\nread()\nbatch_read()\nexists()\n__len__()\n__contains__()\nis_empty()\nfile_size()"]
PythonMethods["batch_read_structured()"]
end
    
 
   DSWsClient -->|inherits| BaseClient
 
   NSHasher -->|exposed via PyO3| NSHasherImpl
    
 
   BaseClient --> RustMethods
 
   DSWsClient --> PythonMethods
    
    RustMethods -.implemented in.-> WsClientCrate["simd-r-drive-ws-client"]

The API is implemented through a combination of Rust PyO3 bindings (BaseDataStoreWsClient) and Python wrapper code that adds convenience methods.

Diagram: Class Hierarchy and Method Implementation

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:11-63 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:8-219

Core Operations

DataStoreWsClient provides operations organized by implementation layer:

Operation TypeMethodsImplementationFile Reference
Write Operationswrite(), batch_write(), delete()Rust (BaseDataStoreWsClient)data_store_ws_client.pyi:27-141
Read Operationsread(), batch_read(), exists()Rust (BaseDataStoreWsClient)data_store_ws_client.pyi:53-107
Metadata Operations__len__(), __contains__(), is_empty(), file_size()Rust (BaseDataStoreWsClient)data_store_ws_client.pyi:143-168
Structured Readsbatch_read_structured()Python wrapperdata_store_ws_client.py:12-62

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:27-168 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:11-63

Python-Rust Method Mapping

Diagram: Method Call Flow from Python to Rust

The batch_read_structured() method demonstrates the hybrid approach:

StepLayerAction
1. DecompilePythonExtract flat list of keys from nested dict/list structure
2. Batch readRustCall fast batch_read() via PyO3
3. RebuildPythonReconstruct original structure with fetched values

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:12-62 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:109-129

PyO3 Binding Architecture

PyO3 provides the FFI layer that exposes Rust structs and methods as Python classes. The binding layer uses PyO3 procedural macros (#[pyclass], #[pymethods]) to generate Python-compatible wrappers around Rust types.

Diagram: PyO3 Macro Transformation Pipeline

graph TB
    subgraph "Rust_Source_Code"
        StructDef["#[pyclass]\nstruct BaseDataStoreWsClient"]
MethodsDef["#[pymethods]\nimpl BaseDataStoreWsClient"]
NSStruct["#[pyclass]\nstruct NamespaceHasher"]
NSMethods["#[pymethods]\nimpl NamespaceHasher"]
end
    
    subgraph "PyO3_Macro_Expansion"
        PyClassTrait["PyClass trait\nPyTypeInfo\nPyObjectProtocol"]
PyMethodsWrap["Method wrappers\nPyArg extraction\nResult conversion"]
end
    
    subgraph "Python_Extension_Module"
        PythonClass["BaseDataStoreWsClient\nwrite()\nread()\nbatch_write()"]
PythonNS["NamespaceHasher\n__init__()\nnamespace()"]
end
    
 
   StructDef --> PyClassTrait
 
   MethodsDef --> PyMethodsWrap
 
   NSStruct --> PyClassTrait
 
   NSMethods --> PyMethodsWrap
    
 
   PyClassTrait --> PythonClass
 
   PyMethodsWrap --> PythonClass
 
   PyClassTrait --> PythonNS
 
   PyMethodsWrap --> PythonNS

PyO3 Macro Functions

MacroPurposeGenerated Code
#[pyclass]Mark Rust struct as Python classImplements PyTypeInfo, PyClass, reference counting
#[pymethods]Expose Rust methods to PythonGenerates wrapper functions with argument extraction and error handling
#[pyfunction]Expose standalone Rust functionsModule-level function bindings

Sources: experiments/bindings/python-ws-client/Cargo.lock:832-846 experiments/bindings/python-ws-client/Cargo.lock:1096-1108

graph TB
    subgraph "Python_Async_Layer"
        PyAsyncCall["await client.write(key, data)"]
PyEventLoop["asyncio event loop"]
end
    
    subgraph "pyo3_async_runtimes_Bridge"
        Bridge["pyo3_async_runtimes::tokio"]
FutureConv["Future<Output=T> → PyObject"]
LocalSet["LocalSet spawning"]
end
    
    subgraph "Tokio_Runtime"
        TokioFuture["async fn write() → Future"]
TokioExecutor["Tokio thread pool"]
end
    
 
   PyAsyncCall --> PyEventLoop
 
   PyEventLoop --> Bridge
 
   Bridge --> FutureConv
 
   FutureConv --> LocalSet
 
   LocalSet --> TokioFuture
 
   TokioFuture --> TokioExecutor

Async Runtime Bridge

The Python bindings use pyo3-async-runtimes to bridge Python’s async/await model with Rust’s Tokio runtime. This enables Python code to call async Rust methods transparently.

Diagram: Python-Tokio Async Bridge

Runtime Bridge Components

ComponentCrateFunction
pyo3-async-runtimesCargo.lock:849-860Async bridge between Python and Tokio
tokioCargo.lock:1287-1308Rust async runtime
PyO3Cargo.lock:832-846FFI layer for Python-Rust interop

The bridge automatically converts Rust Future<Output=T> values to Python awaitables, handling the differences in execution models between Python’s single-threaded async and Tokio’s work-stealing scheduler.

Sources: experiments/bindings/python-ws-client/Cargo.lock:849-860 experiments/bindings/python-ws-client/Cargo.lock:1287-1308

graph LR
    subgraph "Configuration"
        PyProject["pyproject.toml\n[build-system]\nbuild-backend = maturin"]
CargoToml["Cargo.toml\n[lib]\ncrate-type = ['cdylib']"]
end
    
    subgraph "Build_Process"
        RustcCompile["rustc\n--crate-type=cdylib\nPyO3 linking"]
CreateExtension["simd_r_drive_ws_client.so\nor .pyd"]
PackageWheel["maturin build\nAdd Python files\nAdd metadata"]
end
    
    subgraph "Artifacts"
        Wheel["simd_r_drive_ws_client-0.11.1-cp310-linux_x86_64.whl"]
PyPI["PyPI\npip install simd-r-drive-ws-client"]
end
    
 
   PyProject --> RustcCompile
 
   CargoToml --> RustcCompile
 
   RustcCompile --> CreateExtension
 
   CreateExtension --> PackageWheel
 
   PackageWheel --> Wheel
 
   Wheel --> PyPI

Build and Distribution System

The Python package is built using Maturin, which compiles Rust code to native extensions and packages them as platform-specific wheels. The build process produces binary wheels containing the compiled .so (Linux/macOS) or .pyd (Windows) extension module.

Diagram: Maturin Build and Distribution Pipeline

Sources: experiments/bindings/python-ws-client/pyproject.toml:29-35 experiments/bindings/python-ws-client/README.md:25-38

Build Configuration

pyproject.toml configures the build system and package metadata:

SectionLinesConfiguration
[project]pyproject.toml:1-27Package name, version, description, PyPI classifiers
[build-system]pyproject.toml:29-31requires = ["maturin>=1.5"], build-backend = "maturin"
[tool.maturin]pyproject.toml:33-35bindings = "pyo3", requires-python = ">=3.10"
[dependency-groups]pyproject.toml:37-46Development dependencies: maturin, pytest, mypy, numpy

Build Commands

Sources: experiments/bindings/python-ws-client/pyproject.toml:1-47 experiments/bindings/python-ws-client/README.md:31-36

Platform and Python Version Support

The package is distributed as pre-compiled wheels for multiple Python versions and platforms.

Supported Configurations

ComponentSupported Versions/Platforms
Python3.10, 3.11, 3.12, 3.13 (CPython only)
Operating SystemsLinux (x86_64, aarch64), macOS (x86_64, arm64), Windows (x86_64)
Architectures64-bit only

Wheel Naming Convention

simd_r_drive_ws_client-{version}-{python_tag}-{platform_tag}.whl

Examples:
- simd_r_drive_ws_client-0.11.1-cp310-cp310-manylinux_2_17_x86_64.whl
- simd_r_drive_ws_client-0.11.1-cp312-cp312-macosx_11_0_arm64.whl
- simd_r_drive_ws_client-0.11.1-cp313-cp313-win_amd64.whl

Sources: experiments/bindings/python-ws-client/pyproject.toml:19-27 experiments/bindings/python-ws-client/README.md:18-23

Dependency Management

The package uses uv for Python dependency management and cargo for Rust dependencies. Runtime Python dependencies are zero—all Rust dependencies are statically compiled into the wheel.

Dependency Categories

CategoryToolsLock FilePurpose
Python Developmentuv pip, pytest, mypyuv.lock:1-299Testing, type checking, benchmarking
Rust DependenciescargoCargo.lock:1-1380Core functionality, WebSocket RPC, serialization
Build ToolsmaturinBoth lock filesCompiles Rust → Python extension

Key Development Dependencies

Key Rust Dependencies

CrateVersionPurpose
pyo3Cargo.lock:832-846Python FFI
pyo3-async-runtimesCargo.lock:849-860Async bridge
tokioCargo.lock:1287-1308Async runtime
simd-r-drive-ws-client(workspace)WebSocket RPC client

Sources: experiments/bindings/python-ws-client/pyproject.toml:37-46 experiments/bindings/python-ws-client/uv.lock:1-299 experiments/bindings/python-ws-client/Cargo.lock:1-1380

Type Stubs and IDE Support

The package includes .pyi type stub files that provide complete type information for IDEs and static type checkers like mypy.

Type Stub File:data_store_ws_client.pyi

Type Stub Features

FeatureDescriptionExample
Full signaturesComplete method signatures with typesdef write(self, key: bytes, data: bytes) -> None
DocstringsComprehensive documentationdata_store_ws_client.pyi:27-94
Generic typesSupport for complex typesUnion[Dict[Any, bytes], List[Dict[Any, bytes]]]
Final classesPrevent subclassing@final class DataStoreWsClient

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219

graph LR
    Input1["Namespace prefix\ne.g., b'users'"]
Input2["Key\ne.g., b'user123'"]
Hash1["XXH3 hash\n8 bytes"]
Hash2["XXH3 hash\n8 bytes"]
Output["Namespaced key\n16 bytes total"]
Input1 -->|hash once at init| Hash1
 
   Input2 -->|hash per call| Hash2
 
   Hash1 --> Output
 
   Hash2 --> Output
graph LR
    subgraph "Input"
        Prefix["prefix: bytes\ne.g. b'users'"]
Key["key: bytes\ne.g. b'user123'"]
end
    
    subgraph "Hashing"
        XXH3_Prefix["XXH3(prefix)"]
XXH3_Key["XXH3(key)"]
end
    
    subgraph "Output"
        PrefixHash["8 bytes\nprefix_hash"]
KeyHash["8 bytes\nkey_hash"]
Combined["16 bytes total\nprefix_hash // key_hash"]
end
    
 
   Prefix --> XXH3_Prefix
 
   Key --> XXH3_Key
 
   XXH3_Prefix --> PrefixHash
 
   XXH3_Key --> KeyHash
 
   PrefixHash --> Combined
 
   KeyHash --> Combined

NamespaceHasher Utility

NamespaceHasher provides deterministic key namespacing using XXH3 hashing to prevent key collisions across logical domains.

Diagram: Namespace Key Derivation

Usage Example

Key Properties

PropertyValueDescription
Output length16 bytesFixed-size namespaced key
Hash functionXXH3Fast, high-quality 64-bit hash
Collision resistanceHighXXH3 provides strong distribution
DeterministicYesSame input always produces same output

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:170-219 src/storage_engine/key_indexer.rs:64-72

graph TB
    subgraph "integration_test.sh"
        Setup["cd experiments/\nBuild if needed"]
StartServer["cargo run --package\nsimd-r-drive-ws-server\n/tmp/simd-r-drive-pytest-storage.bin\n--host 127.0.0.1 --port 34129"]
SetupPython["uv venv\nuv pip install pytest maturin\nuv pip install -e . --group dev"]
ExtractTests["python extract_readme_tests.py"]
RunPytest["pytest -v -s\nTEST_SERVER_HOST=127.0.0.1\nTEST_SERVER_PORT=34129"]
Cleanup["kill -9 $SERVER_PID\nrm /tmp/simd-r-drive-pytest-storage.bin"]
end
    
 
   Setup --> StartServer
 
   StartServer --> SetupPython
 
   SetupPython --> ExtractTests
 
   ExtractTests --> RunPytest
 
   RunPytest --> Cleanup

Integration Test Infrastructure

The Python bindings include comprehensive integration tests that validate the entire stack, from Python client code to the WebSocket server and storage engine.

Diagram: Integration Test Workflow

Sources: experiments/bindings/python-ws-client/integration_test.sh:1-91

Test Components

The test infrastructure consists of multiple components working together:

ComponentFilePurpose
Integration script integration_test.sh1-91Orchestrates full-stack test execution
README test extractor extract_readme_tests.py1-46Converts README code blocks to pytest functions
Generated teststests/test_readme_blocks.pyExecutable tests from README examples
Manual teststests/test_*.pyHand-written unit and integration tests

Sources: experiments/bindings/python-ws-client/integration_test.sh:1-91 experiments/bindings/python-ws-client/extract_readme_tests.py:1-46

graph LR
    subgraph "Input_File"
        README["README.md"]
CodeBlocks["```python\ncode\n```"]
end
    
    subgraph "Extraction_Logic"
        Regex["re.compile(r'```python\\n(.*?)```', re.DOTALL)"]
Extract["pattern.findall(text)"]
Wrap["def test_readme_block_{i}():\n {indented_code}"]
end
    
    subgraph "Output_File"
        TestFile["tests/test_readme_blocks.py"]
TestFunctions["test_readme_block_0()\ntest_readme_block_1()\ntest_readme_block_N()"]
end
    
 
   README --> CodeBlocks
 
   CodeBlocks --> Regex
 
   Regex --> Extract
 
   Extract --> Wrap
 
   Wrap --> TestFile
 
   TestFile --> TestFunctions

README Test Extraction

extract_readme_tests.py automatically extracts Python code blocks from the README and generates pytest test functions, ensuring documentation examples remain accurate.

Diagram: README to Pytest Pipeline

Extraction Process

StepFunctionAction
1. ReadREADME.read_text()Load README.md as string
2. Extractre.findall(r'```python\n(.*?)```')Find all Python code blocks
3. Wrapwrap_as_test_fn(code, idx)Convert each block to test_readme_block_N()
4. WriteTEST_FILE.write_text()Write tests/test_readme_blocks.py

This ensures documentation examples are automatically validated on every test run, preventing drift between documentation and implementation.

Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:14-45


graph TB
    subgraph "Internal Modules"
        RustBinary["simd_r_drive_ws_client_py.so/.pyd\nBinary compiled module"]
RustSymbols["BaseDataStoreWsClient\nNamespaceHasher\nsetup_logging\ntest_rust_logging"]
PythonWrapper["data_store_ws_client.py\nDataStoreWsClient"]
end
    
    subgraph "Package __init__.py"
        ImportRust["from .simd_r_drive_ws_client import\n setup_logging, test_rust_logging"]
ImportPython["from .data_store_ws_client import\n DataStoreWsClient, NamespaceHasher"]
AllList["__all__ = [\n 'DataStoreWsClient',\n 'NamespaceHasher',\n 'setup_logging',\n 'test_rust_logging'\n]"]
end
    
    subgraph "Public API"
        UserCode["from simd_r_drive_ws_client import DataStoreWsClient"]
end
    
 
   RustBinary --> RustSymbols
 
   RustSymbols --> ImportRust
 
   PythonWrapper --> ImportPython
    
 
   ImportRust --> AllList
 
   ImportPython --> AllList
 
   AllList --> UserCode

Package Exports and Public API

The package’s public API is defined through the __init__.py file, which controls what symbols are available when users import the package.

Export Structure

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py:1-14

The __all__ list explicitly defines the public API surface, preventing internal implementation details from being accidentally imported by users. This follows Python best practices for package design.

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Python WebSocket Client API

Loading…

Python WebSocket Client API

Relevant source files

Purpose and Scope

This document describes the Python WebSocket client API for remote access to SIMD R Drive storage. The API provides idiomatic Python interfaces backed by high-performance Rust implementations via PyO3 bindings. This page covers the DataStoreWsClient class, NamespaceHasher utility, and their usage patterns.

For information about building and installing the Python bindings, see Building Python Bindings. For details about the underlying native Rust WebSocket client, see Native Rust Client. For server-side configuration and deployment, see WebSocket Server.

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py:1-14 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219

Architecture Overview

The Python WebSocket client uses a multi-layer architecture that bridges Python’s async/await with Rust’s Tokio runtime while maintaining idiomatic Python APIs.

graph TB
    UserCode["Python User Code\nimport simd_r_drive_ws_client"]
DataStoreWsClient["DataStoreWsClient\nPython wrapper class"]
BaseDataStoreWsClient["BaseDataStoreWsClient\nPyO3 #[pyclass]"]
PyO3FFI["PyO3 FFI Layer\npyo3-async-runtimes"]
RustClient["simd-r-drive-ws-client\nNative Rust implementation"]
MuxioRPC["muxio-tokio-rpc-client\nWebSocket + RPC"]
Server["simd-r-drive-ws-server\nRemote DataStore"]
UserCode --> DataStoreWsClient
 
   DataStoreWsClient --> BaseDataStoreWsClient
 
   BaseDataStoreWsClient --> PyO3FFI
 
   PyO3FFI --> RustClient
 
   RustClient --> MuxioRPC
 
   MuxioRPC --> Server

Python Integration Stack

Sources: experiments/bindings/python-ws-client/Cargo.lock:1096-1108 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:1-10

Class Hierarchy

The BaseDataStoreWsClient class is implemented in Rust and exposes core storage operations through PyO3. The DataStoreWsClient Python class extends it with additional convenience methods implemented in pure Python.

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-10 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:11-62

DataStoreWsClient Class

The DataStoreWsClient class provides the primary interface for interacting with a remote SIMD R Drive storage engine over WebSocket connections.

Connection Initialization

ConstructorDescription
__init__(host: str, port: int)Establishes WebSocket connection to the specified server

The constructor creates a WebSocket connection to the remote storage server. Connection establishment is synchronous and will raise an exception if the server is unreachable.

Example:

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:17-25

Write Operations

MethodParametersDescription
write(key, data)key: bytes, data: bytesAppends single key/value pair
batch_write(items)items: list[tuple[bytes, bytes]]Writes multiple pairs in one operation

Write operations are append-only and atomic. If a key already exists, writing to it creates a new version while the old data remains on disk (marked as superseded via the index).

Example:

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:27-51

Read Operations

MethodParametersReturn TypeCopy Behavior
read(key)key: bytesOptional[bytes]Performs memory copy
batch_read(keys)keys: list[bytes]list[Optional[bytes]]Performs memory copy
batch_read_structured(data)data: dict or list[dict]Same structure with valuesPython-side wrapper

The read and batch_read methods perform memory copies when returning data. For zero-copy access patterns, the native Rust client provides read_entry methods that return memory-mapped views.

The batch_read_structured method is a Python convenience wrapper that:

  1. Accepts dictionaries or lists of dictionaries where values are datastore keys
  2. Flattens the structure into a single key list
  3. Calls batch_read for efficient parallel fetching
  4. Reconstructs the original structure with fetched values

Example:

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:79-129 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:12-62

Deletion and Existence Checks

MethodParametersReturn TypeDescription
delete(key)key: bytesNoneMarks key as deleted (tombstone)
exists(key)key: bytesboolChecks if key is active
__contains__(key)key: bytesboolPython in operator support

Deletion is logical, not physical. The delete method appends a tombstone entry to the storage file. The physical data remains on disk but is no longer accessible through reads.

Example:

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:53-77 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:131-141

Utility Methods

MethodReturn TypeDescription
__len__()intReturns count of active entries
is_empty()boolChecks if store has any active keys
file_size()intReturns physical file size on disk

Example:

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:143-168

NamespaceHasher Utility

The NamespaceHasher class provides deterministic key namespacing using XXH3 hashing to prevent key collisions across logical domains.

graph LR
    Input1["Namespace prefix\ne.g., b'users'"]
Input2["Key\ne.g., b'user123'"]
Hash1["XXH3 hash\n8 bytes"]
Hash2["XXH3 hash\n8 bytes"]
Output["Namespaced key\n16 bytes total"]
Input1 -->|hash once at init| Hash1
 
   Input2 -->|hash per call| Hash2
 
   Hash1 --> Output
 
   Hash2 --> Output

Architecture

Usage Pattern

MethodParametersReturn TypeDescription
__init__(prefix)prefix: bytesN/AInitializes hasher with namespace
namespace(key)key: bytesbytesReturns 16-byte namespaced key

The output key structure is:

  • Bytes 0-7: XXH3 hash of namespace prefix
  • Bytes 8-15: XXH3 hash of input key

This design ensures:

  • Deterministic key generation (same input → same output)
  • Collision isolation between namespaces
  • Fixed-length keys regardless of input size

Example:

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:170-219

graph TB
    Stub["data_store_ws_client.pyi\nType definitions"]
Impl["data_store_ws_client.py\nImplementation"]
Base["simd_r_drive_ws_client\nCompiled Rust module"]
Stub -.->|describes| Impl
 
   Impl -->|imports from| Base
 
   Stub -.->|describes| Base

Type Stubs and IDE Support

The package provides complete type stubs for IDE integration and static type checking.

Type Stub Structure

The type stubs (experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219) provide:

  • Full method signatures with type annotations
  • Return type information (Optional[bytes], list[Optional[bytes]], etc.)
  • Docstrings for IDE hover documentation
  • @final decorators indicating classes cannot be subclassed

Type Checking Example

Python Version Support:

The package targets Python 3.10-3.13 as specified in experiments/bindings/python-ws-client/pyproject.toml7 and experiments/bindings/python-ws-client/pyproject.toml:21-24

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219 experiments/bindings/python-ws-client/pyproject.toml7 experiments/bindings/python-ws-client/pyproject.toml:19-27

graph TB
    PythonMain["Python Main Thread\nSynchronous API calls"]
PyO3["PyO3 Bridge\npyo3-async-runtimes"]
TokioRT["Tokio Runtime\nAsync event loop"]
WSClient["WebSocket Client\ntokio-tungstenite"]
PythonMain -->|sync call| PyO3
 
   PyO3 -->|spawn + block_on| TokioRT
 
   TokioRT --> WSClient
 
   WSClient -.->|result| TokioRT
 
   TokioRT -.->|return| PyO3
 
   PyO3 -.->|return| PythonMain

Async Runtime Bridging

The client uses pyo3-async-runtimes to bridge Python’s async/await with Rust’s Tokio runtime. This allows the underlying Rust WebSocket client to use native async I/O while exposing synchronous APIs to Python.

Runtime Architecture

The pyo3-async-runtimes crate (experiments/bindings/python-ws-client/Cargo.lock:849-860) provides:

  • Runtime spawning: Manages Tokio runtime lifecycle
  • Future blocking: Converts Rust async operations to Python-blocking calls
  • Thread safety: Ensures proper synchronization between Python GIL and Rust runtime

This design allows Python code to use simple synchronous APIs while benefiting from Rust’s high-performance async networking under the hood.

Sources: experiments/bindings/python-ws-client/Cargo.lock:849-860 experiments/bindings/python-ws-client/Cargo.lock:1096-1108

API Summary

Complete Method Reference

CategoryMethodParametersReturnDescription
Connection__init__host: str, port: intN/AEstablish WebSocket connection
Writewritekey: bytes, data: bytesNoneWrite single entry
Writebatch_writeitems: list[tuple[bytes, bytes]]NoneWrite multiple entries
Readreadkey: bytesOptional[bytes]Read single entry (copies)
Readbatch_readkeys: list[bytes]list[Optional[bytes]]Read multiple entries
Readbatch_read_structureddata: dict or list[dict]Same structureRead with structure preservation
Deletedeletekey: bytesNoneMark key as deleted
Queryexistskey: bytesboolCheck key existence
Query__contains__key: bytesboolPython in operator
Info__len__N/AintActive entry count
Infois_emptyN/AboolCheck if empty
Infofile_sizeN/AintPhysical file size

NamespaceHasher Reference

MethodParametersReturnDescription
__init__prefix: bytesN/AInitialize namespace
namespacekey: bytesbytesGenerate 16-byte namespaced key

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Building Python Bindings

Loading…

Building Python Bindings

Relevant source files

Purpose and Scope

This page describes the build system, tooling, and workflow for generating Python bindings for the SIMD R Drive storage engine. It covers the PyO3/Maturin build pipeline, dependency management with uv, local development workflows, and wheel distribution. For the Python API surface and usage patterns, see Python WebSocket Client API. For CI/CD workflows and automated release processes, see CI/CD Pipeline.


Build System Architecture

The Python bindings are built using PyO3 (Rust-Python FFI) and Maturin (build backend and wheel generator). The uv tool replaces traditional pip and venv for faster, more reliable dependency management.

Build Pipeline Overview

graph TB
    subgraph "Source Code"
        RUST_LIB["experiments/bindings/python-ws-client/src/lib.rs\nPyO3 FFI Layer"]
PY_WRAPPER["simd_r_drive_ws_client/data_store_ws_client.py\nPython Wrapper"]
TYPE_STUBS["simd_r_drive_ws_client/data_store_ws_client.pyi\nType Hints"]
end
    
    subgraph "Build Configuration"
        PYPROJECT["pyproject.toml\nProject Metadata + Build Backend"]
CARGO_TOML["Cargo.toml\nRust Dependencies"]
UV_LOCK["uv.lock\nPinned Python Deps"]
end
    
    subgraph "Build Tools"
        PYO3["PyO3 0.25.1\nRust-Python Bridge"]
MATURIN["Maturin 1.8.7\nBuild System"]
UV["uv\nDependency Manager"]
end
    
    subgraph "Build Outputs"
        NATIVE_LIB["simd_r_drive_ws_client.so/.pyd\nNative Extension Module"]
WHEEL["simd_r_drive_ws_client-*.whl\nDistributable Package"]
end
    
 
   RUST_LIB --> PYO3
 
   PY_WRAPPER --> MATURIN
 
   TYPE_STUBS --> MATURIN
 
   PYPROJECT --> MATURIN
 
   CARGO_TOML --> PYO3
 
   UV_LOCK --> UV
    
 
   PYO3 --> MATURIN
 
   MATURIN --> NATIVE_LIB
 
   MATURIN --> WHEEL
 
   UV --> MATURIN
    
    NATIVE_LIB -.packaged into.-> WHEEL

Sources: experiments/bindings/python-ws-client/pyproject.toml:29-36 experiments/bindings/python-ws-client/Cargo.lock:832-905


Project Configuration Files

pyproject.toml Structure

The pyproject.toml file defines project metadata, build system requirements, and development dependencies.

SectionPurposeKey Configuration
[project]Package metadataname, version, requires-python = ">=3.10"
[build-system]Build backendrequires = ["maturin>=1.5"], build-backend = "maturin"
[tool.maturin]Maturin settingsbindings = "pyo3", requires-python = ">=3.10"
[dependency-groups]Dev dependenciesmaturin, pytest, mypy, numpy

Key Configuration Entries:

The bindings = "pyo3" directive tells Maturin to compile Rust code using PyO3’s FFI macros and generate a native Python extension module.

Sources: experiments/bindings/python-ws-client/pyproject.toml:1-46

Cargo Dependencies

The Rust side declares dependencies for PyO3, async runtime bridging, and core storage functionality:

DependencyVersionPurpose
pyo30.25.1Rust-Python FFI with #[pyclass], #[pyfunction] macros
pyo3-async-runtimes0.25.0Bridges Python asyncio with Rust tokio
tokio1.45.1Async runtime for WebSocket client
simd-r-drive-ws-client0.15.5-alphaNative Rust WebSocket client

Sources: experiments/bindings/python-ws-client/Cargo.lock:832-905 experiments/bindings/python-ws-client/Cargo.lock:849-860


Dependency Management with uv

The project uses uv instead of traditional pip for significantly faster dependency resolution and installation. uv is an all-in-one replacement for pip, pip-tools, and virtualenv.

uv Workflow Diagram

graph LR
    subgraph "Traditional pip"
        PIP_VENV["python -m venv"]
PIP_INSTALL["pip install"]
PIP_LOCK["pip freeze > requirements.txt"]
end
    
    subgraph "uv Workflow"
        UV_VENV["uv venv"]
UV_SYNC["uv pip install -e . --group dev"]
UV_LOCK["uv.lock\nAuto-generated"]
UV_RUN["uv run pytest"]
end
    
 
   UV_VENV --> UV_SYNC
 
   UV_SYNC --> UV_LOCK
 
   UV_LOCK --> UV_RUN

Development Dependencies

Development dependencies are specified using the [dependency-groups] table in pyproject.toml:

This group can be installed with:

Sources: experiments/bindings/python-ws-client/pyproject.toml:37-46 experiments/bindings/python-ws-client/integration_test.sh:70-77


Building from Source

Local Development Build

The fastest way to build and install bindings for local development is using maturin develop:

Build Process Details:

The develop command creates an editable installation , meaning changes to Python wrapper code (experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py) take effect immediately without reinstalling. Rust code changes require re-running maturin develop.

Sources: experiments/bindings/python-ws-client/README.md:32-36 experiments/bindings/python-ws-client/integration_test.sh:74-77

Release Wheel Build

To build distributable wheels for PyPI:

Maturin automatically:

  1. Compiles Rust code with --release optimizations
  2. Generates platform-specific wheel filename (e.g., cp310-cp310-linux_x86_64)
  3. Bundles native extension, Python wrappers, and type stubs
  4. Creates wheel in target/wheels/

Platform-Specific Wheels:

PlatformWheel Tag ExampleNotes
Linux x86_64cp310-cp310-manylinux_2_17_x86_64Built with manylinux2014 for compatibility
macOS x86_64cp310-cp310-macosx_10_12_x86_64Requires macOS 10.12+
macOS ARM64cp310-cp310-macosx_11_0_arm64M1/M2 Macs
Windows x64cp310-cp310-win_amd64MSVC toolchain

Sources: experiments/bindings/python-ws-client/uv.lock:117-130


Integration Testing Infrastructure

The project includes an automated integration test system that:

  1. Extracts code examples from README.md
  2. Starts the WebSocket server
  3. Runs pytest against live server
  4. Cleans up resources
graph TB
    subgraph "integration_test.sh"
        START["Start script"]
REGISTER_CLEANUP["Register cleanup trap"]
BUILD_SERVER["cargo run --package simd-r-drive-ws-server"]
START_SERVER["Start server in background\nPID captured"]
SETUP_UV["uv venv\nuv pip install"]
EXTRACT_TESTS["uv run extract_readme_tests.py"]
RUN_PYTEST["uv run pytest -v -s"]
CLEANUP["Kill server PID\nRemove storage file"]
end
    
    subgraph "extract_readme_tests.py"
        READ_README["Read README.md"]
EXTRACT_BLOCKS["Regex: ```python...```"]
WRAP_TEST_FN["Wrap as test_readme_block_N()"]
WRITE_TEST_FILE["Write tests/test_readme_blocks.py"]
end
    
 
   START --> REGISTER_CLEANUP
 
   REGISTER_CLEANUP --> BUILD_SERVER
 
   BUILD_SERVER --> START_SERVER
 
   START_SERVER --> SETUP_UV
 
   SETUP_UV --> EXTRACT_TESTS
    
 
   EXTRACT_TESTS --> READ_README
 
   READ_README --> EXTRACT_BLOCKS
 
   EXTRACT_BLOCKS --> WRAP_TEST_FN
 
   WRAP_TEST_FN --> WRITE_TEST_FILE
    
 
   WRITE_TEST_FILE --> RUN_PYTEST
 
   RUN_PYTEST --> CLEANUP

Test Orchestration Flow

Sources: experiments/bindings/python-ws-client/integration_test.sh:1-91

README Test Extraction

The extract_readme_tests.py script converts documentation examples into executable tests:

Extraction Logic:

  1. Pattern Matching: Uses regex r"```python\n(.*?)```" to extract fenced code blocks (experiments/bindings/python-ws-client/extract_readme_tests.py:22-24)
  2. Sanitization: Strips non-ASCII characters to avoid encoding issues (experiments/bindings/python-ws-client/extract_readme_tests.py:26-28)
  3. Test Wrapping: Wraps each block in a test_readme_block_{i}() function (experiments/bindings/python-ws-client/extract_readme_tests.py:30-34)
  4. File Generation: Writes to tests/test_readme_blocks.py (experiments/bindings/python-ws-client/extract_readme_tests.py:36-42)

Example Transformation:

Input (README.md):

Output (tests/test_readme_blocks.py):

Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:14-45

Integration Test Server Management

The test script manages the WebSocket server lifecycle:

Server Startup Sequence:

Cleanup Trap:

The script registers a cleanup() function that executes on exit (success or failure):

This ensures the server never remains running after tests complete, even if pytest crashes.

Sources: experiments/bindings/python-ws-client/integration_test.sh:17-33 experiments/bindings/python-ws-client/integration_test.sh:47-56


Wheel Distribution and CI Integration

Maturin Wheel Building

Maturin generates platform-specific binary wheels that include the compiled Rust extension. Each wheel is tagged with Python version, ABI, and platform identifiers.

Wheel Naming Convention:

{distribution}-{version}-{python tag}-{abi tag}-{platform tag}.whl

Example: simd_r_drive_ws_client-0.11.1-cp310-cp310-manylinux_2_17_x86_64.whl

ComponentValueMeaning
cp310CPython 3.10Python implementation and version
cp310CPython 3.10 ABIABI compatibility tag
manylinux_2_17glibc 2.17+Minimum Linux C library version
x86_64x86-64CPU architecture

Sources: experiments/bindings/python-ws-client/uv.lock:117-130

Build Matrix Configuration

The CI system (see CI/CD Pipeline) builds wheels for multiple platforms using a matrix strategy:

This produces wheels for:

  • Linux: manylinux2014 x86_64, aarch64
  • macOS: x86_64, arm64 (universal2)
  • Windows: win_amd64, win32, win_arm64

CI Workflow Reference:

The GitHub Actions workflow at .github/workflows/python-net-release.yml orchestrates multi-platform builds using Maturin’s maturin build --release command across matrix configurations.

Sources: experiments/bindings/python-ws-client/README.md38


PyO3 Feature Configuration

Async Runtime Bridge

The bindings use pyo3-async-runtimes to bridge Python’s asyncio event loop with Rust’s tokio runtime:

This allows Python code to use await with methods that internally execute Rust async code:

The wrapper methods handle the await internally, presenting a synchronous interface to Python users.

Sources: experiments/bindings/python-ws-client/Cargo.lock:849-860

Type Stub Generation

Type stubs (.pyi files) provide IDE autocomplete and mypy type checking. They are manually maintained to match the PyO3 API:

These stubs are packaged into the wheel alongside the native extension, enabling static type checking without runtime overhead.

Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:7-169


Development Workflow Summary

Recommended Development Cycle:

Quick Command Reference:

TaskCommand
Setup environmentuv venv && uv pip install -e . --group dev
Build debugmaturin develop
Build releasematurin develop --release
Run testsuv run pytest -v
Type checkuv run mypy .
Build wheelmaturin build --release
Full integration test./integration_test.sh

Sources: experiments/bindings/python-ws-client/integration_test.sh:70-87 experiments/bindings/python-ws-client/README.md:32-36

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Integration Testing

Loading…

Integration Testing

Relevant source files

Purpose and Scope

This document describes the integration testing infrastructure for the Python WebSocket client bindings. The integration test suite validates that the Python client (simd-r-drive-ws-client) can successfully communicate with the WebSocket server (simd-r-drive-ws-server) over a real network connection. The tests automatically extract and validate code examples from the README documentation to ensure documentation accuracy.

For information about building Python bindings, see Building Python Bindings. For details on the Python WebSocket Client API itself, see Python WebSocket Client API.


Test Architecture Overview

Sources: experiments/bindings/python-ws-client/integration_test.sh:1-91 experiments/bindings/python-ws-client/extract_readme_tests.py:1-46


Test Orchestration Script

The integration test suite is orchestrated by the integration_test.sh Bash script, which manages the complete test lifecycle including server startup, environment setup, test execution, and cleanup.

Script Configuration

Configuration VariableDefault ValuePurpose
EXPERIMENTS_DIR_REL_PATH../../Relative path to experiments directory
SERVER_PACKAGE_NAMEsimd-r-drive-ws-serverCargo package name for server
STORAGE_FILE/tmp/simd-r-drive-pytest-storage.binTemporary storage file path
SERVER_HOST127.0.0.1Server bind address
SERVER_PORT34129Server listen port

Sources: experiments/bindings/python-ws-client/integration_test.sh:8-14

Execution Flow

Sources: experiments/bindings/python-ws-client/integration_test.sh:35-90

Cleanup Mechanism

The script registers a cleanup() function with trap cleanup EXIT to ensure resources are released regardless of how the script terminates:

  1. Process Group Termination : Uses kill -9 "-$SERVER_PID" to kill the entire process group, ensuring the server and any child processes are stopped
  2. Storage File Removal : Deletes the temporary storage file at /tmp/simd-r-drive-pytest-storage.bin
  3. Error Suppression : Uses || true to prevent cleanup failures from failing the script

Sources: experiments/bindings/python-ws-client/integration_test.sh:18-30 experiments/bindings/python-ws-client/integration_test.sh:32-33


README Example Extraction

The extract_readme_tests.py script automatically generates pytest test functions from Python code blocks in the README documentation, ensuring that documented examples remain functional.

graph LR
    subgraph "Input"
        README[README.md\nPython code blocks]
    end
    
    subgraph "Extraction Functions"
        EXTRACT[extract_python_blocks]
        REGEX["re.compile pattern\n```python...```"]
STRIP[strip_non_ascii]
        WRAP[wrap_as_test_fn]
    end
    
    subgraph "Output"
        TEST_FN["def test_readme_block_{i}"]
TEST_FILE[tests/test_readme_blocks.py]
    end
    
 
   README --> EXTRACT
 
   EXTRACT --> REGEX
 
   REGEX --> STRIP
 
   STRIP --> WRAP
 
   WRAP --> TEST_FN
 
   TEST_FN --> TEST_FILE

Extraction Process

Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:15-42

Key Functions

FunctionInputOutputPurpose
extract_python_blocksREADME textlist[str]Uses regex r"```python\n(.*?)```" to extract code blocks
strip_non_asciiCode stringASCII stringRemoves non-ASCII characters using encode("ascii", errors="ignore")
wrap_as_test_fnCode string, indexTest function stringWraps code in def test_readme_block_{idx}(): with proper indentation

Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:21-34

File Paths and Constants

The script uses fixed file paths defined at module level:

  • README = Path("README.md") - Source documentation file
  • TEST_FILE = Path("tests/test_readme_blocks.py") - Generated test file

The generated test file includes a header comment indicating it is auto-generated and imports pytest.

Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:18-19 experiments/bindings/python-ws-client/extract_readme_tests.py:40-41


Test Execution with pytest

The test suite uses pytest as the test runner, executed through the uv run command to ensure correct virtual environment activation.

pytest Configuration

The test execution command uses specific flags:

FlagPurpose
-vVerbose output showing individual test names
-sDisable output capture (show print statements)

Sources: experiments/bindings/python-ws-client/integration_test.sh87

Environment Variables

The test suite relies on environment variables to locate the running server:

VariableSourceUsage
TEST_SERVER_HOST$SERVER_HOST from scriptServer IP address for client connection
TEST_SERVER_PORT$SERVER_PORT from scriptServer port for client connection

These variables are exported before pytest execution so test code can access the server endpoint.

Sources: experiments/bindings/python-ws-client/integration_test.sh:84-85

Test Function Generation

Each Python code block from README.md becomes an isolated test function following the pattern:

def test_readme_block_0():
    <indented code from README>

def test_readme_block_1():
    <indented code from README>

The test functions are numbered sequentially starting from 0. Each test runs independently with its own test context.

Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:30-34


graph TB
    subgraph "Server Startup Sequence"
        CD[cd to experiments dir]
        SET_M["set -m\nEnable job control"]
CARGO["cargo run --package simd-r-drive-ws-server"]
ARGS["-- $STORAGE_FILE --host $HOST --port $PORT"]
BG["& (background)"]
CAPTURE["SERVER_PID=$!"]
UNSET_M["set +m\nDisable job control"]
end
    
 
   CD --> SET_M
 
   SET_M --> CARGO
 
   CARGO --> ARGS
 
   ARGS --> BG
 
   BG --> CAPTURE
 
   CAPTURE --> UNSET_M
    
    style CARGO fill:#f9f9f9
    style CAPTURE fill:#f9f9f9

Server Lifecycle Management

The integration test script manages the WebSocket server lifecycle to provide a clean test environment for each run.

Server Startup

The script uses set -m to enable job control before starting the server, which allows proper PID capture of background processes. After capturing the PID, job control is disabled with set +m.

Sources: experiments/bindings/python-ws-client/integration_test.sh:47-56

Server Configuration

The server is started with the following arguments:

  • Storage File : Positional argument specifying the data file path (/tmp/simd-r-drive-pytest-storage.bin)
  • --host : Bind address (127.0.0.1 for localhost-only access)
  • --port : Listen port (34129 for test isolation)

These arguments are passed after the -- separator to distinguish cargo arguments from application arguments.

Sources: experiments/bindings/python-ws-client/integration_test.sh53

Server Termination

The cleanup function terminates the server using process group kill:

  • -9 : SIGKILL signal for forceful termination
  • "-$SERVER_PID" : Negative PID to kill entire process group
  • 2>/dev/null: Suppress error messages
  • || true : Prevent script failure if process already exited

Sources: experiments/bindings/python-ws-client/integration_test.sh25


graph TB
    subgraph "uv Environment Setup"
        CHECK["command -v uv"]
VENV["uv venv"]
INSTALL_BASE["uv pip install pytest maturin"]
INSTALL_DEV["uv pip install -e . --group dev"]
end
    
    subgraph "Python Dependencies"
        PYTEST[pytest]
        MATURIN[maturin]
        DEV_DEPS[Development dependencies\nfrom pyproject.toml]
    end
    
    subgraph "Lock File"
        UV_LOCK[uv.lock]
        ANYIO[anyio]
        HTTPX[httpx]
        HTTPCORE[httpcore]
        NUMPY[numpy]
        MYPY[mypy]
    end
    
 
   CHECK --> VENV
 
   VENV --> INSTALL_BASE
 
   INSTALL_BASE --> PYTEST
 
   INSTALL_BASE --> MATURIN
 
   INSTALL_BASE --> INSTALL_DEV
 
   INSTALL_DEV --> DEV_DEPS
    
    UV_LOCK -.resolves.-> ANYIO
    UV_LOCK -.resolves.-> HTTPX
    UV_LOCK -.resolves.-> HTTPCORE
    UV_LOCK -.resolves.-> NUMPY
    UV_LOCK -.resolves.-> MYPY
    
    style CHECK fill:#f9f9f9
    style VENV fill:#f9f9f9
    style UV_LOCK fill:#f9f9f9

Environment Setup with uv

The test suite uses the uv Python package manager for fast, reliable dependency management and virtual environment creation.

Dependency Resolution

Sources: experiments/bindings/python-ws-client/integration_test.sh:62-77 experiments/bindings/python-ws-client/uv.lock:1-7

uv Commands

CommandPurpose
uv venvCreates a virtual environment in .venv directory
uv pip install --quiet pytest maturinInstalls test runner and build tool
uv pip install -e . --group devInstalls package in editable mode with dev dependencies
uv run <command>Executes command in virtual environment context

The --quiet flag suppresses installation progress output for cleaner logs.

Sources: experiments/bindings/python-ws-client/integration_test.sh:70-80

Dependency Lock File

The uv.lock file pins exact versions and hashes for all dependencies:

PackageVersionPurpose
pytestLatestTest framework
maturin1.8.7+PyO3 build system
anyio4.9.0+Async I/O foundation
httpx0.28.1+HTTP client (WebSocket support)
mypy1.16.1+Static type checking
numpy2.2.6+/2.3.0+Numerical computing (conditional)

The lock file uses resolution markers to handle different Python versions (e.g., python_full_version >= '3.11').

Sources: experiments/bindings/python-ws-client/uv.lock:1-7 experiments/bindings/python-ws-client/uv.lock:110-130 experiments/bindings/python-ws-client/uv.lock:133-169

uv Availability Check

Before proceeding with environment setup, the script validates that uv is installed:

This check ensures clear error messages if the prerequisite tool is missing.

Sources: experiments/bindings/python-ws-client/integration_test.sh:62-68


graph TB
    START["integration_test.sh start"]
subgraph "Phase 1: Setup"
        CD_EXPERIMENTS[cd experiments/]
        BUILD_SERVER["cargo run --package simd-r-drive-ws-server &"]
CAPTURE_PID[Capture SERVER_PID]
    end
    
    subgraph "Phase 2: Environment"
        CD_CLIENT[cd bindings/python-ws-client]
        CHECK_UV[Check uv availability]
        CREATE_VENV[uv venv]
        INSTALL_DEPS[uv pip install pytest maturin]
        INSTALL_EDITABLE["uv pip install -e . --group dev"]
end
    
    subgraph "Phase 3: Test Generation"
        RUN_EXTRACT[uv run extract_readme_tests.py]
        PARSE_README[Parse README.md]
        GENERATE_TESTS[Generate tests/test_readme_blocks.py]
    end
    
    subgraph "Phase 4: Test Execution"
        EXPORT_ENV[Export TEST_SERVER_HOST/PORT]
        RUN_PYTEST["uv run pytest -v -s"]
EXECUTE_TESTS[Execute test_readme_block_* functions]
    end
    
    subgraph "Phase 5: Cleanup"
        TRAP_EXIT[trap EXIT triggers]
        KILL_SERVER["kill -9 -$SERVER_PID"]
REMOVE_FILE["rm -f $STORAGE_FILE"]
end
    
 
   START --> CD_EXPERIMENTS
 
   CD_EXPERIMENTS --> BUILD_SERVER
 
   BUILD_SERVER --> CAPTURE_PID
 
   CAPTURE_PID --> CD_CLIENT
    
 
   CD_CLIENT --> CHECK_UV
 
   CHECK_UV --> CREATE_VENV
 
   CREATE_VENV --> INSTALL_DEPS
 
   INSTALL_DEPS --> INSTALL_EDITABLE
    
 
   INSTALL_EDITABLE --> RUN_EXTRACT
 
   RUN_EXTRACT --> PARSE_README
 
   PARSE_README --> GENERATE_TESTS
    
 
   GENERATE_TESTS --> EXPORT_ENV
 
   EXPORT_ENV --> RUN_PYTEST
 
   RUN_PYTEST --> EXECUTE_TESTS
    
 
   EXECUTE_TESTS --> TRAP_EXIT
 
   TRAP_EXIT --> KILL_SERVER
 
   KILL_SERVER --> REMOVE_FILE
    
    style START fill:#f9f9f9
    style TRAP_EXIT fill:#f9f9f9

Test Execution Workflow Summary

The complete integration test workflow coordinates multiple tools and processes:

Sources: experiments/bindings/python-ws-client/integration_test.sh:1-91 experiments/bindings/python-ws-client/extract_readme_tests.py:1-46

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Performance Optimizations

Loading…

Performance Optimizations

Relevant source files

Purpose and Scope

This document describes the performance optimization strategies employed by SIMD R Drive to achieve high-throughput storage operations. It covers hardware acceleration through SIMD instructions, cache-efficient memory alignment, zero-copy access patterns, lock-free concurrent operations, and the benchmarking infrastructure used to validate these optimizations.

For implementation details of specific SIMD operations, see SIMD Acceleration. For cache-line alignment specifics, see Payload Alignment and Cache Efficiency. For operation mode characteristics, see Write and Read Modes. For benchmark execution and analysis, see Benchmarking.


Performance Architecture Overview

The performance optimization stack consists of multiple layers, from hardware acceleration at the bottom to application-level operation modes at the top. Each layer contributes to the overall system throughput.

Performance Architecture Stack

graph TB
    subgraph Hardware["Hardware Features"]
CPU["CPU Architecture"]
AVX2["AVX2 Instructions\nx86_64"]
NEON["NEON Instructions\naarch64"]
CACHE["64-byte Cache Lines"]
end
    
    subgraph SIMD["SIMD Optimization Layer"]
SIMD_COPY["simd_copy Function"]
XXH3["xxh3_64 Hashing"]
FEATURE_DETECT["Runtime Feature Detection"]
end
    
    subgraph Memory["Memory Management"]
MMAP["memmap2::Mmap"]
ALIGN["PAYLOAD_ALIGNMENT = 64"]
DASHMAP["DashMap Index"]
end
    
    subgraph Concurrency["Concurrency Primitives"]
ATOMIC["AtomicU64 tail_offset"]
RWLOCK["RwLock File"]
ARC["Arc Mmap Sharing"]
end
    
    subgraph Operations["Operation Modes"]
WRITE_SINGLE["Single Write"]
WRITE_BATCH["Batch Write"]
WRITE_STREAM["Stream Write"]
READ_DIRECT["Direct Read"]
READ_STREAM["Stream Read"]
READ_PARALLEL["Parallel Iteration"]
end
    
 
   CPU --> AVX2
 
   CPU --> NEON
 
   CPU --> CACHE
    
 
   AVX2 --> SIMD_COPY
 
   NEON --> SIMD_COPY
 
   AVX2 --> XXH3
 
   NEON --> XXH3
    
 
   FEATURE_DETECT --> SIMD_COPY
    
 
   CACHE --> ALIGN
 
   SIMD_COPY --> ALIGN
    
 
   ALIGN --> MMAP
 
   MMAP --> ARC
 
   MMAP --> DASHMAP
    
 
   DASHMAP --> ATOMIC
 
   RWLOCK --> ATOMIC
    
 
   SIMD_COPY --> WRITE_SINGLE
 
   SIMD_COPY --> WRITE_BATCH
 
   MMAP --> READ_DIRECT
 
   ARC --> READ_PARALLEL

The diagram shows how hardware features enable SIMD operations, which work with aligned memory to maximize cache efficiency. The memory management layer uses zero-copy access patterns, while concurrency primitives enable safe multi-threaded operations. Application-level operation modes leverage these lower layers for optimal performance.

Sources: README.md:5-7 README.md:249-256 src/storage_engine/simd_copy.rs:1-139


SIMD Acceleration Components

SIMD R Drive uses vectorized instructions to accelerate two critical operations: memory copying during writes and key hashing for indexing. The system detects available CPU features at runtime and selects the optimal implementation.

SIMD Component Architecture

graph TB
    subgraph Detection["Feature Detection"]
RUNTIME["std::is_x86_feature_detected!"]
X86_CHECK["Check avx2 on x86_64"]
ARM_DEFAULT["Default neon on aarch64"]
end
    
    subgraph Implementations["SIMD Implementations"]
SIMD_COPY_X86["simd_copy_x86\n_mm256_loadu_si256\n_mm256_storeu_si256"]
SIMD_COPY_ARM["simd_copy_arm\nvld1q_u8\nvst1q_u8"]
FALLBACK["Scalar copy_from_slice"]
end
    
    subgraph Operations["Accelerated Operations"]
WRITE_OP["Write Operations"]
HASH_OP["xxh3_64 Hashing"]
end
    
    subgraph Characteristics["Performance Characteristics"]
AVX2_SIZE["32-byte chunks AVX2"]
NEON_SIZE["16-byte chunks NEON"]
REMAINDER["Scalar remainder"]
end
    
 
   RUNTIME --> X86_CHECK
 
   RUNTIME --> ARM_DEFAULT
    
 
   X86_CHECK -->|Detected| SIMD_COPY_X86
 
   X86_CHECK -->|Not Detected| FALLBACK
 
   ARM_DEFAULT --> SIMD_COPY_ARM
    
 
   SIMD_COPY_X86 --> AVX2_SIZE
 
   SIMD_COPY_ARM --> NEON_SIZE
 
   AVX2_SIZE --> REMAINDER
 
   NEON_SIZE --> REMAINDER
    
 
   SIMD_COPY_X86 --> WRITE_OP
 
   SIMD_COPY_ARM --> WRITE_OP
 
   FALLBACK --> WRITE_OP
    
 
   HASH_OP --> WRITE_OP

The simd_copy function performs runtime feature detection to select the appropriate SIMD implementation. On x86_64, it checks for AVX2 support and processes 32 bytes per iteration. On aarch64, it uses NEON instructions to process 16 bytes per iteration. A scalar fallback handles unsupported architectures and the remainder bytes after vectorized processing.

Sources: src/storage_engine/simd_copy.rs:110-138 src/storage_engine/simd_copy.rs:16-62 src/storage_engine/simd_copy.rs:64-108

SIMD Copy Function Details

The simd_copy function provides the core memory copying optimization used during write operations:

ArchitectureSIMD ExtensionChunk SizeLoad InstructionStore InstructionFeature Detection
x86_64AVX232 bytes_mm256_loadu_si256_mm256_storeu_si256is_x86_feature_detected!("avx2")
aarch64NEON16 bytesvld1q_u8vst1q_u8Always enabled
FallbackNoneVariableN/AN/AOther architectures

The function is defined in src/storage_engine/simd_copy.rs:110-138 with platform-specific implementations in src/storage_engine/simd_copy.rs:33-62 (x86_64) and src/storage_engine/simd_copy.rs:81-108 (aarch64).

Hardware-Accelerated Hashing

The xxh3_64 hashing algorithm used by KeyIndexer leverages SIMD extensions to accelerate key hashing operations. The dependency is configured in Cargo.toml34 with features ["xxh3", "const_xxh3"].

Hash acceleration characteristics:

  • SSE2 : Universally supported on x86_64, enabled by default
  • AVX2 : Additional performance gains on capable CPUs
  • NEON : Default on aarch64 targets, providing ARM SIMD acceleration

Sources: README.md:158-166 Cargo.toml34


Cache-Line Aligned Memory Layout

The storage engine aligns all non-tombstone payloads to 64-byte boundaries, matching typical CPU cache-line sizes. This alignment strategy ensures that SIMD operations operate efficiently without crossing cache-line boundaries.

64-byte Alignment Strategy

graph LR
    subgraph Entry1["Entry N"]
PREPAD1["Pre-Padding\n0-63 bytes"]
PAYLOAD1["Payload\nAligned Start"]
META1["Metadata\n20 bytes"]
end
    
    subgraph Entry2["Entry N+1"]
PREPAD2["Pre-Padding\n0-63 bytes"]
PAYLOAD2["Payload\nAligned Start"]
META2["Metadata\n20 bytes"]
end
    
    subgraph Alignment["Alignment Calculation"]
PREV_TAIL["prev_tail_offset"]
CALC["pad = A - prev_tail mod A mod A\nA = PAYLOAD_ALIGNMENT = 64"]
NEXT_START["payload_start mod 64 = 0"]
end
    
 
   META1 --> PREV_TAIL
 
   PREV_TAIL --> CALC
 
   CALC --> PREPAD2
 
   PREPAD2 --> PAYLOAD2
 
   PAYLOAD2 --> NEXT_START

Each payload is preceded by 0-63 bytes of padding to ensure the payload itself starts on a 64-byte boundary. The padding length is calculated based on the previous entry’s tail offset. This enables efficient SIMD loads/stores and ensures optimal cache-line utilization.

Alignment Formula

The pre-padding calculation ensures proper alignment:

pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)

Where:

  • PAYLOAD_ALIGNMENT = 64 (defined in simd-r-drive-entry-handle/src/constants.rs)
  • prev_tail is the absolute file offset after the previous entry’s metadata
  • The bitwise AND with (PAYLOAD_ALIGNMENT - 1) handles the modulo operation efficiently since 64 is a power of 2

Sources: README.md:51-59 README.md:114-124

Alignment Benefits

BenefitDescriptionImpact
SIMD EfficiencyVectorized operations don’t cross cache-line boundaries2-4x speedup on bulk copies
Cache PerformanceSingle payload typically fits within contiguous cache linesReduced cache misses
Zero-Copy CastingAligned payloads can be safely cast to typed slices (&[u32], &[u64])No buffer allocation needed
Predictable PerformanceConsistent access patterns regardless of payload sizeStable latency characteristics

The alignment is enforced during write operations and verified during entry access through EntryHandle.

Sources: README.md:51-59


graph TB
    subgraph File["Storage File"]
DISK["simd-r-drive.bin"]
end
    
    subgraph Mapping["Memory Mapping"]
MMAP_CREATE["Mmap::map"]
MMAP_INSTANCE["Mmap Instance"]
ARC_MMAP["Arc Mmap\nShared Reference"]
end
    
    subgraph Index["KeyIndexer"]
DASHMAP_STRUCT["DashMap key_hash -> packed_value"]
PACKED["Packed u64\n16-bit tag\n48-bit offset"]
end
    
    subgraph Access["Zero-Copy Access"]
LOOKUP["read key"]
GET_OFFSET["Extract offset from packed_value"]
SLICE["Arc Mmap byte range"]
HANDLE["EntryHandle\nDirect payload reference"]
end
    
    subgraph Threading["Multi-threaded Access"]
CLONE["Arc::clone"]
THREAD1["Thread 1 read"]
THREAD2["Thread 2 read"]
THREAD3["Thread N read"]
end
    
 
   DISK --> MMAP_CREATE
 
   MMAP_CREATE --> MMAP_INSTANCE
 
   MMAP_INSTANCE --> ARC_MMAP
    
 
   ARC_MMAP --> DASHMAP_STRUCT
 
   DASHMAP_STRUCT --> PACKED
    
 
   LOOKUP --> DASHMAP_STRUCT
 
   DASHMAP_STRUCT --> GET_OFFSET
 
   GET_OFFSET --> SLICE
 
   ARC_MMAP --> SLICE
 
   SLICE --> HANDLE
    
 
   ARC_MMAP --> CLONE
 
   CLONE --> THREAD1
 
   CLONE --> THREAD2
 
   CLONE --> THREAD3
 
   THREAD1 --> HANDLE
 
   THREAD2 --> HANDLE
 
   THREAD3 --> HANDLE

Zero-Copy Memory Access Patterns

SIMD R Drive achieves zero-copy reads by memory-mapping the entire storage file and providing direct byte slice access to payloads. This eliminates deserialization overhead and reduces memory pressure for large datasets.

Zero-Copy Read Architecture

The storage file is memory-mapped once and shared via Arc<Mmap> across threads. Read operations perform hash lookups in DashMap to get file offsets, then return EntryHandle instances that provide direct views into the mapped memory. Multiple threads can safely read concurrently without copying data.

Sources: README.md:43-49 README.md:173-175

Memory-Mapped File Management

The memmap2 crate provides the memory mapping functionality:

  • Configured as a workspace dependency in Cargo.toml102
  • Used in the DataStore implementation
  • Protected by Mutex<Arc<Mmap>> to prevent unsafe remapping during active reads
  • Automatically remapped when file grows beyond current mapping size

EntryHandle Zero-Copy Interface

The EntryHandle type provides zero-copy access to stored payloads without allocating intermediate buffers:

MethodReturn TypeCopy BehaviorUse Case
payload()&[u8]Zero-copy referenceDirect access to full payload
payload_reader()impl ReadBuffered readsStreaming large payloads
as_arrow_buffer()arrow::BufferZero-copy viewApache Arrow integration

The handle maintains a reference to the memory-mapped region and calculates the payload range based on entry metadata.

Sources: README.md:228-233


graph TB
    subgraph Writes["Write Path Synchronized"]
WRITE_LOCK["RwLock File write"]
APPEND["Append entry to file"]
UPDATE_INDEX["DashMap::insert"]
UPDATE_TAIL["AtomicU64::store tail_offset"]
end
    
    subgraph Reads["Read Path Lock-Free"]
READ1["Thread 1 read"]
READ2["Thread 2 read"]
READN["Thread N read"]
LOOKUP1["DashMap::get"]
LOOKUP2["DashMap::get"]
LOOKUPN["DashMap::get"]
MMAP_ACCESS["Arc Mmap shared access"]
end
    
    subgraph Synchronization["Concurrency Control"]
RWLOCK_STRUCT["RwLock File"]
ATOMIC_STRUCT["AtomicU64 tail_offset"]
DASHMAP_STRUCT["DashMap Index"]
ARC_STRUCT["Arc Mmap"]
end
    
 
   WRITE_LOCK --> APPEND
 
   APPEND --> UPDATE_INDEX
 
   UPDATE_INDEX --> UPDATE_TAIL
    
 
   READ1 --> LOOKUP1
 
   READ2 --> LOOKUP2
 
   READN --> LOOKUPN
    
 
   LOOKUP1 --> MMAP_ACCESS
 
   LOOKUP2 --> MMAP_ACCESS
 
   LOOKUPN --> MMAP_ACCESS
    
    RWLOCK_STRUCT -.controls.-> WRITE_LOCK
    ATOMIC_STRUCT -.updated by.-> UPDATE_TAIL
    DASHMAP_STRUCT -.provides.-> LOOKUP1
    DASHMAP_STRUCT -.provides.-> LOOKUP2
    DASHMAP_STRUCT -.provides.-> LOOKUPN
    ARC_STRUCT -.enables.-> MMAP_ACCESS

Lock-Free Concurrent Read Operations

The storage engine enables multiple threads to perform concurrent reads without acquiring locks, using DashMap for the in-memory index and atomic operations for metadata tracking.

Lock-Free Read Architecture

Write operations acquire an RwLock to ensure sequential appends, but read operations access the DashMap index without locking. The DashMap data structure provides lock-free reads through internal sharding and fine-grained locking. The memory-mapped file is shared via Arc<Mmap>, allowing concurrent zero-copy access.

Sources: README.md:172-183 Cargo.toml27

DashMap Index Characteristics

The DashMap dependency is configured in Cargo.toml27 and provides these characteristics:

  • Lock-free reads : Read operations don’t block each other
  • Sharded locking : Write operations only lock specific shards
  • Concurrent inserts : Multiple threads can update different shards simultaneously
  • Memory overhead : Approximately 64 bytes per entry for hash table overhead

Atomic Operations

The AtomicU64 for tail_offset tracking provides:

  • Ordering guarantees : SeqCst ordering ensures consistency across threads
  • Lock-free updates : Writes update the tail without blocking reads
  • Single-word operations : 64-bit atomic operations are efficient on modern CPUs

Sources: README.md:182-183


Operation Mode Performance Characteristics

SIMD R Drive provides multiple operation modes optimized for different workload patterns. Each mode has specific performance characteristics and resource usage profiles.

Write Operation Modes

ModeMethodLock DurationI/O PatternFlush BehaviorBest For
Singlewrite()Per-entrySequentialImmediateLow-latency single writes
Batchbatch_write()Per-batchSequentialAfter batchHigh-throughput bulk writes
Streamwrite_stream()Per-entrySequentialImmediateLarge entries (>1MB)

Single Write (README.md:213-215):

  • Acquires RwLock for each entry
  • Flushes to disk immediately after write
  • Suitable for applications requiring durability guarantees per operation

Batch Write (README.md:217-219):

  • Acquires RwLock once for entire batch
  • Flushes to disk after all entries written
  • Reduces syscall overhead for bulk operations
  • Can write thousands of entries in single lock acquisition

Stream Write (README.md:221-223):

  • Accepts impl Read source for payload data
  • Copies data in chunks to avoid full in-memory allocation
  • Suitable for writing multi-megabyte or gigabyte-sized entries

Sources: README.md:208-223

Read Operation Modes

ModeMethodMemory BehaviorParallelismBest For
Directread()Zero-copySingle-threadedSmall to medium entries
Streampayload_reader()BufferedSingle-threadedLarge entries (>10MB)
Parallelpar_iter_entries()Zero-copyMulti-threadedBulk processing entire dataset

Direct Read (README.md:228-233):

  • Returns EntryHandle with direct memory-mapped payload reference
  • Zero allocation for payload access
  • Full entry must fit in virtual address space
  • Fastest for entries under 10MB

Stream Read (README.md:234-241):

  • Reads payload incrementally through Read trait
  • Uses 8KB buffer internally
  • Avoids memory pressure for large entries
  • Non-zero-copy but memory-efficient

Parallel Iteration (README.md:242-247):

  • Requires parallel feature flag in Cargo.toml52
  • Uses Rayon for multi-threaded iteration
  • Processes all valid entries across CPU cores
  • Ideal for building in-memory caches or analytics workloads

Sources: README.md:224-247 Cargo.toml30 Cargo.toml52


graph TB
    subgraph Benchmark_Suite["Benchmark Suite"]
STORAGE_BENCH["storage_benchmark"]
CONTENTION_BENCH["contention_benchmark"]
end
    
    subgraph Storage_Tests["Storage Benchmark Tests"]
WRITE_SINGLE["Single Write Throughput"]
WRITE_BATCH["Batch Write Throughput"]
READ_SEQ["Sequential Read Throughput"]
READ_RAND["Random Read Throughput"]
end
    
    subgraph Contention_Tests["Contention Benchmark Tests"]
MULTI_THREAD["Multi-threaded Read Contention"]
PARALLEL_ITER["Parallel Iteration Performance"]
end
    
    subgraph Criterion["Criterion.rs Framework"]
STATISTICAL["Statistical Analysis"]
COMPARISON["Baseline Comparison"]
PLOTS["Performance Plots"]
REPORTS["HTML Reports"]
end
    
 
   STORAGE_BENCH --> WRITE_SINGLE
 
   STORAGE_BENCH --> WRITE_BATCH
 
   STORAGE_BENCH --> READ_SEQ
 
   STORAGE_BENCH --> READ_RAND
    
 
   CONTENTION_BENCH --> MULTI_THREAD
 
   CONTENTION_BENCH --> PARALLEL_ITER
    
 
   WRITE_SINGLE --> STATISTICAL
 
   WRITE_BATCH --> STATISTICAL
 
   READ_SEQ --> STATISTICAL
 
   READ_RAND --> STATISTICAL
 
   MULTI_THREAD --> STATISTICAL
 
   PARALLEL_ITER --> STATISTICAL
    
 
   STATISTICAL --> COMPARISON
 
   COMPARISON --> PLOTS
 
   PLOTS --> REPORTS

Benchmarking Infrastructure

SIMD R Drive uses Criterion.rs for statistical benchmarking of performance-critical operations. The benchmark suite validates the effectiveness of SIMD optimizations and concurrent access patterns.

Benchmarking Architecture

The benchmark suite consists of two main harnesses: storage_benchmark measures fundamental read/write throughput, while contention_benchmark measures concurrent access performance. Criterion.rs provides statistical analysis, baseline comparisons, and HTML reports for each benchmark run.

Sources: Cargo.toml:57-63 Cargo.toml98

Benchmark Configuration

The benchmark harnesses are defined in Cargo.toml:57-63:

The harness = false setting disables Rust’s default benchmark harness, allowing Criterion.rs to provide its own test runner with statistical analysis capabilities.

Criterion.rs Integration

The Criterion.rs framework is configured as a development dependency in Cargo.toml39 and provides:

  • Statistical rigor : Multiple iterations with outlier detection
  • Baseline comparison : Compare performance across code changes
  • Regression detection : Automatically detect performance regressions
  • Visualization : Generate performance plots and HTML reports
  • Reproducibility : Consistent measurement methodology across environments

Benchmarks are executed with:

The results are stored in target/criterion/ for historical comparison.

Sources: Cargo.toml39 Cargo.toml:57-63


Performance Feature Summary

The following table summarizes the key performance features and their implementation locations:

FeatureImplementationBenefitConfiguration
AVX2 SIMDsimd_copy_x86 in src/storage_engine/simd_copy.rs:33-6232-byte vectorized copiesRuntime feature detection
NEON SIMDsimd_copy_arm in src/storage_engine/simd_copy.rs:81-10816-byte vectorized copiesAlways enabled on aarch64
64-byte AlignmentPAYLOAD_ALIGNMENT constantCache-line efficiencyBuild-time constant
Zero-Copy Readsmemmap2::MmapNo deserialization overheadAlways enabled
Lock-Free ReadsDashMap in Cargo.toml27Concurrent read scalingAlways enabled
Parallel IterationRayon in Cargo.toml30Multi-core bulk processingparallel feature flag
Hardware Hashingxxhash-rust in Cargo.toml34SIMD-accelerated indexingAlways enabled

For detailed information on each feature, see the corresponding child sections 5.1, 5.2, 5.3, and 5.4.

Sources: README.md:5-7 README.md:249-256 Cargo.toml27 Cargo.toml30 Cargo.toml34

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

SIMD Acceleration

Loading…

SIMD Acceleration

Relevant source files

Purpose and Scope

This document describes the SIMD (Single Instruction, Multiple Data) acceleration layer used in the SIMD R Drive storage engine. SIMD acceleration provides vectorized memory copy operations that process multiple bytes simultaneously, improving throughput for data write operations. The implementation supports AVX2 instructions on x86_64 architectures and NEON instructions on ARM AArch64 architectures.

For information about payload alignment considerations that complement SIMD operations, see Payload Alignment and Cache Efficiency. For details on how SIMD operations are measured, see Benchmarking.

Architecture Support Matrix

The SIMD acceleration layer provides platform-specific implementations based on available hardware features:

ArchitectureSIMD TechnologyVector WidthBytes per OperationRuntime Detection
x86_64AVX2256-bit32 bytesYes (is_x86_feature_detected!)
aarch64 (ARM)NEON128-bit16 bytesNo (always enabled)
OtherScalar fallbackN/A1 byteN/A

Sources: src/storage_engine/simd_copy.rs:10-138

SIMD Copy Architecture

The simd_copy function serves as the unified entry point for SIMD-accelerated memory operations, dispatching to architecture-specific implementations based on compile-time and runtime feature detection.

SIMD Copy Dispatch Flow

graph TB
    Entry["simd_copy(dst, src)"]
Check_x86["#[cfg(target_arch = 'x86_64')]\nCompile-time check"]
Check_arm["#[cfg(target_arch = 'aarch64')]\nCompile-time check"]
Detect_AVX2["is_x86_feature_detected!('avx2')\nRuntime detection"]
AVX2_Impl["simd_copy_x86(dst, src)\n32-byte chunks\n_mm256_loadu_si256\n_mm256_storeu_si256"]
NEON_Impl["simd_copy_arm(dst, src)\n16-byte chunks\nvld1q_u8\nvst1q_u8"]
Scalar_Fallback["copy_from_slice\nStandard Rust memcpy"]
Warning["LOG_ONCE.call_once\nWarn: AVX2 not detected"]
Entry --> Check_x86
 
   Entry --> Check_arm
 
   Check_x86 --> Detect_AVX2
 
   Detect_AVX2 -->|true| AVX2_Impl
 
   Detect_AVX2 -->|false| Warning
 
   Warning --> Scalar_Fallback
 
   Check_arm --> NEON_Impl
 
   Entry --> Scalar_Fallback
    
    style Entry fill:#f9f9f9,stroke:#333,stroke-width:2px
    style AVX2_Impl fill:#f0f0f0
    style NEON_Impl fill:#f0f0f0
    style Scalar_Fallback fill:#f0f0f0

Sources: src/storage_engine/simd_copy.rs:110-138

x86_64 AVX2 Implementation

The simd_copy_x86 function leverages AVX2 instructions for vectorized memory operations on x86_64 processors.

Function Signature and Safety

src/storage_engine/simd_copy.rs:32-35 defines the function with the #[target_feature(enable = "avx2")] attribute, which enables AVX2 code generation and marks the function as unsafe:

Chunked Copy Strategy

The implementation processes data in 32-byte chunks corresponding to the 256-bit AVX2 register width:

StepOperationIntrinsicDescription
1. Calculate chunkslen / 32N/ADetermines number of full 32-byte iterations
2. Load from source_mm256_loadu_si256src/storage_engine/simd_copy.rs47Unaligned load of 256 bits
3. Store to destination_mm256_storeu_si256src/storage_engine/simd_copy.rs55Unaligned store of 256 bits
4. Handle remaindercopy_from_slicesrc/storage_engine/simd_copy.rs61Scalar copy for remaining bytes

Memory Safety Guarantees

The implementation includes detailed safety comments (src/storage_engine/simd_copy.rs:42-56) documenting:

  • Buffer bounds validation (len calculated as minimum of dst.len() and src.len())
  • Pointer arithmetic guarantees (i bounded by chunks * 32 <= len)
  • Alignment handling via unaligned load/store instructions

Sources: src/storage_engine/simd_copy.rs:32-62

ARM NEON Implementation

The simd_copy_arm function provides vectorized operations for ARM AArch64 processors using the NEON instruction set.

Function Signature

src/storage_engine/simd_copy.rs:80-83 defines the ARM-specific implementation:

NEON Operation Pattern

NEON 16-byte Copy Cycle

The implementation (src/storage_engine/simd_copy.rs:83-108):

  1. Chunk Calculation : Divides length by 16 (NEON register width)
  2. Load Operation : Uses vld1q_u8 to read 16 bytes into a NEON register (src/storage_engine/simd_copy.rs94)
  3. Store Operation : Uses vst1q_u8 to write 16 bytes from register to destination (src/storage_engine/simd_copy.rs101)
  4. Remainder Handling : Scalar copy for any bytes not fitting in 16-byte chunks (src/storage_engine/simd_copy.rs107)

Sources: src/storage_engine/simd_copy.rs:80-108

Runtime Feature Detection

x86_64 Detection Mechanism

The x86_64 implementation uses Rust’s standard library feature detection:

x86_64 AVX2 Runtime Detection Flow

src/storage_engine/simd_copy.rs:114-124 implements the detection with logging:

  • The std::is_x86_feature_detected!("avx2") macro performs runtime CPUID checks
  • The LOG_ONCE static variable (src/storage_engine/simd_copy.rs8) ensures the warning is emitted only once
  • Fallback to scalar copy occurs transparently when AVX2 is unavailable

ARM Detection Strategy

ARM AArch64 does not provide standard runtime feature detection. The implementation assumes NEON availability on all AArch64 targets (src/storage_engine/simd_copy.rs:127-133), which is guaranteed by the ARMv8 architecture specification.

Sources: src/storage_engine/simd_copy.rs:4-8 src/storage_engine/simd_copy.rs:110-138

graph TD
    Start["simd_copy invoked"]
Layer1["Layer 1: Platform-specific SIMD\nAVX2 or NEON if available"]
Layer2["Layer 2: Runtime detection failure\nAVX2 not detected on x86_64"]
Layer3["Layer 3: Unsupported architecture\nNeither x86_64 nor aarch64"]
Scalar["copy_from_slice\nStandard Rust memcpy\nCompiler-optimized"]
Start --> Layer1
 
   Layer1 -->|No SIMD available| Layer2
 
   Layer2 -->|No runtime support| Layer3
 
   Layer3 --> Scalar
    
    style Scalar fill:#f0f0f0

Fallback Behavior

The system provides three fallback layers for environments without SIMD support:

Fallback Hierarchy

Fallback Decision Tree

Scalar Copy Implementation

src/storage_engine/simd_copy.rs:136-137 implements the final fallback:

This uses Rust’s standard library copy_from_slice, which:

  • Relies on LLVM’s optimized memcpy implementation
  • May use SIMD instructions if the compiler determines it’s beneficial
  • Provides a safe, portable baseline for all platforms

Sources: src/storage_engine/simd_copy.rs:136-137

graph TB
    subgraph "DataStore Write Path"
        Write["write(key, value)"]
Align["Calculate 64-byte alignment padding"]
Allocate["Allocate file space"]
Copy["simd_copy(dst, src)"]
Metadata["Write metadata\n(hash, prev_offset, crc32)"]
end
    
    subgraph "simd_copy Function"
        Dispatch["Platform dispatch"]
AVX2["AVX2: 32-byte chunks"]
NEON["NEON: 16-byte chunks"]
Scalar["Scalar fallback"]
end
    
    subgraph "Storage File"
        MMap["Memory-mapped region"]
Payload["64-byte aligned payload"]
end
    
 
   Write --> Align
 
   Align --> Allocate
 
   Allocate --> Copy
 
   Copy --> Dispatch
 
   Dispatch --> AVX2
 
   Dispatch --> NEON
 
   Dispatch --> Scalar
 
   AVX2 --> MMap
 
   NEON --> MMap
 
   Scalar --> MMap
 
   MMap --> Payload
 
   Copy --> Metadata
    
    style Copy fill:#f9f9f9,stroke:#333,stroke-width:2px
    style Dispatch fill:#f0f0f0

Integration with Storage Engine

The simd_copy function is invoked during write operations to efficiently copy user data into the memory-mapped storage file.

Usage Context

SIMD Integration in Write Path

The storage engine’s write operations leverage SIMD acceleration when copying payload data into the memory-mapped file. The 64-byte payload alignment (see Payload Alignment and Cache Efficiency) ensures that SIMD operations work with naturally aligned memory boundaries, maximizing cache efficiency.

Performance Impact

SIMD acceleration provides measurable benefits:

  • AVX2 (x86_64) : Processes 32 bytes per instruction vs. scalar’s 8 bytes (or less)
  • NEON (ARM) : Processes 16 bytes per instruction vs. scalar’s 8 bytes (or less)
  • Cache Efficiency : Larger transfer granularity reduces memory access overhead
  • Write Throughput : Directly improves write, batch_write, and write_stream performance

The actual performance gains are measured using the Criterion.rs benchmark suite (see Benchmarking).

Sources: src/storage_engine/simd_copy.rs:1-139 Cargo.toml8

Dependencies and Compiler Support

Architecture-Specific Intrinsics

The implementation imports platform-specific SIMD intrinsics:

ArchitectureImport StatementIntrinsics Used
x86_64use std::arch::x86_64::*; (src/storage_engine/simd_copy.rs11)__m256i, _mm256_loadu_si256, _mm256_storeu_si256
aarch64use std::arch::aarch64::*; (src/storage_engine/simd_copy.rs14)vld1q_u8, vst1q_u8

Build Configuration

The SIMD implementation requires no special feature flags in Cargo.toml:1-113 The code uses:

  • Compile-time conditional compilation (#[cfg(target_arch = "...")])
  • Runtime feature detection (x86_64 only)
  • Standard Rust toolchain support (no nightly features required)

The #[inline] attribute on all SIMD functions encourages the compiler to inline these hot-path operations, reducing function call overhead.

Sources: src/storage_engine/simd_copy.rs:10-14 src/storage_engine/simd_copy.rs:32-35 src/storage_engine/simd_copy.rs:80-83

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Payload Alignment and Cache Efficiency

Loading…

Payload Alignment and Cache Efficiency

Relevant source files

Purpose and Scope

This document explains the payload alignment strategy used by SIMD R Drive to optimize cache efficiency and enable zero-copy SIMD operations. It covers the PAYLOAD_ALIGNMENT constant, the pre-padding mechanism that ensures alignment, cache line optimization, and the testing infrastructure that validates alignment invariants.

For information about SIMD-accelerated operations themselves (vectorized copying and hashing), see SIMD Acceleration. For details on zero-copy memory access patterns, see Memory Management and Zero-Copy Access.


Overview

SIMD R Drive aligns all non-tombstone payloads to a fixed boundary defined by PAYLOAD_ALIGNMENT, currently set to 64 bytes. This alignment ensures that:

  • Payloads begin on CPU cache line boundaries (typically 64 bytes)
  • SIMD vector loads (SSE, AVX, AVX-512, NEON) can operate without crossing alignment boundaries
  • Zero-copy typed views (&[u32], &[u64], &[u128]) can be safely cast without additional copying

Sources: README.md:51-59 README.md:110-137


The PAYLOAD_ALIGNMENT Constant

Definition and Configuration

The alignment is controlled by two constants in the entry handle package:

ConstantValuePurpose
PAYLOAD_ALIGN_LOG26Log₂ of alignment (2⁶ = 64)
PAYLOAD_ALIGNMENT64Actual alignment in bytes

The PAYLOAD_ALIGNMENT value is calculated as 1 << PAYLOAD_ALIGN_LOG2, ensuring it is always a power of two. This constant determines where each payload begins in the storage file.

Diagram: PAYLOAD_ALIGNMENT Configuration and Storage Layout

graph LR
    subgraph "Configuration"
        LOG2["PAYLOAD_ALIGN_LOG2\n(constant = 6)"]
ALIGN["PAYLOAD_ALIGNMENT\n(1 &lt;&lt; 6 = 64)"]
end
    
    subgraph "Storage File"
        ENTRY1["Entry 1\n@ offset 0"]
PAD1["Pre-Pad\n(0-63 bytes)"]
PAYLOAD1["Payload 1\n@ 64-byte boundary"]
META1["Metadata\n(20 bytes)"]
PAD2["Pre-Pad"]
PAYLOAD2["Payload 2\n@ next 64-byte boundary"]
end
    
 
   LOG2 --> ALIGN
    ALIGN -.determines.-> PAD1
    ALIGN -.determines.-> PAD2
    
 
   ENTRY1 --> PAD1
 
   PAD1 --> PAYLOAD1
 
   PAYLOAD1 --> META1
 
   META1 --> PAD2
 
   PAD2 --> PAYLOAD2

The alignment constant can be modified by changing PAYLOAD_ALIGN_LOG2 in the constants file and rebuilding all components. However, this creates incompatibility with files written using different alignment values.

Sources: README.md59 CHANGELOG.md:64-67 simd-r-drive-entry-handle/src/debug_assert_aligned.rs:69-70


Cache Line Optimization

CPU Cache Architecture

Modern CPU cache lines are typically 64 bytes wide. When data is loaded from memory, the CPU fetches entire cache lines at once. Aligning payloads to 64-byte boundaries ensures:

  1. No cache line splits : Each payload begins at a cache line boundary, preventing a single logical read from spanning two cache lines
  2. Predictable cache behavior : Sequential reads traverse cache lines in order without fragmentation
  3. Reduced memory bandwidth : The CPU can prefetch entire cache lines efficiently

Alignment Benefit Matrix

Scenario16-byte Alignment64-byte Alignment
Cache line splits per payloadLikely (3.75x less aligned)Never (boundary-aligned)
SIMD load efficiencyGood for SSEOptimal for AVX/AVX-512
Prefetcher effectivenessModerateHigh
Memory bandwidth utilization~85-90%~95-98%

Sources: README.md53 CHANGELOG.md:27-30


SIMD Compatibility

Vector Instruction Requirements

Different SIMD instruction sets have varying alignment requirements:

SIMD ExtensionVector SizeTypical AlignmentSupported By 64-byte Alignment
SSE2128 bits (16 bytes)16-byte✅ Yes
AVX2256 bits (32 bytes)32-byte✅ Yes
AVX-512512 bits (64 bytes)64-byte✅ Yes
NEON (ARM)128 bits (16 bytes)16-byte✅ Yes
SVE (ARM)Variable (128-2048 bits)16-byte minimum✅ Yes

SIMD Load Operations

The alignment tests demonstrate safe SIMD operations using aligned loads:

Diagram: SIMD 64-byte Lane Loading

The test implementation at tests/alignment_tests.rs:69-95 demonstrates x86_64 SIMD loads using _mm_load_si128, while tests/alignment_tests.rs:97-122 shows aarch64 using vld1q_u8. Both safely load four 16-byte lanes from a 64-byte aligned payload.

Sources: tests/alignment_tests.rs:69-122 README.md:53-54


Pre-Padding Mechanism

Padding Calculation

To ensure each payload starts at a 64-byte boundary, the system inserts zero-filled pre-padding bytes before the payload. The padding length is calculated as:

pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)

Where:

  • prev_tail is the absolute file offset immediately after the previous entry’s metadata
  • The bitwise AND with (PAYLOAD_ALIGNMENT - 1) ensures the result is in range [0, PAYLOAD_ALIGNMENT - 1]
graph TB
    subgraph "Entry N-1"
        PREV_PAYLOAD["Payload\n(variable length)"]
PREV_META["Metadata\n(20 bytes)"]
end
    
    subgraph "Entry N Structure"
        PREPAD["Pre-Pad\n(0-63 zero bytes)"]
PAYLOAD["Payload\n(starts at 64-byte boundary)"]
KEYHASH["key_hash\n(8 bytes)"]
PREVOFF["prev_offset\n(8 bytes)"]
CRC["crc32c\n(4 bytes)"]
end
    
    subgraph "Alignment Validation"
        CHECK["payload_start %\nPAYLOAD_ALIGNMENT == 0"]
end
    
 
   PREV_PAYLOAD --> PREV_META
 
   PREV_META --> PREPAD
 
   PREPAD --> PAYLOAD
 
   PAYLOAD --> KEYHASH
 
   KEYHASH --> PREVOFF
 
   PREVOFF --> CRC
    
    PAYLOAD -.verified by.-> CHECK

Entry Structure with Pre-Padding

Diagram: Entry Structure with Pre-Padding

The prev_offset field stores the absolute file offset of the previous entry’s tail (end of metadata), allowing readers to calculate the pre-padding length by examining where the previous entry ended.

Sources: README.md:112-137 README.md:133-137


Alignment Evolution: From 16 to 64 Bytes

Version History

The payload alignment was increased in version 0.15.0-alpha:

VersionAlignmentRationale
≤ 0.13.x-alphaVariable (no alignment)Minimal storage overhead
0.14.0-alpha16 bytesSSE compatibility, basic alignment
0.15.0-alpha64 bytesCache line + AVX-512 optimization
graph TB
    subgraph "Pre-0.15 (16-byte)"
        OLD_WRITE["Writer\n(16-byte align)"]
OLD_FILE["Storage File\n(16-byte boundaries)"]
OLD_READ["Reader\n(expects 16-byte)"]
end
    
    subgraph "Post-0.15 (64-byte)"
        NEW_WRITE["Writer\n(64-byte align)"]
NEW_FILE["Storage File\n(64-byte boundaries)"]
NEW_READ["Reader\n(expects 64-byte)"]
end
    
    subgraph "Incompatibility"
        MISMATCH["Old reader\n+ New file\n= Parse Error"]
MISMATCH2["New reader\n+ Old file\n= Parse Error"]
end
    
 
   OLD_WRITE --> OLD_FILE
 
   OLD_FILE --> OLD_READ
    
 
   NEW_WRITE --> NEW_FILE
 
   NEW_FILE --> NEW_READ
    
    OLD_READ -.cannot read.-> NEW_FILE
    NEW_READ -.cannot read.-> OLD_FILE
    
 
   NEW_FILE --> MISMATCH
 
   OLD_FILE --> MISMATCH2

Breaking Change Impact

The alignment change in 0.15.0-alpha is a breaking change that affects file compatibility:

Diagram: Alignment Version Incompatibility

Migration Strategy

The changelog specifies a migration path at CHANGELOG.md:43-51:

  1. Read all entries using the old binary (with old alignment)
  2. Write entries into a fresh store using the new binary (with 64-byte alignment)
  3. Replace the old file after verification
  4. In multi-service environments, upgrade readers before writers to prevent parse errors

Sources: CHANGELOG.md:19-51 CHANGELOG.md:55-81


Alignment Testing and Validation

Debug-Only Assertions

The system includes two debug-only alignment validation functions that compile to no-ops in release builds:

Pointer Alignment Assertion

debug_assert_aligned(ptr: *const u8, align: usize) validates that a pointer is aligned to the specified boundary. Implementation at simd-r-drive-entry-handle/src/debug_assert_aligned.rs:26-43

Behavior:

  • Debug/test builds : Uses debug_assert! to verify (ptr as usize & (align - 1)) == 0
  • Release/bench builds : No-op with zero runtime cost

Offset Alignment Assertion

debug_assert_aligned_offset(off: u64) validates that a file offset is aligned to PAYLOAD_ALIGNMENT. Implementation at simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88

Behavior:

  • Debug/test builds : Verifies off.is_multiple_of(PAYLOAD_ALIGNMENT)
  • Release/bench builds : No-op with zero runtime cost

Comprehensive Alignment Test Suite

The alignment test at tests/alignment_tests.rs:1-245 validates multiple alignment scenarios:

Diagram: Alignment Test Coverage and Validation Flow

Test Implementation Details

The test verifies:

  1. Address alignment at tests/alignment_tests.rs:24-32: Confirms payload pointer is multiple of 64
  2. Type alignment at tests/alignment_tests.rs:35-56: Validates alignment sufficient for u32, u64, u128
  3. Bytemuck casting at tests/alignment_tests.rs:59-67: Proves zero-copy typed views work
  4. SIMD operations at tests/alignment_tests.rs:69-133: Executes actual SIMD loads on aligned data
  5. Iterator consistency at tests/alignment_tests.rs:236-243: Ensures all iterated entries are aligned

Sources: tests/alignment_tests.rs:1-245 simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-89


Performance Benefits

Zero-Copy Typed Views

The 64-byte alignment enables safe zero-copy reinterpretation of byte slices as typed slices without additional validation or copying:

Source TypeTarget TypeRequirementSatisfied by 64-byte Alignment
&[u8]&[u16]2-byte aligned✅ Yes (64 % 2 = 0)
&[u8]&[u32]4-byte aligned✅ Yes (64 % 4 = 0)
&[u8]&[u64]8-byte aligned✅ Yes (64 % 8 = 0)
&[u8]&[u128]16-byte aligned✅ Yes (64 % 16 = 0)

The README states at README.md:55-56: “When your payload length matches the element size, you can safely reinterpret the bytes as typed slices (e.g., &[u16], &[u32], &[u64], &[u128]) without copying.”

Practical Benefits Summary

From README.md59:

  • Cache-friendly zero-copy reads : Payloads align with CPU cache lines
  • Predictable SIMD performance : Vector operations never cross alignment boundaries
  • Simpler casting : No runtime alignment checks needed for typed views
  • Fewer fallback copies : Libraries like bytemuck can cast without allocation

Storage Overhead

The pre-padding mechanism adds variable overhead:

  • Worst case : 63 bytes of padding per entry (when previous tail is 1 byte before boundary)
  • Average case : ~31.5 bytes per entry (uniform distribution assumption)
  • Best case : 0 bytes (when previous tail already aligns)

For small payloads, this overhead can be significant. For large payloads (>>64 bytes), the overhead becomes negligible relative to payload size.

Sources: README.md:53-59 README.md110 tests/alignment_tests.rs:215-221


Integration with Arrow Buffers

When the arrow feature is enabled, EntryHandle provides methods to create Apache Arrow buffers that leverage alignment:

  • as_arrow_buffer(): Creates an Arrow buffer view without copying
  • into_arrow_buffer(): Converts into an Arrow buffer with alignment validation

Both methods include debug assertions to verify pointer and offset alignment at simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88 ensuring Arrow’s alignment requirements are met.

Sources: CHANGELOG.md:67-68 README.md59


CI/CD Validation

The GitHub Actions workflow at .github/workflows/rust-lint.yml:1-43 ensures alignment-related code passes:

  • Clippy lints : Validates unsafe SIMD code and alignment assertions
  • Format checks : Ensures consistent style in alignment-critical code
  • Documentation warnings : Catches missing docs for alignment APIs

The test workflow (referenced in the CI setup) runs alignment tests across multiple platforms (x86_64, aarch64) to verify SIMD compatibility on different architectures.

Sources: .github/workflows/rust-lint.yml:1-43


Summary

The 64-byte PAYLOAD_ALIGNMENT is a foundational design choice that:

  1. Aligns payloads with CPU cache lines for optimal memory access
  2. Satisfies alignment requirements for SSE, AVX, AVX-512, and NEON SIMD instructions
  3. Enables safe zero-copy casting to typed slices (&[u32], &[u64], etc.)
  4. Integrates seamlessly with Apache Arrow’s buffer requirements

The pre-padding mechanism transparently maintains this alignment while preserving the append-only storage model. Comprehensive testing validates alignment across write, delete, and overwrite scenarios, ensuring both correctness and performance optimization.

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Write and Read Modes

Loading…

Write and Read Modes

Relevant source files

Purpose and Scope

This document describes the different operation modes available in SIMD R Drive for writing and reading data. Each mode is optimized for specific use cases, offering different trade-offs between memory usage, I/O overhead, and concurrency. For information about SIMD acceleration used within these operations, see SIMD Acceleration. For details on payload alignment requirements, see Payload Alignment and Cache Efficiency.


Write Operation Modes

SIMD R Drive provides three distinct write modes, each optimized for different scenarios. All write operations acquire a write lock on the underlying file to ensure consistency.

Single Entry Write

The write() method writes a single key-value pair atomically with immediate disk flushing.

Method Signature: write(&self, key: &[u8], payload: &[u8]) -> Result<u64>

Characteristics:

  • Acquires RwLock<BufWriter<File>> for entire operation
  • Writes are flushed immediately via file.flush()
  • Each write performs file remapping and index update
  • Suitable for individual, isolated write operations

Internal Flow:

Sources: src/storage_engine/data_store.rs:827-834 src/storage_engine/data_store.rs:832-834

Batch Entry Write

The batch_write() method writes multiple key-value pairs in a single locked operation, reducing disk I/O overhead.

Method Signature: batch_write(&self, entries: &[(&[u8], &[u8])]) -> Result<u64>

Characteristics:

  • Acquires RwLock<BufWriter<File>> once for entire batch
  • All entries are buffered in memory before writing
  • Single file.flush() at end of batch
  • Single remapping and index update operation
  • Significantly more efficient for bulk writes

Internal Process:

StepOperationLock Held
1Hash all keys with compute_hash_batch()No
2Acquire write lockYes
3Build in-memory buffer with all entriesYes
4Calculate alignment padding for each entryYes
5Copy payloads using simd_copy()Yes
6Append all metadataYes
7Write entire buffer with file.write_all()Yes
8Flush with file.flush()Yes
9Call reindex() onceYes
10Release write lockNo

Sources: src/storage_engine/data_store.rs:838-843 src/storage_engine/data_store.rs:847-939 README.md:216-218

Streaming Write

The write_stream() method writes large data entries using a streaming Read source without requiring full in-memory allocation.

Method Signature: write_stream<R: Read>(&self, key: &[u8], reader: &mut R) -> Result<u64>

Characteristics:

  • Reads data in chunks of WRITE_STREAM_BUFFER_SIZE (8192 bytes)
  • Suitable for large files or data streams
  • Only one buffer’s worth of data in memory at a time
  • Computes CRC32 checksum incrementally
  • Single file.flush() after all chunks written

Streaming Flow:

Sources: src/storage_engine/data_store.rs:753-825 README.md:220-222 src/lib.rs:66-115


Read Operation Modes

SIMD R Drive provides multiple read modes optimized for different access patterns and performance requirements.

Direct Memory Access

The read() method retrieves stored data using zero-copy memory mapping, providing the most efficient access for individual entries.

Method Signature: read(&self, key: &[u8]) -> Result<Option<EntryHandle>>

Characteristics:

  • Zero-copy access via mmap
  • Returns EntryHandle wrapping Arc<Mmap> and byte range
  • No data copying - direct pointer into memory-mapped region
  • O(1) lookup via KeyIndexer hash table
  • Lock-free after index lookup completes

Read Path:

Sources: src/storage_engine/data_store.rs:1040-1049 src/storage_engine/data_store.rs:502-565 README.md:228-232

Batch Read

The batch_read() method efficiently retrieves multiple entries in a single operation, minimizing lock contention.

Method Signature: batch_read(&self, keys: &[&[u8]]) -> Result<Vec<Option<EntryHandle>>>

Characteristics:

  • Hashes all keys in batch using compute_hash_batch()
  • Acquires index read lock once for entire batch
  • Clones Arc<Mmap> once and reuses for all entries
  • Returns vector of optional EntryHandle objects
  • More efficient than individual read() calls

Batch Processing:

OperationComplexityLock Duration
Hash all keysO(n)No lock
Acquire index read lockO(1)Begin
Clone Arc<Mmap> onceO(1)Held
Lookup each hashO(n) averageHeld
Verify tagsO(n)Held
Create handlesO(n)Held
Release lockO(1)End

Sources: src/storage_engine/data_store.rs:1105-1109 src/storage_engine/data_store.rs:1111-1158

Streaming Read

The EntryStream wrapper provides incremental reading of large entries, avoiding high memory overhead.

Characteristics:

  • Implements std::io::Read trait
  • Reads data in configurable buffer chunks
  • Non-zero-copy - data is read through a buffer
  • Suitable for processing large entries incrementally
  • Useful when full entry doesn’t fit in memory

Usage Pattern:

Sources: README.md:234-240 src/lib.rs:86-92 src/storage_engine.rs:10-11

Parallel Iteration

The par_iter_entries() method provides Rayon-powered parallel iteration over all valid entries.

Method Signature (requiresparallel feature): par_iter_entries(&self) -> impl ParallelIterator<Item = EntryHandle>

Characteristics:

  • Only available with parallel feature flag
  • Uses Rayon’s parallel iterator infrastructure
  • Acquires index lock briefly to collect offsets
  • Releases lock before parallel processing begins
  • Each thread receives Arc<Mmap> clone for safe access
  • Automatically filters tombstones and duplicates
  • Ideal for bulk processing and analytics workloads

Parallel Execution Flow:

Sources: src/storage_engine/data_store.rs:296-361 README.md:242-246


Performance Characteristics Comparison

Write Mode Comparison

ModeLock DurationFlush FrequencyMemory UsageBest For
Single WritePer writePer writeLow (single entry)Individual updates, low throughput
Batch WritePer batchPer batchMedium (all entries buffered)Bulk imports, high throughput
Stream WritePer streamPer streamLow (8KB buffer)Large files, limited memory

Read Mode Comparison

ModeCopy BehaviorLock ContentionMemory OverheadBest For
Direct ReadZero-copyLow (brief lock)Minimal (Arc<Mmap>)Individual lookups, hot path
Batch ReadZero-copyVery low (single lock)Minimal (shared Arc<Mmap>)Multiple lookups at once
Stream ReadBuffered copyLow (brief lock)Medium (buffer size)Large entries, incremental processing
Parallel IterZero-copyVery low (brief lock)Medium (per-thread Arc<Mmap>)Full scans, analytics, multi-core

Lock Acquisition Patterns

Sources: src/storage_engine/data_store.rs:753-939 src/storage_engine/data_store.rs:1040-1158 README.md:208-246


Code Entity Mapping

Write Mode Function References

ModeTrait MethodImplementationKey Helper
SingleDataStoreWriter::write()data_store.rs:827-830write_with_key_hash()
BatchDataStoreWriter::batch_write()data_store.rs:838-843batch_write_with_key_hashes()
StreamDataStoreWriter::write_stream()data_store.rs:753-756write_stream_with_key_hash()

Read Mode Function References

ModeTrait MethodImplementationKey Helper
DirectDataStoreReader::read()data_store.rs:1040-1049read_entry_with_context()
BatchDataStoreReader::batch_read()data_store.rs:1105-1109batch_read_hashed_keys()
StreamEntryStream::from()storage_engine.rs:10-11N/A
ParallelDataStore::par_iter_entries()data_store.rs:297-361KeyIndexer::unpack()

Core Types

Sources: src/storage_engine/data_store.rs:1-1183 src/storage_engine.rs:1-25 src/storage_engine/entry_iterator.rs:1-128

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Benchmarking

Loading…

Benchmarking

Relevant source files

This document describes the performance benchmark suite for the SIMD R Drive storage engine. The benchmarks measure write throughput, read throughput (sequential, random, and batch), and concurrent access patterns under contention. For general performance optimization features, see Performance Optimizations. For information about SIMD acceleration techniques, see SIMD Acceleration.


Benchmark Suite Overview

The SIMD R Drive project includes two primary benchmark suites that measure different aspects of storage engine performance:

BenchmarkFilePurposeKey Metrics
Storage Benchmarkbenches/storage_benchmark.rsSingle-process throughput testingWrites/sec, Reads/sec for sequential, random, and batch operations
Contention Benchmarkbenches/contention_benchmark.rsMulti-threaded concurrent accessPerformance degradation under write contention

Both benchmarks are configured in Cargo.toml:57-63 and use Criterion.rs for statistical analysis, enabling detection of performance regressions across code changes.

Sources: Cargo.toml:36-63 benches/storage_benchmark.rs:1-234

graph TB
    subgraph "Benchmark Configuration"
        CARGO["Cargo.toml"]
CRITERION["criterion = 0.6.0"]
HARNESS["harness = false"]
end
    
    subgraph "Storage Benchmark"
        STORAGE_BIN["benches/storage_benchmark.rs"]
WRITE_BENCH["benchmark_append_entries()"]
SEQ_BENCH["benchmark_sequential_reads()"]
RAND_BENCH["benchmark_random_reads()"]
BATCH_BENCH["benchmark_batch_reads()"]
end
    
    subgraph "Contention Benchmark"
        CONTENTION_BIN["benches/contention_benchmark.rs"]
CONCURRENT_TESTS["Multi-threaded write tests"]
end
    
    subgraph "Core Operations Measured"
        BATCH_WRITE["DataStoreWriter::batch_write()"]
READ["DataStoreReader::read()"]
BATCH_READ["DataStoreReader::batch_read()"]
ITER["into_iter()"]
end
    
 
   CARGO --> CRITERION
 
   CARGO --> STORAGE_BIN
 
   CARGO --> CONTENTION_BIN
 
   CARGO --> HARNESS
    
 
   STORAGE_BIN --> WRITE_BENCH
 
   STORAGE_BIN --> SEQ_BENCH
 
   STORAGE_BIN --> RAND_BENCH
 
   STORAGE_BIN --> BATCH_BENCH
    
 
   WRITE_BENCH --> BATCH_WRITE
 
   SEQ_BENCH --> ITER
 
   RAND_BENCH --> READ
 
   BATCH_BENCH --> BATCH_READ
    
 
   CONTENTION_BIN --> CONCURRENT_TESTS
 
   CONCURRENT_TESTS --> BATCH_WRITE

Storage Benchmark Architecture

The storage benchmark (storage_benchmark.rs) is a standalone binary that measures single-process throughput across four operation types. It writes 1,000,000 entries and then exercises different read access patterns.

Benchmark Configuration Constants

The benchmark behavior is controlled by tunable constants defined at the top of the file:

ConstantValuePurpose
ENTRY_SIZE8 bytesSize of each value payload (stores u64 in little-endian)
WRITE_BATCH_SIZE1,024 entriesNumber of entries per batch_write call
READ_BATCH_SIZE1,024 entriesNumber of entries per batch_read call
NUM_ENTRIES1,000,000Total entries written during setup phase
NUM_RANDOM_CHECKS1,000,000Number of random single-key lookups
NUM_BATCH_CHECKS1,000,000Total entries verified via batch reads

Sources: benches/storage_benchmark.rs:16-26

Write Benchmark: benchmark_append_entries()

The write benchmark measures append-only write throughput using batched operations.

graph LR
    subgraph "Write Benchmark Flow"
        START["Start: Instant::now()"]
LOOP["Loop: 0..NUM_ENTRIES"]
BATCH["Accumulate in batch Vec"]
FLUSH["flush_batch()
every 1,024 entries"]
BATCH_WRITE["storage.batch_write(&refs)"]
CALC["Calculate writes/sec"]
end
    
 
   START --> LOOP
 
   LOOP --> BATCH
 
   BATCH --> FLUSH
 
   FLUSH --> BATCH_WRITE
 
   LOOP --> CALC
    
    subgraph "Data Generation"
        KEY["Key: bench-key-{i}"]
VALUE["Value: i.to_le_bytes()
in 8-byte buffer"]
end
    
 
   LOOP --> KEY
 
   LOOP --> VALUE
 
   KEY --> BATCH
 
   VALUE --> BATCH

The benchmark creates fixed-width 8-byte payloads containing the loop index as a little-endian u64. Entries are accumulated in a batch and flushed every 1,024 entries via batch_write(). The elapsed time is used to calculate writes per second.

Sources: benches/storage_benchmark.rs:52-83 benches/storage_benchmark.rs:85-92

Sequential Read Benchmark: benchmark_sequential_reads()

The sequential read benchmark measures zero-copy iteration performance by walking the entire storage file from newest to oldest entry.

graph TB
    subgraph "Sequential Read Pattern"
        ITER_START["storage.into_iter()"]
ENTRY["For each EntryHandle"]
DEREF["Dereference: &*entry"]
PARSE["u64::from_le_bytes()"]
VERIFY["assert_eq!(stored, expected)"]
end
    
    subgraph "Memory Access"
        MMAP["Memory-mapped file access"]
ZERO_COPY["Zero-copy EntryHandle"]
BACKWARD_CHAIN["Follow prev_offset backward"]
end
    
 
   ITER_START --> ENTRY
 
   ENTRY --> DEREF
 
   DEREF --> PARSE
 
   PARSE --> VERIFY
    
 
   ITER_START --> BACKWARD_CHAIN
 
   BACKWARD_CHAIN --> MMAP
 
   MMAP --> ZERO_COPY
 
   ZERO_COPY --> DEREF

This benchmark uses storage.into_iter() which implements the IntoIterator trait, traversing the backward-linked chain via prev_offset fields. Each entry is accessed via zero-copy memory mapping and validated by parsing the stored u64 value.

Sources: benches/storage_benchmark.rs:98-118

Random Read Benchmark: benchmark_random_reads()

The random read benchmark measures hash index lookup performance by performing 1,000,000 random single-key reads.

graph LR
    subgraph "Random Read Flow"
        RNG["rng.random_range(0..NUM_ENTRIES)"]
KEY_GEN["Generate key: bench-key-{i}"]
LOOKUP["storage.read(key.as_bytes())"]
UNWRAP["unwrap() -> EntryHandle"]
VALIDATE["Parse u64 and assert_eq!"]
end
    
    subgraph "Index Lookup Path"
        XXH3["XXH3 hash of key"]
DASHMAP["DashMap index lookup"]
TAG_CHECK["16-bit tag collision check"]
OFFSET["Extract 48-bit offset"]
MMAP_ACCESS["Memory-mapped access at offset"]
end
    
 
   RNG --> KEY_GEN
 
   KEY_GEN --> LOOKUP
 
   LOOKUP --> UNWRAP
 
   UNWRAP --> VALIDATE
    
 
   LOOKUP --> XXH3
 
   XXH3 --> DASHMAP
 
   DASHMAP --> TAG_CHECK
 
   TAG_CHECK --> OFFSET
 
   OFFSET --> MMAP_ACCESS
 
   MMAP_ACCESS --> UNWRAP

Each iteration generates a random index, constructs the corresponding key, and performs a single read() operation. This exercises the O(1) hash index lookup path including XXH3 hashing, DashMap access, tag-based collision detection, and memory-mapped file access.

Sources: benches/storage_benchmark.rs:124-149

Batch Read Benchmark: benchmark_batch_reads()

The batch read benchmark measures vectorized lookup performance by reading 1,024 keys at a time via batch_read().

graph TB
    subgraph "Batch Read Flow"
        ACCUMULATE["Accumulate 1,024 keys in Vec"]
CONVERT["Convert to Vec<&[u8]>"]
BATCH_CALL["storage.batch_read(&key_refs)"]
RESULTS["Process Vec<Option<EntryHandle>>"]
VERIFY["Verify each entry's payload"]
end
    
    subgraph "Batch Read Implementation"
        PARALLEL["Optional: Rayon parallel iterator"]
MULTI_LOOKUP["Multiple hash lookups"]
COLLECT["Collect into result Vec"]
end
    
    subgraph "Verification Logic"
        PARSE_KEY["Extract index from key suffix"]
PARSE_VALUE["u64::from_le_bytes(handle)"]
ASSERT["assert_eq!(stored, idx)"]
end
    
 
   ACCUMULATE --> CONVERT
 
   CONVERT --> BATCH_CALL
 
   BATCH_CALL --> RESULTS
 
   RESULTS --> VERIFY
    
 
   BATCH_CALL --> PARALLEL
 
   PARALLEL --> MULTI_LOOKUP
 
   MULTI_LOOKUP --> COLLECT
 
   COLLECT --> RESULTS
    
 
   VERIFY --> PARSE_KEY
 
   VERIFY --> PARSE_VALUE
 
   PARSE_KEY --> ASSERT
 
   PARSE_VALUE --> ASSERT

The benchmark accumulates keys in batches of 1,024 and invokes batch_read() which can optionally use the parallel feature to perform lookups concurrently via Rayon. The verification phase includes a fast numeric suffix parser that extracts the index from the key string without heap allocation.

Sources: benches/storage_benchmark.rs:155-181 benches/storage_benchmark.rs:183-202


graph LR
    subgraph "Rate Formatting Logic"
        INPUT["Input: f64 rate"]
SPLIT["Split into whole and fractional parts"]
ROUND["Round fractional to 3 decimals"]
CARRY["Handle 1000 rounding carry"]
SEPARATE["Comma-separate thousands"]
FORMAT["Output: 4,741,483.464"]
end
    
 
   INPUT --> SPLIT
 
   SPLIT --> ROUND
 
   ROUND --> CARRY
 
   CARRY --> SEPARATE
 
   SEPARATE --> FORMAT

Output Formatting

The benchmark produces human-readable output with formatted throughput numbers using the fmt_rate() utility function.

The fmt_rate() function formats rates with comma-separated thousands and exactly three decimal places. It uses the thousands crate’s separate_with_commas() method and handles edge cases where rounding produces 1000 in the fractional part.

Example output:

Wrote 1,000,000 entries of 8 bytes in 0.234s (4,273,504.273 writes/s)
Sequentially read 1,000,000 entries in 0.089s (11,235,955.056 reads/s)
Randomly read 1,000,000 entries in 0.532s (1,879,699.248 reads/s)
Batch-read verified 1,000,000 entries in 0.156s (6,410,256.410 reads/s)

Sources: benches/storage_benchmark.rs:204-233


Contention Benchmark

The contention benchmark (contention_benchmark.rs) measures performance degradation under concurrent write load. While the source file is not included in the provided files, it is referenced in Cargo.toml:61-63 and is designed to complement the concurrency tests shown in tests/concurrency_tests.rs:1-230

Expected Contention Scenarios

Based on the concurrency test patterns, the contention benchmark likely measures:

ScenarioDescriptionMeasured Metric
Concurrent WritesMultiple threads writing different keys simultaneouslyThroughput degradation under RwLock contention
Write SerializationEffect of RwLock serializing write operationsComparison vs. theoretical maximum (single-threaded)
Index ContentionDashMap update performance under concurrent loadLock-free read performance maintained
Streaming WritesConcurrent write_stream() calls with slow readersI/O bottleneck vs. lock contention

Concurrency Test Patterns

The tests/concurrency_tests.rs:1-230 file demonstrates three concurrency patterns that inform contention benchmarking:

Sources: tests/concurrency_tests.rs:111-161 tests/concurrency_tests.rs:163-229 tests/concurrency_tests.rs:14-109

graph TB
    subgraph "Concurrent Write Test"
        THREADS_WRITE["16 threads × 10 writes each"]
RWLOCK_WRITE["RwLock serializes writes"]
ATOMIC_UPDATE["AtomicU64 tail_offset updated"]
DASHMAP_UPDATE["DashMap index updated"]
end
    
    subgraph "Interleaved Read/Write Test"
        THREAD_A["Thread A: write → notify → read"]
THREAD_B["Thread B: wait → read → write → notify"]
SYNC["Tokio Notify synchronization"]
end
    
    subgraph "Streaming Write Test"
        SLOW_READER["SlowReader with artificial delay"]
CONCURRENT_STREAMS["2 concurrent write_stream()
calls"]
IO_BOUND["Tests I/O vs. lock contention"]
end
    
 
   THREADS_WRITE --> RWLOCK_WRITE
 
   RWLOCK_WRITE --> ATOMIC_UPDATE
 
   ATOMIC_UPDATE --> DASHMAP_UPDATE
    
 
   THREAD_A --> SYNC
 
   THREAD_B --> SYNC
    
 
   SLOW_READER --> CONCURRENT_STREAMS
 
   CONCURRENT_STREAMS --> IO_BOUND

Running Benchmarks

Command-Line Usage

Execute benchmarks using Cargo’s benchmark runner:

Benchmark Execution Flow

The benchmarks use harness = false in Cargo.toml:59-63 meaning they execute as standalone binaries rather than using Criterion’s default test harness. This allows for custom output formatting and fine-grained control over benchmark structure.

Sources: Cargo.toml:57-63 benches/storage_benchmark.rs:32-46


Metrics and Analysis

Performance Indicators

The benchmark suite measures the following key performance indicators:

MetricCalculationTypical RangeOptimization Focus
Write ThroughputNUM_ENTRIES / elapsed_time2-10M writes/secSIMD copy, batch sizing
Sequential Read ThroughputNUM_ENTRIES / elapsed_time5-20M reads/secMemory mapping, iterator overhead
Random Read ThroughputNUM_RANDOM_CHECKS / elapsed_time1-5M reads/secHash index lookup, cache efficiency
Batch Read ThroughputNUM_BATCH_CHECKS / elapsed_time3-12M reads/secParallel lookup, vectorization

Factors Affecting Performance

Sources: benches/storage_benchmark.rs:16-26 Cargo.toml:49-55


Integration with Development Workflow

Performance Regression Detection

While Criterion.rs is included as a dependency Cargo.toml39 the current benchmark implementations use custom timing via std::time::Instant rather than Criterion’s statistical framework. This provides:

  • Immediate feedback : Results printed directly to stdout during execution
  • Reproducibility : Fixed workload sizes and patterns for consistent comparison
  • Simplicity : No statistical overhead for quick performance checks

Benchmark Data for Tuning

The benchmark results inform optimization decisions:

OptimizationBenchmark ValidationExpected Impact
SIMD copy implementationWrite throughput increase2-4x improvement on AVX2 systems
64-byte alignment changeAll operations improve10-30% from cache-line alignment
parallel featureBatch read throughput2-4x on multi-core systems
DashMap vs RwLockRandom read throughputEliminates read lock contention

Sources: benches/storage_benchmark.rs:1-234 Cargo.toml:36-55

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Extensions and Utilities

Loading…

Extensions and Utilities

Relevant source files

This document covers the utility functions, helper modules, and constants provided by the SIMD R Drive ecosystem. These components include the simd-r-drive-extensions crate for higher-level storage operations, core utility functions in the main simd-r-drive crate, and shared constants from simd-r-drive-entry-handle.

For details on the core storage engine API, see DataStore API. For performance optimization features like SIMD acceleration, see SIMD Acceleration. For alignment-related architecture decisions, see Payload Alignment and Cache Efficiency.

Extensions Crate Overview

The simd-r-drive-extensions crate provides storage extensions and higher-level utilities built on top of the core simd-r-drive storage engine. It adds functionality for common storage patterns and data manipulation tasks.

graph TB
    subgraph "simd-r-drive-extensions"
        ExtCrate["simd-r-drive-extensions"]
ExtDeps["Dependencies:\n- bincode\n- serde\n- simd-r-drive\n- walkdir"]
end
    
    subgraph "Core Dependencies"
        Core["simd-r-drive"]
Bincode["bincode\nBinary Serialization"]
Serde["serde\nSerialization Traits"]
Walkdir["walkdir\nDirectory Traversal"]
end
    
 
   ExtCrate --> ExtDeps
 
   ExtDeps --> Core
 
   ExtDeps --> Bincode
 
   ExtDeps --> Serde
 
   ExtDeps --> Walkdir
    
 
   Core -.->|provides| DataStore["DataStore"]
Bincode -.->|enables| SerializationSupport["Structured Data Storage"]
Walkdir -.->|enables| FileSystemOps["File System Operations"]

Crate Structure

Sources: extensions/Cargo.toml:1-22

DependencyPurpose
bincodeBinary serialization/deserialization for structured data storage
serdeSerialization trait support with derive macros
simd-r-driveCore storage engine access
walkdirDirectory tree traversal utilities

Sources: extensions/Cargo.toml:13-17

Core Utilities Module

The main simd-r-drive crate exposes several utility functions through its utils module. These functions handle common tasks like alignment optimization, string formatting, and data validation.

graph TB
    subgraph "utils Module"
        UtilsRoot["src/utils.rs"]
AlignOrCopy["align_or_copy\nZero-Copy Optimization"]
AppendExt["append_extension\nString Path Handling"]
FormatBytes["format_bytes\nHuman-Readable Sizes"]
NamespaceHasher["NamespaceHasher\nHierarchical Keys"]
ParseBuffer["parse_buffer_size\nSize String Parsing"]
VerifyFile["verify_file_existence\nFile Validation"]
end
    
 
   UtilsRoot --> AlignOrCopy
 
   UtilsRoot --> AppendExt
 
   UtilsRoot --> FormatBytes
 
   UtilsRoot --> NamespaceHasher
 
   UtilsRoot --> ParseBuffer
 
   UtilsRoot --> VerifyFile
    
 
   AlignOrCopy -.->|used by| ReadOps["Read Operations"]
NamespaceHasher -.->|used by| KeyManagement["Key Management"]
FormatBytes -.->|used by| Logging["Logging & Reporting"]
ParseBuffer -.->|used by| Config["Configuration Parsing"]

Utility Functions Overview

Sources: src/utils.rs:1-17

align_or_copy Function

The align_or_copy utility function provides zero-copy deserialization with automatic fallback for misaligned data. It attempts to reinterpret a byte slice as a typed slice without copying, and falls back to manual decoding when alignment requirements are not met.

Function Signature

Sources: src/utils/align_or_copy.rs:44-50

Operation Flow

Sources: src/utils/align_or_copy.rs:44-73

Usage Patterns

ScenarioOutcomePerformance
Aligned 64-byte boundary, exact multipleCow::BorrowedZero-copy, optimal
Misaligned addressCow::OwnedAllocation + decode
Non-multiple of element sizePanicInvalid input

Example Usage:

Sources: src/utils/align_or_copy.rs:38-43 tests/align_or_copy_tests.rs:7-12

Safety Considerations

The function uses unsafe for the align_to::<T>() call, which requires:

  1. Starting address must be aligned to align_of::<T>()
  2. Total size must be a multiple of size_of::<T>()

These requirements are validated by checking that prefix and suffix slices are empty before returning the borrowed slice. If validation fails, the function falls back to safe manual decoding.

Sources: src/utils/align_or_copy.rs:28-35 src/utils/align_or_copy.rs:53-60

Other Utility Functions

FunctionModule PathPurpose
append_extensionsrc/utils/append_extension.rsSafely appends file extensions to paths
format_bytessrc/utils/format_bytes.rsFormats byte counts as human-readable strings (KB, MB, GB)
NamespaceHashersrc/utils/namespace_hasher.rsGenerates hierarchical, namespaced hash keys
parse_buffer_sizesrc/utils/parse_buffer_size.rsParses size strings like “64KB”, “1MB” into byte counts
verify_file_existencesrc/utils/verify_file_existence.rsValidates file paths before operations

Sources: src/utils.rs:1-17

Entry Handle Constants

The simd-r-drive-entry-handle crate defines shared constants used throughout the storage system. These constants establish the binary layout of entries and alignment requirements.

graph TB
    subgraph "simd-r-drive-entry-handle"
        LibRoot["lib.rs"]
ConstMod["constants.rs"]
EntryHandle["entry_handle.rs"]
EntryMetadata["entry_metadata.rs"]
DebugAssert["debug_assert_aligned.rs"]
end
    
    subgraph "Exported Constants"
        MetadataSize["METADATA_SIZE = 20"]
KeyHashRange["KEY_HASH_RANGE = 0..8"]
PrevOffsetRange["PREV_OFFSET_RANGE = 8..16"]
ChecksumRange["CHECKSUM_RANGE = 16..20"]
ChecksumLen["CHECKSUM_LEN = 4"]
PayloadLog["PAYLOAD_ALIGN_LOG2 = 6"]
PayloadAlign["PAYLOAD_ALIGNMENT = 64"]
end
    
 
   LibRoot --> ConstMod
 
   LibRoot --> EntryHandle
 
   LibRoot --> EntryMetadata
 
   LibRoot --> DebugAssert
    
 
   ConstMod --> MetadataSize
 
   ConstMod --> KeyHashRange
 
   ConstMod --> PrevOffsetRange
 
   ConstMod --> ChecksumRange
 
   ConstMod --> ChecksumLen
 
   ConstMod --> PayloadLog
 
   ConstMod --> PayloadAlign
    
 
   PayloadAlign -.->|ensures| CacheLineOpt["Cache-Line Optimization"]
PayloadAlign -.->|enables| SIMDOps["SIMD Operations"]

Constants Module Structure

Sources: simd-r-drive-entry-handle/src/lib.rs:1-10 simd-r-drive-entry-handle/src/constants.rs:1-19

Metadata Layout Constants

The following constants define the fixed 20-byte metadata structure at the end of each entry:

ConstantValueDescription
METADATA_SIZE20Total size of entry metadata in bytes
KEY_HASH_RANGE0..8Byte range for 64-bit XXH3 key hash
PREV_OFFSET_RANGE8..16Byte range for 64-bit previous entry offset
CHECKSUM_RANGE16..20Byte range for 32-bit CRC32C checksum
CHECKSUM_LEN4Explicit length of checksum field

Sources: simd-r-drive-entry-handle/src/constants.rs:3-11

Alignment Constants

These constants enforce 64-byte alignment for all payload data:

  • PAYLOAD_ALIGN_LOG2 : Base-2 logarithm of alignment requirement (6 = 64 bytes)
  • PAYLOAD_ALIGNMENT : Computed alignment value (64 bytes)

This alignment matches CPU cache line sizes and enables efficient SIMD operations. The maximum pre-padding per entry is PAYLOAD_ALIGNMENT - 1 (63 bytes).

Sources: simd-r-drive-entry-handle/src/constants.rs:13-18

Constant Relationships

Sources: simd-r-drive-entry-handle/src/constants.rs:1-19

sequenceDiagram
    participant Client
    participant EntryHandle
    participant align_or_copy
    participant Memory
    
    Client->>EntryHandle: get_payload_bytes()
    EntryHandle->>Memory: read &[u8] from mmap
    EntryHandle->>align_or_copy: align_or_copy<f32, 4>(bytes, f32::from_le_bytes)
    
    alt Aligned on 64-byte boundary
        align_or_copy->>Memory: validate alignment
        align_or_copy-->>Client: Cow::Borrowed(&[f32])
        Note over Client,Memory: Zero-copy: direct memory access\nelse Misaligned
        align_or_copy->>align_or_copy: chunks_exact(4)
        align_or_copy->>align_or_copy: map(f32::from_le_bytes)
        align_or_copy->>align_or_copy: collect into Vec<f32>
        align_or_copy-->>Client: Cow::Owned(Vec<f32>)
        Note over Client,align_or_copy: Fallback: allocated copy
    end

Common Patterns

Zero-Copy Data Access

Utilities like align_or_copy enable zero-copy access patterns when memory alignment allows:

Sources: src/utils/align_or_copy.rs:44-73 simd-r-drive-entry-handle/src/constants.rs:13-18

Namespace-Based Key Management

The NamespaceHasher utility enables hierarchical key organization:

Sources: src/utils.rs:11-12

Size Formatting for Logging

The format_bytes utility provides human-readable output:

Input BytesFormatted Output
1023“1023 B”
1024“1.00 KB”
1048576“1.00 MB”
1073741824“1.00 GB”

Sources: src/utils.rs:7-8

Configuration Parsing

The parse_buffer_size utility handles size string inputs:

Input StringParsed Bytes
“64”64
“64KB”65,536
“1MB”1,048,576
“2GB”2,147,483,648

Sources: src/utils.rs:13-14

Integration with Core Systems

Relationship to Storage Engine

Sources: extensions/Cargo.toml:1-22 src/utils.rs:1-17 simd-r-drive-entry-handle/src/lib.rs:1-10

Performance Considerations

UtilityPerformance ImpactUse Case
align_or_copyZero-copy when alignedDeserializing typed arrays from storage
NamespaceHasherSingle XXH3 hashGenerating hierarchical keys
format_bytesString allocationLogging and user display only
PAYLOAD_ALIGNMENTEnables SIMD opsCore storage layout requirement

Sources: src/utils/align_or_copy.rs:1-74 simd-r-drive-entry-handle/src/constants.rs:13-18

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Development Guide

Loading…

Development Guide

Relevant source files

This document provides an overview of development practices for the SIMD R Drive repository. It covers workspace organization, build processes, feature configuration, testing strategies, and CI/CD integration. This guide is intended for contributors and maintainers working on the codebase.

For detailed instructions on building and testing specific features, see Building and Testing. For CI/CD pipeline details, see CI/CD Pipeline. For version history and migration information, see Version History and Migration.


Workspace Organization

The repository uses a Cargo workspace structure with multiple interdependent packages. The workspace is defined in the root Cargo.toml and includes both core components and experimental features.

graph TB
    subgraph "Root Workspace"
        ROOT["Cargo.toml\nworkspace root"]
end
    
    subgraph "Core Packages"
        CORE["simd-r-drive\nmain storage engine"]
HANDLE["simd-r-drive-entry-handle\nentry abstraction"]
EXT["extensions\nutility functions"]
end
    
    subgraph "Experimental Packages"
        WS_SERVER["experiments/simd-r-drive-ws-server\nWebSocket RPC server"]
WS_CLIENT["experiments/simd-r-drive-ws-client\nWebSocket RPC client"]
SERVICE_DEF["experiments/simd-r-drive-muxio-service-definition\nRPC contract"]
end
    
    subgraph "Excluded from Workspace"
        PY_DIRECT["experiments/bindings/python\nPyO3 direct bindings"]
PY_WS["experiments/bindings/python-ws-client\nPython WebSocket client"]
end
    
 
   ROOT --> CORE
 
   ROOT --> HANDLE
 
   ROOT --> EXT
 
   ROOT --> WS_SERVER
 
   ROOT --> WS_CLIENT
 
   ROOT --> SERVICE_DEF
    
 
   CORE --> HANDLE
 
   WS_SERVER --> CORE
 
   WS_SERVER --> SERVICE_DEF
 
   WS_CLIENT --> SERVICE_DEF
    
    PY_DIRECT -.excluded.-> ROOT
    PY_WS -.excluded.-> ROOT

Workspace Structure

Sources: Cargo.toml:65-77

The workspace includes six member packages Cargo.toml:66-73 and excludes two Python binding packages Cargo.toml:74-77 which use their own build systems via maturin.

Package Dependencies

PackagePurposeKey Dependencies
simd-r-driveCore storage enginememmap2, dashmap, xxhash-rust, simd-r-drive-entry-handle
simd-r-drive-entry-handleEntry abstraction layerarrow (optional), memmap2
extensionsUtility functionssimd-r-drive, alignment helpers
simd-r-drive-ws-serverWebSocket serversimd-r-drive, muxio-tokio-rpc-server, tokio
simd-r-drive-ws-clientWebSocket clientmuxio-tokio-rpc-client, tokio
simd-r-drive-muxio-service-definitionRPC contractmuxio-rpc-service, bitcode

Sources: Cargo.toml:23-34 Cargo.toml:80-112


Feature Flags and Configuration

The core simd-r-drive package provides three optional feature flags that enable additional capabilities or internal API access.

graph LR
    subgraph "Available Features"
        DEFAULT["default = []\nbaseline features"]
PARALLEL["parallel\nenables rayon"]
EXPOSE["expose-internal-api\nexposes internals"]
ARROW["arrow\nproxy to entry-handle"]
end
    
    subgraph "Dependencies Enabled"
        RAYON["rayon = '1.10.0'"]
ARROW_DEP["arrow = '57.0.0'"]
end
    
    subgraph "Use Cases"
        UC_DEFAULT["Standard storage operations"]
UC_PARALLEL["Parallel batch operations"]
UC_EXPOSE["Testing/benchmarking access"]
UC_ARROW["Zero-copy Arrow buffers"]
end
    
 
   DEFAULT --> UC_DEFAULT
 
   PARALLEL --> RAYON
 
   RAYON --> UC_PARALLEL
 
   EXPOSE --> UC_EXPOSE
 
   ARROW --> ARROW_DEP
 
   ARROW_DEP --> UC_ARROW

Feature Flag Overview

Sources: Cargo.toml:49-55

Feature Flag Definitions

The feature flags are defined in Cargo.toml:49-55:

FeaturePurposeEnables
defaultBaseline configurationNo additional dependencies
parallelParallel batch operationsrayon dependency for multi-threaded processing
expose-internal-apiInternal API accessExposes internal types for testing/benchmarking
arrowApache Arrow integrationProxy flag to simd-r-drive-entry-handle/arrow

The arrow feature is a proxy feature Cargo.toml:54-55 that forwards to the simd-r-drive-entry-handle package, enabling zero-copy Apache Arrow buffer integration.

Sources: Cargo.toml:49-55


Development Workflow

The typical development workflow involves building, testing, and validating changes across multiple feature combinations before committing.

graph TB
    subgraph "Local Development"
        BUILD["cargo build --workspace"]
TEST["cargo test --workspace"]
BENCH_CHECK["cargo bench --workspace --no-run"]
CHECK["cargo check --workspace"]
end
    
    subgraph "Feature Testing"
        TEST_DEFAULT["cargo test"]
TEST_PARALLEL["cargo test --features parallel"]
TEST_EXPOSE["cargo test --features expose-internal-api"]
TEST_ALL["cargo test --all-features"]
end
    
    subgraph "Code Quality"
        FMT["cargo fmt --all -- --check"]
CLIPPY["cargo clippy --workspace"]
DENY["cargo deny check"]
AUDIT["cargo audit"]
end
    
 
   BUILD --> TEST
 
   TEST --> BENCH_CHECK
    
 
   TEST --> TEST_DEFAULT
 
   TEST --> TEST_PARALLEL
 
   TEST --> TEST_EXPOSE
 
   TEST --> TEST_ALL
    
 
   TEST --> FMT
 
   TEST --> CLIPPY
 
   CLIPPY --> DENY
 
   DENY --> AUDIT

Standard Development Commands

Sources: .github/workflows/rust-tests.yml:54-61

Workspace Commands

All workspace members can be built, tested, and checked simultaneously using workspace flags:

Individual packages can be built by navigating to their directory or using the -p flag:

Sources: .github/workflows/rust-tests.yml:54-57


Testing Strategy

The repository employs multiple testing approaches to ensure correctness and performance across different configurations and platforms.

graph TB
    subgraph "Unit Tests"
        UNIT_CORE["Core storage tests\nsrc/ inline #[test]"]
UNIT_HANDLE["EntryHandle tests\nsimd-r-drive-entry-handle"]
UNIT_EXT["Extension utility tests"]
end
    
    subgraph "Integration Tests"
        INT_STORAGE["Storage operations\ntests/ directory"]
INT_CONCURRENCY["Concurrency tests\nserial_test"]
INT_RPC["RPC integration\nexperiments/"]
end
    
    subgraph "Benchmarks"
        BENCH_STORAGE["storage_benchmark\nCriterion harness:false"]
BENCH_CONTENTION["contention_benchmark\nCriterion harness:false"]
end
    
    subgraph "Python Tests"
        PY_UNIT["pytest unit tests"]
PY_README["README example tests"]
PY_INT["Integration tests"]
end
    
 
   UNIT_CORE --> INT_STORAGE
 
   UNIT_HANDLE --> INT_STORAGE
 
   UNIT_EXT --> INT_STORAGE
    
 
   INT_STORAGE --> BENCH_STORAGE
 
   INT_CONCURRENCY --> BENCH_CONTENTION
    
 
   INT_RPC --> PY_UNIT
 
   PY_UNIT --> PY_README
 
   PY_README --> PY_INT

Test Hierarchy

Sources: Cargo.toml:36-63

Test Types

Test TypeLocationPurpose
Unit testssrc/ modules with #[test]Verify individual function correctness
Integration teststests/ directoryVerify component interactions
Benchmarksbenches/ directoryMeasure performance characteristics
Python testsexperiments/bindings/python*/tests/Verify Python bindings

The benchmark suite uses Criterion.rs with custom harness configuration Cargo.toml:57-63 to disable the default test harness and enable statistical analysis.

Sources: Cargo.toml:36-63


CI/CD Integration

The repository uses GitHub Actions for continuous integration, running tests across multiple operating systems and feature combinations.

CI Matrix Strategy

The test pipeline .github/workflows/rust-tests.yml:1-62 executes a matrix build strategy:

Sources: .github/workflows/rust-tests.yml:10-61

Matrix Configuration

The CI pipeline tests 18 total combinations (3 OS × 6 feature sets) .github/workflows/rust-tests.yml:14-30:

Feature SetFlags
Default"" (empty)
No Default Features--no-default-features
Parallel--features parallel
Expose Internal API--features expose-internal-api
Parallel + Expose API--features=parallel,expose-internal-api
All Features--all-features

The fail-fast: false configuration .github/workflows/rust-tests.yml15 ensures all matrix jobs complete even if one fails, providing comprehensive test coverage feedback.

Sources: .github/workflows/rust-tests.yml:14-30

Caching Strategy

The CI pipeline caches Cargo dependencies .github/workflows/rust-tests.yml:40-51 to reduce build times:

Sources: .github/workflows/rust-tests.yml:40-51

The cache key includes the OS, Cargo.lock hash, and feature flags, ensuring separate caches for different configurations while enabling reuse across builds with identical dependencies.


Version Management

The workspace uses unified versioning across all packages Cargo.toml:1-6:

FieldValue
Version0.15.5-alpha
Edition2024
Repositoryhttps://github.com/jzombie/rust-simd-r-drive
LicenseApache-2.0

All workspace members inherit these values via workspace inheritance Cargo.toml:14-21:

This ensures consistent versioning across the entire project. For detailed version history and migration guides, see Version History and Migration.

Sources: Cargo.toml:1-21


Ignored Files and Directories

The repository excludes certain files and directories from version control .gitignore:1-11:

PatternPurpose
**/targetRust build artifacts
*.binBinary data files (test/debug)
/dataLocal data directory for debugging
out.txtOutput file for experimentation
.cargo/config.tomlLocal Cargo configuration overrides

The /data directory .gitignore5 is specifically noted for debugging and experimentation purposes, allowing developers to maintain local test data without committing it.

Sources: .gitignore:1-11


Summary

The SIMD R Drive development environment is organized as a Cargo workspace with multiple packages, optional feature flags, comprehensive testing across platforms and configurations, and automated CI/CD validation. The workspace structure separates core functionality from experimental features, while unified versioning ensures consistency across all packages.

Key development practices include:

  • Building and testing all workspace members simultaneously
  • Testing multiple feature flag combinations locally before committing
  • Leveraging CI/CD matrix builds for comprehensive platform coverage
  • Using benchmarks with statistical analysis via Criterion.rs
  • Maintaining separate build systems for Python bindings

For specific build instructions and test execution details, refer to Building and Testing. For CI/CD pipeline configuration details, refer to CI/CD Pipeline. For version history and migration guidance, refer to Version History and Migration.

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Building and Testing

Loading…

Building and Testing

Relevant source files

This document covers building the Rust workspace, running tests, and using feature flags for the simd-r-drive storage engine. It provides instructions for building individual crates, executing test suites, and running benchmarks.

For information about CI/CD automation, see CI/CD Pipeline. For building Python bindings specifically, see Building Python Bindings.


Prerequisites

The project requires:

  • Rust toolchain (edition 2024, as specified in Cargo.toml4)
  • Cargo (workspace resolver version 2)
  • Platform support : Linux, macOS, Windows (all tested in CI)

Optional dependencies for specific features:

  • Rayon (for parallel feature)
  • Apache Arrow libraries (for arrow feature)
  • Tokio runtime (for async tests and network experiments)

Sources: Cargo.toml:1-113


Workspace Structure

The project is organized as a Cargo workspace with multiple member crates:

Workspace Members (built together with --workspace):

  • simd-r-drive - Core storage engine
  • simd-r-drive-entry-handle - Entry data structures
  • extensions - Helper utilities
  • Network experiment crates

Excluded Members (must be built separately):

  • Python binding directories use separate build systems (maturin/PyO3)

Sources: Cargo.toml:65-78


Building the Core Library

Basic Build

Build the core library with default features:

Build in release mode for production use:

Building the Entire Workspace

Build all workspace members:

Build all targets (lib, bins, tests, benches):

Sources: Cargo.toml:11-21 .github/workflows/rust-tests.yml:53-54


Feature Flags

The project uses Cargo feature flags to enable optional functionality:

Feature FlagPurposeDependencies Added
defaultBase functionality (empty)None
parallelMulti-threaded operationsrayon = "1.10.0"
arrowApache Arrow columnar dataarrow = "57.0.0" (via entry-handle)
expose-internal-apiExpose internal APIs for testingNone

Building with Features

Build with specific features:

Sources: Cargo.toml:49-55 Cargo.toml30


Running Tests

Test Suite Organization

The project includes multiple test types:

Test TypeLocationPurpose
Unit testssrc/**/*.rs (inline)Test individual functions/modules
Integration teststests/*.rsTest public API interactions
Concurrency teststests/concurrency_tests.rsTest thread-safety under load
Benchmarksbenches/*.rsPerformance measurements

Basic Test Execution

Run all tests in the workspace:

Run tests with verbose output:

Run tests for a specific crate:

Sources: .github/workflows/rust-tests.yml:56-57

Testing with Different Feature Combinations

The CI matrix tests 6 feature combinations across 3 operating systems:

Run tests matching CI configurations:

Sources: .github/workflows/rust-tests.yml:14-31


Concurrency Tests

The concurrency test suite validates thread-safety under high contention:

Concurrency Test Cases

The test suite includes three primary concurrency tests:

Test FunctionConfigurationPurpose
concurrent_write_test16 threads × 10 writesValidates concurrent writes don’t corrupt data
concurrent_slow_streamed_write_testMulti-threaded streamingTests concurrent write_stream with simulated latency
interleaved_read_write_testSynchronized read/writeValidates read-after-write consistency

Running Concurrency Tests

The tests use #[serial] annotation to prevent parallel execution (since they test shared state):

Test Requirements:

Sources: tests/concurrency_tests.rs:1-230


Benchmarking

Benchmark Suite

The project includes two benchmark suites:

Benchmark Configuration:

BenchmarkHarnessPurpose
storage_benchmarkfalse (custom)Micro-benchmarks for core operations
contention_benchmarkfalse (custom)Multi-threaded contention scenarios

Running Benchmarks

Compile benchmarks without running:

Run all benchmarks:

Run specific benchmark:

Storage Benchmark Operations

The storage_benchmark suite measures four operation types:

OperationTest SizeBatch SizePurpose
Append entries1,000,000 entries1,024 entries/batchWrite throughput
Sequential readsAll entriesN/A (iterator)Zero-copy iteration
Random reads1,000,000 lookups1 entry/lookupRandom access latency
Batch reads1,000,000 entries1,024 entries/batchVectorized read throughput

Key Constants:

Sources: Cargo.toml:57-63 benches/storage_benchmark.rs:1-234


Building the CLI Binary

The project includes a command-line interface:

Building the CLI

Build the CLI binary:

Run the CLI directly:

The CLI binary will be located at:

  • Debug: target/debug/simd-r-drive (or .exe on Windows)
  • Release: target/release/simd-r-drive

Sources: src/main.rs:1-12 Cargo.toml25


Cross-Platform Considerations

Platform-Specific Testing

The CI pipeline tests on three operating systems:

OSRunnerNotes
Linuxubuntu-latestPrimary development platform
macOSmacos-latestDarwin/BSD compatibility
Windowswindows-latestMSVC toolchain

Known Platform Differences

  1. File Locking : The BufWriter<File> uses platform-specific file locking Cargo.toml:23-24
  2. Memory Mapping : memmap2 handles platform differences in mmap APIs Cargo.toml29
  3. Path Separators : Tests use tempfile for cross-platform temporary paths Cargo.toml45

Running Platform-Specific Tests

Test on your local platform:

Sources: .github/workflows/rust-tests.yml:14-18


Development Workflow

Quick Test Commands

Common development commands:

CommandPurpose
cargo checkFast syntax/type checking
cargo testRun all tests
cargo test --test concurrency_testsRun concurrency tests only
cargo bench --no-runVerify benchmarks compile
cargo run -- --helpTest CLI binary

Caching Dependencies

The CI uses cargo caching to speed up builds:

Local development automatically uses Cargo’s built-in caching in these directories.

Sources: .github/workflows/rust-tests.yml:40-51

Test Data Management

Test isolation strategies:

  1. Temporary Files : Use tempfile::tempdir() for isolated test storage tests/concurrency_tests.rs37
  2. Serial Execution : Use #[serial] for tests sharing state tests/concurrency_tests.rs15
  3. Cleanup : Temporary files are automatically removed on drop tests/concurrency_tests.rs37

Sources: tests/concurrency_tests.rs:37-40 .gitignore:1-11


Test Debugging

Enabling Trace Logs

The CLI and tests use tracing for structured logging:

The main CLI initializes tracing with info level by default src/main.rs7

Test Output Verbosity

Capture test output:

Sources: src/main.rs7 Cargo.toml:32-33


Verifying Build Artifacts

Checking Benchmark Compilation

The CI ensures benchmarks compile even though they’re not run in CI:

This is important because benchmark compilation uses different code paths (criterion harness disabled) Cargo.toml59

Build All Targets

Verify all code compiles:

This builds:

  • Library crates (--lib)
  • Binary targets (--bins)
  • Test targets (--tests)
  • Benchmark targets (--benches)
  • Example code (--examples)

Sources: .github/workflows/rust-tests.yml:53-61


Document Sources:

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

CI/CD Pipeline

Loading…

CI/CD Pipeline

Relevant source files

Purpose and Scope

This document describes the continuous integration and continuous delivery (CI/CD) infrastructure for the SIMD R Drive project. The CI/CD pipeline is implemented using GitHub Actions workflows that automatically validate code quality, run tests across multiple platforms and feature combinations, and perform security audits on every commit and pull request.

For information about building and testing the project locally, see Building and Testing. For details about version management and breaking changes that may affect CI/CD configuration, see Version History and Migration.

Sources: .github/workflows/rust-tests.yml:1-62 .github/workflows/rust-lint.yml:1-44


Workflow Overview

The CI/CD pipeline consists of two primary GitHub Actions workflows:

WorkflowFilePrimary PurposeTrigger Events
Rust Tests.github/workflows/rust-tests.ymlMulti-platform testing across OS and feature combinationsPush to main, tags (v*), PRs to main
Rust Lint.github/workflows/rust-lint.ymlCode quality, formatting, and security checksAll pushes and pull requests

Both workflows run in parallel to provide comprehensive validation before code is merged.

Sources: .github/workflows/rust-tests.yml:1-9 .github/workflows/rust-lint.yml:1-3


CI/CD Architecture

CI/CD Architecture Overview

This diagram shows the complete CI/CD pipeline structure. The test workflow creates a matrix of 18 job combinations (3 operating systems × 6 feature configurations), while the lint workflow runs a sequential series of quality checks.

Sources: .github/workflows/rust-tests.yml:10-31 .github/workflows/rust-lint.yml:5-43


Test Workflow (rust-tests.yml)

Test Matrix Configuration

The test workflow uses a GitHub Actions matrix strategy to validate the codebase across multiple dimensions:

Test Matrix Execution Flow

graph LR
    subgraph "Operating Systems"
        OS_U["ubuntu-latest"]
OS_M["macos-latest"]
OS_W["windows-latest"]
end
    
    subgraph "Feature Configurations"
        F1["flags: empty\nDefault"]
F2["flags: --no-default-features\nNo Default Features"]
F3["flags: --features parallel\nParallel"]
F4["flags: --features expose-internal-api\nExpose Internal API"]
F5["flags: --features=parallel,expose-internal-api\nParallel + Expose API"]
F6["flags: --all-features\nAll Features"]
end
    
    subgraph "Test Steps"
        CHECKOUT["actions/checkout@v4"]
RUST_INSTALL["dtolnay/rust-toolchain@stable"]
CACHE["actions/cache@v4\nCargo dependencies"]
BUILD["cargo build --workspace --all-targets"]
TEST["cargo test --workspace --all-targets"]
BENCH["cargo bench --workspace --no-run"]
end
    
 
   OS_U --> F1
 
   OS_U --> F2
 
   OS_U --> F3
 
   OS_U --> F4
 
   OS_U --> F5
 
   OS_U --> F6
    
 
   F1 --> CHECKOUT
 
   CHECKOUT --> RUST_INSTALL
 
   RUST_INSTALL --> CACHE
 
   CACHE --> BUILD
 
   BUILD --> TEST
 
   TEST --> BENCH

Each of the 18 matrix combinations follows the same execution flow: checkout code, install Rust toolchain, restore cached dependencies, build all workspace targets, run all tests, and verify benchmarks compile.

Sources: .github/workflows/rust-tests.yml:14-31 .github/workflows/rust-tests.yml:33-61

Matrix Strategy Details

ParameterValuesPurpose
osubuntu-latest, macos-latest, windows-latestValidate cross-platform compatibility
fail-fastfalseContinue running all matrix jobs even if one fails
include.nameSee feature list belowDescriptive name for each feature combination
include.flagsCargo command-line flagsFeature flags passed to cargo build and cargo test

Feature Combinations:

  1. Default (flags: ""): Standard feature set with default features enabled
  2. No Default Features (flags: "--no-default-features"): Minimal build without optional features
  3. Parallel (flags: "--features parallel"): Enables parallel processing capabilities
  4. Expose Internal API (flags: "--features expose-internal-api"): Exposes internal APIs for testing/experimentation
  5. Parallel + Expose API (flags: "--features=parallel,expose-internal-api"): Combination of parallel and internal API features
  6. All Features (flags: "--all-features"): Enables all available features including arrow integration

Sources: .github/workflows/rust-tests.yml:18-30

Test Execution Steps

The test workflow executes the following steps for each matrix combination:

1. Repository Checkout

Uses GitHub’s official checkout action to clone the repository.

Sources: .github/workflows/rust-tests.yml:33-34

2. Rust Toolchain Installation

Installs the stable Rust toolchain using the dtolnay/rust-toolchain action.

Sources: .github/workflows/rust-tests.yml:36-37

3. Dependency Caching

Caches Cargo dependencies and build artifacts to speed up subsequent runs. The cache key includes:

  • Operating system (runner.os)
  • Cargo.lock file hash for dependency versioning
  • Matrix flags to separate caches for different feature combinations

Sources: .github/workflows/rust-tests.yml:39-51

4. Build

Builds all workspace members and all target types (lib, bin, tests, benches, examples) with the specified feature flags.

Sources: .github/workflows/rust-tests.yml:53-54

5. Test Execution

Runs all tests across the entire workspace with verbose output enabled.

Sources: .github/workflows/rust-tests.yml:56-57

6. Benchmark Compilation Check

Verifies that all benchmarks compile successfully without actually executing them. The --no-run flag ensures benchmarks are only compiled, not executed, which would be time-consuming in CI.

Sources: .github/workflows/rust-tests.yml:59-61


graph TB
    TRIGGER["Push or Pull Request"]
subgraph "Setup Steps"
        CHECKOUT["actions/checkout@v3"]
RUST_INSTALL["dtolnay/rust-toolchain@stable"]
COMPONENTS["rustup component add\nrustfmt clippy"]
TOOLS["cargo install\ncargo-deny cargo-audit"]
end
    
    subgraph "Quality Checks"
        FMT["cargo fmt --all -- --check\nVerify formatting"]
CLIPPY["cargo clippy --workspace\n--all-targets --all-features\nLint warnings"]
DOC["RUSTDOCFLAGS=-D warnings\ncargo doc --workspace\nDocumentation quality"]
end
    
    subgraph "Security Checks"
        DENY["cargo deny check\nLicense/dependency policy"]
AUDIT["cargo audit\nKnown vulnerabilities"]
end
    
 
   TRIGGER --> CHECKOUT
 
   CHECKOUT --> RUST_INSTALL
 
   RUST_INSTALL --> COMPONENTS
 
   COMPONENTS --> TOOLS
 
   TOOLS --> FMT
 
   FMT --> CLIPPY
 
   CLIPPY --> DOC
 
   DOC --> DENY
 
   DENY --> AUDIT

Lint Workflow (rust-lint.yml)

The lint workflow performs comprehensive code quality and security checks on a single platform (Ubuntu):

Lint Workflow Execution Graph

The lint workflow runs sequentially through setup, quality checks, and security audits. All checks must pass for the workflow to succeed.

Sources: .github/workflows/rust-lint.yml:1-44

Lint Steps Breakdown

1. Component Installation Workaround

This step addresses a GitHub Actions environment issue where rustfmt and clippy may not be automatically available. The workflow explicitly installs these components to ensure consistent behavior.

Sources: .github/workflows/rust-lint.yml:13-18

2. Tool Installation

Installs third-party Cargo subcommands:

  • cargo-deny : Validates dependency licenses, sources, and advisories against policy rules
  • cargo-audit : Checks dependencies against the RustSec Advisory Database for known security vulnerabilities

Sources: .github/workflows/rust-lint.yml:20-23

3. Format Verification

Verifies that all code follows Rust’s standard formatting conventions using rustfmt. The --check flag ensures the command fails if any files need reformatting without modifying them.

Sources: .github/workflows/rust-lint.yml:25-27

4. Clippy Linting

Runs Clippy, Rust’s official linter, with the following configuration:

  • --workspace: Lint all workspace members
  • --all-targets: Lint library, binaries, tests, benchmarks, and examples
  • --all-features: Enable all features when linting
  • -D warnings: Treat all warnings as errors, failing the build if any issues are found

Sources: .github/workflows/rust-lint.yml:29-31

5. Documentation Verification

Generates and validates documentation with strict checks:

  • RUSTDOCFLAGS="-D warnings": Treats documentation warnings as errors
  • --workspace: Document all workspace members
  • --no-deps: Only document workspace crates, not dependencies
  • --document-private-items: Include documentation for private items to ensure comprehensive coverage

Sources: .github/workflows/rust-lint.yml:33-35

6. Dependency Policy Enforcement

Validates dependencies against policy rules defined in a deny.toml configuration file (if present). This checks:

  • License compatibility
  • Banned/allowed crates
  • Advisory database for security issues
  • Source verification (crates.io, git repositories)

Sources: .github/workflows/rust-lint.yml:37-39

7. Security Audit

Scans Cargo.lock against the RustSec Advisory Database to identify dependencies with known security vulnerabilities. This provides early warning of security issues in the dependency tree.

Sources: .github/workflows/rust-lint.yml:41-43


Caching Strategy

The test workflow implements an intelligent caching strategy to reduce build times:

Cache Key Structure

${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}-${{ matrix.flags }}

Components:

  1. runner.os : Operating system (Linux, macOS, Windows)
  2. hashFiles('**/Cargo.lock') : Hash of all Cargo.lock files in the repository
  3. matrix.flags : Feature flag combination being tested

This multi-dimensional key ensures:

  • Different operating systems maintain separate caches
  • Cache invalidation occurs when dependencies change
  • Different feature combinations don’t share potentially incompatible build artifacts

Cached Directories

DirectoryContentsPurpose
~/.cargo/bin/Installed Cargo binariesReuse installed tools across runs
~/.cargo/registry/index/Crates.io registry indexAvoid re-downloading registry metadata
~/.cargo/registry/cache/Downloaded crate archivesSkip re-downloading crate source code
~/.cargo/git/db/Git dependenciesReuse git repository clones
target/Compiled artifactsSkip recompiling unchanged dependencies

Cache Restore Fallback

If an exact cache match is not found, the workflow attempts to restore a cache with a partial key match (same OS and Cargo.lock hash but different flags). This provides some benefit even when testing different feature combinations.

Sources: .github/workflows/rust-tests.yml:39-51


Workflow Triggers and Conditions

Test Workflow Triggers

The test workflow (rust-tests.yml) activates on:

Event TypeConditionPurpose
PushBranch: mainValidate main branch commits
PushTag: v*Validate release tag creation
Pull RequestTarget: mainPre-merge validation

This configuration ensures:

  • All changes to the main branch are tested
  • Release tags trigger comprehensive validation
  • Pull requests are validated before merge

Sources: .github/workflows/rust-tests.yml:3-8

Lint Workflow Triggers

The lint workflow (rust-lint.yml) activates on:

Event TypeConditionPurpose
PushAll branchesImmediate feedback on all commits
Pull RequestAll pull requestsPre-merge code quality validation

This broader trigger ensures code quality checks run on all development branches, not just main.

Sources: .github/workflows/rust-lint.yml3


Integration with Repository Configuration

Ignored Files and Directories

The CI/CD workflows respect the repository’s .gitignore configuration:

Key exclusions:

  • **/target : Build artifacts (handled by caching)
  • *.bin : Binary data files created by storage engine tests
  • /data : Debugging and experimentation directory
  • .cargo/config.toml : Local Cargo configuration overrides

Sources: .gitignore:1-11

Alignment Changes and CI Impact

The CI/CD pipeline automatically validates alignment-sensitive code across all platforms. Version 0.15.0 introduced a breaking change increasing PAYLOAD_ALIGNMENT from 16 to 64 bytes, which the CI validates through:

  1. Debug assertions in simd-r-drive-entry-handle/src/debug_assert_aligned.rs:

    • debug_assert_aligned(): Validates pointer alignment
    • debug_assert_aligned_offset(): Validates file offset alignment
  2. Cross-platform testing ensures alignment works correctly on:

    • x86_64 (AVX2 256-bit)
    • ARM (NEON 128-bit)
    • Both 32-bit and 64-bit architectures

The debug assertions compile to no-ops in release builds but provide comprehensive validation in CI test runs:

Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:26-43 CHANGELOG.md:25-51


Failure Modes and Debugging

Matrix Job Independence

The test workflow sets fail-fast: false, which means:

  • If one OS/feature combination fails, others continue to completion
  • Developers can see all failure patterns at once
  • Useful for identifying platform-specific or feature-specific issues

Sources: .github/workflows/rust-tests.yml15

Common Failure Scenarios

CheckFailure CauseResolution
cargo fmtCode not formattedRun cargo fmt --all locally
cargo clippyLinting violationsAddress warnings or allow with #[allow(clippy::...)]
cargo docDocumentation errorsFix broken doc comments or missing documentation
cargo denyDependency policy violationUpdate dependencies or adjust policy
cargo auditKnown vulnerabilityUpdate affected dependency or acknowledge advisory
Test matrix jobPlatform/feature-specific bugDebug locally with same OS and feature flags
Benchmark compilationBenchmark code errorFix benchmark code or feature gate

Debugging Failed Matrix Jobs

To reproduce a specific matrix job failure locally:

  1. Identify the failing OS and feature combination from the GitHub Actions log

  2. Use the exact command shown in the workflow:

  3. For cross-platform issues, use Docker or a VM matching the CI environment

Sources: .github/workflows/rust-tests.yml:53-57


CI/CD Pipeline Maintenance

Adding New Feature Flags

To add a new feature flag to the test matrix:

  1. Add a new entry to the matrix.include section in rust-tests.yml:

  2. Consider whether the feature should be included in the “All Features” test

  3. Update documentation if the feature has platform-specific behavior

Sources: .github/workflows/rust-tests.yml:18-30

Updating Toolchain Versions

Both workflows use GitHub Actions to manage Rust toolchain versions:

  • Stable toolchain : dtolnay/rust-toolchain@stable automatically tracks the latest stable release
  • Pinning a specific version : Replace @stable with @1.XX.X if needed
  • Nightly features : Change @stable to @nightly (may require additional stability considerations)

Sources: .github/workflows/rust-tests.yml:36-37 .github/workflows/rust-lint.yml11

Monitoring CI Performance

Key metrics to monitor:

  • Cache hit rate : Check if Cargo caches are being restored effectively
  • Build time trends : Monitor for increases that might indicate dependency bloat
  • Test execution time : Identify slow tests that could benefit from optimization
  • Matrix job duration : Ensure no single OS/feature combination becomes a bottleneck

GitHub Actions provides timing information for each step and job in the workflow run logs.

Dismiss

Refresh this wiki

Enter email to refresh


GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Version History and Migration

Loading…

Version History and Migration

Relevant source files

This page documents the version history of simd-r-drive, tracking breaking changes, file format evolution, and providing actionable migration guides for upgrading between versions. For information about building and testing procedures, see Building and Testing. For CI/CD pipeline details, see CI/CD Pipeline.

Versioning Strategy

The project follows a modified Semantic Versioning approach while in alpha status:

Version ComponentMeaning
0.MINOR.PATCH-alphaCurrent alpha phase format
MINOR incrementBreaking changes, including on-disk format changes
PATCH incrementNon-breaking changes, bug fixes, dependency updates
-alpha suffixIndicates pre-1.0 status with possible instability

Breaking Change Policy : Any change to the on-disk storage format or core API that prevents backward compatibility results in a MINOR version bump. The project maintains a strict policy of documenting all breaking changes in the changelog with migration instructions.

Sources : CHANGELOG.md:1-5

Version Timeline

The following diagram shows the evolution of major versions and their breaking changes:

Sources : CHANGELOG.md:19-82

timeline
    title "simd-r-drive Version History"
    section 0.14.x Series
        0.14.0-alpha : Introduced payload alignment
                     : Added pre-padding mechanism
                     : 16-byte default alignment
                     : BREAKING: Format incompatible with 0.13.x\nsection 0.15.x Series\n0.15.0-alpha : Increased alignment to 64 bytes\n: Added debug alignment assertions\n: BREAKING: Format incompatible with 0.14.x\n0.15.5-alpha : Arrow dependency bump to 57.0.0\n: No format changes

Breaking Changes History

Version 0.15.5-alpha (2025-10-27)

Type : Non-breaking maintenance release

Changes :

  • Apache Arrow dependency updated to version 57.0.0
  • Affects arrow feature flag only
  • No changes to DataStore, EntryHandle, or storage format
  • No migration required

Sources : CHANGELOG.md:19-22


Version 0.15.0-alpha (2025-09-25)

Type : BREAKING - On-disk format incompatible with 0.14.x

Critical Changes :

The PAYLOAD_ALIGNMENT constant in src/storage_engine/constants.rs increased from 16 bytes (log₂ = 4) to 64 bytes (log₂ = 6). This ensures safe zero-copy access for:

  • SSE: 16-byte operations
  • AVX2: 32-byte operations
  • AVX-512: 64-byte operations
  • CPU cache lines: 64 bytes on modern x86_64/ARM

Title : On-Disk Format Comparison: 0.14.x vs 0.15.x

Added Features :

Affected Code Entities :

EntityLocationChange
PAYLOAD_ALIGN_LOG2src/storage_engine/constants.rs46
PAYLOAD_ALIGNMENTsrc/storage_engine/constants.rs1664
Write path pre-padding calculationStorage engineUses new alignment value
Read path offset calculationStorage engineExpects new alignment
EntryMetadata parsingStorage engineUnchanged (20 bytes)

Technical Incompatibility Details :

When a 0.14.x reader opens a 0.15.x file:

  1. Incorrect offset calculation : Reader calculates payload_start = metadata_end + (16 - offset % 16) % 16
  2. Actual offset : File contains payload_start = metadata_end + (64 - offset % 64) % 64
  3. Result : Reader may access pre-pad bytes or miss payload start, causing:
    • CRC32 checksum failures
    • Deserialization errors
    • Silent data corruption if payload happens to parse

Title : Read Operation Failure in Mixed Versions

Sources : CHANGELOG.md:25-52 simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88


Version 0.14.0-alpha (2025-09-08)

Type : BREAKING - On-disk format incompatible with 0.13.x

Critical Changes :

First version to introduce configurable payload alignment via pre-padding mechanism. Payloads now guaranteed to start at addresses that are multiples of PAYLOAD_ALIGNMENT.

Title : Introduction of Pre-Padding in 0.14.0-alpha

New Constants Introduced :

ConstantValue (0.14.0)TypeDescription
PAYLOAD_ALIGN_LOG24u32Log₂ of alignment (2⁴ = 16)
PAYLOAD_ALIGNMENT16u64Derived as 1 << PAYLOAD_ALIGN_LOG2

New Feature Flags :

  • arrow: Enables Apache Arrow integration via EntryHandle::as_arrow_buffer() and into_arrow_buffer()

Added Methods (requires arrow feature):

  • EntryHandle::as_arrow_buffer(&self) -> arrow_buffer::Buffer
    • Creates zero-copy Arrow buffer view over payload
    • Requires payload alignment to satisfy Arrow’s requirements
  • EntryHandle::into_arrow_buffer(self) -> arrow_buffer::Buffer
    • Converts EntryHandle into Arrow buffer (consumes handle)

Pre-Padding Calculation Algorithm :

Given:
  - metadata_end: u64  // offset after EntryMetadata
  - PAYLOAD_ALIGNMENT: u64 = 16

Calculate:
  padding = (PAYLOAD_ALIGNMENT - (metadata_end % PAYLOAD_ALIGNMENT)) % PAYLOAD_ALIGNMENT
  payload_start = metadata_end + padding
  
Assert:
  payload_start % PAYLOAD_ALIGNMENT == 0

Technical Incompatibility Details :

0.13.x readers do not skip pre-padding bytes. When reading 0.14.x files:

  1. Reader expects payload immediately after metadata
  2. Reads pre-pad zero bytes as payload start
  3. Payload deserialization fails or produces corrupt data
  4. CRC32 checksum computed over wrong byte range

Sources : CHANGELOG.md:55-82

Migration Procedures

Migrating from 0.14.x to 0.15.x

Title : Migration Process: 0.14.x to 0.15.x

Detailed Migration Steps :

Step 1: Environment Setup

  • Compile 0.14.x binary: git checkout v0.14.0-alpha && cargo build --release
  • Compile 0.15.x binary: git checkout v0.15.0-alpha && cargo build --release
  • Verify disk space: new file size ≈ old file size × 1.1 (due to increased padding)
  • Backup: cp data.store data.store.backup

Step 2: Extract Data with 0.14.x Binary

Using DataStore API:

Step 3: Create New Store with 0.15.x Binary

Using DataStore::create():

Step 4: Verification Pass

Check all entries are readable and valid:

Step 5: Deployment Strategy

For single-process deployments:

  1. Stop process
  2. Run migration script
  3. Atomic file replacement: mv new.store data.store
  4. Restart process with 0.15.x binary

For multi-service deployments:

  1. Deploy all reader services to 0.15.x first (backward compatible reads)
  2. Stop all writer services
  3. Run migration on primary data store
  4. Verify migrated store
  5. Deploy writer services to 0.15.x
  6. Resume writes

Sources : CHANGELOG.md:43-51


Migrating from 0.13.x to 0.14.x

Step-by-Step Migration :

Technical Changes :

  • 0.13.x: No pre-padding, payload immediately follows EntryMetadata
  • 0.14.x: Pre-padding inserted, PAYLOAD_ALIGNMENT = 16 guaranteed

Migration Process :

  1. Data extraction (using 0.13.x binary):

  2. Store creation (using 0.14.x binary):

  3. Verification (using 0.14.x binary):

    • Read each key and validate CRC32
    • Compare payloads with original data

Service Deployment Order :

  • Stage 1 : Deploy readers with 0.14.x (can read old format)
  • Stage 2 : Migrate data store to new format
  • Stage 3 : Deploy writers with 0.14.x
  • Rationale : Prevents writers from creating 0.14.x files before readers can handle them

Sources : CHANGELOG.md:75-81

Compatibility Matrix

The following table shows version compatibility for readers and writers:

Writer VersionReader 0.13.xReader 0.14.xReader 0.15.x
0.13.x✅ Compatible✅ Compatible✅ Compatible
0.14.x❌ Breaks✅ Compatible✅ Compatible
0.15.x❌ Breaks❌ Breaks✅ Compatible

Legend :

  • ✅ Compatible: Reader can correctly parse writer’s format
  • ❌ Breaks: Reader cannot correctly parse writer’s format (data corruption risk)

Key Observations :

  • Newer readers are backward-compatible (can read older formats)
  • Older readers cannot read newer formats (forward compatibility not guaranteed)
  • Each MINOR version bump introduces a new on-disk format

Sources : CHANGELOG.md:25-82

File Format Version Detection

The storage engine does not embed a file format version marker in the data file. Version compatibility must be managed externally through:

  1. Deployment tracking : Maintain records of which binary version wrote each store
  2. File naming conventions : Include version in filename (e.g., data-v0.15.store)
  3. Metadata sidecars : Store version information in separate metadata files
  4. Service configuration : Configure services with expected format version

Limitations :

  • No automatic format detection at runtime
  • Mixed-version deployment requires careful orchestration
  • Checksum validation alone cannot detect version mismatches

Sources : CHANGELOG.md:1-82

Upgrade Strategies

Strategy 1: Blue-Green Deployment

Advantages :

  • Clean separation of old and new versions
  • Easy rollback if issues discovered
  • No mixed-version complexity

Disadvantages :

  • Requires duplicate infrastructure during migration
  • Data must be fully copied
  • Higher resource cost

Strategy 2: Rolling Upgrade (Reader-First)

Advantages :

  • Minimal infrastructure duplication
  • Gradual rollout reduces risk
  • Reader compatibility maintained throughout

Disadvantages :

  • Requires maintenance window for data migration
  • More complex orchestration
  • Must coordinate across multiple services

Sources : CHANGELOG.md:43-51 CHANGELOG.md:75-81

Alignment Configuration Reference

Title : Code Entity Mapping: Alignment System

Alignment Constants :

ConstantLocationTypeVersion HistoryDescription
PAYLOAD_ALIGN_LOG2src/storage_engine/constants.rsu320.14: 4, 0.15: 6Log₂ of alignment (2^n)
PAYLOAD_ALIGNMENTsrc/storage_engine/constants.rsu640.14: 16, 0.15: 64Computed as 1 << PAYLOAD_ALIGN_LOG2

Debug Assertion Functions :

FunctionLocationSignatureBehavior
debug_assert_aligned()simd-r-drive-entry-handle/src/debug_assert_aligned.rs:26-43fn(ptr: *const u8, align: usize)Asserts (ptr as usize) & (align - 1) == 0 in debug/test, no-op in release
debug_assert_aligned_offset()simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88fn(off: u64)Asserts off.is_multiple_of(PAYLOAD_ALIGNMENT) in debug/test, no-op in release

Function Implementation Details :

Both assertion functions use conditional compilation to ensure zero runtime cost in release builds:

Pre-Padding Calculation (used in write path):

Given:
  metadata_end: u64
  PAYLOAD_ALIGNMENT: u64 (16 or 64)

Compute:
  padding = (PAYLOAD_ALIGNMENT - (metadata_end % PAYLOAD_ALIGNMENT)) % PAYLOAD_ALIGNMENT
  payload_start = metadata_end + padding

Invariant:
  payload_start % PAYLOAD_ALIGNMENT == 0

Version-Specific Alignment Values :

VersionPAYLOAD_ALIGN_LOG2PAYLOAD_ALIGNMENTMax Pre-PadRationale
0.13.xN/AN/A0No alignment guarantees
0.14.x41615 bytesSSE compatibility (128-bit)
0.15.x66463 bytesAVX-512 + cache-line optimization

Sources : CHANGELOG.md:25-42 CHANGELOG.md:55-74 simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88

CI/CD Integration for Version Management

Title : CI/CD Pipeline Steps for Version Validation

Linting Steps (from .github/workflows/rust-lint.yml:1-44):

StepCommandPurposeVersion Impact
Format checkcargo fmt --all -- --checkEnforces formatting consistencyPrevents style regressions
Clippycargo clippy --workspace --all-targets --all-features -- -D warningsStatic analysis for bugs and anti-patternsCatches breaking API changes
DocumentationRUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps --document-private-itemsEnsures all public APIs documentedDocuments version-specific changes
Dependency checkcargo deny checkValidates licenses and bansPrevents supply chain issues
Security auditcargo auditScans for CVEs in dependenciesEnsures security compliance

Pre-Release Checklist :

Before incrementing version number in Cargo.toml:

  1. Run full lint suite: cargo fmt && cargo clippy --all-features
  2. Test all feature combinations (see CI/CD Pipeline)
  3. Update CHANGELOG.md:1-82 with changes:
    • List breaking changes under ### Breaking
    • Provide migration steps under ### Migration
    • Document affected code entities
  4. Verify backward compatibility claims
  5. Test migration procedure on sample data store

Testing Matrix Coverage : See CI/CD Pipeline for full matrix of OS and feature combinations tested.

Sources : .github/workflows/rust-lint.yml:1-44 CHANGELOG.md:1-16

Future Considerations

Toward 1.0 Release :

  • Embed format version marker in file header
  • Implement automatic format detection
  • Support multi-version reader capability
  • Define stable API surface with backward compatibility guarantees

Deprecation Policy (Post-1.0):

  • Major version bumps for breaking changes
  • Deprecated features maintained for one major version
  • Clear migration paths documented before removals

Sources : CHANGELOG.md:1-5

Dismiss

Refresh this wiki

Enter email to refresh