This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Overview
Loading…
Overview
Relevant source files
Purpose and Scope
This document provides a high-level introduction to the SIMD R Drive codebase, explaining its purpose as a high-performance, append-only storage engine and outlining its major architectural components. For detailed information about specific subsystems, see the corresponding sections: Core Storage Engine, Network Layer and RPC, Python Integration, Performance Optimizations, and Extensions and Utilities.
Sources: README.md:1-40 Cargo.toml:1-21
What is SIMD R Drive?
SIMD R Drive is a high-performance, thread-safe, append-only storage engine designed for zero-copy binary access. It stores arbitrary binary data in a single-file storage container where all payloads are written at fixed 64-byte aligned boundaries , optimizing for SIMD operations and cache-line efficiency.
The system operates schema-less —it treats all stored data as raw bytes (&[u8]) without enforcing serialization formats or endianness. This design provides maximum flexibility for applications requiring high-speed storage and retrieval of structured or unstructured binary data.
Key characteristics:
| Feature | Description |
|---|---|
| Storage Model | Single-file, append-only, key-value store |
| Access Pattern | Zero-copy reads via memory-mapped files (memmap2) |
| Alignment | 64-byte payload boundaries (configurable) |
| Indexing | O(1) hash-based lookups using xxh3_64 with SIMD acceleration |
| Concurrency | Thread-safe reads/writes using RwLock, AtomicU64, DashMap |
| Language Support | Rust (native), Python (PyO3 bindings), WebSocket RPC (experimental) |
Sources: README.md:5-87 Cargo.toml:12-21
High-Level System Architecture
The following diagram shows the complete system architecture, mapping high-level concepts to concrete code entities:
Diagram: System Architecture - Mapping Concepts to Code Entities
graph TB
subgraph "User Interfaces"
CLI["CLI Application\n(main.rs)"]
PY_BIND["Python Direct Bindings\n(simd-r-drive-py)"]
PY_WS["Python WebSocket Client\n(simd-r-drive-ws-client-py)"]
end
subgraph "Core Storage Engine (simd-r-drive)"
DS["DataStore\n(data_store/mod.rs)"]
READER["DataStoreReader trait"]
WRITER["DataStoreWriter trait"]
INDEX["KeyIndexer\n(key_indexer.rs)"]
end
subgraph "Entry Abstraction (simd-r-drive-entry-handle)"
EH["EntryHandle\n(entry_handle.rs)"]
META["EntryMetadata\n(entry_metadata.rs)"]
end
subgraph "Network Layer (Experimental)"
WS_SERVER["simd-r-drive-ws-server\n(WebSocket Server)"]
WS_CLIENT["simd-r-drive-ws-client\n(Native Rust Client)"]
SERVICE_DEF["simd-r-drive-muxio-service-definition\n(RPC Contract)"]
end
subgraph "Storage Backend"
MMAP["Memory-Mapped File\n(Arc<Mmap>)"]
FILE["Single Binary File\n(.bin)"]
end
subgraph "Performance Layer"
SIMD["SIMD Operations\n(simd_copy)"]
XXH3["xxh3_64 Hashing\n(KeyIndexer)"]
end
CLI --> DS
PY_BIND --> DS
PY_WS --> WS_CLIENT
WS_CLIENT --> SERVICE_DEF
SERVICE_DEF --> WS_SERVER
WS_SERVER --> DS
DS --> READER
DS --> WRITER
DS --> INDEX
DS --> EH
DS --> MMAP
EH --> META
EH --> MMAP
MMAP --> FILE
DS --> SIMD
INDEX --> XXH3
style DS fill:#f9f9f9,stroke:#333,stroke-width:2px
style EH fill:#f9f9f9,stroke:#333,stroke-width:2px
This diagram illustrates how user-facing interfaces connect to the core storage engine and supporting subsystems, using actual code entity names.
Sources: Cargo.toml:66-77 README.md:11-40
Repository Structure
The codebase is organized as a Cargo workspace with the following packages:
| Package | Path | Purpose |
|---|---|---|
| simd-r-drive | ./ | Core storage engine library and CLI |
| simd-r-drive-entry-handle | ./simd-r-drive-entry-handle | Entry abstraction layer for zero-copy access |
| simd-r-drive-extensions | ./extensions | Utility functions and helper APIs |
| simd-r-drive-ws-server | ./experiments/simd-r-drive-ws-server | WebSocket RPC server (experimental) |
| simd-r-drive-ws-client | ./experiments/simd-r-drive-ws-client | WebSocket RPC client (experimental) |
| simd-r-drive-muxio-service-definition | ./experiments/simd-r-drive-muxio-service-definition | Shared RPC service contract |
| Python bindings | ./experiments/bindings/python | PyO3-based Python direct bindings |
| Python WS client | ./experiments/bindings/python-ws-client | Python WebSocket client bindings |
For detailed information about the repository layout and package relationships, see Repository Structure.
Sources: Cargo.toml:66-77 README.md:259-265
graph TB
subgraph "Public API"
DST["DataStore struct\n(data_store/mod.rs)"]
READER_TRAIT["DataStoreReader trait\n(traits.rs)"]
WRITER_TRAIT["DataStoreWriter trait\n(traits.rs)"]
end
subgraph "Indexing Layer"
KI["KeyIndexer\n(key_indexer.rs)"]
DASHMAP["DashMap<u64, u64>\n(concurrent hash map)"]
XXH3_HASH["xxh3_64\n(key hashing)"]
end
subgraph "Entry Management"
EH_STRUCT["EntryHandle\n(entry_handle.rs)"]
EM_STRUCT["EntryMetadata\n(entry_metadata.rs)"]
PAYLOAD_ALIGN["PAYLOAD_ALIGNMENT\n(constants.rs)"]
end
subgraph "Storage Backend"
RWLOCK_FILE["RwLock<File>"]
MUTEX_MMAP["Mutex<Arc<Mmap>>"]
ATOMIC_TAIL["AtomicU64 tail_offset"]
end
subgraph "SIMD Acceleration"
SIMD_COPY["simd_copy\n(arch-specific impls)"]
AVX2["AVX2 impl (x86_64)"]
NEON["NEON impl (aarch64)"]
end
DST --> READER_TRAIT
DST --> WRITER_TRAIT
DST --> KI
DST --> RWLOCK_FILE
DST --> MUTEX_MMAP
DST --> ATOMIC_TAIL
KI --> DASHMAP
KI --> XXH3_HASH
READER_TRAIT --> EH_STRUCT
EH_STRUCT --> EM_STRUCT
EH_STRUCT --> PAYLOAD_ALIGN
WRITER_TRAIT --> SIMD_COPY
SIMD_COPY --> AVX2
SIMD_COPY --> NEON
style DST fill:#f9f9f9,stroke:#333,stroke-width:2px
style KI fill:#f9f9f9,stroke:#333,stroke-width:2px
Core Storage Components
The following diagram maps storage concepts to their implementing code entities:
Diagram: Core Storage Components - Code Entity Mapping
This diagram shows the relationship between storage concepts and their concrete implementations in the codebase.
Sources: README.md:172-183 Cargo.toml:23-34
Key Features Summary
Storage and Access Patterns
| Feature | Implementation Details |
|---|---|
| Zero-Copy Reads | Memory-mapped file access via memmap2 crate, EntryHandle provides &[u8] views |
| Append-Only Writes | Sequential writes to RwLock<File>, metadata follows payload immediately |
| 64-Byte Alignment | Configurable via PAYLOAD_ALIGNMENT constant in simd-r-drive-entry-handle/src/constants.rs |
| Backward-Linked Chain | Each entry contains prev_offset field, enabling recovery and validation |
| Tombstone Deletions | Single 0x00 byte + metadata marks deleted entries |
Sources: README.md:43-148
Concurrency Model
| Component | Synchronization Primitive | Purpose |
|---|---|---|
| File Writes | RwLock<File> | Serializes write operations |
| Tail Offset | AtomicU64 | Lock-free offset tracking |
| Key Index | DashMap<u64, u64> | Concurrent hash map for lock-free reads |
| Memory Map | Mutex<Arc<Mmap>> | Safe shared access to mmap |
For detailed concurrency semantics, see Concurrency and Thread Safety.
Sources: README.md:170-200
Write and Read Modes
Write Modes:
- Single Entry :
write()- atomic single key-value write - Batch Entry :
batch_write()- multiple writes with single flush - Streaming :
write_stream()- large entries viaReadtrait
Read Modes:
- Direct Memory Access :
read()- zero-copy viaEntryHandle - Streaming :
read_stream()- incremental reads for large entries - Parallel Iteration :
par_iter_entries()- Rayon-powered parallel scanning (requiresparallelfeature)
For detailed read/write APIs, see DataStore API.
Sources: README.md:208-247
SIMD and Performance Optimizations
SIMD R Drive employs multiple optimization strategies:
| Optimization | Implementation | Benefit |
|---|---|---|
| SIMD Memory Copy | simd_copy with AVX2/NEON | Faster buffer staging for writes |
| SIMD Hash Function | xxh3_64 with SSE2/AVX2/NEON | Accelerated key hashing |
| Cache-Line Alignment | 64-byte PAYLOAD_ALIGNMENT | Prevents cache-line splits |
| Lock-Free Reads | DashMap + Arc<Mmap> | Concurrent zero-copy reads |
| Sequential Writes | Append-only design | Minimized disk seeks |
For detailed performance information, see Performance Optimizations and SIMD Acceleration.
Sources: README.md:249-257 Cargo.toml34
Multi-Language Support
Native Rust
The core library is implemented in Rust and can be used directly via Cargo:
Sources: Cargo.toml:11-21
Python Bindings
Two experimental Python integration paths are available:
- Direct Bindings (
simd-r-drive-py): PyO3-based bindings for direct access toDataStore - WebSocket Client (
simd-r-drive-ws-client-py): Remote access via WebSocket RPC
For Python integration details, see Python Integration.
Sources: README.md:262-265 Cargo.toml:74-76
WebSocket RPC (Experimental)
The experimental network layer enables remote access:
- Server :
simd-r-drive-ws-server- ExposesDataStoreover WebSocket - Native Client :
simd-r-drive-ws-client- Rust client for WebSocket connection - Service Definition :
simd-r-drive-muxio-service-definition- Shared RPC contract usingbitcodeserialization
For network layer details, see Network Layer and RPC.
Sources: Cargo.toml:70-72 Cargo.toml:85-89
Feature Flags
The core simd-r-drive package supports the following Cargo features:
| Feature | Description |
|---|---|
parallel | Enables Rayon-powered parallel iteration via par_iter_entries() |
arrow | Enables Apache Arrow integration in simd-r-drive-entry-handle for zero-copy typed views |
expose-internal-api | Exposes internal APIs for advanced use cases (unstable) |
Sources: Cargo.toml:49-55
Dependencies Overview
Core Dependencies
| Crate | Version | Purpose |
|---|---|---|
memmap2 | 0.9.5 | Memory-mapped file access |
xxhash-rust | 0.8.15 | SIMD-accelerated hashing (xxh3_64) |
dashmap | 6.1.0 | Concurrent hash map for lock-free indexing |
crc32fast | 1.4.2 | Payload integrity verification |
rayon | 1.10.0 | Parallel iteration (optional, requires parallel feature) |
Network Layer Dependencies (Experimental)
| Crate | Version | Purpose |
|---|---|---|
muxio-tokio-rpc-server | 0.9.0-alpha | WebSocket RPC server framework |
muxio-tokio-rpc-client | 0.9.0-alpha | WebSocket RPC client framework |
bitcode | 0.6.6 | Compact binary serialization for RPC |
tokio | 1.45.1 | Async runtime for network operations |
Sources: Cargo.toml:23-34 Cargo.toml:80-112
Development and Testing
The repository includes:
- Unit Tests : Inline tests in each module
- Integration Tests :
tests/directory with full system tests - Benchmarks : Criterion-based benchmarks in
benches/(see Benchmarking) - CI/CD : GitHub Actions workflows for cross-platform testing (see CI/CD Pipeline)
Sources: Cargo.toml:36-63
Next Steps
This overview introduces the high-level architecture and key components of SIMD R Drive. For deeper exploration:
- Core Storage Mechanics : See Core Storage Engine for detailed information about
DataStore, storage format, and memory management - API Usage : See DataStore API for method documentation and usage patterns
- Performance Tuning : See Performance Optimizations for SIMD usage, alignment, and benchmarking
- Python Integration : See Python Integration for binding usage and WebSocket client examples
- Building and Testing : See Development Guide for build instructions and contribution guidelines
Sources: README.md:1-285
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Core Storage Engine
Loading…
Core Storage Engine
Relevant source files
Purpose and Scope
This document provides an overview of the core storage engine architecture in SIMD R Drive. It covers the fundamental design principles, main components, and data flow patterns that enable high-performance, append-only binary storage.
For detailed information about specific aspects of the storage engine:
Sources: README.md:1-50 src/lib.rs:1-28
System Architecture
The core storage engine is implemented as a single-file, append-only key-value store. It consists of four primary components that work together to provide high-performance binary storage with zero-copy read access.
Diagram: Core Storage Engine Components
graph TB
subgraph "Public API Layer"
DS["DataStore"]
DSR["DataStoreReader trait"]
DSW["DataStoreWriter trait"]
end
subgraph "Indexing Layer"
KI["KeyIndexer"]
DASHMAP["DashMap<u64, u64>"]
XXH3["xxh3_64 hasher"]
end
subgraph "Storage Layer"
MMAP["Arc<Mmap>"]
FILE["Single binary file"]
RWLOCK["RwLock<File>"]
end
subgraph "Access Layer"
EH["EntryHandle"]
EI["EntryIterator"]
ES["EntryStream"]
end
DS --> DSR
DS --> DSW
DS --> KI
DS --> MMAP
DS --> RWLOCK
KI --> DASHMAP
KI --> XXH3
DSR --> EH
DSR --> EI
DSW --> RWLOCK
EH --> MMAP
EI --> MMAP
ES --> EH
MMAP --> FILE
RWLOCK --> FILE
The diagram shows the relationship between the main code entities:
| Component | Code Entity | Purpose |
|---|---|---|
| Public API | DataStore | Main interface for storage operations |
| Traits | DataStoreReader, DataStoreWriter | Separate read/write capabilities |
| Indexing | KeyIndexer | Manages in-memory hash index |
| Hash Map | DashMap<u64, u64> | Lock-free concurrent hash map |
| Hashing | xxh3_64 | SIMD-accelerated hash function |
| Memory Mapping | Arc<Mmap> | Shared memory-mapped file reference |
| File Access | RwLock<File> | Synchronized write access |
| Entry Access | EntryHandle | Zero-copy view into mapped memory |
| Iteration | EntryIterator | Backward chain traversal |
| Streaming | EntryStream | Buffered reading for large entries |
Sources: src/storage_engine.rs:1-25 src/lib.rs:129-136 README.md:5-11
Append-Only Design
The storage engine follows a strict append-only model where data is never modified or deleted in place. All operations result in new entries being appended to the end of the file.
Diagram: Write Path Data Flow
graph LR
subgraph "Write Operations"
W["write()"]
BW["batch_write()"]
WS["write_stream()"]
DEL["delete()"]
end
subgraph "Internal Write Path"
HASH["Calculate xxh3_64 hash"]
ALIGN["Calculate prepad for\n64-byte alignment"]
COPY["simd_copy payload"]
APPEND["Append to file"]
META["Write metadata:\nkey_hash, prev_offset, crc32"]
end
subgraph "File State"
TAIL["AtomicU64 tail_offset"]
CHAIN["Backward-linked chain"]
end
W --> HASH
BW --> HASH
WS --> HASH
DEL --> HASH
HASH --> ALIGN
ALIGN --> COPY
COPY --> APPEND
APPEND --> META
META --> TAIL
META --> CHAIN
Key Characteristics
| Characteristic | Implementation |
|---|---|
| Immutability | Entries are never modified after writing |
| Overwrites | New entries with same key supersede old ones |
| Deletions | Append tombstone marker (single 0x00 byte) |
| File Growth | File grows monotonically until compaction |
| Ordering | Maintains temporal order via prev_offset chain |
| Recovery | Incomplete writes detected via chain validation |
The append-only design provides several benefits:
- Crash safety : Incomplete writes can be detected and discarded
- Simplified concurrency : No in-place modifications to coordinate
- Time-travel : Historical data remains until compaction
- Write performance : Sequential I/O with no seek overhead
Sources: README.md:98-147 src/lib.rs:3-17
Single-File Storage Container
All data is stored in a single binary file with a specific structure designed for efficient access and validation.
Diagram: File Organization and Backward-Linked Chain
graph TB
subgraph "File Structure"
START["File Start: offset 0"]
E1["Entry 1\nprepad + payload + metadata"]
E2["Entry 2\nprepad + payload + metadata"]
E3["Entry 3\nprepad + payload + metadata"]
EN["Entry N\nprepad + payload + metadata"]
TAIL["tail_offset\nEnd of valid data"]
end
subgraph "Metadata Chain"
M1["metadata.prev_offset = 0"]
M2["metadata.prev_offset = end(E1)"]
M3["metadata.prev_offset = end(E2)"]
MN["metadata.prev_offset = end(E3)"]
end
START --> E1
E1 --> E2
E2 --> E3
E3 --> EN
EN --> TAIL
E1 -.-> M1
E2 -.-> M2
E3 -.-> M3
EN -.-> MN
MN -.backward chain.-> M3
M3 -.backward chain.-> M2
M2 -.backward chain.-> M1
Storage Properties
| Property | Value | Purpose |
|---|---|---|
| File Type | Single binary file | Simplified management and deployment |
| Entry Alignment | 64-byte boundaries | Cache-line and SIMD optimization |
| Metadata Size | 20 bytes | key_hash (8) + prev_offset (8) + crc32 (4) |
| Chain Direction | Backward (tail to head) | Fast validation and recovery |
| Maximum Size | 256 TiB | 48-bit offset support |
| Format | Schema-less | Raw binary with no interpretation |
Entry Composition
Each entry consists of three parts:
- Pre-padding (0-63 bytes): Zero bytes to align payload start to 64-byte boundary
- Payload (variable length): Raw binary data
- Metadata (20 bytes): Hash, previous offset, checksum
Tombstone entries (deletions) use a minimal format:
- 1-byte payload (
0x00) - 20-byte metadata
Sources: README.md:104-138 README.md:61-97
DataStore: Primary Interface
DataStore is the main public interface providing all storage operations. It implements the DataStoreReader and DataStoreWriter traits to separate read and write capabilities.
Diagram: DataStore Structure and Methods
graph TB
subgraph "DataStore struct"
FILE_LOCK["file: RwLock<File>"]
MMAP_LOCK["mmap: Mutex<Arc<Mmap>>"]
INDEXER["indexer: KeyIndexer"]
TAIL["tail_offset: AtomicU64"]
end
subgraph "DataStoreWriter methods"
W1["write(key, value)"]
W2["batch_write(entries)"]
W3["write_stream(key, reader)"]
W4["delete(key)"]
end
subgraph "DataStoreReader methods"
R1["read(key) -> Option<EntryHandle>"]
R2["batch_read(keys)"]
R3["iter_entries() -> EntryIterator"]
R4["contains_key(key) -> bool"]
end
subgraph "Maintenance methods"
M1["compact() -> Stats"]
M2["estimate_compaction_space()"]
M3["verify_file_integrity()"]
end
FILE_LOCK --> W1
FILE_LOCK --> W2
FILE_LOCK --> W3
FILE_LOCK --> W4
MMAP_LOCK --> R1
MMAP_LOCK --> R2
MMAP_LOCK --> R3
INDEXER --> R1
INDEXER --> R2
INDEXER --> R4
TAIL --> W1
TAIL --> W2
TAIL --> W3
Core Fields
The DataStore struct maintains four critical fields:
| Field | Type | Purpose |
|---|---|---|
file | RwLock<File> | Serializes write operations |
mmap | Mutex<Arc<Mmap>> | Protects memory map updates |
indexer | KeyIndexer | Provides O(1) key lookups |
tail_offset | AtomicU64 | Tracks file end without locks |
Sources: src/storage_engine.rs:4-5 src/lib.rs:19-63
graph LR
subgraph "Read Request"
KEY["Key bytes"]
HASH_KEY["xxh3_64(key)"]
end
subgraph "Index Lookup"
DASHMAP_GET["DashMap.get(hash)"]
PACKED["Packed value:\n16-bit tag + 48-bit offset"]
TAG_CHECK["Verify 16-bit tag"]
end
subgraph "Memory Access"
OFFSET["File offset"]
MMAP_SLICE["Arc<Mmap> slice"]
METADATA_READ["Read EntryMetadata"]
PAYLOAD_RANGE["Calculate payload range"]
end
subgraph "Result"
EH["EntryHandle\n(zero-copy view)"]
end
KEY --> HASH_KEY
HASH_KEY --> DASHMAP_GET
DASHMAP_GET --> PACKED
PACKED --> TAG_CHECK
TAG_CHECK --> OFFSET
OFFSET --> MMAP_SLICE
MMAP_SLICE --> METADATA_READ
METADATA_READ --> PAYLOAD_RANGE
PAYLOAD_RANGE --> EH
Zero-Copy Read Path
Read operations leverage memory-mapped files to provide zero-copy access to stored data without deserialization overhead.
Diagram: Zero-Copy Read Operation Flow
Read Performance Characteristics
| Operation | Complexity | Notes |
|---|---|---|
| Single read | O(1) | Hash index lookup + memory access |
| Batch read | O(n) | Independent lookups, parallelizable |
| Full iteration | O(m) | m = total entries, follows chain |
| Collision handling | O(1) | 16-bit tag check |
EntryHandle
The EntryHandle struct provides a zero-copy view into the memory-mapped file:
Methods like as_slice(), as_bytes(), and streaming conversion allow direct access to payload data without copying.
Sources: README.md:43-50 README.md:228-231 src/storage_engine/entry_iterator.rs:8-21
graph TB
subgraph "Key to Hash"
K1["Key: 'user:1234'"]
H1["xxh3_64 hash:\n0xABCDEF0123456789"]
end
subgraph "Hash to Packed Value"
TAG["Extract 16-bit tag:\n0xABCD"]
OFF["Extract file offset:\n0x00EF0123456789"]
PACKED_VAL["Packed 64-bit value:\ntag:16 / offset:48"]
end
subgraph "DashMap Storage"
DM["DashMap<u64, u64>"]
ENTRY["hash -> packed_value"]
end
subgraph "Collision Detection"
LOOKUP["On read: lookup hash"]
VERIFY["Verify 16-bit tag matches"]
RESOLVE["If mismatch: collision detected"]
end
K1 --> H1
H1 --> TAG
H1 --> OFF
TAG --> PACKED_VAL
OFF --> PACKED_VAL
PACKED_VAL --> ENTRY
ENTRY --> DM
DM --> LOOKUP
LOOKUP --> VERIFY
VERIFY --> RESOLVE
Key Indexing System
The KeyIndexer maintains an in-memory hash index for O(1) key lookups using the xxh3_64 hashing algorithm with SIMD acceleration.
Diagram: Key Indexing and Collision Detection
Index Structure
| Component | Type | Size | Purpose |
|---|---|---|---|
| Hash Map | DashMap<u64, u64> | Dynamic | Lock-free concurrent access |
| Key Hash | u64 | 8 bytes | Full xxh3_64 hash of key |
| Packed Value | u64 | 8 bytes | 16-bit tag + 48-bit offset |
| Tag | u16 | 2 bytes | Collision detection |
| Offset | u48 | 6 bytes | File location (0-256 TiB) |
Hash Algorithm Properties
The xxh3_64 algorithm provides:
- SIMD acceleration : Uses SSE2/AVX2 (x86_64) or NEON (ARM)
- High quality : Low collision probability
- Performance : Optimized for throughput
- Stability : Consistent across platforms
Sources: README.md:158-168 src/storage_engine.rs:13-14
graph TB
subgraph "Concurrent Reads"
T1["Thread 1 read()"]
T2["Thread 2 read()"]
T3["Thread 3 read()"]
DM_READ["DashMap lock-free read"]
MMAP_SHARE["Shared Arc<Mmap>"]
end
subgraph "Synchronized Writes"
T4["Thread 4 write()"]
T5["Thread 5 write()"]
RWLOCK_ACQUIRE["RwLock write lock"]
FILE_WRITE["Exclusive file access"]
INDEX_UPDATE["Update DashMap"]
ATOMIC_UPDATE["AtomicU64 tail_offset"]
end
subgraph "Memory Map Updates"
REMAP["Remap after write"]
MUTEX_LOCK["Mutex<Arc<Mmap>>"]
NEW_ARC["Create new Arc<Mmap>"]
end
T1 --> DM_READ
T2 --> DM_READ
T3 --> DM_READ
DM_READ --> MMAP_SHARE
T4 --> RWLOCK_ACQUIRE
T5 --> RWLOCK_ACQUIRE
RWLOCK_ACQUIRE --> FILE_WRITE
FILE_WRITE --> INDEX_UPDATE
FILE_WRITE --> ATOMIC_UPDATE
FILE_WRITE --> REMAP
REMAP --> MUTEX_LOCK
MUTEX_LOCK --> NEW_ARC
NEW_ARC --> MMAP_SHARE
Thread Safety Model
The storage engine provides thread-safe concurrent access within a single process using a combination of synchronization primitives.
Diagram: Concurrency Control Mechanisms
Synchronization Primitives
| Primitive | Protects | Access Pattern |
|---|---|---|
RwLock<File> | File handle | Exclusive writes, no lock for reads |
Mutex<Arc<Mmap>> | Memory mapping | Locked during remap, readers get Arc clone |
DashMap<u64, u64> | Key index | Lock-free concurrent reads |
AtomicU64 | Tail offset | Lock-free updates |
Thread Safety Guarantees
Within single process:
- ✅ Multiple concurrent reads (zero-copy, lock-free)
- ✅ Serialized writes (RwLock ensures ordering)
- ✅ Consistent index updates (DashMap internal locks)
- ✅ Safe memory mapping (Arc reference counting)
Across multiple processes:
- ❌ No cross-process coordination
- ❌ Requires external file locking (e.g.,
flock)
Sources: README.md:170-206
graph LR
subgraph "Write Path SIMD"
PAYLOAD["Payload bytes"]
SIMD_COPY["simd_copy()"]
AVX2["x86_64: AVX2\n256-bit vectors"]
NEON["ARM: NEON\n128-bit vectors"]
BUFFER["Aligned buffer"]
end
subgraph "Hash Path SIMD"
KEY["Key bytes"]
XXH3_SIMD["xxh3_64 SIMD"]
SSE2["x86_64: SSE2/AVX2"]
NEON2["ARM: NEON"]
HASH_OUT["64-bit hash"]
end
PAYLOAD --> SIMD_COPY
SIMD_COPY --> AVX2
SIMD_COPY --> NEON
AVX2 --> BUFFER
NEON --> BUFFER
KEY --> XXH3_SIMD
XXH3_SIMD --> SSE2
XXH3_SIMD --> NEON2
SSE2 --> HASH_OUT
NEON2 --> HASH_OUT
Performance Optimizations
The storage engine incorporates several performance optimizations for high-throughput workloads.
SIMD Acceleration
Diagram: SIMD Operations in Write Path
Optimization Features
| Feature | Implementation | Benefit |
|---|---|---|
| SIMD Copy | simd_copy() with AVX2/NEON | Faster memory operations |
| Cache Alignment | 64-byte payload boundaries | Optimal cache-line usage |
| Zero-Copy Reads | mmap + EntryHandle | No deserialization overhead |
| Lock-Free Index | DashMap for reads | Concurrent read scaling |
| Atomic Tracking | AtomicU64 tail offset | No lock contention for offset |
| Sequential Writes | Append-only design | Optimal disk I/O patterns |
Alignment Benefits
The 64-byte PAYLOAD_ALIGNMENT constant ensures:
- Cache efficiency : Payloads align with CPU cache lines
- SIMD compatibility : Vector loads don’t cross boundaries
- Predictable performance : Consistent access patterns
- Type casting safety : Can reinterpret as typed slices
Sources: README.md:51-59 README.md:248-256
Operation Modes
The storage engine supports multiple operation modes optimized for different use cases.
Write Modes
| Mode | Method | Use Case | Characteristics |
|---|---|---|---|
| Single | write(key, value) | Individual entries | Immediate flush, atomic |
| Batch | batch_write(entries) | Multiple entries | Single lock, batch flush |
| Streaming | write_stream(key, reader) | Large payloads | No full memory allocation |
Read Modes
| Mode | Method | Use Case | Characteristics |
|---|---|---|---|
| Direct | read(key) | Single entry | Zero-copy EntryHandle |
| Batch | batch_read(keys) | Multiple entries | Independent lookups |
| Iteration | iter_entries() | Full scan | Follows chain backward |
| Parallel | par_iter_entries() | Bulk processing | Rayon-powered (optional) |
| Streaming | EntryStream | Large entries | Buffered, incremental |
Sources: README.md:208-246 src/lib.rs:20-115
Summary
The core storage engine provides a high-performance, append-only key-value store built on four main components:
- DataStore : Public API implementing reader and writer traits
- KeyIndexer : O(1) hash-based index with collision detection
- Arc : Zero-copy memory-mapped file access
- EntryHandle : View abstraction for payload data
Key architectural decisions:
- Append-only : Simplifies concurrency and crash recovery
- Single-file : Easy deployment and management
- 64-byte alignment : Optimizes cache and SIMD performance
- Backward chain : Enables fast validation and iteration
- Zero-copy reads : Eliminates deserialization overhead
- Lock-free index : Scales read throughput across threads
For implementation details on specific subsystems, refer to the child pages listed at the beginning of this document.
Sources: README.md:1-50 src/lib.rs:1-28 src/storage_engine.rs:1-25
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Repository Structure
Loading…
Repository Structure
Relevant source files
Purpose and Scope
This document describes the organization of the SIMD R Drive repository as a Cargo workspace, detailing the individual packages (crates) that comprise the system, their purposes, and their inter-dependencies. The repository is structured as a monorepo containing a core storage engine, supporting libraries, experimental network components, and language bindings.
For information about the core storage engine architecture and on-disk format, see Storage Architecture. For details on building and testing the codebase, see Building and Testing.
Workspace Organization
The SIMD R Drive repository is organized as a Cargo workspace defined in Cargo.toml:65-78 The workspace uses Cargo’s resolver version 2 and manages multiple interdependent packages with shared versioning and dependencies.
Sources: Cargo.toml:65-78
Workspace Configuration
The workspace defines common package metadata that all member crates inherit:
| Metadata Field | Value |
|---|---|
| Version | 0.15.5-alpha |
| Edition | 2024 |
| Repository | https://github.com/jzombie/rust-simd-r-drive |
| License | Apache-2.0 |
| Categories | database-implementations, data-structures, filesystem |
| Keywords | storage-engine, binary-storage, append-only, simd, mmap |
Sources: Cargo.toml:1-9
Package Structure Overview
Workspace Members
The workspace includes six member crates defined in Cargo.toml:66-73:
"."- The rootsimd-r-drivepackage"simd-r-drive-entry-handle"- Entry abstraction library"extensions"- Utility extensions"experiments/simd-r-drive-ws-server"- WebSocket server"experiments/simd-r-drive-ws-client"- WebSocket client"experiments/simd-r-drive-muxio-service-definition"- RPC service contract
Excluded Members
Two Python binding packages are excluded from the workspace Cargo.toml:74-77 because they use maturin with separate build systems:
"experiments/bindings/python"- Direct Rust-Python bindings"experiments/bindings/python-ws-client"- Python WebSocket client bindings
Sources: Cargo.toml:65-78
Core Packages
simd-r-drive (Root Package)
Location: Root directory
Cargo Name: simd-r-drive
Description: “SIMD-optimized append-only schema-less storage engine. Key-based binary storage in a single-file storage container.”
This is the main storage engine package providing the DataStore API for append-only key-value storage with SIMD acceleration and memory-mapped file access.
Key Exports:
DataStore- Main storage interfaceDataStoreReader/DataStoreWriter- Trait-based access patternsKeyIndexer- Hash-based key indexing with xxh3_64
Dependencies:
simd-r-drive-entry-handle(workspace)dashmap- Lock-free concurrent hash mapmemmap2- Memory-mapped file accessxxhash-rust- Fast hashing with SIMD supportrayon(optional, withparallelfeature)
Features:
default- No features enabled by defaultexpose-internal-api- Exposes internal APIs for testing/extensionsparallel- Enables parallel iteration with rayonarrow- Proxies tosimd-r-drive-entry-handle/arrow
Sources: Cargo.toml:11-56
simd-r-drive-entry-handle
Location: simd-r-drive-entry-handle/
Cargo Name: simd-r-drive-entry-handle
Provides the EntryHandle abstraction for zero-copy access to stored entries via memory-mapped files. This package is separated to allow optional Apache Arrow integration without requiring arrow dependencies in the core package.
Key Exports:
EntryHandle- Zero-copy entry accessorEntryMetadata- Entry metadata structure (key_hash, prev_offset, crc32)
Dependencies:
memmap2- Memory-mapped file accesscrc32fast- CRC32 checksum validationarrow(optional, witharrowfeature) - Apache Arrow buffer integration
Features:
arrow- Enables zero-copy integration with Apache Arrow buffers
Sources: Cargo.toml83 Cargo.lock:1823-1829
simd-r-drive-extensions
Location: extensions/
Cargo Name: simd-r-drive-extensions
Utility functions and helpers built on top of the core storage engine, including alignment utilities, formatting helpers, and namespace hashing.
Key Exports:
align_or_copy- Memory alignment utilitiesformat_bytes- Human-readable byte formattingNamespaceHasher- Namespace-based key hashing- File verification utilities
Dependencies:
simd-r-drive(workspace)bincode- Serialization supportserde- Serialization framework
Sources: Cargo.toml:66-73 Cargo.lock:1832-1841
Experimental Network Components
simd-r-drive-muxio-service-definition
Location: experiments/simd-r-drive-muxio-service-definition/
Cargo Name: simd-r-drive-muxio-service-definition
Defines the RPC service contract (interface definition) for remote access to the storage engine. This serves as the shared contract between WebSocket clients and servers, ensuring type-safe communication.
Key Exports:
- Service trait definitions for RPC operations
- Request/response message types
- Bitcode serialization schemas
Dependencies:
bitcode- Compact binary serializationmuxio-rpc-service- RPC service framework
Sources: Cargo.toml:66-73 Cargo.lock:1844-1849
simd-r-drive-ws-server
Location: experiments/simd-r-drive-ws-server/
Cargo Name: simd-r-drive-ws-server
WebSocket server implementation providing remote RPC access to a DataStore instance via the muxio framework.
Key Exports:
- WebSocket server with RPC endpoint
- Service implementation for
simd-r-drive-muxio-service-definition
Dependencies:
simd-r-drive(workspace)simd-r-drive-muxio-service-definition(workspace)muxio-tokio-rpc-server- RPC server implementationtokio- Async runtimeclap- CLI argument parsing
Sources: Cargo.toml:66-73 Cargo.lock:1866-1878
simd-r-drive-ws-client
Location: experiments/simd-r-drive-ws-client/
Cargo Name: simd-r-drive-ws-client
Rust WebSocket client for connecting to simd-r-drive-ws-server instances. Provides a native Rust client API matching the core DataStore interface but operating over the network.
Key Exports:
WsClient- WebSocket client implementation- Async methods mirroring
DataStoreAPI
Dependencies:
simd-r-drive(workspace)simd-r-drive-muxio-service-definition(workspace)muxio-tokio-rpc-client- RPC client implementationtokio- Async runtime
Sources: Cargo.toml:66-73 Cargo.lock:1852-1863
Python Bindings (External Build System)
experiments/bindings/python
Location: experiments/bindings/python/
Build System: Maturin + PyO3
Direct Python bindings to the core simd-r-drive package using PyO3. Provides a Python API for local (in-process) access to the storage engine. This package is excluded from the Cargo workspace because it uses a separate pyproject.toml build configuration with maturin.
Key Exports:
- Python
DataStoreclass wrapping Rust implementation - Type stubs (
.pyifiles) for IDE support
Sources: Cargo.toml:74-77
experiments/bindings/python-ws-client
Location: experiments/bindings/python-ws-client/
Build System: Maturin + PyO3
Python bindings for the simd-r-drive-ws-client, enabling remote access to storage servers from Python via asyncio. Uses pyo3-async-runtimes to bridge Python’s asyncio with Rust’s tokio.
Key Exports:
DataStoreWsClient- Python async WebSocket client- Asyncio-compatible API
- Type stubs for Python type checkers
Sources: Cargo.toml:74-77
Dependency Relationships
Sources: Cargo.toml:23-34 Cargo.toml:80-112 Cargo.lock:1795-1878
Workspace Dependency Management
The workspace defines shared dependencies in the [workspace.dependencies] section Cargo.toml:80-112 to ensure version consistency across all member crates:
Intra-Workspace Dependencies
Key External Dependencies
| Dependency | Version | Purpose |
|---|---|---|
memmap2 | 0.9.5 | Memory-mapped file access |
dashmap | 6.1.0 | Lock-free concurrent hashmap |
xxhash-rust | 0.8.15 | Fast non-cryptographic hashing |
crc32fast | 1.4.2 | CRC32 checksum validation |
arrow | 57.0.0 | Apache Arrow integration (optional) |
tokio | 1.45.1 | Async runtime (experimental features only) |
bitcode | 0.6.6 | Compact binary serialization |
rayon | 1.10.0 | Parallel iteration (optional) |
Sources: Cargo.toml:80-112
Feature Flags
The root simd-r-drive package defines three feature flags Cargo.toml:49-55:
default
No features enabled by default. This keeps the core storage engine lightweight with minimal dependencies.
expose-internal-api
Exposes internal APIs that are normally private. Used for extension development and integration testing. Not intended for general use.
parallel
Enables parallel iteration capabilities using the rayon crate. When enabled, operations like iter_entries() can leverage multi-core parallelism for improved throughput on large datasets.
arrow
A proxy feature that enables simd-r-drive-entry-handle/arrow. This provides zero-copy integration with Apache Arrow buffers, allowing EntryHandle instances to be viewed as Arrow arrays without data copying.
Sources: Cargo.toml:49-55
Benchmarks
The root package defines two benchmark suites using Criterion.rs Cargo.toml:57-63:
storage_benchmark
Measures write throughput, read throughput, batch operations, and streaming performance for the core storage engine.
contention_benchmark
Measures performance under concurrent access patterns, testing the effectiveness of the lock-free index and concurrent read scalability.
Both benchmarks use harness = false to integrate with Criterion’s custom benchmark harness.
Sources: Cargo.toml:57-63
Version Management
All workspace members share a common version number 0.15.5-alpha managed through Cargo.toml3 The -alpha suffix indicates this is pre-release software under active development. The workspace uses semantic versioning, where:
- Major version (0): Pre-1.0 indicating API instability
- Minor version (15): Feature releases and API changes
- Patch version (5): Bug fixes and minor improvements
- Suffix (-alpha): Pre-release stability indicator
Sources: Cargo.toml3
File System Layout
The physical repository structure mirrors the logical package organization:
rust-simd-r-drive/
├── Cargo.toml # Workspace root
├── Cargo.lock # Dependency lock file
├── src/ # simd-r-drive source
├── benches/ # Benchmark suites
├── simd-r-drive-entry-handle/ # Entry handle crate
│ ├── Cargo.toml
│ └── src/
├── extensions/ # Extensions crate
│ ├── Cargo.toml
│ └── src/
└── experiments/
├── simd-r-drive-muxio-service-definition/
│ ├── Cargo.toml
│ └── src/
├── simd-r-drive-ws-server/
│ ├── Cargo.toml
│ └── src/
├── simd-r-drive-ws-client/
│ ├── Cargo.toml
│ └── src/
└── bindings/
├── python/ # Excluded from workspace
│ ├── pyproject.toml
│ └── src/
└── python-ws-client/ # Excluded from workspace
├── pyproject.toml
└── src/
Sources: Cargo.toml:65-78
Summary Table: All Packages
| Package Name | Location | Type | Dependencies | Purpose | |—|—|—|—| | simd-r-drive | . | Core | simd-r-drive-entry-handle, dashmap, memmap2, xxhash-rust | Main storage engine | | simd-r-drive-entry-handle | simd-r-drive-entry-handle/ | Library | memmap2, crc32fast, arrow (opt) | Entry abstraction | | simd-r-drive-extensions | extensions/ | Library | simd-r-drive, bincode | Utility functions | | simd-r-drive-muxio-service-definition | experiments/... | Library | bitcode, muxio-rpc-service | RPC contract | | simd-r-drive-ws-server | experiments/... | Binary | Core + service-def + muxio-server | WebSocket server | | simd-r-drive-ws-client | experiments/... | Library | Core + service-def + muxio-client | WebSocket client | | Python bindings | experiments/bindings/python | PyO3 | simd-r-drive, pyo3 | Direct Python access | | Python WS client | experiments/bindings/python-ws-client | PyO3 | simd-r-drive-ws-client, pyo3 | Remote Python access |
Sources: Cargo.toml:1-112 Cargo.lock:1795-1878
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Storage Architecture
Loading…
Storage Architecture
Relevant source files
- README.md
- simd-r-drive-entry-handle/Cargo.toml
- simd-r-drive-entry-handle/src/entry_metadata.rs
- src/storage_engine/data_store.rs
Purpose and Scope
This document describes the on-disk storage format used by SIMD R Drive, including the physical layout of entries, the alignment strategy, the backward-linked chain structure, and the recovery mechanism that ensures data integrity after crashes or incomplete writes.
For information about the in-memory data structures and API, see DataStore API. For details about entry metadata fields, see Entry Structure and Metadata. For information about memory-mapped access patterns, see Memory Management and Zero-Copy Access.
Single-File Storage Container
SIMD R Drive stores all data in a single binary file with an append-only design. The storage engine writes sequentially to minimize disk seeks and maximize throughput. Each write operation appends a new entry to the end of the file, and the file position is tracked using the AtomicU64 tail_offset field in DataStore.
Sources: README.md:61-97 src/storage_engine/data_store.rs:27-33
graph LR
FILE["Single Binary File\n(*.simd-r-drive)"]
ENTRY1["Entry 1\n(Pre-pad + Payload + Metadata)"]
ENTRY2["Entry 2\n(Pre-pad + Payload + Metadata)"]
ENTRY3["Entry 3\n(Pre-pad + Payload + Metadata)"]
TAIL["tail_offset\n(AtomicU64)"]
FILE --> ENTRY1
ENTRY1 --> ENTRY2
ENTRY2 --> ENTRY3
ENTRY3 --> TAIL
Entry Layout
Each entry in the storage file consists of three components: optional pre-padding for alignment, the payload data, and metadata. The layout differs between non-tombstone entries (data) and tombstone entries (deletion markers).
Non-Tombstone Entry Structure
Non-tombstone entries store actual data and are aligned to PAYLOAD_ALIGNMENT (64 bytes by default). The alignment ensures cache-line efficiency and enables zero-copy access for typed slices.
Physical Layout Table:
graph LR
PREV_TAIL["Previous\ntail_offset"]
PREPAD["Pre-Pad\n0-63 bytes\n(zero bytes)"]
PAYLOAD["Payload\nVariable Length\n(actual data)"]
METADATA["EntryMetadata\n20 bytes"]
NEXT_TAIL["New\ntail_offset"]
PREV_TAIL --> PREPAD
PREPAD --> PAYLOAD
PAYLOAD --> METADATA
METADATA --> NEXT_TAIL
| Offset Range | Field | Size (Bytes) | Description |
|---|---|---|---|
P .. P+pad | Pre-Pad | pad | Zero bytes calculated as (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1) |
P+pad .. N | Payload | N-(P+pad) | Variable-length data, starts at aligned boundary |
N .. N+8 | key_hash | 8 | 64-bit XXH3 hash of the key |
N+8 .. N+16 | prev_offset | 8 | Absolute file offset of previous entry’s tail |
N+16 .. N+20 | checksum | 4 | CRC32C checksum of the payload |
Where:
pad = DataStore::prepad_len(prev_tail)computed at write timePAYLOAD_ALIGNMENT = 64(defined insimd-r-drive-entry-handle/src/constants.rs)- Next entry starts at
N + 20
Sources: README.md:112-125 simd-r-drive-entry-handle/src/entry_metadata.rs:11-23 src/storage_engine/data_store.rs:666-673
Entry Metadata Structure
The EntryMetadata struct stores three critical fields in exactly 20 bytes:
Field Purposes:
| Field | Type | Purpose |
|---|---|---|
key_hash | u64 | XXH3 hash of the key for index lookups |
prev_offset | u64 | Absolute file offset pointing to the previous entry’s tail (forms backward chain) |
checksum | [u8; 4] | CRC32C checksum of payload for integrity verification |
The metadata is serialized using little-endian encoding via EntryMetadata::serialize() and deserialized via EntryMetadata::deserialize().
Sources: simd-r-drive-entry-handle/src/entry_metadata.rs:44-113
Tombstone Entry Structure
Tombstone entries mark deleted keys. They consist of a single null byte (NULL_BYTE[0] = 0x00) followed by metadata, with no pre-padding.
Physical Layout Table:
graph LR
PREV_TAIL["Previous\ntail_offset"]
NULL["NULL_BYTE\n1 byte\n(0x00)"]
METADATA["EntryMetadata\n20 bytes"]
NEXT_TAIL["New\ntail_offset"]
PREV_TAIL --> NULL
NULL --> METADATA
METADATA --> NEXT_TAIL
| Offset Range | Field | Size (Bytes) | Description |
|---|---|---|---|
T .. T+1 | Payload | 1 | Single byte 0x00 (NULL_BYTE) |
T+1 .. T+9 | key_hash | 8 | Hash of the deleted key |
T+9 .. T+17 | prev_offset | 8 | Previous entry’s tail offset |
T+17 .. T+21 | checksum | 4 | CRC32C of the null byte |
Tombstones are written using DataStore::batch_write_with_key_hashes() with the allow_null_bytes parameter set to true. The deletion logic filters existing keys before writing tombstones to avoid unnecessary I/O.
Sources: README.md:126-132 src/storage_engine/data_store.rs:863-897 src/storage_engine/data_store.rs:990-1024
Alignment Strategy
The pre-padding mechanism ensures that every non-tombstone payload starts at a 64-byte aligned boundary. This alignment is critical for:
- Cache-line efficiency : Payloads align with CPU cache lines (typically 64 bytes)
- SIMD operations : Vectorized loads/stores can operate without crossing boundaries
- Zero-copy typed access : Enables safe reinterpretation as typed slices (e.g.,
&[u64])
Alignment Calculation
The DataStore::prepad_len() function implements this calculation:
fn prepad_len(offset: u64) -> usize {
let a = PAYLOAD_ALIGNMENT;
((a - (offset % a)) & (a - 1)) as usize
}
During writes, the code checks if pre-padding is needed and writes zero bytes before the payload:
- At src/storage_engine/data_store.rs:765-771: Streaming write path
- At src/storage_engine/data_store.rs:909-914: Batch write path
Sources: src/storage_engine/data_store.rs:666-673 README.md:52-59
Backward-Linked Chain Structure
Each entry’s prev_offset field in EntryMetadata points to the absolute file offset of the previous entry’s tail, forming a backward-linked chain. This chain enables:
- Iteration : Walking entries from end to beginning
- Recovery : Validating chain integrity
- Alignment derivation : Computing payload start from previous tail
Chain Traversal During Reads
When reading an entry from the index:
- Look up
key_hashinKeyIndexerto get metadata offset - Read
EntryMetadataat that offset - Extract
prev_offset(previous tail) - Calculate payload start:
prev_offset + DataStore::prepad_len(prev_offset) - Payload ends at metadata offset
This logic is implemented in DataStore::read_entry_with_context() at src/storage_engine/data_store.rs:501-565
Chain Traversal Diagram:
Sources: src/storage_engine/data_store.rs:501-565 simd-r-drive-entry-handle/src/entry_metadata.rs:40-43
graph TD
START["Start at file_len"]
CHECK_SIZE{"file_len <\nMETADATA_SIZE?"}
RETURN_ZERO["Return Ok(0)"]
INIT["cursor = file_len\nbest_valid_offset = None"]
LOOP{"cursor >=\nMETADATA_SIZE?"}
READ_META["Read metadata at\n(cursor - METADATA_SIZE)"]
EXTRACT["Extract prev_offset\nDerive entry_start"]
VALIDATE{"entry_start <\nmetadata_offset?"}
WALK["Walk chain backward\nvia prev_offset"]
CHAIN_VALID{"Chain reaches\noffset 0?"}
SET_BEST["best_valid_offset =\ncursor"]
BREAK["Break loop"]
DECREMENT["cursor -= 1"]
RETURN_BEST["Return\nbest_valid_offset\nor 0"]
START --> CHECK_SIZE
CHECK_SIZE -->|Yes| RETURN_ZERO
CHECK_SIZE -->|No| INIT
INIT --> LOOP
LOOP -->|Yes| READ_META
LOOP -->|No| RETURN_BEST
READ_META --> EXTRACT
EXTRACT --> VALIDATE
VALIDATE -->|No| DECREMENT
VALIDATE -->|Yes| WALK
WALK --> CHAIN_VALID
CHAIN_VALID -->|Yes| SET_BEST
CHAIN_VALID -->|No| DECREMENT
SET_BEST --> BREAK
BREAK --> RETURN_BEST
DECREMENT --> LOOP
Recovery Mechanism
The DataStore::recover_valid_chain() function validates chain integrity when opening a file. It scans backward from the file end to find the deepest valid chain that reaches offset 0, automatically recovering from incomplete writes.
Recovery Algorithm
Recovery Process Steps
- Initial Check : If file is smaller than
METADATA_SIZE(20 bytes), return offset 0 - Backward Scan : Start from
file_lenand scan backward by 1 byte at a time - Metadata Read : At each position, attempt to read metadata at
cursor - METADATA_SIZE - Entry Validation :
- Extract
prev_offsetfrom metadata - Calculate expected entry start using
DataStore::prepad_len(prev_offset) - Handle tombstone special case (single null byte without pre-pad)
- Verify
entry_start < metadata_offset
- Extract
- Chain Walk : For valid entry, walk entire chain backward:
- Follow
prev_offsetlinks - Validate each link points to earlier offset
- Track total chain size
- Stop when
prev_offset = 0(chain start)
- Follow
- Validation : Chain is valid if:
- All links point backward (no cycles)
- Chain reaches offset 0
- Total chain size ≤ file length
- Result : Return the first valid chain found (deepest chain)
Tombstone Handling in Recovery
Tombstones have special handling during recovery because they lack pre-padding:
if entry_end > prev_tail
&& entry_end - prev_tail == 1
&& mmap[prev_tail..entry_end] == NULL_BYTE
{
entry_start = prev_tail // No pre-pad for tombstone
} else {
entry_start = prev_tail + prepad_len(prev_tail)
}
This logic appears at:
- src/storage_engine/data_store.rs:404-416: Recovery path
- src/storage_engine/data_store.rs:447-454: Chain walking
Sources: src/storage_engine/data_store.rs:363-482 README.md:139-148
Recovery on File Open
The DataStore::open() function performs recovery automatically:
If recovery detects corruption (incomplete chain), the file is truncated to the last valid offset and reopened. This ensures the storage is always in a consistent state.
Sources: src/storage_engine/data_store.rs:84-117 README.md:150-156
Entry Size Calculation
Each entry’s total file size includes pre-padding, payload, and metadata. The calculation depends on entry type:
Non-Tombstone Entry:
total_size = prepad_len(prev_tail) + payload.len() + METADATA_SIZE
Tombstone Entry:
total_size = 1 + METADATA_SIZE // No pre-pad
The EntryHandle::file_size() method computes this from the entry’s range and metadata. For iteration and compaction, this allows precise tracking of storage usage.
Sources: README.md:112-132 src/storage_engine/data_store.rs:705-749
File Growth and Tail Tracking
The tail_offset field tracks the current end of valid data:
The atomic store uses Ordering::Release to ensure visibility across threads. Writers acquire tail_offset with Ordering::Acquire before computing pre-padding.
Sources: src/storage_engine/data_store.rs256 src/storage_engine/data_store.rs763 src/storage_engine/data_store.rs858
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
DataStore API
Loading…
DataStore API
Relevant source files
- README.md
- src/lib.rs
- src/storage_engine.rs
- src/storage_engine/data_store.rs
- src/storage_engine/entry_iterator.rs
This page documents the public API of the DataStore struct and its associated traits DataStoreReader and DataStoreWriter. These interfaces provide the primary methods for interacting with the storage engine, including write, read, delete, batch operations, and streaming methods.
Scope : This page covers the application-level API methods available to users of the storage engine. For details on the underlying storage format, see Entry Structure and Metadata. For implementation details of concurrency mechanisms, see Concurrency and Thread Safety. For key hashing internals, see Key Indexing and Hashing.
API Architecture
The DataStore API is organized around a core DataStore struct with two trait-based interfaces:
Sources : src/storage_engine/data_store.rs:26-33 src/storage_engine/traits.rs src/storage_engine.rs21
graph TB
subgraph "Public API"
DS["DataStore"]
DSR["DataStoreReader trait"]
DSW["DataStoreWriter trait"]
end
subgraph "Core Operations"
WRITE["Write Operations\nwrite()\nbatch_write()\nwrite_stream()"]
READ["Read Operations\nread()\nbatch_read()\nread_last_entry()"]
DELETE["Delete Operations\ndelete()\nbatch_delete()"]
MANAGE["Management Operations\nrename()\ncopy()\ntransfer()"]
ITER["Iteration\niter_entries()\npar_iter_entries()"]
end
subgraph "Internal Components"
FILE["Arc<RwLock<BufWriter<File>>>"]
MMAP["Arc<Mutex<Arc<Mmap>>>"]
INDEXER["Arc<RwLock<KeyIndexer>>"]
TAIL["AtomicU64 tail_offset"]
end
DS --> DSR
DS --> DSW
DSR --> READ
DSR --> ITER
DSW --> WRITE
DSW --> DELETE
DSW --> MANAGE
DS --> FILE
DS --> MMAP
DS --> INDEXER
DS --> TAIL
WRITE --> FILE
WRITE --> INDEXER
WRITE --> TAIL
READ --> MMAP
READ --> INDEXER
DELETE --> FILE
DELETE --> INDEXER
DataStore Struct
The DataStore struct is the primary interface for interacting with the storage engine. It encapsulates file I/O, memory mapping, key indexing, and concurrency control.
Core Fields
| Field | Type | Purpose |
|---|---|---|
file | Arc<RwLock<BufWriter<File>>> | Buffered file writer protected by read-write lock for synchronized writes |
mmap | Arc<Mutex<Arc<Mmap>>> | Memory-mapped file reference wrapped in mutex to prevent unsafe remapping |
tail_offset | AtomicU64 | Current end-of-file offset, atomically updated for lock-free reads |
key_indexer | Arc<RwLock<KeyIndexer>> | Hash-based index mapping key hashes to file offsets |
path | PathBuf | File system path to the storage file |
Sources : src/storage_engine/data_store.rs:26-33
Creation Methods
Sources : src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:141-144
Opening Storage
| Method | Signature | Behavior |
|---|---|---|
open() | pub fn open(path: &Path) -> Result<Self> | Opens existing storage or creates new file if not present |
open_existing() | pub fn open_existing(path: &Path) -> Result<Self> | Opens only existing files, returns error if file does not exist |
from() | impl From<PathBuf> for DataStore | Convenience constructor, panics on failure |
Sources : src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:141-144 src/storage_engine/data_store.rs:53-64
Write Operations
Write operations are defined by the DataStoreWriter trait and implemented for DataStore. All write methods return Result<u64> where the u64 is the new tail offset after writing.
Sources : src/storage_engine/data_store.rs:752-939
graph TB
subgraph "Write API Methods"
W1["write(key, payload)"]
W2["batch_write(entries)"]
W3["write_stream(key, reader)"]
W4["write_with_key_hash(hash, payload)"]
W5["batch_write_with_key_hashes(entries)"]
W6["write_stream_with_key_hash(hash, reader)"]
end
subgraph "Internal Write Path"
LOCK["Acquire RwLock<File>"]
HASH["compute_hash()
or\ncompute_hash_batch()"]
ALIGN["Calculate prepad_len()"]
BUFFER["Buffer construction"]
SIMD["simd_copy()
for payload"]
META["EntryMetadata serialization"]
FLUSH["file.flush()"]
REINDEX["reindex()"]
end
W1 --> HASH
W2 --> HASH
W3 --> HASH
HASH --> W4
HASH --> W5
HASH --> W6
W4 --> LOCK
W5 --> LOCK
W6 --> LOCK
LOCK --> ALIGN
ALIGN --> BUFFER
BUFFER --> SIMD
SIMD --> META
META --> FLUSH
FLUSH --> REINDEX
REINDEX --> MMAP_UPDATE["Update mmap Arc"]
REINDEX --> INDEX_UPDATE["Update KeyIndexer"]
REINDEX --> TAIL_UPDATE["Update AtomicU64"]
Single Entry Write
Method : write(key: &[u8], payload: &[u8]) -> Result<u64>
Writes a single key-value pair atomically. The write is immediately flushed to disk.
Implementation details :
- Computes XXH3 hash of key using
compute_hash() - Delegates to
write_with_key_hash() - Internally uses
batch_write_with_key_hashes()with single entry - Calculates 64-byte alignment padding via
prepad_len() - Uses
simd_copy()for payload transfer - Appends
EntryMetadata(20 bytes) - Calls
reindex()to update mmap and index
Sources : src/storage_engine/data_store.rs:827-834
Batch Write
Method : batch_write(entries: &[(&[u8], &[u8])]) -> Result<u64>
Writes multiple key-value pairs in a single locked operation. Reduces disk I/O overhead by buffering all entries and flushing once at the end.
Process :
- Computes hashes for all keys via
compute_hash_batch() - Acquires write lock once for entire batch
- Builds buffer with aligned entries
- Writes buffer to file with single
write_all() - Updates index with all new mappings atomically
Sources : src/storage_engine/data_store.rs:838-843 src/storage_engine/data_store.rs:847-939
Streaming Write
Method : write_stream<R: Read>(key: &[u8], reader: &mut R) -> Result<u64>
Writes data from a Read source without requiring full in-memory allocation. Suitable for large payloads that exceed available memory.
Characteristics :
- Uses fixed 8KB buffer (
WRITE_STREAM_BUFFER_SIZE) - Reads chunks incrementally from source
- Computes CRC32C checksum while streaming
- Validates that payload is non-empty and not null-only
- Immediately flushes after completion
Sources : src/storage_engine/data_store.rs:753-825
Pre-hashed Write Methods
For performance optimization when keys are reused, the API provides methods accepting pre-computed hashes:
| Method | Description |
|---|---|
write_with_key_hash() | Single write with pre-computed hash |
batch_write_with_key_hashes() | Batch write with pre-computed hashes |
write_stream_with_key_hash() | Streaming write with pre-computed hash |
These skip the hashing step and proceed directly to storage operations.
Sources : src/storage_engine/data_store.rs:832-834 src/storage_engine/data_store.rs:847-939 src/storage_engine/data_store.rs:758-825
graph TB
subgraph "Read API Methods"
R1["read(key)"]
R2["batch_read(keys)"]
R3["read_last_entry()"]
R4["read_with_key_hash(hash)"]
R5["batch_read_hashed_keys(hashes)"]
R6["read_metadata(key)"]
R7["exists(key)"]
end
subgraph "Internal Read Path"
HASH_KEY["compute_hash()
or\ncompute_hash_batch()"]
INDEXER_READ["key_indexer.read()"]
MMAP_CLONE["get_mmap_arc()"]
UNPACK["KeyIndexer::unpack(packed)"]
TAG_CHECK["Verify 16-bit tag"]
BOUNDS["Bounds checking"]
PREPAD["Derive entry_start from\nprev_offset + prepad_len()"]
HANDLE["Construct EntryHandle"]
end
R1 --> HASH_KEY
R2 --> HASH_KEY
HASH_KEY --> R4
HASH_KEY --> R5
R4 --> INDEXER_READ
R5 --> INDEXER_READ
R7 --> R1
INDEXER_READ --> MMAP_CLONE
MMAP_CLONE --> UNPACK
UNPACK --> TAG_CHECK
TAG_CHECK --> BOUNDS
BOUNDS --> PREPAD
PREPAD --> HANDLE
R3 --> MMAP_CLONE
R6 --> R1
Read Operations
Read operations are defined by the DataStoreReader trait. All reads are zero-copy when possible, returning EntryHandle references to memory-mapped regions.
Sources : src/storage_engine/data_store.rs:1027-1182
Single Entry Read
Method : read(key: &[u8]) -> Result<Option<EntryHandle>>
Retrieves a single entry by key. Returns None if key does not exist or is deleted (tombstone).
Implementation :
- Computes key hash via
compute_hash() - Acquires read lock on
key_indexer - Looks up packed (tag, offset) value
- Verifies 16-bit tag to detect hash collisions
- Derives entry boundaries from
prev_offsetandprepad_len() - Returns
EntryHandlewith zero-copy access to payload
Sources : src/storage_engine/data_store.rs:1040-1049 src/storage_engine/data_store.rs:501-565
Batch Read
Method : batch_read(keys: &[&[u8]]) -> Result<Vec<Option<EntryHandle>>>
Reads multiple entries in a single index lock acquisition. More efficient than individual reads when processing multiple keys.
Process :
- Computes all key hashes via
compute_hash_batch() - Acquires single read lock on indexer
- Performs lookup for each hash
- Verifies tags for collision detection
- Returns vector preserving input order
Sources : src/storage_engine/data_store.rs:1105-1109 src/storage_engine/data_store.rs:1111-1158
Read Last Entry
Method : read_last_entry() -> Result<Option<EntryHandle>>
Retrieves the most recently written entry without requiring a key lookup. Uses tail_offset to locate the last metadata block.
Use case : Useful for sequential processing or determining the latest state.
Sources : src/storage_engine/data_store.rs:1061-1103
Metadata Read
Method : read_metadata(key: &[u8]) -> Result<Option<EntryMetadata>>
Retrieves only the metadata (key hash, previous offset, checksum) without accessing the payload. More efficient when only metadata is needed.
Sources : src/storage_engine/data_store.rs:1160-1162
Existence Check
Method : exists(key: &[u8]) -> Result<bool>
Checks if a key exists without retrieving the full entry. Lightweight operation that only performs index lookup and tag verification.
Sources : src/storage_engine/data_store.rs:1030-1032
Read Operations Summary
| Method | Returns | Lock Duration | Use Case |
|---|---|---|---|
read() | Option<EntryHandle> | Single index read lock | Standard single-key retrieval |
batch_read() | Vec<Option<EntryHandle>> | Single index read lock | Multiple keys, order preserved |
read_last_entry() | Option<EntryHandle> | No index lock required | Sequential or state check |
read_metadata() | Option<EntryMetadata> | Single index read lock | Metadata only, no payload |
exists() | bool | Single index read lock | Fast existence check |
Sources : src/storage_engine/data_store.rs:1027-1182
graph LR DELETE_API["delete(key)\nbatch_delete(keys)"] --> HASH["compute_hash_batch()"] HASH --> CHECK_EXISTS["Filter existing keys\nvia key_indexer.read()"] CHECK_EXISTS --> TOMBSTONE["Create (hash, NULL_BYTE)\npairs"] TOMBSTONE --> BATCH_WRITE["batch_write_with_key_hashes()\nwith allow_null_bytes=true"] BATCH_WRITE --> UPDATE_INDEX["reindex() removes\nkeys from index"]
Delete Operations
Delete operations write tombstone entries (single null byte + metadata) to mark keys as deleted. The append-only model means deletions do not reclaim space until compaction.
Sources : src/storage_engine/data_store.rs:986-1024
Single Delete
Method : delete(key: &[u8]) -> Result<u64>
Deletes a single key by writing a tombstone entry. Internally delegates to batch_delete() with a single key.
Sources : src/storage_engine/data_store.rs:986-988
Batch Delete
Method : batch_delete(keys: &[&[u8]]) -> Result<u64>
Deletes multiple keys in a single operation. Optimized to skip keys that don’t exist, avoiding unnecessary tombstone writes.
Process :
- Hashes all keys via
compute_hash_batch() - Filters to only keys present in index
- Constructs tombstone entries (NULL_BYTE + metadata)
- Calls
batch_write_with_key_hashes()withallow_null_bytes=true - Index updated to remove deleted keys
Sources : src/storage_engine/data_store.rs:990-1024
Pre-hashed Delete
Method : batch_delete_key_hashes(prehashed_keys: &[u64]) -> Result<u64>
Deletes keys using pre-computed hashes. Useful when hashes are already available from previous operations.
Sources : src/storage_engine/data_store.rs:995-1024
Entry Management Operations
These operations combine read, write, and delete to provide higher-level functionality for managing entries across storage instances.
Rename
Method : rename(old_key: &[u8], new_key: &[u8]) -> Result<u64>
Renames a key by:
- Reading the entry at
old_key - Creating an
EntryStreamfrom it - Writing to
new_keyviawrite_stream() - Deleting
old_key
Constraint : old_key must exist and must differ from new_key.
Sources : src/storage_engine/data_store.rs:941-958
Copy
Method : copy(key: &[u8], target: &DataStore) -> Result<u64>
Copies an entry from the current storage to a different DataStore instance. The source entry remains unchanged.
Process :
- Reads entry from source
- Extracts payload and metadata
- Writes to target using
write_stream_with_key_hash() - Preserves original key hash
Constraint : Source and target must be different storage files.
Sources : src/storage_engine/data_store.rs:960-979 src/storage_engine/data_store.rs:587-590
Transfer
Method : transfer(key: &[u8], target: &DataStore) -> Result<u64>
Moves an entry from the current storage to a different instance by copying then deleting from source.
Equivalent to : copy() followed by delete()
Sources : src/storage_engine/data_store.rs:981-984
Entry Management Summary
| Operation | Source Modified | Target Modified | Use Case |
|---|---|---|---|
rename() | Yes (old deleted, new added) | N/A | Same storage, different key |
copy() | No | Yes (entry added) | Cross-storage duplication |
transfer() | Yes (entry deleted) | Yes (entry added) | Cross-storage migration |
Sources : src/storage_engine/data_store.rs:941-984
graph TB
subgraph "Iteration Methods"
ITER_OWNED["into_iter()\n(consumes DataStore)"]
ITER_REF["iter_entries()\n(borrows DataStore)"]
PAR_ITER["par_iter_entries()\n(parallel, requires 'parallel' feature)"]
end
subgraph "EntryIterator Implementation"
CURSOR["cursor: u64 = tail_offset"]
SEEN_KEYS["seen_keys: HashSet<u64>"]
NEXT["next()
method"]
METADATA["Read EntryMetadata"]
PREPAD_CALC["Derive entry_start from\nprev_offset + prepad_len()"]
SKIP_DUPE["Skip if key_hash in seen_keys"]
SKIP_TOMB["Skip if entry is NULL_BYTE"]
EMIT["Emit EntryHandle"]
end
subgraph "Parallel Iterator"
COLLECT["Collect key_indexer offsets"]
PAR_MAP["Rayon par_iter()"]
FILTER_MAP["filter_map constructs\nEntryHandle per thread"]
end
ITER_OWNED --> ITER_REF
ITER_REF --> CURSOR
ITER_REF --> SEEN_KEYS
CURSOR --> NEXT
NEXT --> METADATA
METADATA --> PREPAD_CALC
PREPAD_CALC --> SKIP_DUPE
SKIP_DUPE --> SKIP_TOMB
SKIP_TOMB --> EMIT
PAR_ITER --> COLLECT
COLLECT --> PAR_MAP
PAR_MAP --> FILTER_MAP
Iteration and Traversal
The DataStore provides multiple methods for iterating over all valid entries in the storage.
Sources : src/storage_engine/data_store.rs:269-361 src/storage_engine/entry_iterator.rs:8-127
Sequential Iteration
Method : iter_entries() -> EntryIterator
Returns an iterator that traverses all valid entries sequentially. The iterator:
- Starts at
tail_offsetand walks backward viaprev_offsetchain - Tracks seen key hashes to ensure only latest versions are returned
- Filters out tombstone entries automatically
- Returns
EntryHandleobjects with zero-copy access
Sources : src/storage_engine/data_store.rs:276-280 src/storage_engine/entry_iterator.rs:41-47
Consuming Iteration
Trait : impl IntoIterator for DataStore
Allows consuming a DataStore instance to produce an iterator:
Internally delegates to iter_entries().
Sources : src/storage_engine/data_store.rs:44-50
Parallel Iteration
Method : par_iter_entries() -> impl ParallelIterator<Item = EntryHandle>
Feature gate : Requires parallel feature flag.
Provides Rayon-powered parallel iteration for high-throughput processing on multi-core systems.
Implementation strategy :
- Acquires read lock on
key_indexerbriefly - Collects all packed offset values into a
Vec<u64> - Releases lock immediately
- Creates parallel iterator over collected offsets
- Constructs
EntryHandleobjects in parallel threads
Performance : Ideal for bulk operations like analytics, caching, or transformation pipelines.
Sources : src/storage_engine/data_store.rs:296-361
Iteration Comparison
| Method | Ownership | Concurrency | Lock Hold Time | Use Case |
|---|---|---|---|---|
iter_entries() | Borrows | Sequential | Lock per next() call | General-purpose scanning |
into_iter() | Consumes | Sequential | Lock per next() call | One-time full traversal |
par_iter_entries() | Borrows | Parallel | Brief upfront lock | High-throughput processing |
Sources : src/storage_engine/data_store.rs:44-50 src/storage_engine/data_store.rs:276-280 src/storage_engine/data_store.rs:296-361
Utility and Maintenance Methods
File Information
| Method | Returns | Description |
|---|---|---|
len() | Result<usize> | Number of unique keys in storage (excludes deleted) |
is_empty() | Result<bool> | Returns true if no keys exist |
file_size() | Result<u64> | Total size of storage file in bytes |
get_path() | PathBuf | Returns path to storage file |
Sources : src/storage_engine/data_store.rs:1164-1181 src/storage_engine/data_store.rs:265-267
Compaction
Method : compact(&mut self) -> Result<()>
Reclaims disk space by creating a new storage file containing only the latest version of each key. Tombstone entries are excluded.
Process :
- Creates temporary backup file with
.bkextension - Iterates through
iter_entries()(which returns only latest versions) - Copies each entry via
copy_handle() - Swaps temporary file with original via
std::fs::rename()
Thread safety warning : Should only be called when no other threads are accessing the storage. The &mut self requirement prevents concurrent mutations but does not prevent reads if the instance is wrapped in Arc<DataStore>.
Sources : src/storage_engine/data_store.rs:706-749
Compaction Estimation
Method : estimate_compaction_savings() -> u64
Calculates potential space savings from compaction without performing the operation. Returns the difference between total file size and the size needed for unique entries only.
Sources : src/storage_engine/data_store.rs:605-616
Internal Support Methods
These methods support the public API but are not directly exposed to users.
Reindexing
Method : reindex()
Called after every write operation to:
- Re-map the file via
init_mmap()to include new data - Update
key_indexerwith new key-to-offset mappings - Update
tail_offsetatomically
Acquires locks on both mmap and key_indexer to ensure consistency.
Sources : src/storage_engine/data_store.rs:224-259
Entry Context Reading
Method : read_entry_with_context()
Internal helper centralizing read logic for both read() and batch_read(). Parameters include the key hash, mmap reference, and indexer guard. Performs:
- Index lookup
- Tag verification (if original key provided)
- Bounds checking
- Tombstone detection
EntryHandleconstruction
Sources : src/storage_engine/data_store.rs:501-565
Recovery Chain Validation
Method : recover_valid_chain()
Called during open() to validate storage file integrity. Walks backward through the file following prev_offset chains until reaching offset 0. Truncates file if incomplete write detected.
Sources : src/storage_engine/data_store.rs:383-482
Alignment Calculation
Method : prepad_len(offset: u64) -> usize
Computes padding bytes required to align offset to PAYLOAD_ALIGNMENT (64 bytes). Uses bitwise operations for efficiency:
pad = (A - (offset % A)) & (A - 1)
Sources : src/storage_engine/data_store.rs:669-673
API Patterns and Conventions
Return Value Pattern
Most write operations return Result<u64> where the u64 is the new tail_offset after the operation. This allows chaining operations or validating expected file growth.
Error Handling
The API uses std::io::Result<T> consistently. Common error cases:
InvalidInput: Empty payloads, null-byte-only payloads, invalid renameNotFound: Key does not exist (for operations requiring existing keys)- Lock poisoning errors (converted to
std::io::Error)
Pre-hashed Key Methods
Many operations offer both standard and pre-hashed variants:
- Standard:
write(key, payload)- computes hash internally - Pre-hashed:
write_with_key_hash(hash, payload)- uses provided hash
Pre-hashed methods enable optimization when keys are reused across multiple operations.
Batch Operations Benefit
Batch methods acquire locks once for the entire batch, significantly reducing overhead:
batch_write()vs multiplewrite()callsbatch_read()vs multipleread()callsbatch_delete()vs multipledelete()calls
Sources : src/storage_engine/data_store.rs:752-1182
Trait Implementations
DataStoreReader Trait
Defines read-only operations. Associated type EntryHandleType allows flexibility in handle implementation.
Implementors : DataStore
Key methods : read(), batch_read(), read_last_entry(), read_metadata(), exists(), len(), is_empty(), file_size()
Sources : src/storage_engine/traits.rs
DataStoreWriter Trait
Defines mutating operations. All methods take &self (not &mut self) because internal synchronization via RwLock enables safe concurrent access.
Implementors : DataStore
Key methods : write(), batch_write(), write_stream(), delete(), batch_delete(), rename(), copy(), transfer()
Sources : src/storage_engine/traits.rs
From Trait
Convenience constructor that panics on failure:
Sources : src/storage_engine/data_store.rs:53-64
IntoIterator Trait
Allows consuming iteration over storage entries. Returns EntryIterator as the iterator type.
Sources : src/storage_engine/data_store.rs:44-50
Dismiss
Refresh this wiki
Enter email to refresh
On this page
- DataStore API
- API Architecture
- DataStore Struct
- Core Fields
- Creation Methods
- Opening Storage
- Write Operations
- Single Entry Write
- Batch Write
- Streaming Write
- Pre-hashed Write Methods
- Read Operations
- Single Entry Read
- Batch Read
- Read Last Entry
- Metadata Read
- Existence Check
- Read Operations Summary
- Delete Operations
- Single Delete
- Batch Delete
- Pre-hashed Delete
- Entry Management Operations
- Rename
- Copy
- Transfer
- Entry Management Summary
- Iteration and Traversal
- Sequential Iteration
- Consuming Iteration
- Parallel Iteration
- Iteration Comparison
- Utility and Maintenance Methods
- File Information
- Compaction
- Compaction Estimation
- Internal Support Methods
- Reindexing
- Entry Context Reading
- Recovery Chain Validation
- Alignment Calculation
- API Patterns and Conventions
- Return Value Pattern
- Error Handling
- Pre-hashed Key Methods
- Batch Operations Benefit
- Trait Implementations
- DataStoreReader Trait
- DataStoreWriter Trait
- From
Trait - IntoIterator Trait
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Entry Structure and Metadata
Loading…
Entry Structure and Metadata
Relevant source files
- README.md
- simd-r-drive-entry-handle/Cargo.toml
- simd-r-drive-entry-handle/src/constants.rs
- simd-r-drive-entry-handle/src/entry_metadata.rs
- simd-r-drive-entry-handle/src/lib.rs
- src/utils/align_or_copy.rs
Purpose and Scope
This document details the on-disk binary layout of entries in the SIMD R Drive storage engine. It covers the structure of aligned entries, tombstones, metadata fields, and the alignment strategy that enables zero-copy access.
For information about how entries are read and accessed in memory, see Memory Management and Zero-Copy Access. For details on the validation chain and recovery mechanisms, see Storage Architecture.
On-Disk Entry Layout Overview
Every entry written to the storage file consists of three components:
- Pre-Pad Bytes (optional, 0-63 bytes) - Zero bytes inserted to ensure the payload starts at a 64-byte boundary
- Payload - Variable-length binary data
- Metadata - Fixed 20-byte structure containing key hash, previous offset, and checksum
The exception is tombstones (deletion markers), which use a minimal 1-byte payload with no pre-padding.
Sources: README.md:104-137 simd-r-drive-entry-handle/src/entry_metadata.rs:9-37
Aligned Entry Structure
Entry Layout Table
| Offset Range | Field | Size (Bytes) | Description |
|---|---|---|---|
P .. P+pad | Pre-Pad (optional) | pad | Zero bytes to align payload start |
P+pad .. N | Payload | N-(P+pad) | Variable-length data |
N .. N+8 | Key Hash | 8 | 64-bit XXH3 key hash |
N+8 .. N+16 | Prev Offset | 8 | Absolute offset of previous tail |
N+16 .. N+20 | Checksum | 4 | CRC32C of payload |
Where:
pad = (A - (prev_tail % A)) & (A - 1), withA = PAYLOAD_ALIGNMENT(64 bytes)- The next entry starts at offset
N + 20
Aligned Entry Structure Diagram
Sources: README.md:112-137 simd-r-drive-entry-handle/src/entry_metadata.rs:11-23
Tombstone Structure
Tombstones are special deletion markers that do not require payload alignment. They consist of a single zero byte followed by the standard 20-byte metadata structure.
Tombstone Layout Table
| Offset Range | Field | Size (Bytes) | Description |
|---|---|---|---|
T .. T+1 | Payload | 1 | Single byte 0x00 |
T+1 .. T+21 | Metadata | 20 | Key hash, prev, crc32c |
Tombstone Structure Diagram
Sources: README.md:126-131 simd-r-drive-entry-handle/src/entry_metadata.rs:25-30
EntryMetadata Structure
The EntryMetadata struct represents the fixed 20-byte metadata block that follows every payload. It is defined in #[repr(C)] layout to ensure consistent binary representation.
graph TB
subgraph EntryMetadataStruct["EntryMetadata struct"]
field1["key_hash: u64\n8 bytes\nXXH3_64 hash"]
field2["prev_offset: u64\n8 bytes\nbackward chain link"]
field3["checksum: [u8; 4]\n4 bytes\nCRC32C payload checksum"]
end
field1 --> field2
field2 --> field3
note4["Serialized at offset N\nfollowing payload"]
note5["Total: METADATA_SIZE = 20"]
field1 -.-> note4
field3 -.-> note5
Metadata Fields
Field Descriptions
key_hash: u64 (8 bytes, offset N .. N+8)
- 64-bit XXH3 hash of the key
- Used by
KeyIndexerfor O(1) lookups - Combined with a tag for collision detection
- Hardware-accelerated via SSE2/AVX2/NEON
prev_offset: u64 (8 bytes, offset N+8 .. N+16)
- Absolute file offset of the previous entry for this key
- Forms a backward-linked chain for version history
- Set to
0for the first entry of a key - Used during chain validation and recovery
checksum: [u8; 4] (4 bytes, offset N+16 .. N+20)
- CRC32C checksum of the payload
- Provides fast integrity verification
- Not cryptographically secure
- Used during recovery to detect corruption
Serialization and Deserialization
The EntryMetadata struct provides methods for converting to/from bytes:
serialize() -> [u8; 20]- Converts metadata to byte array using little-endian encodingdeserialize(data:&[u8]) -> Self- Reconstructs metadata from byte slice
Sources: simd-r-drive-entry-handle/src/entry_metadata.rs:44-113 README.md:114-120
Pre-Padding and Alignment Strategy
Alignment Purpose
All non-tombstone payloads start at a 64-byte aligned address. This alignment ensures:
- Cache-line efficiency - Matches typical CPU cache line size
- SIMD optimization - Enables full-speed AVX2/AVX-512/NEON operations
- Zero-copy typed views - Allows safe reinterpretation as typed slices (
&[u16],&[u32], etc.)
graph TD
Start["Calculate padding needed"]
GetPrevTail["prev_tail = last written offset"]
CalcPad["pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT))\n& (PAYLOAD_ALIGNMENT - 1)"]
CheckPad{"pad > 0?"}
WritePad["Write pad zero bytes"]
WritePayload["Write payload at aligned offset"]
Start --> GetPrevTail
GetPrevTail --> CalcPad
CalcPad --> CheckPad
CheckPad -->|Yes| WritePad
CheckPad -->|No| WritePayload
WritePad --> WritePayload
The alignment is configured via PAYLOAD_ALIGNMENT constant (64 bytes as of version 0.15.0).
Pre-Padding Calculation
The formula pad = (A - (prev_tail % A)) & (A - 1) where A = PAYLOAD_ALIGNMENT ensures:
- If
prev_tailis already aligned,pad = 0 - Otherwise,
padequals the bytes needed to reach the next aligned boundary - Maximum padding is
A - 1bytes (63 bytes for 64-byte alignment)
Constants
The alignment is defined in simd-r-drive-entry-handle/src/constants.rs:1-20:
| Constant | Value | Description |
|---|---|---|
PAYLOAD_ALIGN_LOG2 | 6 | Log₂ of alignment (2⁶ = 64) |
PAYLOAD_ALIGNMENT | 64 | Actual alignment boundary in bytes |
METADATA_SIZE | 20 | Fixed size of metadata block |
Sources: README.md:51-59 simd-r-drive-entry-handle/src/entry_metadata.rs:22-23 CHANGELOG.md:25-51
Backward Chain Formation
Chain Structure
Each entry’s prev_offset field creates a backward-linked chain that tracks the version history for a given key. This chain is essential for:
- Recovery and validation on file open
- Detecting incomplete writes
- Rebuilding the index
Chain Properties
- Most recent entry is at the end of the file (highest offset)
- Chain traversal moves backward from tail toward offset 0
- First entry for a key has
prev_offset = 0 - Valid chain can be walked all the way back to byte 0 without gaps
- Broken chain indicates corruption or incomplete write
Usage in Recovery
During file open, the system:
- Scans backward from EOF reading metadata
- Follows
prev_offsetlinks to validate chain continuity - Verifies checksums at each step
- Truncates file if corruption is detected
- Scans forward to rebuild the index
Sources: README.md:139-147 simd-r-drive-entry-handle/src/entry_metadata.rs:41-43
Entry Type Comparison
Aligned Entry vs. Tombstone
| Aspect | Aligned Entry (Non-Tombstone) | Tombstone (Deletion Marker) |
|---|---|---|
| Pre-padding | 0-63 bytes (alignment dependent) | None |
| Payload size | Variable (user-defined) | Fixed 1 byte (0x00) |
| Payload alignment | 64-byte boundary | No alignment requirement |
| Metadata size | 20 bytes | 20 bytes |
| Total minimum size | 21 bytes (1-byte payload + metadata) | 21 bytes (1-byte + metadata) |
| Total maximum overhead | 83 bytes (63-byte pad + 20 metadata) | 21 bytes |
| Zero-copy capable | Yes (aligned payload) | No (tombstone flag only) |
When Tombstones Are Used
Tombstones mark key deletions while maintaining chain integrity. They:
- Preserve the backward chain via
prev_offset - Use minimal space (no alignment overhead)
- Are detected during reads and filtered out
- Enable recovery to skip deleted entries
Sources: README.md:112-137 simd-r-drive-entry-handle/src/entry_metadata.rs:9-37
Metadata Serialization Format
Binary Layout in File
Constants for Range Indexing
The simd-r-drive-entry-handle/src/constants.rs:1-20 file defines range constants for metadata field access:
KEY_HASH_RANGE = 0..8PREV_OFFSET_RANGE = 8..16CHECKSUM_RANGE = 16..20METADATA_SIZE = 20
These ranges are used in EntryMetadata::serialize() and deserialize() methods.
Sources: simd-r-drive-entry-handle/src/entry_metadata.rs:62-112
Alignment Evolution and Migration
Version History
v0.14.0-alpha and earlier: Used 16-byte alignment (PAYLOAD_ALIGNMENT = 16)
v0.15.0-alpha onwards: Changed to 64-byte alignment (PAYLOAD_ALIGNMENT = 64)
This change was made to:
- Ensure full cache-line alignment
- Support AVX-512 and future SIMD extensions
- Improve zero-copy performance across modern hardware
Migration Considerations
Storage files created with different alignment values are not compatible :
- v0.14.x readers cannot correctly parse v0.15.x stores
- v0.15.x readers may misinterpret v0.14.x padding
To migrate between versions:
- Read all entries using the old version binary
- Write entries to a new store using the new version binary
- Replace the old file after verification
In multi-service environments, deploy reader upgrades before writer upgrades to avoid mixed-version issues.
Sources: CHANGELOG.md:25-82 README.md:51-59
Debug Assertions for Alignment
Runtime Validation
The codebase includes debug-only alignment assertions that validate both pointer and offset alignment:
debug_assert_aligned(ptr: *const u8, align: usize) - Validates pointer alignment
- Active in debug and test builds
- Zero cost in release/bench builds
- Ensures buffer base address is properly aligned
debug_assert_aligned_offset(off: u64) - Validates file offset alignment
- Checks that derived payload start offset is at
PAYLOAD_ALIGNMENTboundary - Used during entry handle creation
- Defined in simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88
These assertions help catch alignment issues during development without imposing runtime overhead in production.
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88 CHANGELOG.md:33-41
Summary
The SIMD R Drive entry structure uses a carefully designed binary layout that balances efficiency, integrity, and flexibility:
- Fixed 64-byte alignment ensures cache-friendly, SIMD-optimized access
- 20-byte fixed metadata provides fast integrity checks and chain traversal
- Variable pre-padding maintains alignment without complex calculations
- Minimal tombstones mark deletions efficiently
- Backward-linked chain enables robust recovery and validation
This design enables zero-copy reads, high write throughput, and automatic crash recovery while maintaining a simple, append-only storage model.
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Memory Management and Zero-Copy Access
Loading…
Memory Management and Zero-Copy Access
Relevant source files
- README.md
- src/lib.rs
- src/storage_engine.rs
- src/storage_engine/data_store.rs
- src/storage_engine/entry_iterator.rs
Purpose and Scope
This document describes the memory management strategy used by SIMD R Drive’s core storage engine, focusing on memory-mapped file access and zero-copy read patterns. It covers the memmap2 crate integration, the Arc<Mmap> shared reference architecture, and how EntryHandle provides zero-copy views into stored data.
For details on entry structure and metadata organization, see Entry Structure and Metadata. For concurrency mechanisms that protect memory-mapped access, see Concurrency and Thread Safety.
Memory-Mapped File Architecture
Core mmap Integration
The storage engine uses the memmap2 crate to memory-map the entire storage file, allowing direct access to file contents without explicit read system calls. The memory-mapped region is managed through a layered reference-counting structure:
Arc<Mutex<Arc<Mmap>>>
│
├─ Outer Arc: Shared across DataStore clones
├─ Mutex: Serializes remapping operations
└─ Inner Arc<Mmap>: Shared across readers
Sources: src/storage_engine/data_store.rs:1-30
DataStore mmap Field Structure
The DataStore struct maintains the memory map using nested Arc wrappers:
| Layer | Type | Purpose |
|---|---|---|
| Outer | Arc<Mutex<...>> | Allows shared ownership of the mutex across DataStore instances |
| Mutex | Mutex<...> | Serializes remapping operations during writes |
| Inner | Arc<Mmap> | Enables zero-cost cloning for concurrent readers |
| Core | Mmap | The actual memory-mapped file region from memmap2 |
This structure enables:
- Multiple readers to hold
Arc<Mmap>references simultaneously - Safe remapping after writes without invalidating existing reader references
- Lock-free reads once an
Arc<Mmap>is obtained
Sources: src/storage_engine/data_store.rs:26-33 README.md:174-183
Memory Map Initialization and Remapping
graph TB
Open["DataStore::open()"]
OpenFile["open_file_in_append_mode()"]
InitMmap["init_mmap()"]
UnsafeMap["unsafe memmap2::MmapOptions::new().map()"]
ArcWrap["Arc::new(mmap)"]
Open --> OpenFile
OpenFile --> InitMmap
InitMmap --> UnsafeMap
UnsafeMap --> ArcWrap
OpenFile -.returns.-> File
UnsafeMap -.returns.-> Mmap
ArcWrap -.stored in.-> DataStore
Initial Mapping
When a DataStore is opened, the storage file is memory-mapped using unsafe code that delegates to the OS:
Diagram: Initial memory map creation flow
The init_mmap function wraps the unsafe memmap2::MmapOptions::new().map() call, which asks the OS to map the file into the process address space. The resulting Mmap is immediately wrapped in an Arc for shared access.
Sources: src/storage_engine/data_store.rs:172-174 src/storage_engine/data_store.rs:84-117
sequenceDiagram
participant Writer as "Write Operation"
participant File as "RwLock<BufWriter<File>>"
participant Reindex as "reindex()"
participant MmapMutex as "Mutex<Arc<Mmap>>"
participant Indexer as "RwLock<KeyIndexer>"
Writer->>File: Acquire write lock
Writer->>File: Append data + metadata
Writer->>File: flush()
Writer->>Reindex: reindex(&write_guard, offsets, tail)
Reindex->>File: init_mmap(&write_guard)
Note over Reindex,File: Create new Mmap from flushed file
Reindex->>MmapMutex: lock()
Reindex->>MmapMutex: *guard = Arc::new(new_mmap)
Note over MmapMutex: Old Arc<Mmap> still valid for readers
Reindex->>Indexer: write().insert(key_hash, offset)
Reindex->>Indexer: Release lock
Reindex->>MmapMutex: Release lock
Note over Writer: New reads see updated mmap
Remapping After Writes
After write operations extend the file, the memory map must be refreshed to make new data visible. The reindex method handles this critical operation:
Diagram: Memory map remapping sequence during writes
The reindex method performs three synchronized updates:
- Creates a new
Mmapfrom the extended file - Atomically replaces the
Arc<Mmap>in the mutex - Updates the key indexer with new offsets
Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:176-186
Zero-Copy Read Patterns
graph LR
subgraph "DataStore"
MmapContainer["Mutex<Arc<Mmap>>"]
end
subgraph "EntryHandle"
MmapRef["Arc<Mmap>"]
Range["range: Range<usize>"]
Metadata["metadata: EntryMetadata"]
end
subgraph "User Code"
Slice["&[u8] payload slice"]
end
MmapContainer -->|get_mmap_arc| MmapRef
MmapRef -->|&mmap[range]| Slice
Range -.defines region.-> Slice
Note1["Zero-copy: slice points\ndirectly into mmap"]
Slice -.-> Note1
EntryHandle Architecture
EntryHandle is the primary abstraction for zero-copy reads. It holds an Arc<Mmap> reference and a byte range, providing direct slice access without copying:
Diagram: EntryHandle zero-copy architecture
When EntryHandle::as_slice() is called, it returns &self.mmap_arc[self.range.clone()], which is a direct reference into the memory-mapped region. No data is copied; the slice is a view into the OS page cache.
Sources: [simd-r-drive-entry-handle crate](https://github.com/jzombie/rust-simd-r-drive/blob/0299fd5d/simd-r-drive-entry-handle crate) src/storage_engine/data_store.rs:560-565
graph TB
Read["read(key)"]
ComputeHash["compute_hash(key)"]
GetMmap["get_mmap_arc()"]
LockIndex["key_indexer.read()"]
ReadContext["read_entry_with_context()"]
IndexLookup["key_indexer.get_packed(key_hash)"]
Unpack["KeyIndexer::unpack(packed)"]
CreateHandle["EntryHandle { mmap_arc, range, metadata }"]
AsSlice["entry.as_slice()"]
DirectRef["&mmap[range]"]
Read --> ComputeHash
Read --> GetMmap
Read --> LockIndex
ComputeHash --> ReadContext
GetMmap --> ReadContext
LockIndex --> ReadContext
ReadContext --> IndexLookup
IndexLookup --> Unpack
Unpack --> CreateHandle
CreateHandle --> AsSlice
AsSlice --> DirectRef
DirectRef -.zero-copy.-> OSPageCache["OS Page Cache"]
Read Operation Flow
The zero-copy read flow demonstrates how data moves from disk to user code without intermediate buffers:
Diagram: Zero-copy read operation flow from key lookup to slice access
Key points:
get_mmap_arc()obtains anArc<Mmap>clone (cheap atomic increment)- Index lookup finds the file offset
EntryHandleis constructed with theArc<Mmap>and byte rangeas_slice()returns a reference directly into the mapped memory
Sources: src/storage_engine/data_store.rs:1040-1049 src/storage_engine/data_store.rs:502-565 src/storage_engine/data_store.rs:658-663
Shared Access with Arc
Thread-Safe Reference Counting
The Arc<Mmap> enables multiple threads to hold references to the same memory-mapped region simultaneously. Each clone increments an atomic reference count:
| Operation | Cost | Thread Safety |
|---|---|---|
Arc::clone() | Single atomic increment | Lock-free |
Holding Arc<Mmap> | No synchronization needed | Fully safe |
Dropping Arc<Mmap> | Single atomic decrement | Lock-free |
| Last reference drops | Mmap unmapped by OS | Safe |
When a writer remaps the file, it replaces the Arc<Mmap> inside the mutex. Old Arc<Mmap> references remain valid until all readers drop them, at which point the OS automatically unmaps the old region.
Sources: src/storage_engine/data_store.rs:658-663 README.md:174-183
Clone Semantics in Iteration
EntryIterator demonstrates efficient Arc<Mmap> usage. The iterator holds one Arc<Mmap> and clones it for each EntryHandle it yields:
Diagram: Arc cloning pattern in EntryIterator
graph TB
IterNew["EntryIterator::new(mmap_arc, tail)"]
IterField["EntryIterator { mmap: Arc<Mmap>, ... }"]
Next["next()
called"]
CreateHandle["EntryHandle { mmap_arc: Arc::clone(&self.mmap), ... }"]
UserCode["User processes EntryHandle"]
Drop["EntryHandle dropped"]
IterNew --> IterField
IterField --> Next
Next --> CreateHandle
CreateHandle -.cheap clone.-> UserCode
UserCode --> Drop
Drop -.atomic decrement.-> RefCount["Reference count"]
Note["Iterator holds 1 Arc\nEach EntryHandle clones it\nAll point to same Mmap"]
IterField -.-> Note
This design allows the iterator and all yielded handles to coexist safely. The cloning overhead is minimal—just an atomic operation—while providing complete memory safety.
Sources: src/storage_engine/entry_iterator.rs:21-47 src/storage_engine/entry_iterator.rs:121-125
Memory Management Flow
graph TB
subgraph "Initialization"
OpenFile["open_file_in_append_mode()"]
InitMmap1["init_mmap(&file)"]
Recovery["recover_valid_chain()"]
ReinitMmap["Remap if truncation needed"]
BuildIndex["KeyIndexer::build()"]
StoreMmap["Store Arc<Mutex<Arc<Mmap>>>"]
end
subgraph "Read Path"
GetArc["get_mmap_arc()"]
ReadLock["key_indexer.read()"]
Lookup["Index lookup"]
ConstructHandle["EntryHandle { Arc::clone(mmap_arc), range, ... }"]
AsSlice["as_slice() → &mmap[range]"]
end
subgraph "Write Path"
WriteLock["file.write()"]
AppendData["Append payload + metadata"]
Flush["flush()"]
Reindex["reindex()"]
NewMmap["init_mmap() → new Mmap"]
SwapMmap["Mutex: *guard = Arc::new(new_mmap)"]
UpdateIndex["KeyIndexer: insert offsets"]
end
subgraph "Iterator Path"
IterCreate["iter_entries()"]
CloneMmap["get_mmap_arc()"]
IterNew["EntryIterator::new(mmap_arc, tail)"]
IterNext["next() → EntryHandle"]
end
OpenFile --> InitMmap1
InitMmap1 --> Recovery
Recovery --> ReinitMmap
ReinitMmap --> BuildIndex
BuildIndex --> StoreMmap
StoreMmap -.available for.-> GetArc
GetArc --> ReadLock
ReadLock --> Lookup
Lookup --> ConstructHandle
ConstructHandle --> AsSlice
StoreMmap -.available for.-> WriteLock
WriteLock --> AppendData
AppendData --> Flush
Flush --> Reindex
Reindex --> NewMmap
NewMmap --> SwapMmap
SwapMmap --> UpdateIndex
StoreMmap -.available for.-> IterCreate
IterCreate --> CloneMmap
CloneMmap --> IterNew
IterNew --> IterNext
Complete Lifecycle
The following diagram maps the complete lifecycle of memory-mapped access, from initial file open through reads and writes to iterator cleanup:
Diagram: Complete memory management lifecycle
Sources: src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:1040-1049 src/storage_engine/data_store.rs:752-825 src/storage_engine/data_store.rs:276-280
Code Entity Mapping
The following table maps high-level concepts to specific code entities:
| Concept | Code Entity | Location |
|---|---|---|
| Memory-mapped file | memmap2::Mmap | src/storage_engine/data_store.rs9 |
| Shared mmap reference | Arc<Mmap> | Throughout codebase |
| Mmap container | Arc<Mutex<Arc<Mmap>>> | src/storage_engine/data_store.rs29 |
| Mmap initialization | init_mmap(file: &BufWriter<File>) | src/storage_engine/data_store.rs:172-174 |
| Mmap retrieval | get_mmap_arc(&self) | src/storage_engine/data_store.rs:658-663 |
| Remapping operation | reindex(&self, write_guard, offsets, tail, deleted) | src/storage_engine/data_store.rs:224-259 |
| Zero-copy handle | simd_r_drive_entry_handle::EntryHandle | Separate crate |
| Iterator with mmap | EntryIterator { mmap: Arc<Mmap>, ... } | src/storage_engine/entry_iterator.rs:21-25 |
| Raw mmap pointer (testing) | arc_ptr(&self) → *const u8 | src/storage_engine/data_store.rs:653-655 |
Sources: src/storage_engine/data_store.rs:1-33 src/storage_engine/entry_iterator.rs:21-25
Safety Considerations
OS Page Cache Integration
The memory-mapped approach delegates memory management to the OS page cache:
Diagram: OS page cache interaction with memory-mapped region
Key benefits:
- Pages loaded on-demand (lazy loading)
- OS handles eviction when memory is tight
- Multiple processes can share the same page cache entries
- No explicit memory allocation in application code
Sources: README.md:43-50 README.md174
Large File Handling
The system is designed to handle datasets larger than available RAM. The memory mapping does not load the entire file into RAM:
| File Size | RAM Usage | Behavior |
|---|---|---|
| < Available RAM | Entire file may be cached | Fast access, no swapping |
Available RAM| Only accessed pages cached| OS loads pages on-demand
Available RAM| LRU page eviction active| Older pages evicted as needed
When iterating or reading, only the accessed byte ranges are loaded into physical memory. The OS automatically evicts least-recently-used pages under memory pressure.
Sources: README.md:45-50
Unsafe Code Boundaries
Memory mapping inherently requires unsafe code:
DataStore::init_mmap()
└─> unsafe { memmap2::MmapOptions::new().map(file) }
The memmap2 crate provides safe abstractions over this unsafe operation, ensuring:
- The file descriptor remains valid while mapped
- The mapped region respects file size boundaries
- Concurrent modifications to the file (outside the mmap) are handled correctly
SIMD R Drive’s architecture ensures safety by:
- Never resizing the file while an mmap exists
- Remapping after writes extend the file
- Using
Arc<Mmap>to prevent use-after-unmap bugs
Sources: src/storage_engine/data_store.rs:172-174 src/lib.rs:123-124
Thread Safety Guarantees
The nested Arc<Mutex<Arc<Mmap>>> structure provides these guarantees:
| Operation | Synchronization | Safety Property |
|---|---|---|
Reading from Arc<Mmap> | None (lock-free) | Safe: immutable data |
Cloning Arc<Mmap> | Atomic refcount | Safe: no data race |
| Remapping | Mutex held | Safe: serialized with other remaps |
| Old mmap still referenced | Independent Arc | Safe: won’t be unmapped |
| Concurrent reads + remap | Separate Arc instances | Safe: readers use old or new mmap |
The key insight is that remapping creates a new Arc<Mmap> without invalidating existing references. Readers holding old Arc<Mmap> instances continue accessing the old mapping until they drop their references.
Sources: src/storage_engine/data_store.rs:26-33 README.md:174-183 README.md:196-206
Memory Pressure and Resource Management
Automatic Resource Cleanup
When memory pressure increases, the OS automatically evicts pages from the page cache. However, the Mmap object itself is small—it only holds file descriptor information and address space pointers. The actual memory is managed by the kernel.
Arc<Mmap> ensures that:
- The file is not unmapped while any thread holds a reference
- When the last
Arcis dropped, theMmapdestructor unmaps the region - The OS then reclaims the virtual address space
Sources: src/storage_engine/data_store.rs:658-663
Testing Hooks
For validation and testing, the system exposes mmap internals in debug builds:
| Method | Purpose | Availability |
|---|---|---|
get_mmap_arc_for_testing() | Returns Arc<Mmap> for inspection | #[cfg(any(test, debug_assertions))] |
arc_ptr() | Returns raw *const u8 pointer | #[cfg(any(test, debug_assertions))] |
These methods allow tests to verify zero-copy behavior by comparing pointer addresses and validating that slices point directly into the mapped region.
Sources: src/storage_engine/data_store.rs:631-656
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Concurrency and Thread Safety
Loading…
Concurrency and Thread Safety
Relevant source files
- README.md
- benches/storage_benchmark.rs
- src/main.rs
- src/storage_engine/data_store.rs
- src/utils/format_bytes.rs
- tests/concurrency_tests.rs
Purpose and Scope
This document describes the concurrency model and thread safety guarantees of the SIMD R Drive storage engine. It covers the synchronization primitives used to enable safe multi-threaded access within a single process, including lock strategies for reads and writes, atomic operations, and memory map management.
For information about the core storage architecture and data structures, see Storage Architecture. For details on memory-mapped file usage, see Memory Management and Zero-Copy Access.
Key Limitation : The concurrency mechanisms described here apply only to single-process, multi-threaded environments. Multiple processes accessing the same storage file simultaneously are not supported and require external file locking mechanisms.
Concurrency Model Overview
The DataStore structure uses a combination of read-write locks, atomic operations, and mutexes to enable safe concurrent access across multiple threads while maintaining data consistency.
Diagram: DataStore Synchronization Architecture
graph TB
subgraph "DataStore Synchronization Primitives"
FILE["Arc<RwLock<BufWriter<File>>>\nfile"]
MMAP["Arc<Mutex<Arc<Mmap>>>\nmmap"]
TAIL["AtomicU64\ntail_offset"]
INDEX["Arc<RwLock<KeyIndexer>>\nkey_indexer"]
end
subgraph "Write Operations"
W_STREAM["write_stream"]
W_SINGLE["write"]
W_BATCH["batch_write"]
end
subgraph "Read Operations"
R_SINGLE["read"]
R_BATCH["batch_read"]
R_ITER["iter_entries"]
end
W_STREAM --> FILE
W_SINGLE --> FILE
W_BATCH --> FILE
W_STREAM --> TAIL
W_SINGLE --> TAIL
W_BATCH --> TAIL
W_STREAM -.updates.-> MMAP
W_SINGLE -.updates.-> MMAP
W_BATCH -.updates.-> MMAP
W_STREAM -.updates.-> INDEX
W_SINGLE -.updates.-> INDEX
W_BATCH -.updates.-> INDEX
R_SINGLE --> INDEX
R_BATCH --> INDEX
R_ITER --> MMAP
R_SINGLE --> MMAP
R_BATCH --> MMAP
Sources: src/storage_engine/data_store.rs:27-33 README.md:172-183
Synchronization Primitives
DataStore Field Overview
The DataStore struct contains four primary fields that implement concurrency control:
| Field | Type | Purpose | Lock Type |
|---|---|---|---|
file | Arc<RwLock<BufWriter<File>>> | File handle for writes | Read-write lock |
mmap | Arc<Mutex<Arc<Mmap>>> | Memory-mapped view | Exclusive mutex |
tail_offset | AtomicU64 | Current file end position | Lock-free atomic |
key_indexer | Arc<RwLock<KeyIndexer>> | Hash index for lookups | Read-write lock |
Sources: src/storage_engine/data_store.rs:27-33
RwLock for File Writes
All write operations acquire an exclusive write lock on the file handle to prevent concurrent modifications.
Diagram: Write Lock Serialization
sequenceDiagram
participant T1 as "Thread 1"
participant T2 as "Thread 2"
participant FILE as "RwLock<File>"
T1->>FILE: write.lock() - acquire
Note over T1,FILE: Thread 1 holds write lock
T2->>FILE: write.lock() - blocks
Note over T2: Thread 2 waits
T1->>FILE: write data + flush
T1->>FILE: release lock
Note over FILE: Lock released
T2->>FILE: acquire lock
Note over T2,FILE: Thread 2 now writes
T2->>FILE: write data + flush
T2->>FILE: release lock
Write Lock Acquisition
Write operations acquire the lock at the start of the write process:
This pattern appears in:
write_stream_with_key_hash: src/storage_engine/data_store.rs:759-762batch_write_with_key_hashes: src/storage_engine/data_store.rs:852-855
The write lock ensures that only one thread can append data to the file at any given time, preventing:
- Race conditions on file position
- Interleaved writes corrupting the append-only chain
- Inconsistent metadata ordering
Sources: src/storage_engine/data_store.rs:752-825 src/storage_engine/data_store.rs:847-945 README.md176
AtomicU64 for Tail Offset
The tail_offset field tracks the current end of the valid data in the storage file using atomic operations, enabling lock-free reads of the current file position.
Atomic Operations Used
| Operation | Method | Purpose |
|---|---|---|
| Load | load(Ordering::Acquire) | Read current tail position |
| Store | store(offset, Ordering::Release) | Update tail after write |
Load Operation
Reads use Acquire ordering to ensure they see all previous writes:
Examples:
iter_entries: src/storage_engine/data_store.rs278- Write operations reading previous tail: src/storage_engine/data_store.rs763 src/storage_engine/data_store.rs858
Store Operation
Writes use Release ordering to ensure all previous writes are visible:
Location: src/storage_engine/data_store.rs256
This atomic coordination ensures that:
- Readers always see a consistent tail offset
- Writers update the tail only after data is flushed
- No locks are needed for reading the tail position
Sources: src/storage_engine/data_store.rs30 src/storage_engine/data_store.rs256 src/storage_engine/data_store.rs278 README.md182
graph LR
subgraph "Memory Map Management"
MUTEX["Mutex<Arc<Mmap>>"]
MMAP1["Arc<Mmap> v1"]
MMAP2["Arc<Mmap> v2"]
end
subgraph "Readers"
R1["Reader Thread 1"]
R2["Reader Thread 2"]
end
subgraph "Writer"
W["Writer Thread"]
end
R1 -.clones.-> MMAP1
R2 -.clones.-> MMAP1
W -->|1. Lock mutex| MUTEX
W -->|2. Create new| MMAP2
W -->|3. Replace| MUTEX
W -->|4. Release| MUTEX
MMAP1 -.remains valid.-> R1
MMAP1 -.remains valid.-> R2
Mutex for Memory Map
The memory-mapped file reference is protected by a Mutex<Arc<Mmap>> to prevent concurrent remapping during reads.
Diagram: Memory Map Arc Cloning Pattern
Accessing the Memory Map
Read operations clone the Arc<Mmap> to obtain a stable reference:
Source: src/storage_engine/data_store.rs:658-663
This pattern ensures:
- Readers hold a reference to a specific memory map version
- Writers can create a new memory map without invalidating existing readers
- The
Arcreference counting prevents premature deallocation - The mutex is held only briefly during the clone operation
Remapping After Writes
After writing and flushing data, the reindex function creates a new memory map:
Source: src/storage_engine/data_store.rs:231-255
Sources: src/storage_engine/data_store.rs29 src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:658-663 README.md180
RwLock for Key Index
The KeyIndexer is protected by a read-write lock, allowing multiple concurrent readers but exclusive writers.
Read Access Pattern
Multiple threads can acquire read locks simultaneously for lookups:
Example: src/storage_engine/data_store.rs509
Write Access Pattern
Index updates require exclusive write access:
Source: src/storage_engine/data_store.rs:233-253
Parallel Iterator Lock Strategy
The parallel iterator minimizes lock holding time by collecting offsets first:
Source: src/storage_engine/data_store.rs:300-302
Sources: src/storage_engine/data_store.rs31 src/storage_engine/data_store.rs:233-253 src/storage_engine/data_store.rs:300-302 README.md178
sequenceDiagram
participant R1 as "Reader 1"
participant R2 as "Reader 2"
participant R3 as "Reader 3"
participant INDEX as "RwLock<KeyIndexer>"
participant MMAP as "Arc<Mmap>"
par Concurrent Reads
R1->>INDEX: read().lock() - acquire
R2->>INDEX: read().lock() - acquire
R3->>INDEX: read().lock() - acquire
end
par Index Lookups
R1->>INDEX: get_packed(key_hash_1)
R2->>INDEX: get_packed(key_hash_2)
R3->>INDEX: get_packed(key_hash_3)
end
Note over R1,R3: All readers release index lock
par Zero-Copy Access
R1->>MMAP: Access offset_1
R2->>MMAP: Access offset_2
R3->>MMAP: Access offset_3
end
Note over R1,MMAP: No locks during data access
Lock-Free Read Operations
Read operations achieve lock-free access through memory-mapped files and atomic operations.
Diagram: Concurrent Lock-Free Read Pattern
Zero-Copy Read Implementation
Once the offset is obtained from the index, data access is lock-free:
Source: src/storage_engine/data_store.rs:502-565
Benefits of Lock-Free Reads
- No Read Contention : Multiple readers access different memory regions simultaneously
- Zero-Copy : Data is accessed directly from the memory map without copying
- Scalability : Read throughput scales linearly with CPU cores
- Low Latency : No lock acquisition overhead after index lookup
Sources: README.md174 src/storage_engine/data_store.rs:502-565 tests/concurrency_tests.rs:163-229
graph TB
subgraph "Write Operation Phases"
ACQUIRE["1. Acquire RwLock<File>"]
LOAD["2. Load tail_offset\n(Atomic)"]
CALC["3. Calculate pre-padding"]
WRITE["4. Write payload + metadata"]
FLUSH["5. Flush to disk"]
REMAP["6. Remap file (Mutex)"]
UPDATE["7. Update index (RwLock)"]
STORE["8. Store new tail_offset\n(Atomic)"]
RELEASE["9. Release file lock"]
end
ACQUIRE --> LOAD
LOAD --> CALC
CALC --> WRITE
WRITE --> FLUSH
FLUSH --> REMAP
REMAP --> UPDATE
UPDATE --> STORE
STORE --> RELEASE
Write Synchronization
Write operations are fully serialized through the file lock, ensuring consistency.
Diagram: Write Operation Synchronization Flow
Single Write Flow
Source: src/storage_engine/data_store.rs:758-825
Batch Write Optimization
Batch writes hold the lock once for multiple entries:
Source: src/storage_engine/data_store.rs:847-945
Sources: src/storage_engine/data_store.rs:752-825 src/storage_engine/data_store.rs:847-945 README.md176
Thread Safety Guarantees
Thread Safety Matrix
The following table summarizes thread safety guarantees for different environments:
| Environment | Reads | Writes | Index Updates | Storage Safety |
|---|---|---|---|---|
| Single Process, Single Thread | ✅ Safe | ✅ Safe | ✅ Safe | ✅ Safe |
| Single Process, Multi-Threaded | ✅ Safe (lock-free, zero-copy) | ✅ Safe (RwLock<File>) | ✅ Safe (RwLock<KeyIndexer>) | ✅ Safe (Mutex<Arc<Mmap>>) |
| Multiple Processes, Shared File | ⚠️ Unsafe (no cross-process coordination) | ❌ Unsafe (no external locking) | ❌ Unsafe (separate memory spaces) | ❌ Unsafe (risk of race conditions) |
Source: README.md:196-200
graph TB
subgraph "Thread Safety Properties"
P1["Atomic Tail Updates\nNo torn reads/writes"]
P2["Serialized File Writes\nNo interleaved data"]
P3["Consistent Index View\nRwLock guarantees"]
P4["Valid Memory Maps\nArc prevents premature free"]
P5["Backward Chain Integrity\nSequential offsets"]
end
P1 --> SAFE["Thread-Safe\nMulti-Reader/Single-Writer"]
P2 --> SAFE
P3 --> SAFE
P4 --> SAFE
P5 --> SAFE
Safe Concurrency Properties
The design ensures the following properties in single-process, multi-threaded environments:
Diagram: Thread Safety Property Dependencies
- Atomicity : All operations on shared state are atomic or properly locked
- Visibility : Changes made by one thread are visible to others through Release/Acquire semantics
- Ordering : The append-only design ensures writes happen in a strict sequence
- Isolation : Readers see a consistent snapshot via
Arc<Mmap>cloning
Sources: README.md:172-206 src/storage_engine/data_store.rs:27-33
Single-Process vs Multi-Process
Single-Process Multi-Threaded (Supported)
All synchronization primitives work correctly within a single process:
Diagram: Single-Process Shared State
Example from concurrency tests:
Source: tests/concurrency_tests.rs:117-137
graph TB
subgraph "Process 1"
P1_DS["DataStore"]
P1_INDEX["KeyIndexer\n(separate)"]
P1_MMAP["Mmap\n(separate)"]
end
subgraph "Process 2"
P2_DS["DataStore"]
P2_INDEX["KeyIndexer\n(separate)"]
P2_MMAP["Mmap\n(separate)"]
end
FILE["storage.bin\n(shared file)"]
P1_DS --> P1_INDEX
P1_DS --> P1_MMAP
P2_DS --> P2_INDEX
P2_DS --> P2_MMAP
P1_MMAP -.unsafe.-> FILE
P2_MMAP -.unsafe.-> FILE
Multi-Process (Not Supported)
Multiple processes have separate address spaces and cannot share the in-memory synchronization primitives:
Diagram: Multi-Process Unsafe Access
Why Multi-Process is Unsafe
- Separate Index State : Each process has its own
KeyIndexerin memory - Independent Mmap Views : Memory maps are not synchronized across processes
- No Lock Coordination :
RwLockandMutexare process-local, not system-wide - Race Conditions : Concurrent writes can corrupt the file structure
Recommendation : Use external file locking (e.g., flock, advisory locks) if multi-process access is required.
Sources: README.md:186-206 README.md:189-191
Testing Concurrency
The test suite validates concurrent access patterns to ensure thread safety guarantees.
Concurrent Write Test
Tests multiple threads writing simultaneously:
Source: tests/concurrency_tests.rs:111-161
Interleaved Read-Write Test
Tests read-after-write consistency with coordinated threads:
Source: tests/concurrency_tests.rs:163-229
Concurrent Streamed Write Test
Tests slow, streaming writes that hold the lock for extended periods:
Source: tests/concurrency_tests.rs:14-109
Sources: tests/concurrency_tests.rs:1-230
Summary
The SIMD R Drive concurrency model provides thread-safe access through a carefully coordinated set of synchronization primitives:
- RwLock : Serializes file writes while allowing concurrent reads of the lock
- AtomicU64 : Provides lock-free tail offset tracking
- Mutex : Protects memory map updates without blocking existing readers
- RwLock : Enables highly concurrent index reads with exclusive write access
This design achieves:
- Zero-copy concurrent reads via memory mapping
- Serialized writes preventing data corruption
- Linear read scalability across CPU cores
- Consistent snapshots through atomic operations
However, these guarantees apply only within a single process. Multi-process access requires external coordination mechanisms.
Sources: README.md:170-206 src/storage_engine/data_store.rs:26-33 tests/concurrency_tests.rs:1-230
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Key Indexing and Hashing
Loading…
Key Indexing and Hashing
Relevant source files
- README.md
- experiments/bindings/python-ws-client/pyproject.toml
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi
- src/storage_engine/data_store.rs
- src/storage_engine/key_indexer.rs
Purpose and Scope
This page documents the key indexing system and hashing mechanisms used in SIMD R Drive’s storage engine. It covers the KeyIndexer data structure, the XXH3 hashing algorithm, tag-based collision detection, and hardware acceleration features.
For information about how the index is accessed in concurrent operations, see Concurrency and Thread Safety. For details on how metadata is stored alongside payloads, see Entry Structure and Metadata.
Overview
The SIMD R Drive storage engine maintains an in-memory index that maps key hashes to file offsets, enabling O(1) lookup performance for stored entries. This index is critical for avoiding full file scans when retrieving data.
The indexing system consists of three main components:
KeyIndexer: A concurrent hash map that stores packed values containing both a collision-detection tag and a file offset- XXH3_64 hashing : A fast, hardware-accelerated hashing algorithm that generates 64-bit hashes from arbitrary keys
- Tag-based verification : A secondary collision detection mechanism that validates lookups to prevent hash collision errors
Sources: src/storage_engine/data_store.rs:1-33 README.md:158-168
KeyIndexer Structure
The KeyIndexer struct is defined in src/storage_engine/key_indexer.rs:56-59 and manages the in-memory hash index. It wraps a HashMap<u64, u64, Xxh3BuildHasher> where keys are XXH3 key hashes and values are packed 64-bit integers containing both a collision-detection tag and file offset.
graph TB
subgraph DataStore["DataStore struct"]
KeyIndexerField["key_indexer: Arc<RwLock<KeyIndexer>>"]
end
subgraph KeyIndexer["KeyIndexer struct"]
IndexField["index: HashMap<u64, u64, Xxh3BuildHasher>"]
MapKey["Key: u64\n(key_hash from compute_hash)"]
MapValue["Value: u64\n(packed tag / offset)"]
end
subgraph Constants["Key Constants"]
TagBits["TAG_BITS = 16"]
OffsetMask["OFFSET_MASK = (1 << 48) - 1"]
end
subgraph Methods["Public Methods"]
TagFromHash["tag_from_hash(key_hash) -> u16"]
TagFromKey["tag_from_key(key) -> u16"]
Pack["pack(tag, offset) -> u64"]
Unpack["unpack(packed) -> (u16, u64)"]
Build["build(mmap, tail_offset) -> Self"]
Insert["insert(key_hash, offset) -> Result"]
GetPacked["get_packed(&key_hash) -> Option<&u64>"]
GetOffset["get_offset(&key_hash) -> Option<u64>"]
Remove["remove(&key_hash) -> Option<u64>"]
end
KeyIndexerField --> IndexField
IndexField --> MapKey
IndexField --> MapValue
MapValue --> TagBits
MapValue --> OffsetMask
Pack -.uses.-> TagBits
Pack -.uses.-> OffsetMask
Unpack -.uses.-> TagBits
Unpack -.uses.-> OffsetMask
Packed Value Format
The KeyIndexer stores a compact 64-bit packed value for each hash. This value is constructed by the pack function src/storage_engine/key_indexer.rs:79-85 and decoded by unpack src/storage_engine/key_indexer.rs:88-93:
| Bits | Field | Description | Constant Used |
|---|---|---|---|
| 63-48 | Tag (16-bit) | Collision detection tag from upper hash bits | TAG_BITS = 16 |
| 47-0 | Offset (48-bit) | Absolute file offset to entry metadata | OFFSET_MASK = 0xFFFFFFFFFFFF |
The packing formula is: packed = (tag << (64 - TAG_BITS)) | offset
The unpacking extracts: tag = (packed >> (64 - TAG_BITS)) as u16 and offset = packed & OFFSET_MASK
Maximum Addressable File Size : The 48-bit offset field supports files up to 256 TiB (2^48 bytes). Attempting to use larger offsets will panic in debug builds due to debug_assert! checks src/storage_engine/key_indexer.rs:80-83
Sources: src/storage_engine/key_indexer.rs:9-15 src/storage_engine/key_indexer.rs:56-59 src/storage_engine/key_indexer.rs:79-93 src/storage_engine/data_store.rs31
Hashing Algorithm: XXH3_64
SIMD R Drive uses the XXH3_64 hashing algorithm from the xxhash-rust crate src/storage_engine/digest.rs XXH3 is optimized for speed and provides automatic hardware acceleration through SIMD instructions.
graph TB
subgraph DigestModule["storage_engine::digest module"]
ComputeHash["compute_hash(key: &[u8]) -> u64"]
ComputeHashBatch["compute_hash_batch(keys: &[&[u8]]) -> Vec<u64>"]
ComputeChecksum["compute_checksum(payload: &[u8]) -> [u8; 4]"]
Xxh3BuildHasher["Xxh3BuildHasher struct"]
end
subgraph Callers["Usage in DataStore"]
WriteMethod["write(key, payload)"]
BatchWrite["batch_write(entries)"]
KeyIndexerHashMap["HashMap<u64, u64, Xxh3BuildHasher>"]
end
WriteMethod -->|calls| ComputeHash
BatchWrite -->|calls| ComputeHashBatch
WriteMethod -->|calls| ComputeChecksum
KeyIndexerHashMap -->|uses hasher| Xxh3BuildHasher
ComputeHash -->|produces| KeyHash["key_hash: u64"]
ComputeHashBatch -->|produces| KeyHashes["Vec<u64>"]
Hash Function API
The digest module exports the following functions used throughout the codebase:
| Function | Signature | Implementation | Use Case |
|---|---|---|---|
compute_hash | fn(key: &[u8]) -> u64 | Wraps xxhash_rust::xxh3::xxh3_64 | Single key hashing |
compute_hash_batch | fn(keys: &[&[u8]]) -> Vec<u64> | Parallel iterator over keys | Batch write operations |
compute_checksum | fn(payload: &[u8]) -> [u8; 4] | CRC32C from crc32fast | Payload integrity checks |
Xxh3BuildHasher | struct implementing BuildHasher | Custom hasher for HashMap | KeyIndexer HashMap hasher |
The compute_hash_batch function leverages Rayon for parallel hashing when processing multiple keys simultaneously src/storage_engine/data_store.rs:839-842
graph LR
KeyInput["Input: &[u8]"]
subgraph XXH3Crate["xxhash-rust crate"]
CPUDetect["Runtime CPU\nFeature Detection"]
subgraph x86Implementation["x86_64 Implementation"]
SSE2Path["SSE2 Path\n(always available)"]
AVX2Path["AVX2 Path\n(if cpuid detects)"]
end
subgraph ARMImplementation["aarch64 Implementation"]
NEONPath["NEON Path\n(default on ARM)"]
end
subgraph FallbackImplementation["Fallback"]
ScalarPath["Scalar Operations"]
end
end
Output["Output: u64 hash"]
KeyInput --> CPUDetect
CPUDetect --> SSE2Path
CPUDetect --> AVX2Path
CPUDetect --> NEONPath
CPUDetect --> ScalarPath
SSE2Path --> Output
AVX2Path --> Output
NEONPath --> Output
ScalarPath --> Output
Hardware Acceleration
The XXH3_64 implementation automatically detects and utilizes CPU-specific SIMD instructions at runtime:
| Platform | Default SIMD | Optional Features | Detection Method |
|---|---|---|---|
| x86_64 | SSE2 (baseline) | AVX2 | Runtime cpuid instruction |
| aarch64 | NEON (always) | None | Compile-time default |
| Other | Scalar fallback | None | Compile-time detection |
The hardware acceleration is transparent to the application code. The compute_hash function signature remains the same regardless of which SIMD path is taken README.md:160-165
Sources: src/storage_engine/digest.rs src/storage_engine/data_store.rs:2-4 src/storage_engine/key_indexer.rs2 README.md:158-168
Tag-Based Collision Detection
While XXH3_64 produces high-quality 64-bit hashes, the system implements an additional collision detection layer using 16-bit tags. The tag is derived from the upper 16 bits of the key hash src/storage_engine/key_indexer.rs:64-66
Tag Computation Methods
Two methods generate tags for collision detection:
| Method | Signature | Source | Usage Context |
|---|---|---|---|
tag_from_hash | fn(key_hash: u64) -> u16 | src/storage_engine/key_indexer.rs:64-66 | When hash is known |
tag_from_key | fn(key: &[u8]) -> u16 | src/storage_engine/key_indexer.rs:69-72 | Direct from key bytes |
The tag_from_hash function extracts the tag: (key_hash >> (64 - TAG_BITS)) as u16
The tag_from_key function computes: tag_from_hash(compute_hash(key))
graph TB
subgraph WriteFlow["Write Operation Flow"]
Key["key: &[u8]"]
ComputeHash["compute_hash(key)"]
KeyHash["key_hash: u64"]
TagFromHash["KeyIndexer::tag_from_hash(key_hash)"]
Tag["tag: u16"]
WriteData["Write payload + metadata to file"]
Offset["metadata_offset: u64"]
Pack["KeyIndexer::pack(tag, offset)"]
PackedValue["packed: u64"]
Insert["key_indexer.insert(key_hash, offset)"]
end
Key --> ComputeHash
ComputeHash --> KeyHash
KeyHash --> TagFromHash
TagFromHash --> Tag
WriteData --> Offset
Tag --> Pack
Offset --> Pack
Pack --> PackedValue
KeyHash --> Insert
PackedValue --> Insert
Write-Time Tag Storage
During write operations, the tag is packed with the offset before insertion into the index:
The insert method in KeyIndexer src/storage_engine/key_indexer.rs:135-160 performs collision detection at write time by verifying that the new tag matches any existing tag for the same key hash.
Read-Time Tag Verification
The read_entry_with_context method src/storage_engine/data_store.rs:501-565 implements tag verification during reads:
The verification logic src/storage_engine/data_store.rs:513-521:
Collision Probability Analysis
The dual-layer verification provides strong collision resistance:
| Layer | Bits | Collision Probability | Description |
|---|---|---|---|
| XXH3_64 Hash | 64 | ~2^-64 | Primary hash collision |
| Tag Verification | 16 | ~2^-16 | Secondary tag collision given hash collision |
| Combined | 80 | ~2^-80 | Both hash and tag must collide |
With 2^16 = 65,536 possible tag values, the tag check provides sufficient discrimination for practical workloads. The KeyIndexer documentation src/storage_engine/key_indexer.rs:20-56 notes this can distinguish over 4 billion keys with ~50% collision probability (birthday bound).
Sources: src/storage_engine/key_indexer.rs:9-72 src/storage_engine/key_indexer.rs:135-160 src/storage_engine/data_store.rs:501-565
Write-Time Collision Rejection
The KeyIndexer::insert method src/storage_engine/key_indexer.rs:135-160 enforces collision detection at write time:
If a collision is detected (same hash, different tag), the write operation fails with an error src/storage_engine/data_store.rs:245-251 This prevents index corruption and ensures data integrity.
Sources: src/storage_engine/key_indexer.rs:135-160 src/storage_engine/data_store.rs:238-252
Index Building and Maintenance
Index Construction on Open
When DataStore::open src/storage_engine/data_store.rs:84-117 is called, the KeyIndexer is constructed by the static build method src/storage_engine/key_indexer.rs:98-124 which scans backward through the validated storage file:
The backward scan ensures only the most recent version of each key is indexed. Keys seen earlier in the scan (which represent newer entries) are added to the seen HashSet to skip older versions src/storage_engine/key_indexer.rs:108-111
Sources: src/storage_engine/data_store.rs:84-117 src/storage_engine/key_indexer.rs:98-124
Index Updates During Writes
After each write operation, the reindex method src/storage_engine/data_store.rs:224-259 updates the in-memory index with new key mappings:
Critical : The file must be flushed before calling reindex src/storage_engine/data_store.rs814 to ensure newly written data is visible in the new memory-mapped view. The flush guarantees that the OS has persisted the data to disk before the mmap is recreated.
The key_indexer.insert call may return an error if a hash collision is detected at write time src/storage_engine/data_store.rs:246-250 In this case, the entire batch operation is aborted to prevent an inconsistent index state.
Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:818-824
Concurrent Access and Locking
The KeyIndexer is protected by an Arc<RwLock<KeyIndexer>> wrapper src/storage_engine/data_store.rs31 enabling multiple concurrent readers while ensuring exclusive access for writers.
graph TB
subgraph DataStoreField["DataStore struct field"]
KeyIndexerArc["key_indexer: Arc<RwLock<KeyIndexer>>"]
end
subgraph ReadOperations["Read Operations (Shared Lock)"]
ReadOp["read(key)"]
BatchReadOp["batch_read(keys)"]
IterEntries["iter_entries()"]
ParIterEntries["par_iter_entries() (parallel feature)"]
end
subgraph WriteOperations["Write Operations (Exclusive Lock)"]
WriteOp["write(key, payload)"]
BatchWriteOp["batch_write(entries)"]
DeleteOp["delete(key)"]
ReindexOp["reindex() (internal)"]
end
subgraph LockAcquisition["Lock Acquisition"]
ReadLock["key_indexer.read().unwrap()"]
WriteLock["key_indexer.write().map_err(...)"]
end
ReadOp --> ReadLock
BatchReadOp --> ReadLock
IterEntries --> ReadLock
ParIterEntries --> ReadLock
WriteOp --> WriteLock
BatchWriteOp --> WriteLock
DeleteOp --> WriteLock
ReindexOp --> WriteLock
ReadLock -.-> KeyIndexerArc
WriteLock -.-> KeyIndexerArc
Lock Granularity
| Operation | Lock on key_indexer | Lock on file | Atomicity |
|---|---|---|---|
read(key) | Read (shared) | None | Lock-free read |
batch_read(keys) | Read (shared) | None | Lock-free reads in batch |
write(key, data) | Write (exclusive) | Write (exclusive) | Full write atomicity |
batch_write(...) | Write (exclusive) | Write (exclusive) | Batch atomicity |
delete(key) | Write (exclusive) | Write (exclusive) | Tombstone write atomicity |
The reindex method acquires both the mmap mutex src/storage_engine/data_store.rs232 and the key_indexer write lock src/storage_engine/data_store.rs:233-236 to atomically update both structures after a write operation.
Parallel Iteration : When the parallel feature is enabled, par_iter_entries src/storage_engine/data_store.rs:296-361 clones all packed values under a read lock, then releases the lock before parallel processing. This allows concurrent reads during parallel iteration.
Sources: src/storage_engine/data_store.rs31 src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:296-361
Performance Characteristics
Lookup Performance
The KeyIndexer provides O(1) average-case lookup performance using the HashMap src/storage_engine/key_indexer.rs58 The get_packed method src/storage_engine/key_indexer.rs:163-166 performs a single hash table lookup.
Empirical performance from README README.md:166-167:
- 1 million random seeks (8-byte entries): typically < 1 second
- Hash computation overhead : negligible due to SIMD acceleration
- Tag verification overhead : minimal (one bit shift + one comparison)
Memory Overhead
Each index entry in the HashMap<u64, u64, Xxh3BuildHasher> consumes:
| Component | Size (bytes) | Description |
|---|---|---|
Key (u64) | 8 | XXH3 hash of the key |
Value (u64) | 8 | Packed (tag |
| HashMap overhead | ~16-24 | Bucket pointers and metadata |
| Total per entry | 32-40 | Approximate overhead per unique key |
For a dataset with 1 million unique keys , the KeyIndexer occupies approximately 32-40 MB of RAM. This is a small fraction of typical system memory, enabling efficient indexing even for large datasets.
Batch Operation Performance
The compute_hash_batch function src/storage_engine/data_store.rs:839-842 leverages Rayon for parallel hashing:
This parallel hashing provides near-linear speedup with CPU core count for large batches, as each key hash is computed independently.
Hardware Acceleration Impact
SIMD acceleration in XXH3_64 provides measurable performance improvements for hash-intensive workloads:
| Platform | SIMD Instructions | Relative Performance | Speedup vs Scalar |
|---|---|---|---|
| x86_64 | SSE2 | ~2-3x faster | Baseline |
| x86_64 | AVX2 | ~3-4x faster | 1.5x over SSE2 |
| aarch64 | NEON | ~2-3x faster | Baseline |
| Fallback | Scalar | 1x (baseline) | N/A |
Performance gains are most significant for:
batch_writeoperations with many keyscompute_hash_batchcalls processing large key sets- Workloads with small payload sizes where hashing dominates
Sources: src/storage_engine/key_indexer.rs:56-59 src/storage_engine/data_store.rs:838-843 README.md:160-167
Error Handling and Collision Management
Collision Probability Analysis
The dual-layer verification system (64-bit hash + 16-bit tag) provides strong collision resistance as documented in src/storage_engine/key_indexer.rs:17-56:
| Verification Layer | Bits | Probability | Description |
|---|---|---|---|
| XXH3_64 Hash Collision | 64 | 2^-64 | Two different keys produce same hash |
| Tag Collision (given hash collision) | 16 | 2^-16 | Tags match despite different keys |
| Combined Collision | 80 | 2^-80 | Both hash and tag must collide simultaneously |
The 16-bit tag provides 65,536 distinct values. According to the birthday paradox, this supports over 4 billion keys with ~50% collision probability src/storage_engine/key_indexer.rs:48-49
Write-Time Collision Handling
The KeyIndexer::insert method src/storage_engine/key_indexer.rs:135-160 enforces strict collision detection during writes. If a tag mismatch occurs, the insert returns Err:
The reindex method propagates this error and aborts the entire batch operation src/storage_engine/data_store.rs:246-250:
This fail-fast approach ensures:
- No partial writes that could corrupt the index
- Deterministic error handling (writes either fully succeed or fully fail)
- Index consistency is maintained across all operations
Read-Time Collision Handling
The read_entry_with_context method src/storage_engine/data_store.rs:501-565 detects collisions during reads when the original key is provided for verification src/storage_engine/data_store.rs:513-521:
When a read-time collision is detected:
- A warning is logged to help diagnose the issue
Noneis returned to the caller (key not found)- The index remains unchanged (reads do not modify state)
Read operations without the original key (e.g., when using pre-hashed keys) cannot perform tag verification and may return incorrect results if a hash collision exists. This is a tradeoff for performance in batch operations.
Sources: src/storage_engine/key_indexer.rs:17-56 src/storage_engine/key_indexer.rs:135-160 src/storage_engine/data_store.rs:238-252 src/storage_engine/data_store.rs:501-565
Summary
The key indexing and hashing system in SIMD R Drive provides:
- Fast lookups : O(1) hash-based access to entries
- Hardware acceleration : Automatic SIMD optimization on SSE2, AVX2, and NEON platforms
- Collision resistance : Dual-layer verification with 64-bit hashes and 16-bit tags
- Thread safety : Concurrent reads with exclusive writes via
RwLock - Low memory overhead : 16 bytes per unique key
This design enables efficient storage operations even for datasets with millions of entries, while maintaining data integrity through robust collision detection.
Sources: src/storage_engine/data_store.rs:1-1183 README.md:158-168
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Compaction and Maintenance
Loading…
Compaction and Maintenance
Relevant source files
Purpose and Scope
This page documents the maintenance operations available in the DataStore, focusing on compaction for space reclamation and automatic file recovery mechanisms. These operations ensure the storage engine remains efficient and resilient despite its append-only architecture.
For information about the underlying append-only storage model, see Storage Architecture. For details on entry structure that affects compaction, see Entry Structure and Metadata.
Sources: src/storage_engine/data_store.rs:1-1183
Compaction Process
Overview
The compaction process eliminates space waste caused by outdated entry versions. In the append-only model, updating a key creates a new entry while leaving the old version in the file. Compaction creates a new file containing only the latest version of each key, then atomically swaps it with the original.
graph TB
subgraph "Original DataStore"
DS["DataStore\n(self)"]
FILE["file: RwLock<BufWriter<File>>"]
MMAP["mmap: Mutex<Arc<Mmap>>"]
IDX["key_indexer: RwLock<KeyIndexer>"]
PATH["path: PathBuf"]
end
subgraph "Compaction Process"
COMPACT["compact()"]
ITER["iter_entries()"]
COPY["copy_handle()"]
end
subgraph "Temporary DataStore"
TEMP_DS["DataStore\n(compacted_storage)"]
TEMP_FILE["file (path + .bk)"]
TEMP_PATH["compacted_path"]
end
subgraph "Final Operation"
RENAME["std::fs::rename()"]
SWAP["Atomic File Swap"]
end
DS --> COMPACT
COMPACT --> TEMP_PATH
TEMP_PATH --> TEMP_DS
DS --> ITER
ITER --> COPY
COPY --> TEMP_DS
TEMP_DS --> TEMP_FILE
TEMP_FILE --> RENAME
RENAME --> SWAP
SWAP --> PATH
style COMPACT fill:#f9f9f9
style TEMP_DS fill:#f9f9f9
style SWAP fill:#f9f9f9
Architecture
Sources: src/storage_engine/data_store.rs:706-749
Implementation Details
The compact() method at src/storage_engine/data_store.rs:706-749 performs the following sequence:
Sources: src/storage_engine/data_store.rs:706-749 src/storage_engine/data_store.rs:587-590 src/storage_engine/data_store.rs:269-280
Key Implementation Characteristics
| Aspect | Implementation | Location |
|---|---|---|
| Temporary File | Appends .bk extension to original path | src/storage_engine/data_store.rs707 |
| Entry Selection | Uses iter_entries() which returns only latest versions | src/storage_engine/data_store.rs714 |
| Copy Mechanism | copy_handle() → EntryStream → write_stream_with_key_hash() | src/storage_engine/data_store.rs:587-590 |
| Atomic Swap | std::fs::rename() provides atomic replacement | src/storage_engine/data_store.rs746 |
| Space Optimization | Skips static index if it wouldn’t save space | src/storage_engine/data_store.rs:727-741 |
Sources: src/storage_engine/data_store.rs:706-749
Thread Safety Considerations
The compact() method has critical thread safety limitations documented at src/storage_engine/data_store.rs:681-693:
Important Constraints:
- Requires
&mut self: Prevents concurrent mutations but does NOT prevent concurrent reads - Arc-wrapped risk : If
DataStoreis wrapped inArc<DataStore>, other threads may hold read references during compaction - Recommendation : Only compact when you have exclusive access (single thread or external synchronization)
- No automatic locking : External synchronization must be enforced by the caller
Sources: src/storage_engine/data_store.rs:679-705
EntryHandle Copying Mechanism
The copy_handle() method at src/storage_engine/data_store.rs:587-590 creates an EntryStream from the entry’s memory-mapped data and writes it to the target store using the original key hash. This preserves the entry’s identity while creating a new physical copy.
Sources: src/storage_engine/data_store.rs:567-590
Space Reclamation
Estimating Compaction Savings
The estimate_compaction_savings() method at src/storage_engine/data_store.rs:605-616 calculates potential space savings without performing actual compaction:
Algorithm:
- Get total file size via
file_size() - Iterate through all valid entries (latest versions only)
- Track seen keys using
HashSet<u64>withXxh3BuildHasher - Sum the
file_size()of each unique entry - Return
total_size - unique_entry_size
Note: The iter_entries() method already filters to latest versions, so this calculation represents the minimum achievable size through compaction.
Sources: src/storage_engine/data_store.rs:592-616
When to Compact
Compaction should be considered when:
| Condition | Description | Detection Method |
|---|---|---|
| High Waste Ratio | Significant difference between total and unique entry sizes | estimate_compaction_savings() / file_size() > threshold |
| Frequent Updates | Many keys updated multiple times | Application-level tracking |
| Long-Running Storage | File has been in use for extended periods | Time-based policy |
| Before Backup | Minimize backup size | Pre-backup maintenance |
The compaction process at src/storage_engine/data_store.rs:727-741 includes logic to skip static index generation if it wouldn’t save space, indicating an awareness of space optimization trade-offs.
Sources: src/storage_engine/data_store.rs:605-616 src/storage_engine/data_store.rs:727-741
File Recovery
Chain Validation
The recover_valid_chain() method at src/storage_engine/data_store.rs:383-482 ensures data integrity by validating the backward-linked chain of entries:
Sources: src/storage_engine/data_store.rs:363-482
Validation Algorithm Details
The recovery process validates multiple constraints:
| Validation | Check | Purpose |
|---|---|---|
| Metadata Size | cursor >= METADATA_SIZE | Ensure enough space for metadata read |
| Prev Offset Bounds | prev_tail < metadata_offset | Prevent circular references |
| Entry Start | entry_start < metadata_offset | Ensure valid entry range |
| Pre-padding | prepad_len(prev_tail) | Account for 64-byte alignment |
| Tombstone Detection | Single NULL byte check | Handle deletion markers |
| Chain Depth | Walk to offset 0 | Ensure complete chain |
| Total Size | total_size <= file_len | Prevent overflow |
Sources: src/storage_engine/data_store.rs:383-482
Corruption Detection and Handling
When DataStore::open() is called at src/storage_engine/data_store.rs:84-117 it automatically invokes recovery:
Sources: src/storage_engine/data_store.rs:84-117
Automatic Recovery Steps
The recovery process at src/storage_engine/data_store.rs:91-103:
- Detection :
recover_valid_chain()returnsfinal_len < file_len - Warning : Logs truncation message with offsets
- Cleanup : Drops mmap and file handles
- Truncation : Re-opens file and calls
set_len(final_len) - Sync : Forces
sync_all()to persist truncation - Retry : Recursively calls
open()with clean file
This ensures corrupted tail data is removed and the file is left in a valid state.
Sources: src/storage_engine/data_store.rs:89-103
Maintenance Operations
Re-indexing After Writes
The reindex() method at src/storage_engine/data_store.rs:224-259 updates internal structures after writes:
Key Operations:
- Re-map file : Creates new
Mmapreflecting written data - Update index : Inserts or removes key-hash-to-offset mappings
- Collision check :
insert()returns error on hash collision (tag mismatch) - Atomic update : Stores new
tail_offsetwithReleaseordering - Visibility : New mmap makes written data visible to readers
Sources: src/storage_engine/data_store.rs:176-259
File Path and Metadata Access
| Method | Return Type | Purpose | Location |
|---|---|---|---|
get_path() | PathBuf | Returns storage file path | src/storage_engine/data_store.rs:265-267 |
file_size() | Result<u64> | Returns current file size | src/storage_engine/data_store.rs:1179-1181 |
len() | Result<usize> | Returns count of unique keys | src/storage_engine/data_store.rs:1164-1171 |
is_empty() | Result<bool> | Checks if storage has no entries | src/storage_engine/data_store.rs:1173-1177 |
These methods provide essential metadata for maintenance decision-making.
Sources: src/storage_engine/data_store.rs:261-267 src/storage_engine/data_store.rs:1164-1181
Iterators for Maintenance
The storage provides multiple iteration mechanisms useful for maintenance:
Iteration Characteristics:
- Sequential :
iter_entries()at src/storage_engine/data_store.rs:269-280 - Parallel :
par_iter_entries()at src/storage_engine/data_store.rs:283-361 (requiresparallelfeature) - Latest Only : Both return only the most recent version of each key
- Zero-Copy : Return
EntryHandlewith memory-mapped data references
Sources: src/storage_engine/data_store.rs:269-361
Summary
The maintenance system provides:
| Component | Purpose | Key Method |
|---|---|---|
| Compaction | Reclaim space from outdated entries | compact() |
| Space Estimation | Calculate potential savings | estimate_compaction_savings() |
| Recovery | Validate and repair corrupted files | recover_valid_chain() |
| Re-indexing | Update structures after writes | reindex() |
| Iteration | Scan for maintenance operations | iter_entries(), par_iter_entries() |
All operations maintain the append-only guarantee while ensuring data integrity and efficient space utilization.
Sources: src/storage_engine/data_store.rs:1-1183
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Network Layer and RPC
Loading…
Network Layer and RPC
Relevant source files
- Cargo.lock
- experiments/simd-r-drive-muxio-service-definition/Cargo.toml
- experiments/simd-r-drive-ws-client/Cargo.toml
- experiments/simd-r-drive-ws-server/Cargo.toml
Purpose and Scope
This document describes the network layer and Remote Procedure Call (RPC) system that enables remote access to the DataStore over WebSocket connections. The system is built on the Muxio framework and provides a standardized interface for clients in any language to communicate with a DataStore server.
For information about the core storage engine that the network layer wraps, see Core Storage Engine. For details on language-specific client implementations, see Python Integration and Native Rust Client.
Overview
The network layer provides a WebSocket-based RPC protocol that allows remote clients to perform DataStore operations. The architecture consists of three main components:
| Component | Purpose | Location |
|---|---|---|
| Service Definition | Defines the RPC contract (methods and data types) | experiments/simd-r-drive-muxio-service-definition |
| WebSocket Server | Exposes DataStore operations over WebSocket | experiments/simd-r-drive-ws-server |
| WebSocket Client | Connects to remote servers and invokes RPC methods | experiments/simd-r-drive-ws-client |
The system uses bitcode for efficient binary serialization and tokio-tungstenite for WebSocket transport. The Muxio framework handles message framing, multiplexing, and concurrent request processing over a single WebSocket connection.
Sources: experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-17 experiments/simd-r-drive-ws-server/Cargo.toml:1-23 experiments/simd-r-drive-ws-client/Cargo.toml:1-22
Architecture Overview
Diagram: Network Layer Component Architecture
The client and server both depend on simd-r-drive-muxio-service-definition to ensure they share the same RPC contract. The Muxio framework components (muxio-rpc-service-caller and muxio-rpc-service-endpoint) handle the RPC mechanics, while muxio-tokio-rpc-client and muxio-tokio-rpc-server provide the WebSocket transport layer.
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:13-21 experiments/simd-r-drive-ws-server/Cargo.toml:13-22 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:13-16
Service Definition Layer
The simd-r-drive-muxio-service-definition crate defines the RPC contract shared between client and server. This contract specifies:
- RPC Methods : Operations that can be invoked remotely (e.g., read, write, compact)
- Request/Response Types : Data structures for method parameters and return values
- Error Types : Standardized error representations for network and storage errors
Key Components
The service definition uses the muxio-rpc-service framework to define typed RPC interfaces. All data types use bitcode for serialization, which provides:
- Compact Binary Encoding : Efficient wire format optimized for small message sizes
- Zero-Copy Deserialization : Where possible, data is accessed without copying
- Type Safety : Compile-time guarantees that client and server speak the same protocol
Diagram: Service Definition Structure
The service trait defines method signatures, while request/response structs define the data exchanged. The bitcode derive macros generate serialization code automatically.
Sources: experiments/simd-r-drive-muxio-service-definition/Cargo.toml:14-15 experiments/bindings/python-ws-client/Cargo.lock:133-143
Transport Layer
WebSocket Protocol
The network layer uses WebSocket as the transport protocol for several reasons:
| Feature | Benefit |
|---|---|
| Full Duplex | Server can push notifications to clients |
| Single Connection | Reduces connection overhead and latency |
| Firewall Friendly | Works through HTTP/HTTPS proxies |
| Binary Frames | Efficient for bitcode-encoded messages |
The transport is implemented using tokio-tungstenite, which provides async WebSocket support integrated with the Tokio runtime.
Sources: experiments/bindings/python-ws-client/Cargo.lock:1213-1222 experiments/bindings/python-ws-client/Cargo.lock:1303-1317
Muxio Framework
The Muxio framework provides the RPC layer on top of WebSocket:
Diagram: RPC Message Flow Through Muxio
The caller assigns a unique request_id to each RPC call, enabling multiplexing : multiple concurrent requests can be in flight over the same WebSocket connection. The endpoint matches responses to requests using these IDs.
Sources: experiments/bindings/python-ws-client/Cargo.lock:659-682 experiments/bindings/python-ws-client/Cargo.lock:685-700
Connection Management
Diagram: WebSocket Connection State Machine
The client maintains connection state and can implement automatic reconnection logic. The Muxio layer handles connection interruptions gracefully, returning errors for in-flight requests when the connection drops.
Sources: experiments/bindings/python-ws-client/Cargo.lock:685-700
Concurrency Model
The network layer is fully asynchronous and built on Tokio:
| Component | Concurrency Mechanism |
|---|---|
| Client | Multiple concurrent RPC calls multiplexed over one connection |
| Server | One task per connected client; concurrent request handling within each connection |
| Request Processing | Each RPC method invocation runs as a separate async task |
Client-Side Concurrency
The muxio-rpc-service-caller manages request IDs and matches responses to the correct awaiting future, enabling concurrent operations without blocking.
Server-Side Concurrency
The server spawns a new Tokio task for each incoming RPC request, allowing concurrent processing of multiple client requests. The underlying DataStore handles concurrent reads efficiently through its shared memory-mapped file access pattern (see Concurrency and Thread Safety).
Sources: experiments/simd-r-drive-ws-client/Cargo.toml19 experiments/simd-r-drive-ws-server/Cargo.toml18 experiments/bindings/python-ws-client/Cargo.lock:1184-1199
Error Handling
The RPC layer distinguishes between different error categories:
Diagram: Error Type Hierarchy
Each error category provides different information to help diagnose issues:
- Transport Errors : Indicate network connectivity problems
- Protocol Errors : Suggest version mismatches or implementation bugs
- Application Errors : Represent normal error conditions from DataStore operations
Sources: experiments/bindings/python-ws-client/Cargo.lock:659-669
Security Considerations
The current implementation provides:
| Security Feature | Status | Notes |
|---|---|---|
| Encryption | Not implemented | Uses plain WebSocket (ws://) |
| Authentication | Not implemented | No built-in auth mechanism |
| Authorization | Not implemented | All connected clients have full access |
For production deployments, consider:
- Use WSS (WebSocket Secure) : Implement TLS encryption by placing the server behind a reverse proxy (nginx, Caddy, etc.)
- Implement Authentication : Add token-based auth in the WebSocket handshake
- Add Authorization : Implement per-key access controls in the server handler
- Rate Limiting : Protect against DoS by limiting request rates per client
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:1-23
Performance Characteristics
Serialization Overhead
The bitcode serialization format is optimized for performance:
- Small Message Size : Typically 30-50% smaller than JSON
- Fast Encoding : Zero-copy for many types, SIMD-optimized where applicable
- Predictable Layout : Fixed-size types don’t require length prefixes
Network Latency
RPC call latency consists of:
Total Latency = Serialization + Network RTT + Deserialization + Processing
For typical operations:
- Serialization/Deserialization : <1ms for small payloads
- Network RTT : Depends on network conditions (LAN: <1ms, WAN: 10-100ms)
- Processing : Varies by operation (read: <1ms, write: ~1-5ms with flush)
Connection Multiplexing
A single WebSocket connection can handle thousands of concurrent RPC calls, limited only by:
- Available memory for buffering requests
- Server processing capacity
- Network bandwidth
Sources: experiments/simd-r-drive-muxio-service-definition/Cargo.toml14 experiments/bindings/python-ws-client/Cargo.lock:133-143
Dependency Graph
Diagram: Complete Dependency Graph for Network Layer
The layered architecture ensures clean separation of concerns:
- Application Layer : Client and server applications
- simd-r-drive Components : Project-specific RPC wrappers and service definitions
- Muxio Framework : Generic RPC infrastructure
- Transport & Runtime: Low-level WebSocket and async runtime
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:13-21 experiments/simd-r-drive-ws-server/Cargo.toml:13-22 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:13-16 experiments/bindings/python-ws-client/Cargo.lock:648-700
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
WebSocket Server
Loading…
WebSocket Server
Relevant source files
- Cargo.lock
- experiments/simd-r-drive-muxio-service-definition/Cargo.toml
- experiments/simd-r-drive-ws-client/Cargo.toml
- experiments/simd-r-drive-ws-server/Cargo.toml
Purpose and Scope
The WebSocket Server provides remote network access to the SIMD R Drive storage engine through an RPC-based interface. This experimental component enables clients to perform storage operations over WebSocket connections using the muxio RPC framework with bitcode serialization.
This page covers the server implementation, configuration, and connection handling. For information about the RPC protocol and service definitions, see Muxio RPC Framework. For client-side implementation, see Native Rust Client. For the underlying storage operations, see DataStore API.
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:1-23
Server Architecture
The WebSocket Server acts as a bridge between remote clients and the local storage engine, handling WebSocket connections, deserializing RPC requests, executing storage operations, and returning serialized responses.
graph TB
subgraph "simd-r-drive-ws-server"
MAIN["main.rs\nServer Entry Point"]
CLI["clap Parser\nCLI Arguments"]
SERVER["muxio-tokio-rpc-server\nWebSocket Server"]
SERVICE["Service Implementation\nRPC Handler"]
end
subgraph "Dependencies"
DEFINITION["simd-r-drive-muxio-service-definition\nService Contract"]
CORE["simd-r-drive\nDataStore"]
TOKIO["tokio Runtime\nAsync Executor"]
TRACING["tracing-subscriber\nLogging"]
end
subgraph "Network"
WS_CONN["WebSocket Connection\ntokio-tungstenite"]
CLIENT["Remote Client"]
end
MAIN --> CLI
MAIN --> TRACING
MAIN --> SERVER
SERVER --> SERVICE
SERVICE --> DEFINITION
SERVICE --> CORE
SERVER --> TOKIO
SERVER --> WS_CONN
WS_CONN --> CLIENT
style SERVER fill:#f9f9f9
style SERVICE fill:#f9f9f9
style CORE fill:#e8e8e8
Component Overview
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:13-23
Server Implementation
Package Structure
The simd-r-drive-ws-server package implements the WebSocket server as an experimental component in the workspace. It provides a binary that can be executed to start the server.
| Component | Purpose |
|---|---|
main.rs | Server entry point, CLI parsing, initialization |
| Service Handler | Implements the RPC service interface defined in simd-r-drive-muxio-service-definition |
| DataStore Wrapper | Manages the local DataStore instance and access |
Service Implementation Flow
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:13-18
Configuration and Startup
Command-Line Arguments
The server uses clap for CLI argument parsing with the derive feature, providing a structured interface for configuration.
Expected Configuration Options
| Option | Type | Purpose |
|---|---|---|
--host / -h | String | Bind address (e.g., 127.0.0.1, 0.0.0.0) |
--port / -p | u16 | Port number (e.g., 9001) |
--path | PathBuf | Storage file path for DataStore |
--log-level | String | Logging level (trace, debug, info, warn, error) |
Initialization Sequence
The server initialization creates a local DataStore instance which is then accessed by the service handler for all RPC operations. The tracing-subscriber dependency with env-filter feature allows runtime configuration of logging levels.
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:19-22
Connection Handling
WebSocket Lifecycle
The muxio-tokio-rpc-server package handles the WebSocket protocol details, including:
- Connection Upgrade : HTTP to WebSocket protocol upgrade
- Message Framing : Binary message framing over WebSocket
- Multiplexing : Multiple concurrent RPC calls over a single connection
- Error Handling : Connection errors and RPC-level errors
Connection State Management
Each WebSocket connection is handled by a separate tokio task, allowing concurrent client connections without blocking.
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:16-18
Service Implementation Details
RPC Service Interface
The service implementation must implement the interface defined in simd-r-drive-muxio-service-definition. This shared contract ensures type-safe communication between client and server.
Service Methods
Based on the DataStore API and typical RPC patterns, the service likely implements these methods:
| Method | Request Type | Response Type | Purpose |
|---|---|---|---|
write | (Vec<u8>, Vec<u8>) | Result<(), Error> | Write key-value pair |
read | Vec<u8> | Result<Option<Vec<u8>>, Error> | Read value by key |
delete | Vec<u8> | Result<(), Error> | Mark key as deleted |
batch_write | Vec<(Vec<u8>, Vec<u8>)> | Result<(), Error> | Write multiple pairs |
batch_read | Vec<Vec<u8>> | Result<Vec<Option<Vec<u8>>>, Error> | Read multiple values |
compact | () | Result<(), Error> | Trigger compaction |
Service Handler Structure
The service handler wraps a shared reference to the DataStore (likely Arc<DataStore>) to allow concurrent read access across multiple RPC calls while serializing writes through the DataStore’s internal concurrency control.
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:13-17 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-17
Dependencies and Runtime
Core Dependencies
muxio-tokio-rpc-server
The muxio-tokio-rpc-server package provides the WebSocket server implementation built on top of:
axumfor HTTP/WebSocket handlingtokiofor async runtimemuxio-rpc-service-endpointfor RPC dispatch
graph TB
subgraph "tokio Runtime"
MULTI["Multi-threaded\nExecutor"]
REACTOR["Reactor\nI/O Events"]
TIMER["Timer\nTimeouts"]
end
subgraph "Server Tasks"
LISTENER["Listener Task\nAccept Connections"]
CONN1["Connection Task 1\nClient 1"]
CONN2["Connection Task 2\nClient 2"]
CONNN["Connection Task N\nClient N"]
end
MULTI --> LISTENER
MULTI --> CONN1
MULTI --> CONN2
MULTI --> CONNN
REACTOR --> LISTENER
REACTOR --> CONN1
REACTOR --> CONN2
REACTOR --> CONNN
Serialization
The service uses bitcode for binary serialization, shared through the simd-r-drive-muxio-service-definition package. Bitcode provides compact binary encoding with zero-copy deserialization where possible.
Async Runtime
The server runs on the tokio multi-threaded runtime, with each WebSocket connection handled by an independent task. This allows efficient concurrent handling of multiple clients.
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:16-20
Logging and Observability
tracing Integration
The server uses tracing with tracing-subscriber for structured logging. The env-filter feature allows configuration via environment variables:
Log Levels
| Level | Use Case |
|---|---|
trace | Detailed RPC message tracing |
debug | Connection lifecycle events |
info | Server startup, configuration, client connections |
warn | Non-fatal errors, retries |
error | Fatal errors, connection failures |
Key Trace Points
The server likely includes trace instrumentation at:
- Server initialization and configuration
- WebSocket connection establishment
- RPC method dispatch
- DataStore operation execution
- Error conditions and recovery
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:19-20
Building and Running
Build Command
Run Command Example
Development Mode
For development with detailed logging:
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:1-23
Security Considerations
Network Exposure
The WebSocket server exposes the DataStore over the network. Important considerations:
- Authentication : The current implementation does not include authentication (experimental status)
- Encryption : WebSocket connections are not TLS-encrypted by default
- Access Control : No per-key or per-operation access control
- Network Binding : Binding to
0.0.0.0exposes the server to all network interfaces
Recommended Production Practices
For production deployment, additional layers would be needed:
- TLS/SSL termination (via reverse proxy or native support)
- Authentication middleware
- Rate limiting
- Request validation
- Network-level access control (firewall rules)
Note: This is an experimental component and should not be used in production without additional security hardening.
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:1-11
Integration with Core Storage
DataStore Access Pattern
The server maintains a single DataStore instance that is shared across all RPC handlers. Write operations serialize through the DataStore’s internal RwLock, while read operations can proceed concurrently through the lock-free DashMap index.
Thread Safety
The server implementation relies on the DataStore’s thread-safe design:
- Multiple concurrent reads via
DashMap - Serialized writes via
RwLock - Atomic tail offset tracking via
AtomicU64
This allows multiple WebSocket connections to safely access the same DataStore instance without additional synchronization.
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:14-15
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Muxio RPC Framework
Loading…
Muxio RPC Framework
Relevant source files
- Cargo.lock
- experiments/simd-r-drive-muxio-service-definition/Cargo.toml
- experiments/simd-r-drive-ws-client/Cargo.toml
- experiments/simd-r-drive-ws-server/Cargo.toml
Purpose and Scope
This document describes the Muxio RPC (Remote Procedure Call) framework as implemented in SIMD R Drive for remote storage access over WebSocket connections. The framework provides a type-safe, multiplexed communication protocol using bitcode serialization for efficient binary data transfer.
For information about the WebSocket server implementation, see WebSocket Server. For the native Rust client implementation, see Native Rust Client. For Python client integration, see Python WebSocket Client API.
Sources: experiments/simd-r-drive-ws-server/Cargo.toml:1-23 experiments/simd-r-drive-ws-client/Cargo.toml:1-22 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-17
Architecture Overview
The Muxio RPC framework consists of multiple layers that work together to provide remote procedure calls over WebSocket connections:
Muxio RPC Framework Layer Architecture
graph TB
subgraph "Client Application Layer"
App["Application Code"]
end
subgraph "Client RPC Stack"
Caller["muxio-rpc-service-caller\nMethod Invocation"]
ClientRuntime["muxio-tokio-rpc-client\nWebSocket Client Runtime"]
end
subgraph "Shared Contract"
ServiceDef["simd-r-drive-muxio-service-definition\nService Interface Contract\nMethod Signatures"]
Bitcode["bitcode\nBinary Serialization"]
end
subgraph "Server RPC Stack"
ServerRuntime["muxio-tokio-rpc-server\nWebSocket Server Runtime"]
Endpoint["muxio-rpc-service-endpoint\nRequest Router"]
end
subgraph "Server Application Layer"
Impl["DataStore Implementation"]
end
subgraph "Core Framework"
Core["muxio-rpc-service\nBase RPC Traits & Types"]
end
App --> Caller
Caller --> ClientRuntime
ClientRuntime --> ServiceDef
ClientRuntime --> Bitcode
ClientRuntime --> Core
ServiceDef --> Bitcode
ServiceDef --> Core
ServerRuntime --> ServiceDef
ServerRuntime --> Bitcode
ServerRuntime --> Core
ServerRuntime --> Endpoint
Endpoint --> Impl
style ServiceDef fill:#f9f9f9,stroke:#333,stroke-width:2px
The framework is organized into distinct layers:
| Layer | Crates | Responsibility |
|---|---|---|
| Core Framework | muxio-rpc-service | Base traits, types, and RPC protocol definitions |
| Service Definition | simd-r-drive-muxio-service-definition | Shared interface contract between client and server |
| Serialization | bitcode | Efficient binary encoding/decoding of messages |
| Client Runtime | muxio-tokio-rpc-client, muxio-rpc-service-caller | WebSocket client, method invocation, request management |
| Server Runtime | muxio-tokio-rpc-server, muxio-rpc-service-endpoint | WebSocket server, request routing, response handling |
Sources: Cargo.lock:1250-1336 experiments/simd-r-drive-ws-server/Cargo.toml:14-17 experiments/simd-r-drive-ws-client/Cargo.toml:14-21
Core Framework Components
muxio-rpc-service
The muxio-rpc-service crate provides the foundational abstractions for the RPC system. This crate defines the core traits and types that both client and server components build upon.
Core RPC Framework Message Structure and Dependencies
graph TB
subgraph "muxio-rpc-service Crate"
RpcService["#[async_trait]\nRpcService Trait"]
Request["RpcRequest\nStruct"]
Response["RpcResponse\nStruct"]
ServiceDef["Service Definition\nInfrastructure"]
end
subgraph "RpcRequest Fields"
ReqID["request_id: u64\n(unique per call)"]
MethodID["method_id: u64\n(xxhash-rust XXH3)"]
Payload["payload: Vec<u8>\n(bitcode serialized)"]
end
subgraph "RpcResponse Fields"
RespID["request_id: u64\n(matches request)"]
Result["result: Result<Vec<u8>, Error>\n(bitcode serialized)"]
end
subgraph "Dependencies"
AsyncTrait["async-trait"]
Futures["futures"]
NumEnum["num_enum"]
XXHash["xxhash-rust"]
end
RpcService -->|defines| ServiceDef
Request -->|contains| ReqID
Request -->|contains| MethodID
Request -->|contains| Payload
Response -->|contains| RespID
Response -->|contains| Result
RpcService -.uses.- AsyncTrait
MethodID -.hashed with.- XXHash
The muxio-rpc-service crate provides:
| Component | Type | Purpose |
|---|---|---|
RpcService | #[async_trait] trait | Defines async service interface with method dispatch |
RpcRequest | Struct | Contains request_id, method_id (XXH3 hash from xxhash-rust), and bitcode payload |
RpcResponse | Struct | Contains request_id and Result<Vec<u8>, Error> variant |
| Method ID hashing | xxhash-rust XXH3 | Generates stable 64-bit method identifiers |
| Enum conversion | num_enum | Converts between numeric and enum representations |
The framework uses async-trait to enable async methods in traits, and XXH3 hashing (via xxhash-rust) for method identification, allowing fast O(1) method dispatch without string comparisons.
Sources: Cargo.lock:1261-1272 experiments/simd-r-drive-muxio-service-definition/Cargo.toml15
Service Definition Layer
simd-r-drive-muxio-service-definition
The simd-r-drive-muxio-service-definition crate serves as the shared RPC contract between clients and servers. This crate is compiled into both client and server binaries, ensuring type-safe method signatures on both sides.
Service Definition Compilation Model
graph TB
subgraph "simd-r-drive-muxio-service-definition"
Contract["RPC Service Contract"]
Methods["Method Signatures"]
Types["Shared Types"]
end
subgraph "Client Binary"
ClientStub["Generated Client Stubs"]
end
subgraph "Server Binary"
ServerImpl["Generated Server Handlers"]
end
Contract --> Methods
Contract --> Types
Methods -->|compiled into| ClientStub
Methods -->|compiled into| ServerImpl
Types -->|used by| ClientStub
Types -->|used by| ServerImpl
ClientStub -->|invokes via| WS["WebSocket"]
WS -->|routes to| ServerImpl
The service definition provides the RPC interface contract. Both client and server depend on this crate, which defines:
| Component | Description | Implementation |
|---|---|---|
| Method signatures | DataStore operations (write, read, delete, etc.) | Uses muxio-rpc-service traits |
| Request types | Bitcode-serializable structs for each method | Implements bitcode::Encode |
| Response types | Bitcode-serializable result types | Implements bitcode::Decode |
| Error types | Shared error definitions | Serializable across RPC boundary |
Method ID Generation
Each RPC method is identified by a stable method_id computed as the XXH3 hash of its signature string. This enables O(1) method routing:
Method ID Computation and Routing with Code Entities
flowchart LR
Sig["Method Signature\n'write(key: &[u8], value: &[u8])\n-> Result<u64>'"]
XXH3["xxhash_rust::xxh3\nxxh3_64(sig.as_bytes())"]
ID["method_id: u64\ne.g., 0x1a2b3c4d5e6f7890"]
HashMap["HashMap<u64,\nBox<dyn Fn>>\nin RpcServiceEndpoint"]
Lookup["HashMap::get\n(&method_id)"]
Handler["async fn handler\n(decoded args)"]
Sig -->|hash at compile time| XXH3
XXH3 --> ID
ID -->|stored in| HashMap
HashMap -->|O 1 lookup| Lookup
Lookup --> Handler
The XXH3 hash (via xxhash-rust crate) ensures:
| Property | Implementation | Benefit |
|---|---|---|
| Deterministic routing | xxh3_64(signature.as_bytes()) | Same signature → same ID |
| Fast dispatch | HashMap::get(&method_id) | O(1) integer key lookup |
| Version compatibility | Different signatures → different IDs | Breaking changes detected |
| Collision resistance | 64-bit hash space (2^64 values) | Negligible collision probability |
| Compile-time computation | const or build-time hashing | No runtime overhead |
The xxhash-rust dependency provides the xxh3_64 function used by muxio-rpc-service for method ID generation. The server’s RpcServiceEndpoint struct maintains the HashMap<u64, Box<dyn Fn>> dispatcher.
Sources: Cargo.lock:1261-1272 Cargo.lock:1905-1915 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-17
Bitcode Serialization
The framework uses the bitcode crate (version 0.6.6) for efficient binary serialization with the following characteristics:
graph LR
subgraph "Bitcode Serialization Pipeline"
RustType["Rust Type\n#[derive(Encode, Decode)]"]
Encode["bitcode::encode\n<T>(&value)"]
Binary["Vec<u8>\nCompact Binary"]
Decode["bitcode::decode\n<T>(&bytes)"]
RustType2["Rust Type\nReconstructed"]
end
subgraph "bitcode Dependencies"
BitcodeDerve["bitcode_derive\nproc macros"]
Bytemuck["bytemuck\nzero-copy casts"]
Arrayvec["arrayvec\nstack arrays"]
Glam["glam\nSIMD vectors"]
end
RustType -->|serialize| Encode
Encode --> Binary
Binary -->|deserialize| Decode
Decode --> RustType2
Encode -.uses.- BitcodeDerve
Encode -.uses.- Bytemuck
Decode -.uses.- BitcodeDerve
Decode -.uses.- Bytemuck
Serialization Features
Bitcode Encoding/Decoding Pipeline with Dependencies
| Feature | Implementation | Benefit |
|---|---|---|
| Zero-copy deserialization | bytemuck for Pod types | Minimal overhead for aligned data |
| Compact encoding | Variable-length integers, bit packing | Smaller than bincode/MessagePack |
| Type safety | #[derive(Encode, Decode)] proc macros | Compile-time serialization code |
| Performance | ~50ns per small struct | Lower CPU than JSON/CBOR |
| SIMD support | glam integration | Efficient vector serialization |
Integration with RPC
The serialization is integrated at multiple points:
| Integration Point | Operation | Code Path |
|---|---|---|
| Request serialization | bitcode::encode(&args) → Vec<u8> | Client RpcServiceCaller::call |
| Wire transfer | Vec<u8> in RpcRequest.payload | WebSocket binary message |
| Request deserialization | bitcode::decode::<Args>(&payload) | Server RpcServiceEndpoint::dispatch |
| Response serialization | bitcode::encode(&result) → Vec<u8> | Server after method execution |
| Response deserialization | bitcode::decode::<Result>(&payload) | Client response handler |
The use of #[derive(Encode, Decode)] on request/response types ensures compile-time validation of serialization compatibility.
Sources: Cargo.lock:392-414 experiments/simd-r-drive-muxio-service-definition/Cargo.toml14
Client-Side Components
flowchart TB
subgraph "Client Call Flow"
ClientApp["Client Application"]
Caller["RpcServiceCaller\nStruct"]
GenID["Generate request_id\n(AtomicU64::fetch_add)"]
Request["Create RpcRequest\nStruct"]
Serialize["bitcode::encode\n(method args)"]
Send["Send via\ntokio::sync::mpsc"]
Await["tokio::sync::oneshot\nawait response"]
Deserialize["bitcode::decode\n(response payload)"]
Return["Return Result\nto caller"]
end
ClientApp -->|async fn call| Caller
Caller --> GenID
GenID --> Request
Request --> Serialize
Serialize --> Send
Send --> Await
Await --> Deserialize
Deserialize --> Return
Return --> ClientApp
muxio-rpc-service-caller
The muxio-rpc-service-caller crate provides the client-side method invocation interface:
Client Method Invocation Flow with tokio Primitives
Key responsibilities and implementation:
| Responsibility | Implementation | Purpose |
|---|---|---|
| Method call marshalling | RpcServiceCaller struct | Provides typed interface to remote methods |
| Request ID generation | AtomicU64::fetch_add(1, Ordering::Relaxed) | Unique, monotonic request identifiers |
| Response awaiting | tokio::sync::oneshot::Receiver | Single-use channel for response delivery |
| Request queuing | tokio::sync::mpsc::Sender | Sends requests to send loop |
| Error propagation | Result<T, RpcError> return types | Type-safe error handling |
The caller uses tokio’s async primitives to coordinate between the application thread and the WebSocket send/receive loops.
Sources: Cargo.lock:1274-1285 experiments/simd-r-drive-ws-client/Cargo.toml18
graph TB
subgraph "muxio-tokio-rpc-client Crate"
Client["RpcClient\nStruct"]
SendLoop["send_loop\ntokio::task::spawn"]
RecvLoop["recv_loop\ntokio::task::spawn"]
PendingMap["Arc<DashMap<u64,\noneshot::Sender<Result>>>\nShared state"]
ReqChan["mpsc::Receiver\n<RpcRequest>"]
end
subgraph "tokio-tungstenite Integration"
WS["WebSocketStream\n<MaybeTlsStream>"]
Split["ws.split()"]
WSRead["SplitStream\n(read half)"]
WSWrite["SplitSink\n(write half)"]
end
subgraph "Application Layer"
AppCall["async fn call()"]
Future["impl Future\n<Output=Result>"]
end
AppCall -->|1. create oneshot| Client
Client -->|2. insert into| PendingMap
Client -->|3. mpsc::send| ReqChan
ReqChan -->|4. recv request| SendLoop
SendLoop -->|5. bitcode::encode| SendLoop
SendLoop -->|6. send binary| WSWrite
WSRead -->|7. next binary| RecvLoop
RecvLoop -->|8. bitcode::decode| RecvLoop
RecvLoop -->|9. lookup by id| PendingMap
PendingMap -->|10. oneshot::send| Future
Future -->|11. return| AppCall
WS --> Split
Split --> WSRead
Split --> WSWrite
muxio-tokio-rpc-client
The muxio-tokio-rpc-client crate implements the WebSocket client runtime with request multiplexing and response routing:
Client Runtime Request Multiplexing with tokio and tungstenite
Implementation details:
| Component | Type | Purpose |
|---|---|---|
RpcClient | Struct | Main client interface, owns WebSocket and spawns tasks |
send_loop | tokio::task | Receives from mpsc, serializes, writes to SplitSink |
recv_loop | tokio::task | Reads from SplitStream, deserializes, routes via DashMap |
| Pending requests | Arc<DashMap<u64, oneshot::Sender>> | Thread-safe map for response routing |
| Request channel | mpsc::Sender/Receiver<RpcRequest> | Queue for outbound requests |
| WebSocket | tokio_tungstenite::WebSocketStream | Binary WebSocket with TLS support |
| Split streams | futures::stream::SplitStream/SplitSink | Separate read/write halves |
The multiplexing architecture uses DashMap for lock-free concurrent access to pending requests. The WebSocket stream is split into read and write halves, allowing the send_loop and recv_loop tasks to operate independently. Each request gets a unique request_id, and the recv_loop task matches responses back to waiting callers via oneshot channels.
Sources: Cargo.lock:1302-1318 experiments/simd-r-drive-ws-client/Cargo.toml16 Cargo.lock:681-693
Server-Side Components
graph TB
subgraph "muxio-tokio-rpc-server Crate"
Server["RpcServer\nStruct"]
AxumApp["axum::Router\nwith WebSocket route"]
AcceptLoop["tokio::spawn\n(per connection)"]
ConnHandler["handle_connection\nasync fn"]
Dispatcher["RpcServiceEndpoint\n<ServiceImpl>"]
end
subgraph "axum WebSocket Integration"
Route["GET /ws\nWebSocket upgrade"]
WSUpgrade["axum::extract::ws\nWebSocketUpgrade"]
WSStream["axum::extract::ws\nWebSocket"]
end
subgraph "Service Implementation"
ServiceImpl["Arc<ServiceImpl>\n(e.g., DataStore)"]
Methods["#[async_trait]\nRpcService methods"]
end
subgraph "Method Dispatch"
MethodMap["HashMap<u64,\nBox<dyn Fn>>\n(method_id → handler)"]
end
AxumApp -->|upgrade| WSUpgrade
WSUpgrade -->|on_upgrade| WSStream
WSStream -->|tokio::spawn| AcceptLoop
AcceptLoop --> ConnHandler
ConnHandler -->|recv Message::Binary| ConnHandler
ConnHandler -->|bitcode::decode| ConnHandler
ConnHandler -->|dispatch by id| MethodMap
MethodMap -->|invoke| Methods
Methods -.implemented by.- ServiceImpl
Methods -->|return Result| ConnHandler
ConnHandler -->|bitcode::encode| ConnHandler
ConnHandler -->|send Message::Binary| WSStream
Dispatcher -->|owns| MethodMap
Dispatcher -->|holds Arc| ServiceImpl
muxio-tokio-rpc-server
The muxio-tokio-rpc-server crate implements the WebSocket server runtime with connection management and request dispatching:
Server Runtime with axum WebSocket Integration
The server runtime architecture:
| Component | Type | Purpose |
|---|---|---|
RpcServer | Struct | Main server, creates axum::Router with WebSocket route |
axum::Router | HTTP router | Handles WebSocket upgrade at /ws endpoint |
WebSocketUpgrade | axum::extract | Performs HTTP → WebSocket protocol upgrade |
| Connection handler | async fn per client | Spawned via tokio::spawn for each connection |
RpcServiceEndpoint | Generic struct | Routes method_id to service methods via HashMap |
| Method dispatcher | HashMap<u64, Box<dyn Fn>> | O(1) lookup and async invocation of methods |
| Service implementation | Arc<ServiceImpl> | Shared DataStore instance across connections |
Request Processing Pipeline
Each incoming request follows this pipeline:
Server Request Processing Pipeline with Code Entities
The dispatcher performs O(1) method lookup using the method_id hash from the HashMap, then invokes the corresponding service implementation. All service methods use #[async_trait], allowing concurrent request handling. The use of Arc<ServiceImpl> enables safe sharing of the DataStore across multiple client connections.
Sources: Cargo.lock:1320-1336 experiments/simd-r-drive-ws-server/Cargo.toml16 Cargo.lock:305-340
Request/Response Flow
Complete RPC Call Sequence
End-to-End RPC Call Flow
Message Format
The Muxio RPC wire protocol uses WebSocket binary frames with bitcode-encoded messages. The exact frame structure is managed by the muxio framework, but the logical message structure is:
| Component | Encoding | Description |
|---|---|---|
| Request message | bitcode | Contains request_id, method_id, and method arguments |
| Response message | bitcode | Contains request_id and result (success/error) |
| WebSocket frame | Binary | Single frame per request/response for small messages |
| Fragmentation | Automatic | Large payloads may use multiple frames |
The use of WebSocket binary frames and bitcode serialization provides:
- Compact encoding : Smaller than JSON or MessagePack
- Zero-copy potential : bitcode can deserialize without copying
- Type safety : Compile-time verification of message structure
Sources: Cargo.lock:133-143 Cargo.lock:648-656 Cargo.lock:1213-1222
Error Handling
The framework provides comprehensive error handling across the RPC boundary:
RPC Error Classification and Propagation
Error Categories
| Category | Origin | Handling |
|---|---|---|
| Serialization errors | Bitcode encoding/decoding failure | Logged and returned as RpcError |
| Network errors | WebSocket connection issues | Automatic reconnect or error propagation |
| Application errors | DataStore operation failures | Serialized and returned to client |
| Timeout errors | Request took too long | Client-side timeout with error result |
Error Recovery
The framework implements several recovery strategies:
- Connection loss : Client automatically attempts reconnection
- Request timeout : Client cancels pending request after configured duration
- Serialization failure : Error logged and generic error returned
- Invalid method ID : Server returns “method not found” error
Sources: Cargo.lock:1261-1336
Performance Characteristics
The Muxio RPC framework is optimized for high-performance remote storage access:
| Metric | Characteristic | Impact |
|---|---|---|
| Serialization overhead | ~50-100 ns for typical payloads | Minimal CPU impact |
| Request multiplexing | Thousands of concurrent requests | High throughput |
| Binary protocol | Compact wire format | Reduced bandwidth usage |
| Zero-copy deserialization | Direct memory references | Lower latency for large payloads |
The use of bitcode serialization and WebSocket binary frames minimizes overhead compared to text-based protocols like JSON over HTTP. The multiplexed architecture allows clients to issue multiple concurrent requests without blocking, essential for high-performance batch operations.
Sources: Cargo.lock:392-414 Cargo.lock:1250-1336
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Native Rust Client
Loading…
Native Rust Client
Relevant source files
- Cargo.lock
- experiments/simd-r-drive-muxio-service-definition/Cargo.toml
- experiments/simd-r-drive-ws-client/Cargo.toml
- experiments/simd-r-drive-ws-server/Cargo.toml
Purpose and Scope
The simd-r-drive-ws-client crate provides a native Rust client library for remote access to the SIMD R Drive storage engine via WebSocket connections. This client enables Rust applications to interact with a remote DataStore instance through the Muxio RPC framework with bitcode serialization.
This document covers the native Rust client implementation. For the WebSocket server that this client connects to, see WebSocket Server. For the Muxio RPC protocol details, see Muxio RPC Framework. For the Python bindings that wrap this client, see Python WebSocket Client API.
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:1-22
Architecture Overview
The native Rust client is structured as a thin wrapper around the Muxio RPC client infrastructure, providing type-safe access to remote DataStore operations.
Key Components:
graph TB
subgraph "Application Layer"
UserApp["User Application\nRust Code"]
end
subgraph "simd-r-drive-ws-client Crate"
ClientAPI["Client API\nDataStoreReader/Writer Traits"]
WsClient["WebSocket Client\nConnection Management"]
end
subgraph "RPC Infrastructure"
ServiceCaller["muxio-rpc-service-caller\nMethod Invocation"]
TokioRpcClient["muxio-tokio-rpc-client\nTransport Layer"]
end
subgraph "Shared Contract"
ServiceDef["simd-r-drive-muxio-service-definition\nRPC Interface"]
Bitcode["bitcode\nSerialization"]
end
subgraph "Network"
WsConnection["WebSocket Connection\ntokio-tungstenite"]
end
UserApp --> ClientAPI
ClientAPI --> WsClient
WsClient --> ServiceCaller
ServiceCaller --> TokioRpcClient
ServiceCaller --> ServiceDef
TokioRpcClient --> Bitcode
TokioRpcClient --> WsConnection
style ClientAPI fill:#f9f9f9,stroke:#333,stroke-width:2px
style ServiceDef fill:#f9f9f9,stroke:#333,stroke-width:2px
| Component | Crate | Purpose |
|---|---|---|
| Client API | simd-r-drive-ws-client | Public interface implementing DataStore traits |
| Service Caller | muxio-rpc-service-caller | RPC method invocation and request routing |
| RPC Client | muxio-tokio-rpc-client | WebSocket transport and message handling |
| Service Definition | simd-r-drive-muxio-service-definition | Shared RPC contract and type definitions |
| Async Runtime | tokio | Asynchronous I/O and task execution |
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:13-21 Cargo.lock:1302-1318
Client API Structure
The client implements the same DataStoreReader and DataStoreWriter traits as the local DataStore, enabling transparent remote access with minimal API differences.
Core Traits:
graph LR
subgraph "Trait Implementations"
Reader["DataStoreReader\nread()\nexists()\nbatch_read()"]
Writer["DataStoreWriter\nwrite()\ndelete()\nbatch_write()"]
end
subgraph "Client Implementation"
WsClient["WebSocket Client\nAsync Methods"]
ConnState["Connection State\nURL, Options"]
end
subgraph "RPC Layer"
Serializer["Request Serialization\nbitcode"]
Caller["Service Caller\nCall Routing"]
Deserializer["Response Deserialization\nbitcode"]
end
Reader --> WsClient
Writer --> WsClient
WsClient --> ConnState
WsClient --> Serializer
Serializer --> Caller
Caller --> Deserializer
style Reader fill:#f9f9f9,stroke:#333,stroke-width:2px
style Writer fill:#f9f9f9,stroke:#333,stroke-width:2px
DataStoreReader: Read-only operations (read, exists, batch_read, iteration)DataStoreWriter: Write operations (write, delete, batch_write)async-trait: All methods are asynchronous, requiring a Tokio runtime
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:14-21 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-16
Connection Management
The client manages persistent WebSocket connections to the remote server with automatic reconnection and error handling.
Connection Lifecycle:
sequenceDiagram
participant App as "Application"
participant Client as "WebSocket Client"
participant Transport as "muxio-tokio-rpc-client"
participant Server as "Remote Server"
Note over App,Server: Connection Establishment
App->>Client: connect(url)
Client->>Transport: create WebSocket connection
Transport->>Server: WebSocket handshake
Server-->>Transport: connection established
Transport-->>Client: client ready
Client-->>App: connected client
Note over App,Server: Normal Operation
App->>Client: read(key)
Client->>Transport: serialize request
Transport->>Server: send via WebSocket
Server-->>Transport: response data
Transport-->>Client: deserialize response
Client-->>App: return result
Note over App,Server: Error Handling
Server-->>Transport: connection lost
Transport-->>Client: connection error
Client->>Transport: reconnection attempt
Transport->>Server: reconnect
- Initialization : Client connects to server URL with connection options
- Authentication : Optional authentication via Muxio RPC mechanisms
- Active State : Client maintains persistent WebSocket connection
- Error Recovery : Automatic reconnection on transient failures
- Shutdown : Graceful connection termination
Sources: Cargo.lock:1302-1318 experiments/simd-r-drive-ws-client/Cargo.toml:16-19
graph TB
subgraph "Client Side"
Method["Client Method Call\nread/write/delete"]
ReqBuilder["Request Builder\nCreate RPC Request"]
Serializer["bitcode Serialization\nBinary Encoding"]
Sender["WebSocket Send\nBinary Frame"]
end
subgraph "Network"
WsFrame["WebSocket Frame\nBinary Message"]
end
subgraph "Server Side"
Receiver["WebSocket Receive\nBinary Frame"]
Deserializer["bitcode Deserialization\nBinary Decoding"]
Handler["Request Handler\nExecute DataStore Operation"]
Response["Response Builder\nCreate RPC Response"]
end
Method --> ReqBuilder
ReqBuilder --> Serializer
Serializer --> Sender
Sender --> WsFrame
WsFrame --> Receiver
Receiver --> Deserializer
Deserializer --> Handler
Handler --> Response
Response --> Serializer
style Method fill:#f9f9f9,stroke:#333,stroke-width:2px
style Handler fill:#f9f9f9,stroke:#333,stroke-width:2px
Request-Response Flow
All client operations follow a standardized request-response pattern through the Muxio RPC framework.
Request Structure:
| Field | Type | Description |
|---|---|---|
| Method ID | u64 | XXH3 hash of method name from service definition |
| Payload | Vec | Bitcode-serialized request parameters |
| Request ID | u64 | Unique identifier for request-response matching |
Response Structure:
| Field | Type | Description |
|---|---|---|
| Request ID | u64 | Matches original request ID |
| Status | enum | Success, Error, or specific error codes |
| Payload | Vec | Bitcode-serialized response data or error |
Sources: experiments/simd-r-drive-muxio-service-definition/Cargo.toml:14-15 Cargo.lock:392-402
Async Runtime Requirements
The client requires a Tokio async runtime for all operations. The async-trait crate enables async methods in trait implementations.
Runtime Configuration:
graph TB
subgraph "Application"
Main["#[tokio::main]\nasync fn main()"]
UserCode["User Code\nawait client.read()"]
end
subgraph "Client"
AsyncMethods["async-trait Methods\nDataStoreReader/Writer"]
TokioTasks["Tokio Tasks\nNetwork I/O"]
end
subgraph "Tokio Runtime"
Executor["Task Executor\nWork Stealing Scheduler"]
Reactor["I/O Reactor\nepoll/kqueue/IOCP"]
end
Main --> UserCode
UserCode --> AsyncMethods
AsyncMethods --> TokioTasks
TokioTasks --> Executor
TokioTasks --> Reactor
style AsyncMethods fill:#f9f9f9,stroke:#333,stroke-width:2px
style Executor fill:#f9f9f9,stroke:#333,stroke-width:2px
- Multi-threaded Runtime : Default for concurrent operations
- Current-thread Runtime : Available for single-threaded use cases
- Feature Flags : Requires
tokiowithrt-multi-threadandnetfeatures
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:19-21 Cargo.lock:279-287
Error Handling
The client propagates errors from multiple layers of the stack, providing detailed error information for debugging and recovery.
Error Types:
| Error Category | Source | Description |
|---|---|---|
| Connection Errors | muxio-tokio-rpc-client | WebSocket connection failures, timeouts |
| Serialization Errors | bitcode | Invalid data encoding/decoding |
| RPC Errors | muxio-rpc-service | Service method errors, invalid requests |
| DataStore Errors | Remote DataStore | Storage operation failures (key not found, write errors) |
Error Propagation Flow:
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:17-18 Cargo.lock:1261-1271
Usage Patterns
Basic Connection and Operations
The client follows standard Rust async patterns for initialization and operation:
Concurrent Operations
The client supports concurrent operations through standard Tokio concurrency primitives:
Sources: experiments/simd-r-drive-ws-client/Cargo.toml:19-21
graph TB
subgraph "Service Definition"
Methods["Method Definitions\nread, write, delete, etc."]
Types["Request/Response Types\nbitcode derive macros"]
MethodHash["Method ID Hashing\nXXH3 of method names"]
end
subgraph "Client Usage"
ClientImpl["Client Implementation\nUses defined methods"]
TypeSafety["Type Safety\nCompile-time checking"]
end
subgraph "Server Usage"
ServerImpl["Server Implementation\nHandles defined methods"]
Routing["Request Routing\nHash-based dispatch"]
end
Methods --> ClientImpl
Methods --> ServerImpl
Types --> ClientImpl
Types --> ServerImpl
MethodHash --> Routing
ClientImpl --> TypeSafety
style Methods fill:#f9f9f9,stroke:#333,stroke-width:2px
style Types fill:#f9f9f9,stroke:#333,stroke-width:2px
Integration with Service Definition
The client relies on the shared service definition crate for type-safe RPC communication.
Shared Contract Benefits:
- Type Safety : Compile-time verification of request/response types
- Version Compatibility : Client and server must use compatible service definitions
- Method Resolution : XXH3 hash-based method identification
- Serialization Schema : Consistent bitcode encoding across client and server
Sources: experiments/simd-r-drive-ws-client/Cargo.toml15 experiments/simd-r-drive-muxio-service-definition/Cargo.toml:1-16
Performance Considerations
The native Rust client provides several performance advantages over alternative approaches:
Performance Characteristics:
| Aspect | Implementation | Benefit |
|---|---|---|
| Serialization | bitcode binary encoding | Minimal overhead, faster than JSON/MessagePack |
| Connection | Persistent WebSocket | Avoids HTTP handshake overhead |
| Async I/O | Tokio zero-copy operations | Efficient memory usage |
| Type Safety | Compile-time generics | Zero runtime type checking cost |
| Multiplexing | Muxio request pipelining | Multiple concurrent requests per connection |
Memory Efficiency:
- Zero-copy where possible through bitcode and WebSocket frames
- Efficient buffer reuse in Tokio’s I/O layer
- Minimal allocation overhead compared to HTTP-based protocols
Throughput:
- Supports request pipelining for high-throughput workloads
- Concurrent operations through Tokio’s work-stealing scheduler
- Batch operations reduce round-trip overhead
Sources: Cargo.lock:392-402 Cargo.lock:1302-1318
Comparison with Direct Access
The WebSocket client provides remote access with different tradeoffs compared to direct DataStore usage:
| Feature | Direct DataStore | WebSocket Client |
|---|---|---|
| Access Pattern | Local file I/O | Network I/O over WebSocket |
| Zero-Copy Reads | Yes (via mmap) | No (serialized over network) |
| Latency | Microseconds | Milliseconds (network dependent) |
| Concurrency | Multi-process safe | Network-limited |
| Deployment | Single machine | Distributed architecture |
| Security | File system permissions | Network authentication |
When to Use the Client:
- Remote access to centralized storage
- Microservice architectures requiring shared state
- Language interoperability (via Python bindings)
- Isolation of storage from compute workloads
When to Use Direct Access:
- Single-machine deployments
- Latency-critical applications
- Maximum throughput requirements
- Zero-copy read performance needs
Sources: experiments/simd-r-drive-ws-client/Cargo.toml14
Logging and Debugging
The client uses the tracing crate for structured logging and diagnostics.
Logging Levels:
- TRACE : Detailed RPC message contents and serialization
- DEBUG : Connection state changes, request/response flow
- INFO : Connection establishment, disconnection events
- WARN : Recoverable errors, retry attempts
- ERROR : Unrecoverable errors, connection failures
Diagnostic Information:
- Request/response timing
- Serialized message sizes
- Connection state transitions
- Error context and stack traces
Sources: experiments/simd-r-drive-ws-client/Cargo.toml20 Cargo.lock:279-287
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Python Integration
Loading…
Python Integration
Relevant source files
- experiments/bindings/python-ws-client/Cargo.lock
- experiments/bindings/python-ws-client/README.md
- experiments/bindings/python-ws-client/extract_readme_tests.py
- experiments/bindings/python-ws-client/integration_test.sh
- experiments/bindings/python-ws-client/pyproject.toml
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi
- experiments/bindings/python-ws-client/uv.lock
- src/storage_engine/key_indexer.rs
This page provides an overview of the Python bindings for SIMD R Drive. The system offers two approaches for Python integration:
- Modern WebSocket Client (
simd-r-drive-ws-client-py): Communicates with a remotesimd-r-drive-ws-servervia WebSocket RPC. This is the primary, recommended approach documented in this section. - Legacy Direct Bindings (
simd-r-drive-py): Directly embeds the Rust storage engine into Python. This approach is deprecated and not covered in detail here.
The WebSocket client bindings are implemented in Rust using PyO3, compiled to native Python extension modules (.so/.pyd), and distributed as platform-specific wheels via Maturin. The package is published as simd-r-drive-ws-client on PyPI.
Related Pages:
- Python WebSocket Client API - Detailed API reference
- Building Python Bindings - Build and packaging instructions
- Integration Testing - Testing infrastructure and workflows
Sources: experiments/bindings/python-ws-client/README.md:1-60 experiments/bindings/python-ws-client/pyproject.toml:1-6
Architecture Overview
The WebSocket client bindings use a layered architecture that bridges Python user code to the native Rust WebSocket client implementation. The package consists of pure Python wrapper code, PyO3-compiled Rust bindings, and the underlying simd-r-drive-ws-client Rust crate.
Diagram: Python Binding Architecture with Code Entities
graph TB
subgraph "Python_Layer"
UserCode["user_script.py"]
Import["from simd_r_drive_ws_client import DataStoreWsClient"]
end
subgraph "Package_simd_r_drive_ws_client"
InitPy["__init__.py"]
DataStoreWsClientPy["data_store_ws_client.py::DataStoreWsClient"]
TypeStubs["data_store_ws_client.pyi"]
end
subgraph "PyO3_Native_Extension"
BinaryModule["simd_r_drive_ws_client.so / .pyd"]
BaseDataStoreWsClient["BaseDataStoreWsClient"]
NamespaceHasher["NamespaceHasher"]
end
subgraph "Rust_Dependencies"
WsClient["simd-r-drive-ws-client crate"]
MuxioClient["muxio-tokio-rpc-client"]
ServiceDef["simd-r-drive-muxio-service-definition"]
end
UserCode --> Import
Import --> InitPy
InitPy --> DataStoreWsClientPy
DataStoreWsClientPy -.inherits.-> BaseDataStoreWsClient
DataStoreWsClientPy -.types.-> TypeStubs
BaseDataStoreWsClient --> WsClient
NamespaceHasher --> WsClient
BinaryModule --> BaseDataStoreWsClient
BinaryModule --> NamespaceHasher
WsClient --> MuxioClient
WsClient --> ServiceDef
Architecture Layers
| Layer | Components | Technology | Location |
|---|---|---|---|
| Python User Code | Application scripts | Pure Python | User-provided |
| Python Package | DataStoreWsClient, __init__.py | Pure Python | experiments/bindings/python-ws-client/simd_r_drive_ws_client/ |
| PyO3 Bindings | BaseDataStoreWsClient, NamespaceHasher | Rust → compiled .so/.pyd | experiments/bindings/python-ws-client/src/lib.rs |
| Rust Implementation | simd-r-drive-ws-client, muxio-* | Native Rust crates | experiments/ws-client/ |
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py:1-14 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:1-63 experiments/bindings/python-ws-client/README.md:12-15
Python API Surface
The simd-r-drive-ws-client package exposes two primary classes:
DataStoreWsClient- Main client for read/write operationsNamespaceHasher- Utility for generating collision-free namespaced keys
graph TB
subgraph "Python_Wrapper"
DSWsClient["data_store_ws_client.py::DataStoreWsClient"]
NSHasher["NamespaceHasher"]
end
subgraph "PyO3_Bindings"
BaseClient["BaseDataStoreWsClient"]
NSHasherImpl["NamespaceHasher_impl"]
end
subgraph "Method_Sources"
RustMethods["write()\nbatch_write()\ndelete()\nread()\nbatch_read()\nexists()\n__len__()\n__contains__()\nis_empty()\nfile_size()"]
PythonMethods["batch_read_structured()"]
end
DSWsClient -->|inherits| BaseClient
NSHasher -->|exposed via PyO3| NSHasherImpl
BaseClient --> RustMethods
DSWsClient --> PythonMethods
RustMethods -.implemented in.-> WsClientCrate["simd-r-drive-ws-client"]
The API is implemented through a combination of Rust PyO3 bindings (BaseDataStoreWsClient) and Python wrapper code that adds convenience methods.
Diagram: Class Hierarchy and Method Implementation
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:11-63 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:8-219
Core Operations
DataStoreWsClient provides operations organized by implementation layer:
| Operation Type | Methods | Implementation | File Reference |
|---|---|---|---|
| Write Operations | write(), batch_write(), delete() | Rust (BaseDataStoreWsClient) | data_store_ws_client.pyi:27-141 |
| Read Operations | read(), batch_read(), exists() | Rust (BaseDataStoreWsClient) | data_store_ws_client.pyi:53-107 |
| Metadata Operations | __len__(), __contains__(), is_empty(), file_size() | Rust (BaseDataStoreWsClient) | data_store_ws_client.pyi:143-168 |
| Structured Reads | batch_read_structured() | Python wrapper | data_store_ws_client.py:12-62 |
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:27-168 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:11-63
Python-Rust Method Mapping
Diagram: Method Call Flow from Python to Rust
The batch_read_structured() method demonstrates the hybrid approach:
| Step | Layer | Action |
|---|---|---|
| 1. Decompile | Python | Extract flat list of keys from nested dict/list structure |
| 2. Batch read | Rust | Call fast batch_read() via PyO3 |
| 3. Rebuild | Python | Reconstruct original structure with fetched values |
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:12-62 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:109-129
PyO3 Binding Architecture
PyO3 provides the FFI layer that exposes Rust structs and methods as Python classes. The binding layer uses PyO3 procedural macros (#[pyclass], #[pymethods]) to generate Python-compatible wrappers around Rust types.
Diagram: PyO3 Macro Transformation Pipeline
graph TB
subgraph "Rust_Source_Code"
StructDef["#[pyclass]\nstruct BaseDataStoreWsClient"]
MethodsDef["#[pymethods]\nimpl BaseDataStoreWsClient"]
NSStruct["#[pyclass]\nstruct NamespaceHasher"]
NSMethods["#[pymethods]\nimpl NamespaceHasher"]
end
subgraph "PyO3_Macro_Expansion"
PyClassTrait["PyClass trait\nPyTypeInfo\nPyObjectProtocol"]
PyMethodsWrap["Method wrappers\nPyArg extraction\nResult conversion"]
end
subgraph "Python_Extension_Module"
PythonClass["BaseDataStoreWsClient\nwrite()\nread()\nbatch_write()"]
PythonNS["NamespaceHasher\n__init__()\nnamespace()"]
end
StructDef --> PyClassTrait
MethodsDef --> PyMethodsWrap
NSStruct --> PyClassTrait
NSMethods --> PyMethodsWrap
PyClassTrait --> PythonClass
PyMethodsWrap --> PythonClass
PyClassTrait --> PythonNS
PyMethodsWrap --> PythonNS
PyO3 Macro Functions
| Macro | Purpose | Generated Code |
|---|---|---|
#[pyclass] | Mark Rust struct as Python class | Implements PyTypeInfo, PyClass, reference counting |
#[pymethods] | Expose Rust methods to Python | Generates wrapper functions with argument extraction and error handling |
#[pyfunction] | Expose standalone Rust functions | Module-level function bindings |
Sources: experiments/bindings/python-ws-client/Cargo.lock:832-846 experiments/bindings/python-ws-client/Cargo.lock:1096-1108
graph TB
subgraph "Python_Async_Layer"
PyAsyncCall["await client.write(key, data)"]
PyEventLoop["asyncio event loop"]
end
subgraph "pyo3_async_runtimes_Bridge"
Bridge["pyo3_async_runtimes::tokio"]
FutureConv["Future<Output=T> → PyObject"]
LocalSet["LocalSet spawning"]
end
subgraph "Tokio_Runtime"
TokioFuture["async fn write() → Future"]
TokioExecutor["Tokio thread pool"]
end
PyAsyncCall --> PyEventLoop
PyEventLoop --> Bridge
Bridge --> FutureConv
FutureConv --> LocalSet
LocalSet --> TokioFuture
TokioFuture --> TokioExecutor
Async Runtime Bridge
The Python bindings use pyo3-async-runtimes to bridge Python’s async/await model with Rust’s Tokio runtime. This enables Python code to call async Rust methods transparently.
Diagram: Python-Tokio Async Bridge
Runtime Bridge Components
| Component | Crate | Function |
|---|---|---|
pyo3-async-runtimes | Cargo.lock:849-860 | Async bridge between Python and Tokio |
tokio | Cargo.lock:1287-1308 | Rust async runtime |
PyO3 | Cargo.lock:832-846 | FFI layer for Python-Rust interop |
The bridge automatically converts Rust Future<Output=T> values to Python awaitables, handling the differences in execution models between Python’s single-threaded async and Tokio’s work-stealing scheduler.
Sources: experiments/bindings/python-ws-client/Cargo.lock:849-860 experiments/bindings/python-ws-client/Cargo.lock:1287-1308
graph LR
subgraph "Configuration"
PyProject["pyproject.toml\n[build-system]\nbuild-backend = maturin"]
CargoToml["Cargo.toml\n[lib]\ncrate-type = ['cdylib']"]
end
subgraph "Build_Process"
RustcCompile["rustc\n--crate-type=cdylib\nPyO3 linking"]
CreateExtension["simd_r_drive_ws_client.so\nor .pyd"]
PackageWheel["maturin build\nAdd Python files\nAdd metadata"]
end
subgraph "Artifacts"
Wheel["simd_r_drive_ws_client-0.11.1-cp310-linux_x86_64.whl"]
PyPI["PyPI\npip install simd-r-drive-ws-client"]
end
PyProject --> RustcCompile
CargoToml --> RustcCompile
RustcCompile --> CreateExtension
CreateExtension --> PackageWheel
PackageWheel --> Wheel
Wheel --> PyPI
Build and Distribution System
The Python package is built using Maturin, which compiles Rust code to native extensions and packages them as platform-specific wheels. The build process produces binary wheels containing the compiled .so (Linux/macOS) or .pyd (Windows) extension module.
Diagram: Maturin Build and Distribution Pipeline
Sources: experiments/bindings/python-ws-client/pyproject.toml:29-35 experiments/bindings/python-ws-client/README.md:25-38
Build Configuration
pyproject.toml configures the build system and package metadata:
| Section | Lines | Configuration |
|---|---|---|
[project] | pyproject.toml:1-27 | Package name, version, description, PyPI classifiers |
[build-system] | pyproject.toml:29-31 | requires = ["maturin>=1.5"], build-backend = "maturin" |
[tool.maturin] | pyproject.toml:33-35 | bindings = "pyo3", requires-python = ">=3.10" |
[dependency-groups] | pyproject.toml:37-46 | Development dependencies: maturin, pytest, mypy, numpy |
Build Commands
Sources: experiments/bindings/python-ws-client/pyproject.toml:1-47 experiments/bindings/python-ws-client/README.md:31-36
Platform and Python Version Support
The package is distributed as pre-compiled wheels for multiple Python versions and platforms.
Supported Configurations
| Component | Supported Versions/Platforms |
|---|---|
| Python | 3.10, 3.11, 3.12, 3.13 (CPython only) |
| Operating Systems | Linux (x86_64, aarch64), macOS (x86_64, arm64), Windows (x86_64) |
| Architectures | 64-bit only |
Wheel Naming Convention
simd_r_drive_ws_client-{version}-{python_tag}-{platform_tag}.whl
Examples:
- simd_r_drive_ws_client-0.11.1-cp310-cp310-manylinux_2_17_x86_64.whl
- simd_r_drive_ws_client-0.11.1-cp312-cp312-macosx_11_0_arm64.whl
- simd_r_drive_ws_client-0.11.1-cp313-cp313-win_amd64.whl
Sources: experiments/bindings/python-ws-client/pyproject.toml:19-27 experiments/bindings/python-ws-client/README.md:18-23
Dependency Management
The package uses uv for Python dependency management and cargo for Rust dependencies. Runtime Python dependencies are zero—all Rust dependencies are statically compiled into the wheel.
Dependency Categories
| Category | Tools | Lock File | Purpose |
|---|---|---|---|
| Python Development | uv pip, pytest, mypy | uv.lock:1-299 | Testing, type checking, benchmarking |
| Rust Dependencies | cargo | Cargo.lock:1-1380 | Core functionality, WebSocket RPC, serialization |
| Build Tools | maturin | Both lock files | Compiles Rust → Python extension |
Key Development Dependencies
Key Rust Dependencies
| Crate | Version | Purpose |
|---|---|---|
pyo3 | Cargo.lock:832-846 | Python FFI |
pyo3-async-runtimes | Cargo.lock:849-860 | Async bridge |
tokio | Cargo.lock:1287-1308 | Async runtime |
simd-r-drive-ws-client | (workspace) | WebSocket RPC client |
Sources: experiments/bindings/python-ws-client/pyproject.toml:37-46 experiments/bindings/python-ws-client/uv.lock:1-299 experiments/bindings/python-ws-client/Cargo.lock:1-1380
Type Stubs and IDE Support
The package includes .pyi type stub files that provide complete type information for IDEs and static type checkers like mypy.
Type Stub File:data_store_ws_client.pyi
Type Stub Features
| Feature | Description | Example |
|---|---|---|
| Full signatures | Complete method signatures with types | def write(self, key: bytes, data: bytes) -> None |
| Docstrings | Comprehensive documentation | data_store_ws_client.pyi:27-94 |
| Generic types | Support for complex types | Union[Dict[Any, bytes], List[Dict[Any, bytes]]] |
| Final classes | Prevent subclassing | @final class DataStoreWsClient |
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219
graph LR
Input1["Namespace prefix\ne.g., b'users'"]
Input2["Key\ne.g., b'user123'"]
Hash1["XXH3 hash\n8 bytes"]
Hash2["XXH3 hash\n8 bytes"]
Output["Namespaced key\n16 bytes total"]
Input1 -->|hash once at init| Hash1
Input2 -->|hash per call| Hash2
Hash1 --> Output
Hash2 --> Output
graph LR
subgraph "Input"
Prefix["prefix: bytes\ne.g. b'users'"]
Key["key: bytes\ne.g. b'user123'"]
end
subgraph "Hashing"
XXH3_Prefix["XXH3(prefix)"]
XXH3_Key["XXH3(key)"]
end
subgraph "Output"
PrefixHash["8 bytes\nprefix_hash"]
KeyHash["8 bytes\nkey_hash"]
Combined["16 bytes total\nprefix_hash // key_hash"]
end
Prefix --> XXH3_Prefix
Key --> XXH3_Key
XXH3_Prefix --> PrefixHash
XXH3_Key --> KeyHash
PrefixHash --> Combined
KeyHash --> Combined
NamespaceHasher Utility
NamespaceHasher provides deterministic key namespacing using XXH3 hashing to prevent key collisions across logical domains.
Diagram: Namespace Key Derivation
Usage Example
Key Properties
| Property | Value | Description |
|---|---|---|
| Output length | 16 bytes | Fixed-size namespaced key |
| Hash function | XXH3 | Fast, high-quality 64-bit hash |
| Collision resistance | High | XXH3 provides strong distribution |
| Deterministic | Yes | Same input always produces same output |
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:170-219 src/storage_engine/key_indexer.rs:64-72
graph TB
subgraph "integration_test.sh"
Setup["cd experiments/\nBuild if needed"]
StartServer["cargo run --package\nsimd-r-drive-ws-server\n/tmp/simd-r-drive-pytest-storage.bin\n--host 127.0.0.1 --port 34129"]
SetupPython["uv venv\nuv pip install pytest maturin\nuv pip install -e . --group dev"]
ExtractTests["python extract_readme_tests.py"]
RunPytest["pytest -v -s\nTEST_SERVER_HOST=127.0.0.1\nTEST_SERVER_PORT=34129"]
Cleanup["kill -9 $SERVER_PID\nrm /tmp/simd-r-drive-pytest-storage.bin"]
end
Setup --> StartServer
StartServer --> SetupPython
SetupPython --> ExtractTests
ExtractTests --> RunPytest
RunPytest --> Cleanup
Integration Test Infrastructure
The Python bindings include comprehensive integration tests that validate the entire stack, from Python client code to the WebSocket server and storage engine.
Diagram: Integration Test Workflow
Sources: experiments/bindings/python-ws-client/integration_test.sh:1-91
Test Components
The test infrastructure consists of multiple components working together:
| Component | File | Purpose |
|---|---|---|
| Integration script | integration_test.sh1-91 | Orchestrates full-stack test execution |
| README test extractor | extract_readme_tests.py1-46 | Converts README code blocks to pytest functions |
| Generated tests | tests/test_readme_blocks.py | Executable tests from README examples |
| Manual tests | tests/test_*.py | Hand-written unit and integration tests |
Sources: experiments/bindings/python-ws-client/integration_test.sh:1-91 experiments/bindings/python-ws-client/extract_readme_tests.py:1-46
graph LR
subgraph "Input_File"
README["README.md"]
CodeBlocks["```python\ncode\n```"]
end
subgraph "Extraction_Logic"
Regex["re.compile(r'```python\\n(.*?)```', re.DOTALL)"]
Extract["pattern.findall(text)"]
Wrap["def test_readme_block_{i}():\n {indented_code}"]
end
subgraph "Output_File"
TestFile["tests/test_readme_blocks.py"]
TestFunctions["test_readme_block_0()\ntest_readme_block_1()\ntest_readme_block_N()"]
end
README --> CodeBlocks
CodeBlocks --> Regex
Regex --> Extract
Extract --> Wrap
Wrap --> TestFile
TestFile --> TestFunctions
README Test Extraction
extract_readme_tests.py automatically extracts Python code blocks from the README and generates pytest test functions, ensuring documentation examples remain accurate.
Diagram: README to Pytest Pipeline
Extraction Process
| Step | Function | Action |
|---|---|---|
| 1. Read | README.read_text() | Load README.md as string |
| 2. Extract | re.findall(r'```python\n(.*?)```') | Find all Python code blocks |
| 3. Wrap | wrap_as_test_fn(code, idx) | Convert each block to test_readme_block_N() |
| 4. Write | TEST_FILE.write_text() | Write tests/test_readme_blocks.py |
This ensures documentation examples are automatically validated on every test run, preventing drift between documentation and implementation.
Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:14-45
graph TB
subgraph "Internal Modules"
RustBinary["simd_r_drive_ws_client_py.so/.pyd\nBinary compiled module"]
RustSymbols["BaseDataStoreWsClient\nNamespaceHasher\nsetup_logging\ntest_rust_logging"]
PythonWrapper["data_store_ws_client.py\nDataStoreWsClient"]
end
subgraph "Package __init__.py"
ImportRust["from .simd_r_drive_ws_client import\n setup_logging, test_rust_logging"]
ImportPython["from .data_store_ws_client import\n DataStoreWsClient, NamespaceHasher"]
AllList["__all__ = [\n 'DataStoreWsClient',\n 'NamespaceHasher',\n 'setup_logging',\n 'test_rust_logging'\n]"]
end
subgraph "Public API"
UserCode["from simd_r_drive_ws_client import DataStoreWsClient"]
end
RustBinary --> RustSymbols
RustSymbols --> ImportRust
PythonWrapper --> ImportPython
ImportRust --> AllList
ImportPython --> AllList
AllList --> UserCode
Package Exports and Public API
The package’s public API is defined through the __init__.py file, which controls what symbols are available when users import the package.
Export Structure
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py:1-14
The __all__ list explicitly defines the public API surface, preventing internal implementation details from being accidentally imported by users. This follows Python best practices for package design.
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Python WebSocket Client API
Loading…
Python WebSocket Client API
Relevant source files
- experiments/bindings/python-ws-client/README.md
- experiments/bindings/python-ws-client/extract_readme_tests.py
- experiments/bindings/python-ws-client/integration_test.sh
- experiments/bindings/python-ws-client/pyproject.toml
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi
- experiments/bindings/python-ws-client/uv.lock
- src/storage_engine/key_indexer.rs
Purpose and Scope
This document describes the Python WebSocket client API for remote access to SIMD R Drive storage. The API provides idiomatic Python interfaces backed by high-performance Rust implementations via PyO3 bindings. This page covers the DataStoreWsClient class, NamespaceHasher utility, and their usage patterns.
For information about building and installing the Python bindings, see Building Python Bindings. For details about the underlying native Rust WebSocket client, see Native Rust Client. For server-side configuration and deployment, see WebSocket Server.
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py:1-14 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219
Architecture Overview
The Python WebSocket client uses a multi-layer architecture that bridges Python’s async/await with Rust’s Tokio runtime while maintaining idiomatic Python APIs.
graph TB
UserCode["Python User Code\nimport simd_r_drive_ws_client"]
DataStoreWsClient["DataStoreWsClient\nPython wrapper class"]
BaseDataStoreWsClient["BaseDataStoreWsClient\nPyO3 #[pyclass]"]
PyO3FFI["PyO3 FFI Layer\npyo3-async-runtimes"]
RustClient["simd-r-drive-ws-client\nNative Rust implementation"]
MuxioRPC["muxio-tokio-rpc-client\nWebSocket + RPC"]
Server["simd-r-drive-ws-server\nRemote DataStore"]
UserCode --> DataStoreWsClient
DataStoreWsClient --> BaseDataStoreWsClient
BaseDataStoreWsClient --> PyO3FFI
PyO3FFI --> RustClient
RustClient --> MuxioRPC
MuxioRPC --> Server
Python Integration Stack
Sources: experiments/bindings/python-ws-client/Cargo.lock:1096-1108 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:1-10
Class Hierarchy
The BaseDataStoreWsClient class is implemented in Rust and exposes core storage operations through PyO3. The DataStoreWsClient Python class extends it with additional convenience methods implemented in pure Python.
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-10 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:11-62
DataStoreWsClient Class
The DataStoreWsClient class provides the primary interface for interacting with a remote SIMD R Drive storage engine over WebSocket connections.
Connection Initialization
| Constructor | Description |
|---|---|
__init__(host: str, port: int) | Establishes WebSocket connection to the specified server |
The constructor creates a WebSocket connection to the remote storage server. Connection establishment is synchronous and will raise an exception if the server is unreachable.
Example:
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:17-25
Write Operations
| Method | Parameters | Description |
|---|---|---|
write(key, data) | key: bytes, data: bytes | Appends single key/value pair |
batch_write(items) | items: list[tuple[bytes, bytes]] | Writes multiple pairs in one operation |
Write operations are append-only and atomic. If a key already exists, writing to it creates a new version while the old data remains on disk (marked as superseded via the index).
Example:
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:27-51
Read Operations
| Method | Parameters | Return Type | Copy Behavior |
|---|---|---|---|
read(key) | key: bytes | Optional[bytes] | Performs memory copy |
batch_read(keys) | keys: list[bytes] | list[Optional[bytes]] | Performs memory copy |
batch_read_structured(data) | data: dict or list[dict] | Same structure with values | Python-side wrapper |
The read and batch_read methods perform memory copies when returning data. For zero-copy access patterns, the native Rust client provides read_entry methods that return memory-mapped views.
The batch_read_structured method is a Python convenience wrapper that:
- Accepts dictionaries or lists of dictionaries where values are datastore keys
- Flattens the structure into a single key list
- Calls
batch_readfor efficient parallel fetching - Reconstructs the original structure with fetched values
Example:
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:79-129 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py:12-62
Deletion and Existence Checks
| Method | Parameters | Return Type | Description |
|---|---|---|---|
delete(key) | key: bytes | None | Marks key as deleted (tombstone) |
exists(key) | key: bytes | bool | Checks if key is active |
__contains__(key) | key: bytes | bool | Python in operator support |
Deletion is logical, not physical. The delete method appends a tombstone entry to the storage file. The physical data remains on disk but is no longer accessible through reads.
Example:
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:53-77 experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:131-141
Utility Methods
| Method | Return Type | Description |
|---|---|---|
__len__() | int | Returns count of active entries |
is_empty() | bool | Checks if store has any active keys |
file_size() | int | Returns physical file size on disk |
Example:
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:143-168
NamespaceHasher Utility
The NamespaceHasher class provides deterministic key namespacing using XXH3 hashing to prevent key collisions across logical domains.
graph LR
Input1["Namespace prefix\ne.g., b'users'"]
Input2["Key\ne.g., b'user123'"]
Hash1["XXH3 hash\n8 bytes"]
Hash2["XXH3 hash\n8 bytes"]
Output["Namespaced key\n16 bytes total"]
Input1 -->|hash once at init| Hash1
Input2 -->|hash per call| Hash2
Hash1 --> Output
Hash2 --> Output
Architecture
Usage Pattern
| Method | Parameters | Return Type | Description |
|---|---|---|---|
__init__(prefix) | prefix: bytes | N/A | Initializes hasher with namespace |
namespace(key) | key: bytes | bytes | Returns 16-byte namespaced key |
The output key structure is:
- Bytes 0-7: XXH3 hash of namespace prefix
- Bytes 8-15: XXH3 hash of input key
This design ensures:
- Deterministic key generation (same input → same output)
- Collision isolation between namespaces
- Fixed-length keys regardless of input size
Example:
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:170-219
graph TB
Stub["data_store_ws_client.pyi\nType definitions"]
Impl["data_store_ws_client.py\nImplementation"]
Base["simd_r_drive_ws_client\nCompiled Rust module"]
Stub -.->|describes| Impl
Impl -->|imports from| Base
Stub -.->|describes| Base
Type Stubs and IDE Support
The package provides complete type stubs for IDE integration and static type checking.
Type Stub Structure
The type stubs (experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219) provide:
- Full method signatures with type annotations
- Return type information (
Optional[bytes],list[Optional[bytes]], etc.) - Docstrings for IDE hover documentation
@finaldecorators indicating classes cannot be subclassed
Type Checking Example
Python Version Support:
The package targets Python 3.10-3.13 as specified in experiments/bindings/python-ws-client/pyproject.toml7 and experiments/bindings/python-ws-client/pyproject.toml:21-24
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219 experiments/bindings/python-ws-client/pyproject.toml7 experiments/bindings/python-ws-client/pyproject.toml:19-27
graph TB
PythonMain["Python Main Thread\nSynchronous API calls"]
PyO3["PyO3 Bridge\npyo3-async-runtimes"]
TokioRT["Tokio Runtime\nAsync event loop"]
WSClient["WebSocket Client\ntokio-tungstenite"]
PythonMain -->|sync call| PyO3
PyO3 -->|spawn + block_on| TokioRT
TokioRT --> WSClient
WSClient -.->|result| TokioRT
TokioRT -.->|return| PyO3
PyO3 -.->|return| PythonMain
Async Runtime Bridging
The client uses pyo3-async-runtimes to bridge Python’s async/await with Rust’s Tokio runtime. This allows the underlying Rust WebSocket client to use native async I/O while exposing synchronous APIs to Python.
Runtime Architecture
The pyo3-async-runtimes crate (experiments/bindings/python-ws-client/Cargo.lock:849-860) provides:
- Runtime spawning: Manages Tokio runtime lifecycle
- Future blocking: Converts Rust async operations to Python-blocking calls
- Thread safety: Ensures proper synchronization between Python GIL and Rust runtime
This design allows Python code to use simple synchronous APIs while benefiting from Rust’s high-performance async networking under the hood.
Sources: experiments/bindings/python-ws-client/Cargo.lock:849-860 experiments/bindings/python-ws-client/Cargo.lock:1096-1108
API Summary
Complete Method Reference
| Category | Method | Parameters | Return | Description |
|---|---|---|---|---|
| Connection | __init__ | host: str, port: int | N/A | Establish WebSocket connection |
| Write | write | key: bytes, data: bytes | None | Write single entry |
| Write | batch_write | items: list[tuple[bytes, bytes]] | None | Write multiple entries |
| Read | read | key: bytes | Optional[bytes] | Read single entry (copies) |
| Read | batch_read | keys: list[bytes] | list[Optional[bytes]] | Read multiple entries |
| Read | batch_read_structured | data: dict or list[dict] | Same structure | Read with structure preservation |
| Delete | delete | key: bytes | None | Mark key as deleted |
| Query | exists | key: bytes | bool | Check key existence |
| Query | __contains__ | key: bytes | bool | Python in operator |
| Info | __len__ | N/A | int | Active entry count |
| Info | is_empty | N/A | bool | Check if empty |
| Info | file_size | N/A | int | Physical file size |
NamespaceHasher Reference
| Method | Parameters | Return | Description |
|---|---|---|---|
__init__ | prefix: bytes | N/A | Initialize namespace |
namespace | key: bytes | bytes | Generate 16-byte namespaced key |
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:1-219
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Building Python Bindings
Loading…
Building Python Bindings
Relevant source files
- experiments/bindings/python-ws-client/Cargo.lock
- experiments/bindings/python-ws-client/README.md
- experiments/bindings/python-ws-client/extract_readme_tests.py
- experiments/bindings/python-ws-client/integration_test.sh
- experiments/bindings/python-ws-client/pyproject.toml
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi
- experiments/bindings/python-ws-client/uv.lock
- src/storage_engine/key_indexer.rs
Purpose and Scope
This page describes the build system, tooling, and workflow for generating Python bindings for the SIMD R Drive storage engine. It covers the PyO3/Maturin build pipeline, dependency management with uv, local development workflows, and wheel distribution. For the Python API surface and usage patterns, see Python WebSocket Client API. For CI/CD workflows and automated release processes, see CI/CD Pipeline.
Build System Architecture
The Python bindings are built using PyO3 (Rust-Python FFI) and Maturin (build backend and wheel generator). The uv tool replaces traditional pip and venv for faster, more reliable dependency management.
Build Pipeline Overview
graph TB
subgraph "Source Code"
RUST_LIB["experiments/bindings/python-ws-client/src/lib.rs\nPyO3 FFI Layer"]
PY_WRAPPER["simd_r_drive_ws_client/data_store_ws_client.py\nPython Wrapper"]
TYPE_STUBS["simd_r_drive_ws_client/data_store_ws_client.pyi\nType Hints"]
end
subgraph "Build Configuration"
PYPROJECT["pyproject.toml\nProject Metadata + Build Backend"]
CARGO_TOML["Cargo.toml\nRust Dependencies"]
UV_LOCK["uv.lock\nPinned Python Deps"]
end
subgraph "Build Tools"
PYO3["PyO3 0.25.1\nRust-Python Bridge"]
MATURIN["Maturin 1.8.7\nBuild System"]
UV["uv\nDependency Manager"]
end
subgraph "Build Outputs"
NATIVE_LIB["simd_r_drive_ws_client.so/.pyd\nNative Extension Module"]
WHEEL["simd_r_drive_ws_client-*.whl\nDistributable Package"]
end
RUST_LIB --> PYO3
PY_WRAPPER --> MATURIN
TYPE_STUBS --> MATURIN
PYPROJECT --> MATURIN
CARGO_TOML --> PYO3
UV_LOCK --> UV
PYO3 --> MATURIN
MATURIN --> NATIVE_LIB
MATURIN --> WHEEL
UV --> MATURIN
NATIVE_LIB -.packaged into.-> WHEEL
Sources: experiments/bindings/python-ws-client/pyproject.toml:29-36 experiments/bindings/python-ws-client/Cargo.lock:832-905
Project Configuration Files
pyproject.toml Structure
The pyproject.toml file defines project metadata, build system requirements, and development dependencies.
| Section | Purpose | Key Configuration |
|---|---|---|
[project] | Package metadata | name, version, requires-python = ">=3.10" |
[build-system] | Build backend | requires = ["maturin>=1.5"], build-backend = "maturin" |
[tool.maturin] | Maturin settings | bindings = "pyo3", requires-python = ">=3.10" |
[dependency-groups] | Dev dependencies | maturin, pytest, mypy, numpy |
Key Configuration Entries:
The bindings = "pyo3" directive tells Maturin to compile Rust code using PyO3’s FFI macros and generate a native Python extension module.
Sources: experiments/bindings/python-ws-client/pyproject.toml:1-46
Cargo Dependencies
The Rust side declares dependencies for PyO3, async runtime bridging, and core storage functionality:
| Dependency | Version | Purpose |
|---|---|---|
pyo3 | 0.25.1 | Rust-Python FFI with #[pyclass], #[pyfunction] macros |
pyo3-async-runtimes | 0.25.0 | Bridges Python asyncio with Rust tokio |
tokio | 1.45.1 | Async runtime for WebSocket client |
simd-r-drive-ws-client | 0.15.5-alpha | Native Rust WebSocket client |
Sources: experiments/bindings/python-ws-client/Cargo.lock:832-905 experiments/bindings/python-ws-client/Cargo.lock:849-860
Dependency Management with uv
The project uses uv instead of traditional pip for significantly faster dependency resolution and installation. uv is an all-in-one replacement for pip, pip-tools, and virtualenv.
uv Workflow Diagram
graph LR
subgraph "Traditional pip"
PIP_VENV["python -m venv"]
PIP_INSTALL["pip install"]
PIP_LOCK["pip freeze > requirements.txt"]
end
subgraph "uv Workflow"
UV_VENV["uv venv"]
UV_SYNC["uv pip install -e . --group dev"]
UV_LOCK["uv.lock\nAuto-generated"]
UV_RUN["uv run pytest"]
end
UV_VENV --> UV_SYNC
UV_SYNC --> UV_LOCK
UV_LOCK --> UV_RUN
Development Dependencies
Development dependencies are specified using the [dependency-groups] table in pyproject.toml:
This group can be installed with:
Sources: experiments/bindings/python-ws-client/pyproject.toml:37-46 experiments/bindings/python-ws-client/integration_test.sh:70-77
Building from Source
Local Development Build
The fastest way to build and install bindings for local development is using maturin develop:
Build Process Details:
The develop command creates an editable installation , meaning changes to Python wrapper code (experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py) take effect immediately without reinstalling. Rust code changes require re-running maturin develop.
Sources: experiments/bindings/python-ws-client/README.md:32-36 experiments/bindings/python-ws-client/integration_test.sh:74-77
Release Wheel Build
To build distributable wheels for PyPI:
Maturin automatically:
- Compiles Rust code with
--releaseoptimizations - Generates platform-specific wheel filename (e.g.,
cp310-cp310-linux_x86_64) - Bundles native extension, Python wrappers, and type stubs
- Creates wheel in
target/wheels/
Platform-Specific Wheels:
| Platform | Wheel Tag Example | Notes |
|---|---|---|
| Linux x86_64 | cp310-cp310-manylinux_2_17_x86_64 | Built with manylinux2014 for compatibility |
| macOS x86_64 | cp310-cp310-macosx_10_12_x86_64 | Requires macOS 10.12+ |
| macOS ARM64 | cp310-cp310-macosx_11_0_arm64 | M1/M2 Macs |
| Windows x64 | cp310-cp310-win_amd64 | MSVC toolchain |
Sources: experiments/bindings/python-ws-client/uv.lock:117-130
Integration Testing Infrastructure
The project includes an automated integration test system that:
- Extracts code examples from
README.md - Starts the WebSocket server
- Runs pytest against live server
- Cleans up resources
graph TB
subgraph "integration_test.sh"
START["Start script"]
REGISTER_CLEANUP["Register cleanup trap"]
BUILD_SERVER["cargo run --package simd-r-drive-ws-server"]
START_SERVER["Start server in background\nPID captured"]
SETUP_UV["uv venv\nuv pip install"]
EXTRACT_TESTS["uv run extract_readme_tests.py"]
RUN_PYTEST["uv run pytest -v -s"]
CLEANUP["Kill server PID\nRemove storage file"]
end
subgraph "extract_readme_tests.py"
READ_README["Read README.md"]
EXTRACT_BLOCKS["Regex: ```python...```"]
WRAP_TEST_FN["Wrap as test_readme_block_N()"]
WRITE_TEST_FILE["Write tests/test_readme_blocks.py"]
end
START --> REGISTER_CLEANUP
REGISTER_CLEANUP --> BUILD_SERVER
BUILD_SERVER --> START_SERVER
START_SERVER --> SETUP_UV
SETUP_UV --> EXTRACT_TESTS
EXTRACT_TESTS --> READ_README
READ_README --> EXTRACT_BLOCKS
EXTRACT_BLOCKS --> WRAP_TEST_FN
WRAP_TEST_FN --> WRITE_TEST_FILE
WRITE_TEST_FILE --> RUN_PYTEST
RUN_PYTEST --> CLEANUP
Test Orchestration Flow
Sources: experiments/bindings/python-ws-client/integration_test.sh:1-91
README Test Extraction
The extract_readme_tests.py script converts documentation examples into executable tests:
Extraction Logic:
- Pattern Matching: Uses regex
r"```python\n(.*?)```"to extract fenced code blocks (experiments/bindings/python-ws-client/extract_readme_tests.py:22-24) - Sanitization: Strips non-ASCII characters to avoid encoding issues (experiments/bindings/python-ws-client/extract_readme_tests.py:26-28)
- Test Wrapping: Wraps each block in a
test_readme_block_{i}()function (experiments/bindings/python-ws-client/extract_readme_tests.py:30-34) - File Generation: Writes to
tests/test_readme_blocks.py(experiments/bindings/python-ws-client/extract_readme_tests.py:36-42)
Example Transformation:
Input (README.md):
Output (tests/test_readme_blocks.py):
Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:14-45
Integration Test Server Management
The test script manages the WebSocket server lifecycle:
Server Startup Sequence:
Cleanup Trap:
The script registers a cleanup() function that executes on exit (success or failure):
This ensures the server never remains running after tests complete, even if pytest crashes.
Sources: experiments/bindings/python-ws-client/integration_test.sh:17-33 experiments/bindings/python-ws-client/integration_test.sh:47-56
Wheel Distribution and CI Integration
Maturin Wheel Building
Maturin generates platform-specific binary wheels that include the compiled Rust extension. Each wheel is tagged with Python version, ABI, and platform identifiers.
Wheel Naming Convention:
{distribution}-{version}-{python tag}-{abi tag}-{platform tag}.whl
Example: simd_r_drive_ws_client-0.11.1-cp310-cp310-manylinux_2_17_x86_64.whl
| Component | Value | Meaning |
|---|---|---|
cp310 | CPython 3.10 | Python implementation and version |
cp310 | CPython 3.10 ABI | ABI compatibility tag |
manylinux_2_17 | glibc 2.17+ | Minimum Linux C library version |
x86_64 | x86-64 | CPU architecture |
Sources: experiments/bindings/python-ws-client/uv.lock:117-130
Build Matrix Configuration
The CI system (see CI/CD Pipeline) builds wheels for multiple platforms using a matrix strategy:
This produces wheels for:
- Linux: manylinux2014 x86_64, aarch64
- macOS: x86_64, arm64 (universal2)
- Windows: win_amd64, win32, win_arm64
CI Workflow Reference:
The GitHub Actions workflow at .github/workflows/python-net-release.yml orchestrates multi-platform builds using Maturin’s maturin build --release command across matrix configurations.
Sources: experiments/bindings/python-ws-client/README.md38
PyO3 Feature Configuration
Async Runtime Bridge
The bindings use pyo3-async-runtimes to bridge Python’s asyncio event loop with Rust’s tokio runtime:
This allows Python code to use await with methods that internally execute Rust async code:
The wrapper methods handle the await internally, presenting a synchronous interface to Python users.
Sources: experiments/bindings/python-ws-client/Cargo.lock:849-860
Type Stub Generation
Type stubs (.pyi files) provide IDE autocomplete and mypy type checking. They are manually maintained to match the PyO3 API:
These stubs are packaged into the wheel alongside the native extension, enabling static type checking without runtime overhead.
Sources: experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi:7-169
Development Workflow Summary
Recommended Development Cycle:
Quick Command Reference:
| Task | Command |
|---|---|
| Setup environment | uv venv && uv pip install -e . --group dev |
| Build debug | maturin develop |
| Build release | maturin develop --release |
| Run tests | uv run pytest -v |
| Type check | uv run mypy . |
| Build wheel | maturin build --release |
| Full integration test | ./integration_test.sh |
Sources: experiments/bindings/python-ws-client/integration_test.sh:70-87 experiments/bindings/python-ws-client/README.md:32-36
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Integration Testing
Loading…
Integration Testing
Relevant source files
- experiments/bindings/python-ws-client/README.md
- experiments/bindings/python-ws-client/extract_readme_tests.py
- experiments/bindings/python-ws-client/integration_test.sh
- experiments/bindings/python-ws-client/uv.lock
Purpose and Scope
This document describes the integration testing infrastructure for the Python WebSocket client bindings. The integration test suite validates that the Python client (simd-r-drive-ws-client) can successfully communicate with the WebSocket server (simd-r-drive-ws-server) over a real network connection. The tests automatically extract and validate code examples from the README documentation to ensure documentation accuracy.
For information about building Python bindings, see Building Python Bindings. For details on the Python WebSocket Client API itself, see Python WebSocket Client API.
Test Architecture Overview
Sources: experiments/bindings/python-ws-client/integration_test.sh:1-91 experiments/bindings/python-ws-client/extract_readme_tests.py:1-46
Test Orchestration Script
The integration test suite is orchestrated by the integration_test.sh Bash script, which manages the complete test lifecycle including server startup, environment setup, test execution, and cleanup.
Script Configuration
| Configuration Variable | Default Value | Purpose |
|---|---|---|
EXPERIMENTS_DIR_REL_PATH | ../../ | Relative path to experiments directory |
SERVER_PACKAGE_NAME | simd-r-drive-ws-server | Cargo package name for server |
STORAGE_FILE | /tmp/simd-r-drive-pytest-storage.bin | Temporary storage file path |
SERVER_HOST | 127.0.0.1 | Server bind address |
SERVER_PORT | 34129 | Server listen port |
Sources: experiments/bindings/python-ws-client/integration_test.sh:8-14
Execution Flow
Sources: experiments/bindings/python-ws-client/integration_test.sh:35-90
Cleanup Mechanism
The script registers a cleanup() function with trap cleanup EXIT to ensure resources are released regardless of how the script terminates:
- Process Group Termination : Uses
kill -9 "-$SERVER_PID"to kill the entire process group, ensuring the server and any child processes are stopped - Storage File Removal : Deletes the temporary storage file at
/tmp/simd-r-drive-pytest-storage.bin - Error Suppression : Uses
|| trueto prevent cleanup failures from failing the script
Sources: experiments/bindings/python-ws-client/integration_test.sh:18-30 experiments/bindings/python-ws-client/integration_test.sh:32-33
README Example Extraction
The extract_readme_tests.py script automatically generates pytest test functions from Python code blocks in the README documentation, ensuring that documented examples remain functional.
graph LR
subgraph "Input"
README[README.md\nPython code blocks]
end
subgraph "Extraction Functions"
EXTRACT[extract_python_blocks]
REGEX["re.compile pattern\n```python...```"]
STRIP[strip_non_ascii]
WRAP[wrap_as_test_fn]
end
subgraph "Output"
TEST_FN["def test_readme_block_{i}"]
TEST_FILE[tests/test_readme_blocks.py]
end
README --> EXTRACT
EXTRACT --> REGEX
REGEX --> STRIP
STRIP --> WRAP
WRAP --> TEST_FN
TEST_FN --> TEST_FILE
Extraction Process
Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:15-42
Key Functions
| Function | Input | Output | Purpose |
|---|---|---|---|
extract_python_blocks | README text | list[str] | Uses regex r"```python\n(.*?)```" to extract code blocks |
strip_non_ascii | Code string | ASCII string | Removes non-ASCII characters using encode("ascii", errors="ignore") |
wrap_as_test_fn | Code string, index | Test function string | Wraps code in def test_readme_block_{idx}(): with proper indentation |
Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:21-34
File Paths and Constants
The script uses fixed file paths defined at module level:
README = Path("README.md")- Source documentation fileTEST_FILE = Path("tests/test_readme_blocks.py")- Generated test file
The generated test file includes a header comment indicating it is auto-generated and imports pytest.
Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:18-19 experiments/bindings/python-ws-client/extract_readme_tests.py:40-41
Test Execution with pytest
The test suite uses pytest as the test runner, executed through the uv run command to ensure correct virtual environment activation.
pytest Configuration
The test execution command uses specific flags:
| Flag | Purpose |
|---|---|
-v | Verbose output showing individual test names |
-s | Disable output capture (show print statements) |
Sources: experiments/bindings/python-ws-client/integration_test.sh87
Environment Variables
The test suite relies on environment variables to locate the running server:
| Variable | Source | Usage |
|---|---|---|
TEST_SERVER_HOST | $SERVER_HOST from script | Server IP address for client connection |
TEST_SERVER_PORT | $SERVER_PORT from script | Server port for client connection |
These variables are exported before pytest execution so test code can access the server endpoint.
Sources: experiments/bindings/python-ws-client/integration_test.sh:84-85
Test Function Generation
Each Python code block from README.md becomes an isolated test function following the pattern:
def test_readme_block_0():
<indented code from README>
def test_readme_block_1():
<indented code from README>
The test functions are numbered sequentially starting from 0. Each test runs independently with its own test context.
Sources: experiments/bindings/python-ws-client/extract_readme_tests.py:30-34
graph TB
subgraph "Server Startup Sequence"
CD[cd to experiments dir]
SET_M["set -m\nEnable job control"]
CARGO["cargo run --package simd-r-drive-ws-server"]
ARGS["-- $STORAGE_FILE --host $HOST --port $PORT"]
BG["& (background)"]
CAPTURE["SERVER_PID=$!"]
UNSET_M["set +m\nDisable job control"]
end
CD --> SET_M
SET_M --> CARGO
CARGO --> ARGS
ARGS --> BG
BG --> CAPTURE
CAPTURE --> UNSET_M
style CARGO fill:#f9f9f9
style CAPTURE fill:#f9f9f9
Server Lifecycle Management
The integration test script manages the WebSocket server lifecycle to provide a clean test environment for each run.
Server Startup
The script uses set -m to enable job control before starting the server, which allows proper PID capture of background processes. After capturing the PID, job control is disabled with set +m.
Sources: experiments/bindings/python-ws-client/integration_test.sh:47-56
Server Configuration
The server is started with the following arguments:
- Storage File : Positional argument specifying the data file path (
/tmp/simd-r-drive-pytest-storage.bin) --host: Bind address (127.0.0.1for localhost-only access)--port: Listen port (34129for test isolation)
These arguments are passed after the -- separator to distinguish cargo arguments from application arguments.
Sources: experiments/bindings/python-ws-client/integration_test.sh53
Server Termination
The cleanup function terminates the server using process group kill:
-9: SIGKILL signal for forceful termination"-$SERVER_PID": Negative PID to kill entire process group2>/dev/null: Suppress error messages|| true: Prevent script failure if process already exited
Sources: experiments/bindings/python-ws-client/integration_test.sh25
graph TB
subgraph "uv Environment Setup"
CHECK["command -v uv"]
VENV["uv venv"]
INSTALL_BASE["uv pip install pytest maturin"]
INSTALL_DEV["uv pip install -e . --group dev"]
end
subgraph "Python Dependencies"
PYTEST[pytest]
MATURIN[maturin]
DEV_DEPS[Development dependencies\nfrom pyproject.toml]
end
subgraph "Lock File"
UV_LOCK[uv.lock]
ANYIO[anyio]
HTTPX[httpx]
HTTPCORE[httpcore]
NUMPY[numpy]
MYPY[mypy]
end
CHECK --> VENV
VENV --> INSTALL_BASE
INSTALL_BASE --> PYTEST
INSTALL_BASE --> MATURIN
INSTALL_BASE --> INSTALL_DEV
INSTALL_DEV --> DEV_DEPS
UV_LOCK -.resolves.-> ANYIO
UV_LOCK -.resolves.-> HTTPX
UV_LOCK -.resolves.-> HTTPCORE
UV_LOCK -.resolves.-> NUMPY
UV_LOCK -.resolves.-> MYPY
style CHECK fill:#f9f9f9
style VENV fill:#f9f9f9
style UV_LOCK fill:#f9f9f9
Environment Setup with uv
The test suite uses the uv Python package manager for fast, reliable dependency management and virtual environment creation.
Dependency Resolution
Sources: experiments/bindings/python-ws-client/integration_test.sh:62-77 experiments/bindings/python-ws-client/uv.lock:1-7
uv Commands
| Command | Purpose |
|---|---|
uv venv | Creates a virtual environment in .venv directory |
uv pip install --quiet pytest maturin | Installs test runner and build tool |
uv pip install -e . --group dev | Installs package in editable mode with dev dependencies |
uv run <command> | Executes command in virtual environment context |
The --quiet flag suppresses installation progress output for cleaner logs.
Sources: experiments/bindings/python-ws-client/integration_test.sh:70-80
Dependency Lock File
The uv.lock file pins exact versions and hashes for all dependencies:
| Package | Version | Purpose |
|---|---|---|
pytest | Latest | Test framework |
maturin | 1.8.7+ | PyO3 build system |
anyio | 4.9.0+ | Async I/O foundation |
httpx | 0.28.1+ | HTTP client (WebSocket support) |
mypy | 1.16.1+ | Static type checking |
numpy | 2.2.6+/2.3.0+ | Numerical computing (conditional) |
The lock file uses resolution markers to handle different Python versions (e.g., python_full_version >= '3.11').
Sources: experiments/bindings/python-ws-client/uv.lock:1-7 experiments/bindings/python-ws-client/uv.lock:110-130 experiments/bindings/python-ws-client/uv.lock:133-169
uv Availability Check
Before proceeding with environment setup, the script validates that uv is installed:
This check ensures clear error messages if the prerequisite tool is missing.
Sources: experiments/bindings/python-ws-client/integration_test.sh:62-68
graph TB
START["integration_test.sh start"]
subgraph "Phase 1: Setup"
CD_EXPERIMENTS[cd experiments/]
BUILD_SERVER["cargo run --package simd-r-drive-ws-server &"]
CAPTURE_PID[Capture SERVER_PID]
end
subgraph "Phase 2: Environment"
CD_CLIENT[cd bindings/python-ws-client]
CHECK_UV[Check uv availability]
CREATE_VENV[uv venv]
INSTALL_DEPS[uv pip install pytest maturin]
INSTALL_EDITABLE["uv pip install -e . --group dev"]
end
subgraph "Phase 3: Test Generation"
RUN_EXTRACT[uv run extract_readme_tests.py]
PARSE_README[Parse README.md]
GENERATE_TESTS[Generate tests/test_readme_blocks.py]
end
subgraph "Phase 4: Test Execution"
EXPORT_ENV[Export TEST_SERVER_HOST/PORT]
RUN_PYTEST["uv run pytest -v -s"]
EXECUTE_TESTS[Execute test_readme_block_* functions]
end
subgraph "Phase 5: Cleanup"
TRAP_EXIT[trap EXIT triggers]
KILL_SERVER["kill -9 -$SERVER_PID"]
REMOVE_FILE["rm -f $STORAGE_FILE"]
end
START --> CD_EXPERIMENTS
CD_EXPERIMENTS --> BUILD_SERVER
BUILD_SERVER --> CAPTURE_PID
CAPTURE_PID --> CD_CLIENT
CD_CLIENT --> CHECK_UV
CHECK_UV --> CREATE_VENV
CREATE_VENV --> INSTALL_DEPS
INSTALL_DEPS --> INSTALL_EDITABLE
INSTALL_EDITABLE --> RUN_EXTRACT
RUN_EXTRACT --> PARSE_README
PARSE_README --> GENERATE_TESTS
GENERATE_TESTS --> EXPORT_ENV
EXPORT_ENV --> RUN_PYTEST
RUN_PYTEST --> EXECUTE_TESTS
EXECUTE_TESTS --> TRAP_EXIT
TRAP_EXIT --> KILL_SERVER
KILL_SERVER --> REMOVE_FILE
style START fill:#f9f9f9
style TRAP_EXIT fill:#f9f9f9
Test Execution Workflow Summary
The complete integration test workflow coordinates multiple tools and processes:
Sources: experiments/bindings/python-ws-client/integration_test.sh:1-91 experiments/bindings/python-ws-client/extract_readme_tests.py:1-46
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Performance Optimizations
Loading…
Performance Optimizations
Relevant source files
Purpose and Scope
This document describes the performance optimization strategies employed by SIMD R Drive to achieve high-throughput storage operations. It covers hardware acceleration through SIMD instructions, cache-efficient memory alignment, zero-copy access patterns, lock-free concurrent operations, and the benchmarking infrastructure used to validate these optimizations.
For implementation details of specific SIMD operations, see SIMD Acceleration. For cache-line alignment specifics, see Payload Alignment and Cache Efficiency. For operation mode characteristics, see Write and Read Modes. For benchmark execution and analysis, see Benchmarking.
Performance Architecture Overview
The performance optimization stack consists of multiple layers, from hardware acceleration at the bottom to application-level operation modes at the top. Each layer contributes to the overall system throughput.
Performance Architecture Stack
graph TB
subgraph Hardware["Hardware Features"]
CPU["CPU Architecture"]
AVX2["AVX2 Instructions\nx86_64"]
NEON["NEON Instructions\naarch64"]
CACHE["64-byte Cache Lines"]
end
subgraph SIMD["SIMD Optimization Layer"]
SIMD_COPY["simd_copy Function"]
XXH3["xxh3_64 Hashing"]
FEATURE_DETECT["Runtime Feature Detection"]
end
subgraph Memory["Memory Management"]
MMAP["memmap2::Mmap"]
ALIGN["PAYLOAD_ALIGNMENT = 64"]
DASHMAP["DashMap Index"]
end
subgraph Concurrency["Concurrency Primitives"]
ATOMIC["AtomicU64 tail_offset"]
RWLOCK["RwLock File"]
ARC["Arc Mmap Sharing"]
end
subgraph Operations["Operation Modes"]
WRITE_SINGLE["Single Write"]
WRITE_BATCH["Batch Write"]
WRITE_STREAM["Stream Write"]
READ_DIRECT["Direct Read"]
READ_STREAM["Stream Read"]
READ_PARALLEL["Parallel Iteration"]
end
CPU --> AVX2
CPU --> NEON
CPU --> CACHE
AVX2 --> SIMD_COPY
NEON --> SIMD_COPY
AVX2 --> XXH3
NEON --> XXH3
FEATURE_DETECT --> SIMD_COPY
CACHE --> ALIGN
SIMD_COPY --> ALIGN
ALIGN --> MMAP
MMAP --> ARC
MMAP --> DASHMAP
DASHMAP --> ATOMIC
RWLOCK --> ATOMIC
SIMD_COPY --> WRITE_SINGLE
SIMD_COPY --> WRITE_BATCH
MMAP --> READ_DIRECT
ARC --> READ_PARALLEL
The diagram shows how hardware features enable SIMD operations, which work with aligned memory to maximize cache efficiency. The memory management layer uses zero-copy access patterns, while concurrency primitives enable safe multi-threaded operations. Application-level operation modes leverage these lower layers for optimal performance.
Sources: README.md:5-7 README.md:249-256 src/storage_engine/simd_copy.rs:1-139
SIMD Acceleration Components
SIMD R Drive uses vectorized instructions to accelerate two critical operations: memory copying during writes and key hashing for indexing. The system detects available CPU features at runtime and selects the optimal implementation.
SIMD Component Architecture
graph TB
subgraph Detection["Feature Detection"]
RUNTIME["std::is_x86_feature_detected!"]
X86_CHECK["Check avx2 on x86_64"]
ARM_DEFAULT["Default neon on aarch64"]
end
subgraph Implementations["SIMD Implementations"]
SIMD_COPY_X86["simd_copy_x86\n_mm256_loadu_si256\n_mm256_storeu_si256"]
SIMD_COPY_ARM["simd_copy_arm\nvld1q_u8\nvst1q_u8"]
FALLBACK["Scalar copy_from_slice"]
end
subgraph Operations["Accelerated Operations"]
WRITE_OP["Write Operations"]
HASH_OP["xxh3_64 Hashing"]
end
subgraph Characteristics["Performance Characteristics"]
AVX2_SIZE["32-byte chunks AVX2"]
NEON_SIZE["16-byte chunks NEON"]
REMAINDER["Scalar remainder"]
end
RUNTIME --> X86_CHECK
RUNTIME --> ARM_DEFAULT
X86_CHECK -->|Detected| SIMD_COPY_X86
X86_CHECK -->|Not Detected| FALLBACK
ARM_DEFAULT --> SIMD_COPY_ARM
SIMD_COPY_X86 --> AVX2_SIZE
SIMD_COPY_ARM --> NEON_SIZE
AVX2_SIZE --> REMAINDER
NEON_SIZE --> REMAINDER
SIMD_COPY_X86 --> WRITE_OP
SIMD_COPY_ARM --> WRITE_OP
FALLBACK --> WRITE_OP
HASH_OP --> WRITE_OP
The simd_copy function performs runtime feature detection to select the appropriate SIMD implementation. On x86_64, it checks for AVX2 support and processes 32 bytes per iteration. On aarch64, it uses NEON instructions to process 16 bytes per iteration. A scalar fallback handles unsupported architectures and the remainder bytes after vectorized processing.
Sources: src/storage_engine/simd_copy.rs:110-138 src/storage_engine/simd_copy.rs:16-62 src/storage_engine/simd_copy.rs:64-108
SIMD Copy Function Details
The simd_copy function provides the core memory copying optimization used during write operations:
| Architecture | SIMD Extension | Chunk Size | Load Instruction | Store Instruction | Feature Detection |
|---|---|---|---|---|---|
| x86_64 | AVX2 | 32 bytes | _mm256_loadu_si256 | _mm256_storeu_si256 | is_x86_feature_detected!("avx2") |
| aarch64 | NEON | 16 bytes | vld1q_u8 | vst1q_u8 | Always enabled |
| Fallback | None | Variable | N/A | N/A | Other architectures |
The function is defined in src/storage_engine/simd_copy.rs:110-138 with platform-specific implementations in src/storage_engine/simd_copy.rs:33-62 (x86_64) and src/storage_engine/simd_copy.rs:81-108 (aarch64).
Hardware-Accelerated Hashing
The xxh3_64 hashing algorithm used by KeyIndexer leverages SIMD extensions to accelerate key hashing operations. The dependency is configured in Cargo.toml34 with features ["xxh3", "const_xxh3"].
Hash acceleration characteristics:
- SSE2 : Universally supported on x86_64, enabled by default
- AVX2 : Additional performance gains on capable CPUs
- NEON : Default on aarch64 targets, providing ARM SIMD acceleration
Sources: README.md:158-166 Cargo.toml34
Cache-Line Aligned Memory Layout
The storage engine aligns all non-tombstone payloads to 64-byte boundaries, matching typical CPU cache-line sizes. This alignment strategy ensures that SIMD operations operate efficiently without crossing cache-line boundaries.
64-byte Alignment Strategy
graph LR
subgraph Entry1["Entry N"]
PREPAD1["Pre-Padding\n0-63 bytes"]
PAYLOAD1["Payload\nAligned Start"]
META1["Metadata\n20 bytes"]
end
subgraph Entry2["Entry N+1"]
PREPAD2["Pre-Padding\n0-63 bytes"]
PAYLOAD2["Payload\nAligned Start"]
META2["Metadata\n20 bytes"]
end
subgraph Alignment["Alignment Calculation"]
PREV_TAIL["prev_tail_offset"]
CALC["pad = A - prev_tail mod A mod A\nA = PAYLOAD_ALIGNMENT = 64"]
NEXT_START["payload_start mod 64 = 0"]
end
META1 --> PREV_TAIL
PREV_TAIL --> CALC
CALC --> PREPAD2
PREPAD2 --> PAYLOAD2
PAYLOAD2 --> NEXT_START
Each payload is preceded by 0-63 bytes of padding to ensure the payload itself starts on a 64-byte boundary. The padding length is calculated based on the previous entry’s tail offset. This enables efficient SIMD loads/stores and ensures optimal cache-line utilization.
Alignment Formula
The pre-padding calculation ensures proper alignment:
pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)
Where:
PAYLOAD_ALIGNMENT = 64(defined insimd-r-drive-entry-handle/src/constants.rs)prev_tailis the absolute file offset after the previous entry’s metadata- The bitwise AND with
(PAYLOAD_ALIGNMENT - 1)handles the modulo operation efficiently since 64 is a power of 2
Sources: README.md:51-59 README.md:114-124
Alignment Benefits
| Benefit | Description | Impact |
|---|---|---|
| SIMD Efficiency | Vectorized operations don’t cross cache-line boundaries | 2-4x speedup on bulk copies |
| Cache Performance | Single payload typically fits within contiguous cache lines | Reduced cache misses |
| Zero-Copy Casting | Aligned payloads can be safely cast to typed slices (&[u32], &[u64]) | No buffer allocation needed |
| Predictable Performance | Consistent access patterns regardless of payload size | Stable latency characteristics |
The alignment is enforced during write operations and verified during entry access through EntryHandle.
Sources: README.md:51-59
graph TB
subgraph File["Storage File"]
DISK["simd-r-drive.bin"]
end
subgraph Mapping["Memory Mapping"]
MMAP_CREATE["Mmap::map"]
MMAP_INSTANCE["Mmap Instance"]
ARC_MMAP["Arc Mmap\nShared Reference"]
end
subgraph Index["KeyIndexer"]
DASHMAP_STRUCT["DashMap key_hash -> packed_value"]
PACKED["Packed u64\n16-bit tag\n48-bit offset"]
end
subgraph Access["Zero-Copy Access"]
LOOKUP["read key"]
GET_OFFSET["Extract offset from packed_value"]
SLICE["Arc Mmap byte range"]
HANDLE["EntryHandle\nDirect payload reference"]
end
subgraph Threading["Multi-threaded Access"]
CLONE["Arc::clone"]
THREAD1["Thread 1 read"]
THREAD2["Thread 2 read"]
THREAD3["Thread N read"]
end
DISK --> MMAP_CREATE
MMAP_CREATE --> MMAP_INSTANCE
MMAP_INSTANCE --> ARC_MMAP
ARC_MMAP --> DASHMAP_STRUCT
DASHMAP_STRUCT --> PACKED
LOOKUP --> DASHMAP_STRUCT
DASHMAP_STRUCT --> GET_OFFSET
GET_OFFSET --> SLICE
ARC_MMAP --> SLICE
SLICE --> HANDLE
ARC_MMAP --> CLONE
CLONE --> THREAD1
CLONE --> THREAD2
CLONE --> THREAD3
THREAD1 --> HANDLE
THREAD2 --> HANDLE
THREAD3 --> HANDLE
Zero-Copy Memory Access Patterns
SIMD R Drive achieves zero-copy reads by memory-mapping the entire storage file and providing direct byte slice access to payloads. This eliminates deserialization overhead and reduces memory pressure for large datasets.
Zero-Copy Read Architecture
The storage file is memory-mapped once and shared via Arc<Mmap> across threads. Read operations perform hash lookups in DashMap to get file offsets, then return EntryHandle instances that provide direct views into the mapped memory. Multiple threads can safely read concurrently without copying data.
Sources: README.md:43-49 README.md:173-175
Memory-Mapped File Management
The memmap2 crate provides the memory mapping functionality:
- Configured as a workspace dependency in Cargo.toml102
- Used in the
DataStoreimplementation - Protected by
Mutex<Arc<Mmap>>to prevent unsafe remapping during active reads - Automatically remapped when file grows beyond current mapping size
EntryHandle Zero-Copy Interface
The EntryHandle type provides zero-copy access to stored payloads without allocating intermediate buffers:
| Method | Return Type | Copy Behavior | Use Case |
|---|---|---|---|
payload() | &[u8] | Zero-copy reference | Direct access to full payload |
payload_reader() | impl Read | Buffered reads | Streaming large payloads |
as_arrow_buffer() | arrow::Buffer | Zero-copy view | Apache Arrow integration |
The handle maintains a reference to the memory-mapped region and calculates the payload range based on entry metadata.
Sources: README.md:228-233
graph TB
subgraph Writes["Write Path Synchronized"]
WRITE_LOCK["RwLock File write"]
APPEND["Append entry to file"]
UPDATE_INDEX["DashMap::insert"]
UPDATE_TAIL["AtomicU64::store tail_offset"]
end
subgraph Reads["Read Path Lock-Free"]
READ1["Thread 1 read"]
READ2["Thread 2 read"]
READN["Thread N read"]
LOOKUP1["DashMap::get"]
LOOKUP2["DashMap::get"]
LOOKUPN["DashMap::get"]
MMAP_ACCESS["Arc Mmap shared access"]
end
subgraph Synchronization["Concurrency Control"]
RWLOCK_STRUCT["RwLock File"]
ATOMIC_STRUCT["AtomicU64 tail_offset"]
DASHMAP_STRUCT["DashMap Index"]
ARC_STRUCT["Arc Mmap"]
end
WRITE_LOCK --> APPEND
APPEND --> UPDATE_INDEX
UPDATE_INDEX --> UPDATE_TAIL
READ1 --> LOOKUP1
READ2 --> LOOKUP2
READN --> LOOKUPN
LOOKUP1 --> MMAP_ACCESS
LOOKUP2 --> MMAP_ACCESS
LOOKUPN --> MMAP_ACCESS
RWLOCK_STRUCT -.controls.-> WRITE_LOCK
ATOMIC_STRUCT -.updated by.-> UPDATE_TAIL
DASHMAP_STRUCT -.provides.-> LOOKUP1
DASHMAP_STRUCT -.provides.-> LOOKUP2
DASHMAP_STRUCT -.provides.-> LOOKUPN
ARC_STRUCT -.enables.-> MMAP_ACCESS
Lock-Free Concurrent Read Operations
The storage engine enables multiple threads to perform concurrent reads without acquiring locks, using DashMap for the in-memory index and atomic operations for metadata tracking.
Lock-Free Read Architecture
Write operations acquire an RwLock to ensure sequential appends, but read operations access the DashMap index without locking. The DashMap data structure provides lock-free reads through internal sharding and fine-grained locking. The memory-mapped file is shared via Arc<Mmap>, allowing concurrent zero-copy access.
Sources: README.md:172-183 Cargo.toml27
DashMap Index Characteristics
The DashMap dependency is configured in Cargo.toml27 and provides these characteristics:
- Lock-free reads : Read operations don’t block each other
- Sharded locking : Write operations only lock specific shards
- Concurrent inserts : Multiple threads can update different shards simultaneously
- Memory overhead : Approximately 64 bytes per entry for hash table overhead
Atomic Operations
The AtomicU64 for tail_offset tracking provides:
- Ordering guarantees :
SeqCstordering ensures consistency across threads - Lock-free updates : Writes update the tail without blocking reads
- Single-word operations : 64-bit atomic operations are efficient on modern CPUs
Sources: README.md:182-183
Operation Mode Performance Characteristics
SIMD R Drive provides multiple operation modes optimized for different workload patterns. Each mode has specific performance characteristics and resource usage profiles.
Write Operation Modes
| Mode | Method | Lock Duration | I/O Pattern | Flush Behavior | Best For |
|---|---|---|---|---|---|
| Single | write() | Per-entry | Sequential | Immediate | Low-latency single writes |
| Batch | batch_write() | Per-batch | Sequential | After batch | High-throughput bulk writes |
| Stream | write_stream() | Per-entry | Sequential | Immediate | Large entries (>1MB) |
Single Write (README.md:213-215):
- Acquires
RwLockfor each entry - Flushes to disk immediately after write
- Suitable for applications requiring durability guarantees per operation
Batch Write (README.md:217-219):
- Acquires
RwLockonce for entire batch - Flushes to disk after all entries written
- Reduces syscall overhead for bulk operations
- Can write thousands of entries in single lock acquisition
Stream Write (README.md:221-223):
- Accepts
impl Readsource for payload data - Copies data in chunks to avoid full in-memory allocation
- Suitable for writing multi-megabyte or gigabyte-sized entries
Sources: README.md:208-223
Read Operation Modes
| Mode | Method | Memory Behavior | Parallelism | Best For |
|---|---|---|---|---|
| Direct | read() | Zero-copy | Single-threaded | Small to medium entries |
| Stream | payload_reader() | Buffered | Single-threaded | Large entries (>10MB) |
| Parallel | par_iter_entries() | Zero-copy | Multi-threaded | Bulk processing entire dataset |
Direct Read (README.md:228-233):
- Returns
EntryHandlewith direct memory-mapped payload reference - Zero allocation for payload access
- Full entry must fit in virtual address space
- Fastest for entries under 10MB
Stream Read (README.md:234-241):
- Reads payload incrementally through
Readtrait - Uses 8KB buffer internally
- Avoids memory pressure for large entries
- Non-zero-copy but memory-efficient
Parallel Iteration (README.md:242-247):
- Requires
parallelfeature flag in Cargo.toml52 - Uses Rayon for multi-threaded iteration
- Processes all valid entries across CPU cores
- Ideal for building in-memory caches or analytics workloads
Sources: README.md:224-247 Cargo.toml30 Cargo.toml52
graph TB
subgraph Benchmark_Suite["Benchmark Suite"]
STORAGE_BENCH["storage_benchmark"]
CONTENTION_BENCH["contention_benchmark"]
end
subgraph Storage_Tests["Storage Benchmark Tests"]
WRITE_SINGLE["Single Write Throughput"]
WRITE_BATCH["Batch Write Throughput"]
READ_SEQ["Sequential Read Throughput"]
READ_RAND["Random Read Throughput"]
end
subgraph Contention_Tests["Contention Benchmark Tests"]
MULTI_THREAD["Multi-threaded Read Contention"]
PARALLEL_ITER["Parallel Iteration Performance"]
end
subgraph Criterion["Criterion.rs Framework"]
STATISTICAL["Statistical Analysis"]
COMPARISON["Baseline Comparison"]
PLOTS["Performance Plots"]
REPORTS["HTML Reports"]
end
STORAGE_BENCH --> WRITE_SINGLE
STORAGE_BENCH --> WRITE_BATCH
STORAGE_BENCH --> READ_SEQ
STORAGE_BENCH --> READ_RAND
CONTENTION_BENCH --> MULTI_THREAD
CONTENTION_BENCH --> PARALLEL_ITER
WRITE_SINGLE --> STATISTICAL
WRITE_BATCH --> STATISTICAL
READ_SEQ --> STATISTICAL
READ_RAND --> STATISTICAL
MULTI_THREAD --> STATISTICAL
PARALLEL_ITER --> STATISTICAL
STATISTICAL --> COMPARISON
COMPARISON --> PLOTS
PLOTS --> REPORTS
Benchmarking Infrastructure
SIMD R Drive uses Criterion.rs for statistical benchmarking of performance-critical operations. The benchmark suite validates the effectiveness of SIMD optimizations and concurrent access patterns.
Benchmarking Architecture
The benchmark suite consists of two main harnesses: storage_benchmark measures fundamental read/write throughput, while contention_benchmark measures concurrent access performance. Criterion.rs provides statistical analysis, baseline comparisons, and HTML reports for each benchmark run.
Sources: Cargo.toml:57-63 Cargo.toml98
Benchmark Configuration
The benchmark harnesses are defined in Cargo.toml:57-63:
The harness = false setting disables Rust’s default benchmark harness, allowing Criterion.rs to provide its own test runner with statistical analysis capabilities.
Criterion.rs Integration
The Criterion.rs framework is configured as a development dependency in Cargo.toml39 and provides:
- Statistical rigor : Multiple iterations with outlier detection
- Baseline comparison : Compare performance across code changes
- Regression detection : Automatically detect performance regressions
- Visualization : Generate performance plots and HTML reports
- Reproducibility : Consistent measurement methodology across environments
Benchmarks are executed with:
The results are stored in target/criterion/ for historical comparison.
Sources: Cargo.toml39 Cargo.toml:57-63
Performance Feature Summary
The following table summarizes the key performance features and their implementation locations:
| Feature | Implementation | Benefit | Configuration |
|---|---|---|---|
| AVX2 SIMD | simd_copy_x86 in src/storage_engine/simd_copy.rs:33-62 | 32-byte vectorized copies | Runtime feature detection |
| NEON SIMD | simd_copy_arm in src/storage_engine/simd_copy.rs:81-108 | 16-byte vectorized copies | Always enabled on aarch64 |
| 64-byte Alignment | PAYLOAD_ALIGNMENT constant | Cache-line efficiency | Build-time constant |
| Zero-Copy Reads | memmap2::Mmap | No deserialization overhead | Always enabled |
| Lock-Free Reads | DashMap in Cargo.toml27 | Concurrent read scaling | Always enabled |
| Parallel Iteration | Rayon in Cargo.toml30 | Multi-core bulk processing | parallel feature flag |
| Hardware Hashing | xxhash-rust in Cargo.toml34 | SIMD-accelerated indexing | Always enabled |
For detailed information on each feature, see the corresponding child sections 5.1, 5.2, 5.3, and 5.4.
Sources: README.md:5-7 README.md:249-256 Cargo.toml27 Cargo.toml30 Cargo.toml34
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
SIMD Acceleration
Loading…
SIMD Acceleration
Relevant source files
Purpose and Scope
This document describes the SIMD (Single Instruction, Multiple Data) acceleration layer used in the SIMD R Drive storage engine. SIMD acceleration provides vectorized memory copy operations that process multiple bytes simultaneously, improving throughput for data write operations. The implementation supports AVX2 instructions on x86_64 architectures and NEON instructions on ARM AArch64 architectures.
For information about payload alignment considerations that complement SIMD operations, see Payload Alignment and Cache Efficiency. For details on how SIMD operations are measured, see Benchmarking.
Architecture Support Matrix
The SIMD acceleration layer provides platform-specific implementations based on available hardware features:
| Architecture | SIMD Technology | Vector Width | Bytes per Operation | Runtime Detection |
|---|---|---|---|---|
| x86_64 | AVX2 | 256-bit | 32 bytes | Yes (is_x86_feature_detected!) |
| aarch64 (ARM) | NEON | 128-bit | 16 bytes | No (always enabled) |
| Other | Scalar fallback | N/A | 1 byte | N/A |
Sources: src/storage_engine/simd_copy.rs:10-138
SIMD Copy Architecture
The simd_copy function serves as the unified entry point for SIMD-accelerated memory operations, dispatching to architecture-specific implementations based on compile-time and runtime feature detection.
SIMD Copy Dispatch Flow
graph TB
Entry["simd_copy(dst, src)"]
Check_x86["#[cfg(target_arch = 'x86_64')]\nCompile-time check"]
Check_arm["#[cfg(target_arch = 'aarch64')]\nCompile-time check"]
Detect_AVX2["is_x86_feature_detected!('avx2')\nRuntime detection"]
AVX2_Impl["simd_copy_x86(dst, src)\n32-byte chunks\n_mm256_loadu_si256\n_mm256_storeu_si256"]
NEON_Impl["simd_copy_arm(dst, src)\n16-byte chunks\nvld1q_u8\nvst1q_u8"]
Scalar_Fallback["copy_from_slice\nStandard Rust memcpy"]
Warning["LOG_ONCE.call_once\nWarn: AVX2 not detected"]
Entry --> Check_x86
Entry --> Check_arm
Check_x86 --> Detect_AVX2
Detect_AVX2 -->|true| AVX2_Impl
Detect_AVX2 -->|false| Warning
Warning --> Scalar_Fallback
Check_arm --> NEON_Impl
Entry --> Scalar_Fallback
style Entry fill:#f9f9f9,stroke:#333,stroke-width:2px
style AVX2_Impl fill:#f0f0f0
style NEON_Impl fill:#f0f0f0
style Scalar_Fallback fill:#f0f0f0
Sources: src/storage_engine/simd_copy.rs:110-138
x86_64 AVX2 Implementation
The simd_copy_x86 function leverages AVX2 instructions for vectorized memory operations on x86_64 processors.
Function Signature and Safety
src/storage_engine/simd_copy.rs:32-35 defines the function with the #[target_feature(enable = "avx2")] attribute, which enables AVX2 code generation and marks the function as unsafe:
Chunked Copy Strategy
The implementation processes data in 32-byte chunks corresponding to the 256-bit AVX2 register width:
| Step | Operation | Intrinsic | Description |
|---|---|---|---|
| 1. Calculate chunks | len / 32 | N/A | Determines number of full 32-byte iterations |
| 2. Load from source | _mm256_loadu_si256 | src/storage_engine/simd_copy.rs47 | Unaligned load of 256 bits |
| 3. Store to destination | _mm256_storeu_si256 | src/storage_engine/simd_copy.rs55 | Unaligned store of 256 bits |
| 4. Handle remainder | copy_from_slice | src/storage_engine/simd_copy.rs61 | Scalar copy for remaining bytes |
Memory Safety Guarantees
The implementation includes detailed safety comments (src/storage_engine/simd_copy.rs:42-56) documenting:
- Buffer bounds validation (
lencalculated as minimum ofdst.len()andsrc.len()) - Pointer arithmetic guarantees (
ibounded bychunks * 32 <= len) - Alignment handling via unaligned load/store instructions
Sources: src/storage_engine/simd_copy.rs:32-62
ARM NEON Implementation
The simd_copy_arm function provides vectorized operations for ARM AArch64 processors using the NEON instruction set.
Function Signature
src/storage_engine/simd_copy.rs:80-83 defines the ARM-specific implementation:
NEON Operation Pattern
NEON 16-byte Copy Cycle
The implementation (src/storage_engine/simd_copy.rs:83-108):
- Chunk Calculation : Divides length by 16 (NEON register width)
- Load Operation : Uses
vld1q_u8to read 16 bytes into a NEON register (src/storage_engine/simd_copy.rs94) - Store Operation : Uses
vst1q_u8to write 16 bytes from register to destination (src/storage_engine/simd_copy.rs101) - Remainder Handling : Scalar copy for any bytes not fitting in 16-byte chunks (src/storage_engine/simd_copy.rs107)
Sources: src/storage_engine/simd_copy.rs:80-108
Runtime Feature Detection
x86_64 Detection Mechanism
The x86_64 implementation uses Rust’s standard library feature detection:
x86_64 AVX2 Runtime Detection Flow
src/storage_engine/simd_copy.rs:114-124 implements the detection with logging:
- The
std::is_x86_feature_detected!("avx2")macro performs runtime CPUID checks - The
LOG_ONCEstatic variable (src/storage_engine/simd_copy.rs8) ensures the warning is emitted only once - Fallback to scalar copy occurs transparently when AVX2 is unavailable
ARM Detection Strategy
ARM AArch64 does not provide standard runtime feature detection. The implementation assumes NEON availability on all AArch64 targets (src/storage_engine/simd_copy.rs:127-133), which is guaranteed by the ARMv8 architecture specification.
Sources: src/storage_engine/simd_copy.rs:4-8 src/storage_engine/simd_copy.rs:110-138
graph TD
Start["simd_copy invoked"]
Layer1["Layer 1: Platform-specific SIMD\nAVX2 or NEON if available"]
Layer2["Layer 2: Runtime detection failure\nAVX2 not detected on x86_64"]
Layer3["Layer 3: Unsupported architecture\nNeither x86_64 nor aarch64"]
Scalar["copy_from_slice\nStandard Rust memcpy\nCompiler-optimized"]
Start --> Layer1
Layer1 -->|No SIMD available| Layer2
Layer2 -->|No runtime support| Layer3
Layer3 --> Scalar
style Scalar fill:#f0f0f0
Fallback Behavior
The system provides three fallback layers for environments without SIMD support:
Fallback Hierarchy
Fallback Decision Tree
Scalar Copy Implementation
src/storage_engine/simd_copy.rs:136-137 implements the final fallback:
This uses Rust’s standard library copy_from_slice, which:
- Relies on LLVM’s optimized
memcpyimplementation - May use SIMD instructions if the compiler determines it’s beneficial
- Provides a safe, portable baseline for all platforms
Sources: src/storage_engine/simd_copy.rs:136-137
graph TB
subgraph "DataStore Write Path"
Write["write(key, value)"]
Align["Calculate 64-byte alignment padding"]
Allocate["Allocate file space"]
Copy["simd_copy(dst, src)"]
Metadata["Write metadata\n(hash, prev_offset, crc32)"]
end
subgraph "simd_copy Function"
Dispatch["Platform dispatch"]
AVX2["AVX2: 32-byte chunks"]
NEON["NEON: 16-byte chunks"]
Scalar["Scalar fallback"]
end
subgraph "Storage File"
MMap["Memory-mapped region"]
Payload["64-byte aligned payload"]
end
Write --> Align
Align --> Allocate
Allocate --> Copy
Copy --> Dispatch
Dispatch --> AVX2
Dispatch --> NEON
Dispatch --> Scalar
AVX2 --> MMap
NEON --> MMap
Scalar --> MMap
MMap --> Payload
Copy --> Metadata
style Copy fill:#f9f9f9,stroke:#333,stroke-width:2px
style Dispatch fill:#f0f0f0
Integration with Storage Engine
The simd_copy function is invoked during write operations to efficiently copy user data into the memory-mapped storage file.
Usage Context
SIMD Integration in Write Path
The storage engine’s write operations leverage SIMD acceleration when copying payload data into the memory-mapped file. The 64-byte payload alignment (see Payload Alignment and Cache Efficiency) ensures that SIMD operations work with naturally aligned memory boundaries, maximizing cache efficiency.
Performance Impact
SIMD acceleration provides measurable benefits:
- AVX2 (x86_64) : Processes 32 bytes per instruction vs. scalar’s 8 bytes (or less)
- NEON (ARM) : Processes 16 bytes per instruction vs. scalar’s 8 bytes (or less)
- Cache Efficiency : Larger transfer granularity reduces memory access overhead
- Write Throughput : Directly improves
write,batch_write, andwrite_streamperformance
The actual performance gains are measured using the Criterion.rs benchmark suite (see Benchmarking).
Sources: src/storage_engine/simd_copy.rs:1-139 Cargo.toml8
Dependencies and Compiler Support
Architecture-Specific Intrinsics
The implementation imports platform-specific SIMD intrinsics:
| Architecture | Import Statement | Intrinsics Used |
|---|---|---|
| x86_64 | use std::arch::x86_64::*; (src/storage_engine/simd_copy.rs11) | __m256i, _mm256_loadu_si256, _mm256_storeu_si256 |
| aarch64 | use std::arch::aarch64::*; (src/storage_engine/simd_copy.rs14) | vld1q_u8, vst1q_u8 |
Build Configuration
The SIMD implementation requires no special feature flags in Cargo.toml:1-113 The code uses:
- Compile-time conditional compilation (
#[cfg(target_arch = "...")]) - Runtime feature detection (x86_64 only)
- Standard Rust toolchain support (no nightly features required)
The #[inline] attribute on all SIMD functions encourages the compiler to inline these hot-path operations, reducing function call overhead.
Sources: src/storage_engine/simd_copy.rs:10-14 src/storage_engine/simd_copy.rs:32-35 src/storage_engine/simd_copy.rs:80-83
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Payload Alignment and Cache Efficiency
Loading…
Payload Alignment and Cache Efficiency
Relevant source files
- .github/workflows/rust-lint.yml
- CHANGELOG.md
- README.md
- simd-r-drive-entry-handle/src/debug_assert_aligned.rs
- tests/alignment_tests.rs
Purpose and Scope
This document explains the payload alignment strategy used by SIMD R Drive to optimize cache efficiency and enable zero-copy SIMD operations. It covers the PAYLOAD_ALIGNMENT constant, the pre-padding mechanism that ensures alignment, cache line optimization, and the testing infrastructure that validates alignment invariants.
For information about SIMD-accelerated operations themselves (vectorized copying and hashing), see SIMD Acceleration. For details on zero-copy memory access patterns, see Memory Management and Zero-Copy Access.
Overview
SIMD R Drive aligns all non-tombstone payloads to a fixed boundary defined by PAYLOAD_ALIGNMENT, currently set to 64 bytes. This alignment ensures that:
- Payloads begin on CPU cache line boundaries (typically 64 bytes)
- SIMD vector loads (SSE, AVX, AVX-512, NEON) can operate without crossing alignment boundaries
- Zero-copy typed views (
&[u32],&[u64],&[u128]) can be safely cast without additional copying
Sources: README.md:51-59 README.md:110-137
The PAYLOAD_ALIGNMENT Constant
Definition and Configuration
The alignment is controlled by two constants in the entry handle package:
| Constant | Value | Purpose |
|---|---|---|
PAYLOAD_ALIGN_LOG2 | 6 | Log₂ of alignment (2⁶ = 64) |
PAYLOAD_ALIGNMENT | 64 | Actual alignment in bytes |
The PAYLOAD_ALIGNMENT value is calculated as 1 << PAYLOAD_ALIGN_LOG2, ensuring it is always a power of two. This constant determines where each payload begins in the storage file.
Diagram: PAYLOAD_ALIGNMENT Configuration and Storage Layout
graph LR
subgraph "Configuration"
LOG2["PAYLOAD_ALIGN_LOG2\n(constant = 6)"]
ALIGN["PAYLOAD_ALIGNMENT\n(1 << 6 = 64)"]
end
subgraph "Storage File"
ENTRY1["Entry 1\n@ offset 0"]
PAD1["Pre-Pad\n(0-63 bytes)"]
PAYLOAD1["Payload 1\n@ 64-byte boundary"]
META1["Metadata\n(20 bytes)"]
PAD2["Pre-Pad"]
PAYLOAD2["Payload 2\n@ next 64-byte boundary"]
end
LOG2 --> ALIGN
ALIGN -.determines.-> PAD1
ALIGN -.determines.-> PAD2
ENTRY1 --> PAD1
PAD1 --> PAYLOAD1
PAYLOAD1 --> META1
META1 --> PAD2
PAD2 --> PAYLOAD2
The alignment constant can be modified by changing PAYLOAD_ALIGN_LOG2 in the constants file and rebuilding all components. However, this creates incompatibility with files written using different alignment values.
Sources: README.md59 CHANGELOG.md:64-67 simd-r-drive-entry-handle/src/debug_assert_aligned.rs:69-70
Cache Line Optimization
CPU Cache Architecture
Modern CPU cache lines are typically 64 bytes wide. When data is loaded from memory, the CPU fetches entire cache lines at once. Aligning payloads to 64-byte boundaries ensures:
- No cache line splits : Each payload begins at a cache line boundary, preventing a single logical read from spanning two cache lines
- Predictable cache behavior : Sequential reads traverse cache lines in order without fragmentation
- Reduced memory bandwidth : The CPU can prefetch entire cache lines efficiently
Alignment Benefit Matrix
| Scenario | 16-byte Alignment | 64-byte Alignment |
|---|---|---|
| Cache line splits per payload | Likely (3.75x less aligned) | Never (boundary-aligned) |
| SIMD load efficiency | Good for SSE | Optimal for AVX/AVX-512 |
| Prefetcher effectiveness | Moderate | High |
| Memory bandwidth utilization | ~85-90% | ~95-98% |
Sources: README.md53 CHANGELOG.md:27-30
SIMD Compatibility
Vector Instruction Requirements
Different SIMD instruction sets have varying alignment requirements:
| SIMD Extension | Vector Size | Typical Alignment | Supported By 64-byte Alignment |
|---|---|---|---|
| SSE2 | 128 bits (16 bytes) | 16-byte | ✅ Yes |
| AVX2 | 256 bits (32 bytes) | 32-byte | ✅ Yes |
| AVX-512 | 512 bits (64 bytes) | 64-byte | ✅ Yes |
| NEON (ARM) | 128 bits (16 bytes) | 16-byte | ✅ Yes |
| SVE (ARM) | Variable (128-2048 bits) | 16-byte minimum | ✅ Yes |
SIMD Load Operations
The alignment tests demonstrate safe SIMD operations using aligned loads:
Diagram: SIMD 64-byte Lane Loading
The test implementation at tests/alignment_tests.rs:69-95 demonstrates x86_64 SIMD loads using _mm_load_si128, while tests/alignment_tests.rs:97-122 shows aarch64 using vld1q_u8. Both safely load four 16-byte lanes from a 64-byte aligned payload.
Sources: tests/alignment_tests.rs:69-122 README.md:53-54
Pre-Padding Mechanism
Padding Calculation
To ensure each payload starts at a 64-byte boundary, the system inserts zero-filled pre-padding bytes before the payload. The padding length is calculated as:
pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)
Where:
prev_tailis the absolute file offset immediately after the previous entry’s metadata- The bitwise AND with
(PAYLOAD_ALIGNMENT - 1)ensures the result is in range[0, PAYLOAD_ALIGNMENT - 1]
graph TB
subgraph "Entry N-1"
PREV_PAYLOAD["Payload\n(variable length)"]
PREV_META["Metadata\n(20 bytes)"]
end
subgraph "Entry N Structure"
PREPAD["Pre-Pad\n(0-63 zero bytes)"]
PAYLOAD["Payload\n(starts at 64-byte boundary)"]
KEYHASH["key_hash\n(8 bytes)"]
PREVOFF["prev_offset\n(8 bytes)"]
CRC["crc32c\n(4 bytes)"]
end
subgraph "Alignment Validation"
CHECK["payload_start %\nPAYLOAD_ALIGNMENT == 0"]
end
PREV_PAYLOAD --> PREV_META
PREV_META --> PREPAD
PREPAD --> PAYLOAD
PAYLOAD --> KEYHASH
KEYHASH --> PREVOFF
PREVOFF --> CRC
PAYLOAD -.verified by.-> CHECK
Entry Structure with Pre-Padding
Diagram: Entry Structure with Pre-Padding
The prev_offset field stores the absolute file offset of the previous entry’s tail (end of metadata), allowing readers to calculate the pre-padding length by examining where the previous entry ended.
Sources: README.md:112-137 README.md:133-137
Alignment Evolution: From 16 to 64 Bytes
Version History
The payload alignment was increased in version 0.15.0-alpha:
| Version | Alignment | Rationale |
|---|---|---|
| ≤ 0.13.x-alpha | Variable (no alignment) | Minimal storage overhead |
| 0.14.0-alpha | 16 bytes | SSE compatibility, basic alignment |
| 0.15.0-alpha | 64 bytes | Cache line + AVX-512 optimization |
graph TB
subgraph "Pre-0.15 (16-byte)"
OLD_WRITE["Writer\n(16-byte align)"]
OLD_FILE["Storage File\n(16-byte boundaries)"]
OLD_READ["Reader\n(expects 16-byte)"]
end
subgraph "Post-0.15 (64-byte)"
NEW_WRITE["Writer\n(64-byte align)"]
NEW_FILE["Storage File\n(64-byte boundaries)"]
NEW_READ["Reader\n(expects 64-byte)"]
end
subgraph "Incompatibility"
MISMATCH["Old reader\n+ New file\n= Parse Error"]
MISMATCH2["New reader\n+ Old file\n= Parse Error"]
end
OLD_WRITE --> OLD_FILE
OLD_FILE --> OLD_READ
NEW_WRITE --> NEW_FILE
NEW_FILE --> NEW_READ
OLD_READ -.cannot read.-> NEW_FILE
NEW_READ -.cannot read.-> OLD_FILE
NEW_FILE --> MISMATCH
OLD_FILE --> MISMATCH2
Breaking Change Impact
The alignment change in 0.15.0-alpha is a breaking change that affects file compatibility:
Diagram: Alignment Version Incompatibility
Migration Strategy
The changelog specifies a migration path at CHANGELOG.md:43-51:
- Read all entries using the old binary (with old alignment)
- Write entries into a fresh store using the new binary (with 64-byte alignment)
- Replace the old file after verification
- In multi-service environments, upgrade readers before writers to prevent parse errors
Sources: CHANGELOG.md:19-51 CHANGELOG.md:55-81
Alignment Testing and Validation
Debug-Only Assertions
The system includes two debug-only alignment validation functions that compile to no-ops in release builds:
Pointer Alignment Assertion
debug_assert_aligned(ptr: *const u8, align: usize) validates that a pointer is aligned to the specified boundary. Implementation at simd-r-drive-entry-handle/src/debug_assert_aligned.rs:26-43
Behavior:
- Debug/test builds : Uses
debug_assert!to verify(ptr as usize & (align - 1)) == 0 - Release/bench builds : No-op with zero runtime cost
Offset Alignment Assertion
debug_assert_aligned_offset(off: u64) validates that a file offset is aligned to PAYLOAD_ALIGNMENT. Implementation at simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88
Behavior:
- Debug/test builds : Verifies
off.is_multiple_of(PAYLOAD_ALIGNMENT) - Release/bench builds : No-op with zero runtime cost
Comprehensive Alignment Test Suite
The alignment test at tests/alignment_tests.rs:1-245 validates multiple alignment scenarios:
Diagram: Alignment Test Coverage and Validation Flow
Test Implementation Details
The test verifies:
- Address alignment at tests/alignment_tests.rs:24-32: Confirms payload pointer is multiple of 64
- Type alignment at tests/alignment_tests.rs:35-56: Validates alignment sufficient for
u32,u64,u128 - Bytemuck casting at tests/alignment_tests.rs:59-67: Proves zero-copy typed views work
- SIMD operations at tests/alignment_tests.rs:69-133: Executes actual SIMD loads on aligned data
- Iterator consistency at tests/alignment_tests.rs:236-243: Ensures all iterated entries are aligned
Sources: tests/alignment_tests.rs:1-245 simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-89
Performance Benefits
Zero-Copy Typed Views
The 64-byte alignment enables safe zero-copy reinterpretation of byte slices as typed slices without additional validation or copying:
| Source Type | Target Type | Requirement | Satisfied by 64-byte Alignment |
|---|---|---|---|
&[u8] | &[u16] | 2-byte aligned | ✅ Yes (64 % 2 = 0) |
&[u8] | &[u32] | 4-byte aligned | ✅ Yes (64 % 4 = 0) |
&[u8] | &[u64] | 8-byte aligned | ✅ Yes (64 % 8 = 0) |
&[u8] | &[u128] | 16-byte aligned | ✅ Yes (64 % 16 = 0) |
The README states at README.md:55-56: “When your payload length matches the element size, you can safely reinterpret the bytes as typed slices (e.g., &[u16], &[u32], &[u64], &[u128]) without copying.”
Practical Benefits Summary
From README.md59:
- Cache-friendly zero-copy reads : Payloads align with CPU cache lines
- Predictable SIMD performance : Vector operations never cross alignment boundaries
- Simpler casting : No runtime alignment checks needed for typed views
- Fewer fallback copies : Libraries like
bytemuckcan cast without allocation
Storage Overhead
The pre-padding mechanism adds variable overhead:
- Worst case : 63 bytes of padding per entry (when previous tail is 1 byte before boundary)
- Average case : ~31.5 bytes per entry (uniform distribution assumption)
- Best case : 0 bytes (when previous tail already aligns)
For small payloads, this overhead can be significant. For large payloads (>>64 bytes), the overhead becomes negligible relative to payload size.
Sources: README.md:53-59 README.md110 tests/alignment_tests.rs:215-221
Integration with Arrow Buffers
When the arrow feature is enabled, EntryHandle provides methods to create Apache Arrow buffers that leverage alignment:
as_arrow_buffer(): Creates an Arrow buffer view without copyinginto_arrow_buffer(): Converts into an Arrow buffer with alignment validation
Both methods include debug assertions to verify pointer and offset alignment at simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88 ensuring Arrow’s alignment requirements are met.
Sources: CHANGELOG.md:67-68 README.md59
CI/CD Validation
The GitHub Actions workflow at .github/workflows/rust-lint.yml:1-43 ensures alignment-related code passes:
- Clippy lints : Validates unsafe SIMD code and alignment assertions
- Format checks : Ensures consistent style in alignment-critical code
- Documentation warnings : Catches missing docs for alignment APIs
The test workflow (referenced in the CI setup) runs alignment tests across multiple platforms (x86_64, aarch64) to verify SIMD compatibility on different architectures.
Sources: .github/workflows/rust-lint.yml:1-43
Summary
The 64-byte PAYLOAD_ALIGNMENT is a foundational design choice that:
- Aligns payloads with CPU cache lines for optimal memory access
- Satisfies alignment requirements for SSE, AVX, AVX-512, and NEON SIMD instructions
- Enables safe zero-copy casting to typed slices (
&[u32],&[u64], etc.) - Integrates seamlessly with Apache Arrow’s buffer requirements
The pre-padding mechanism transparently maintains this alignment while preserving the append-only storage model. Comprehensive testing validates alignment across write, delete, and overwrite scenarios, ensuring both correctness and performance optimization.
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Write and Read Modes
Loading…
Write and Read Modes
Relevant source files
- README.md
- src/lib.rs
- src/storage_engine.rs
- src/storage_engine/data_store.rs
- src/storage_engine/entry_iterator.rs
Purpose and Scope
This document describes the different operation modes available in SIMD R Drive for writing and reading data. Each mode is optimized for specific use cases, offering different trade-offs between memory usage, I/O overhead, and concurrency. For information about SIMD acceleration used within these operations, see SIMD Acceleration. For details on payload alignment requirements, see Payload Alignment and Cache Efficiency.
Write Operation Modes
SIMD R Drive provides three distinct write modes, each optimized for different scenarios. All write operations acquire a write lock on the underlying file to ensure consistency.
Single Entry Write
The write() method writes a single key-value pair atomically with immediate disk flushing.
Method Signature: write(&self, key: &[u8], payload: &[u8]) -> Result<u64>
Characteristics:
- Acquires
RwLock<BufWriter<File>>for entire operation - Writes are flushed immediately via
file.flush() - Each write performs file remapping and index update
- Suitable for individual, isolated write operations
Internal Flow:
Sources: src/storage_engine/data_store.rs:827-834 src/storage_engine/data_store.rs:832-834
Batch Entry Write
The batch_write() method writes multiple key-value pairs in a single locked operation, reducing disk I/O overhead.
Method Signature: batch_write(&self, entries: &[(&[u8], &[u8])]) -> Result<u64>
Characteristics:
- Acquires
RwLock<BufWriter<File>>once for entire batch - All entries are buffered in memory before writing
- Single
file.flush()at end of batch - Single remapping and index update operation
- Significantly more efficient for bulk writes
Internal Process:
| Step | Operation | Lock Held |
|---|---|---|
| 1 | Hash all keys with compute_hash_batch() | No |
| 2 | Acquire write lock | Yes |
| 3 | Build in-memory buffer with all entries | Yes |
| 4 | Calculate alignment padding for each entry | Yes |
| 5 | Copy payloads using simd_copy() | Yes |
| 6 | Append all metadata | Yes |
| 7 | Write entire buffer with file.write_all() | Yes |
| 8 | Flush with file.flush() | Yes |
| 9 | Call reindex() once | Yes |
| 10 | Release write lock | No |
Sources: src/storage_engine/data_store.rs:838-843 src/storage_engine/data_store.rs:847-939 README.md:216-218
Streaming Write
The write_stream() method writes large data entries using a streaming Read source without requiring full in-memory allocation.
Method Signature: write_stream<R: Read>(&self, key: &[u8], reader: &mut R) -> Result<u64>
Characteristics:
- Reads data in chunks of
WRITE_STREAM_BUFFER_SIZE(8192 bytes) - Suitable for large files or data streams
- Only one buffer’s worth of data in memory at a time
- Computes CRC32 checksum incrementally
- Single
file.flush()after all chunks written
Streaming Flow:
Sources: src/storage_engine/data_store.rs:753-825 README.md:220-222 src/lib.rs:66-115
Read Operation Modes
SIMD R Drive provides multiple read modes optimized for different access patterns and performance requirements.
Direct Memory Access
The read() method retrieves stored data using zero-copy memory mapping, providing the most efficient access for individual entries.
Method Signature: read(&self, key: &[u8]) -> Result<Option<EntryHandle>>
Characteristics:
- Zero-copy access via
mmap - Returns
EntryHandlewrappingArc<Mmap>and byte range - No data copying - direct pointer into memory-mapped region
- O(1) lookup via
KeyIndexerhash table - Lock-free after index lookup completes
Read Path:
Sources: src/storage_engine/data_store.rs:1040-1049 src/storage_engine/data_store.rs:502-565 README.md:228-232
Batch Read
The batch_read() method efficiently retrieves multiple entries in a single operation, minimizing lock contention.
Method Signature: batch_read(&self, keys: &[&[u8]]) -> Result<Vec<Option<EntryHandle>>>
Characteristics:
- Hashes all keys in batch using
compute_hash_batch() - Acquires index read lock once for entire batch
- Clones
Arc<Mmap>once and reuses for all entries - Returns vector of optional
EntryHandleobjects - More efficient than individual
read()calls
Batch Processing:
| Operation | Complexity | Lock Duration |
|---|---|---|
| Hash all keys | O(n) | No lock |
| Acquire index read lock | O(1) | Begin |
Clone Arc<Mmap> once | O(1) | Held |
| Lookup each hash | O(n) average | Held |
| Verify tags | O(n) | Held |
| Create handles | O(n) | Held |
| Release lock | O(1) | End |
Sources: src/storage_engine/data_store.rs:1105-1109 src/storage_engine/data_store.rs:1111-1158
Streaming Read
The EntryStream wrapper provides incremental reading of large entries, avoiding high memory overhead.
Characteristics:
- Implements
std::io::Readtrait - Reads data in configurable buffer chunks
- Non-zero-copy - data is read through a buffer
- Suitable for processing large entries incrementally
- Useful when full entry doesn’t fit in memory
Usage Pattern:
Sources: README.md:234-240 src/lib.rs:86-92 src/storage_engine.rs:10-11
Parallel Iteration
The par_iter_entries() method provides Rayon-powered parallel iteration over all valid entries.
Method Signature (requiresparallel feature): par_iter_entries(&self) -> impl ParallelIterator<Item = EntryHandle>
Characteristics:
- Only available with
parallelfeature flag - Uses Rayon’s parallel iterator infrastructure
- Acquires index lock briefly to collect offsets
- Releases lock before parallel processing begins
- Each thread receives
Arc<Mmap>clone for safe access - Automatically filters tombstones and duplicates
- Ideal for bulk processing and analytics workloads
Parallel Execution Flow:
Sources: src/storage_engine/data_store.rs:296-361 README.md:242-246
Performance Characteristics Comparison
Write Mode Comparison
| Mode | Lock Duration | Flush Frequency | Memory Usage | Best For |
|---|---|---|---|---|
| Single Write | Per write | Per write | Low (single entry) | Individual updates, low throughput |
| Batch Write | Per batch | Per batch | Medium (all entries buffered) | Bulk imports, high throughput |
| Stream Write | Per stream | Per stream | Low (8KB buffer) | Large files, limited memory |
Read Mode Comparison
| Mode | Copy Behavior | Lock Contention | Memory Overhead | Best For |
|---|---|---|---|---|
| Direct Read | Zero-copy | Low (brief lock) | Minimal (Arc<Mmap>) | Individual lookups, hot path |
| Batch Read | Zero-copy | Very low (single lock) | Minimal (shared Arc<Mmap>) | Multiple lookups at once |
| Stream Read | Buffered copy | Low (brief lock) | Medium (buffer size) | Large entries, incremental processing |
| Parallel Iter | Zero-copy | Very low (brief lock) | Medium (per-thread Arc<Mmap>) | Full scans, analytics, multi-core |
Lock Acquisition Patterns
Sources: src/storage_engine/data_store.rs:753-939 src/storage_engine/data_store.rs:1040-1158 README.md:208-246
Code Entity Mapping
Write Mode Function References
| Mode | Trait Method | Implementation | Key Helper |
|---|---|---|---|
| Single | DataStoreWriter::write() | data_store.rs:827-830 | write_with_key_hash() |
| Batch | DataStoreWriter::batch_write() | data_store.rs:838-843 | batch_write_with_key_hashes() |
| Stream | DataStoreWriter::write_stream() | data_store.rs:753-756 | write_stream_with_key_hash() |
Read Mode Function References
| Mode | Trait Method | Implementation | Key Helper |
|---|---|---|---|
| Direct | DataStoreReader::read() | data_store.rs:1040-1049 | read_entry_with_context() |
| Batch | DataStoreReader::batch_read() | data_store.rs:1105-1109 | batch_read_hashed_keys() |
| Stream | EntryStream::from() | storage_engine.rs:10-11 | N/A |
| Parallel | DataStore::par_iter_entries() | data_store.rs:297-361 | KeyIndexer::unpack() |
Core Types
- DataStore : data_store.rs:27-33 - Main storage engine struct
- EntryHandle : storage_engine.rs24 - Zero-copy entry wrapper
- EntryStream : storage_engine.rs:10-11 - Streaming read adapter
- KeyIndexer : storage_engine.rs:13-14 - Hash index for O(1) lookups
- EntryIterator : entry_iterator.rs:21-25 - Sequential iterator
Sources: src/storage_engine/data_store.rs:1-1183 src/storage_engine.rs:1-25 src/storage_engine/entry_iterator.rs:1-128
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Benchmarking
Loading…
Benchmarking
Relevant source files
- Cargo.toml
- benches/storage_benchmark.rs
- src/main.rs
- src/utils/format_bytes.rs
- tests/concurrency_tests.rs
This document describes the performance benchmark suite for the SIMD R Drive storage engine. The benchmarks measure write throughput, read throughput (sequential, random, and batch), and concurrent access patterns under contention. For general performance optimization features, see Performance Optimizations. For information about SIMD acceleration techniques, see SIMD Acceleration.
Benchmark Suite Overview
The SIMD R Drive project includes two primary benchmark suites that measure different aspects of storage engine performance:
| Benchmark | File | Purpose | Key Metrics |
|---|---|---|---|
| Storage Benchmark | benches/storage_benchmark.rs | Single-process throughput testing | Writes/sec, Reads/sec for sequential, random, and batch operations |
| Contention Benchmark | benches/contention_benchmark.rs | Multi-threaded concurrent access | Performance degradation under write contention |
Both benchmarks are configured in Cargo.toml:57-63 and use Criterion.rs for statistical analysis, enabling detection of performance regressions across code changes.
Sources: Cargo.toml:36-63 benches/storage_benchmark.rs:1-234
graph TB
subgraph "Benchmark Configuration"
CARGO["Cargo.toml"]
CRITERION["criterion = 0.6.0"]
HARNESS["harness = false"]
end
subgraph "Storage Benchmark"
STORAGE_BIN["benches/storage_benchmark.rs"]
WRITE_BENCH["benchmark_append_entries()"]
SEQ_BENCH["benchmark_sequential_reads()"]
RAND_BENCH["benchmark_random_reads()"]
BATCH_BENCH["benchmark_batch_reads()"]
end
subgraph "Contention Benchmark"
CONTENTION_BIN["benches/contention_benchmark.rs"]
CONCURRENT_TESTS["Multi-threaded write tests"]
end
subgraph "Core Operations Measured"
BATCH_WRITE["DataStoreWriter::batch_write()"]
READ["DataStoreReader::read()"]
BATCH_READ["DataStoreReader::batch_read()"]
ITER["into_iter()"]
end
CARGO --> CRITERION
CARGO --> STORAGE_BIN
CARGO --> CONTENTION_BIN
CARGO --> HARNESS
STORAGE_BIN --> WRITE_BENCH
STORAGE_BIN --> SEQ_BENCH
STORAGE_BIN --> RAND_BENCH
STORAGE_BIN --> BATCH_BENCH
WRITE_BENCH --> BATCH_WRITE
SEQ_BENCH --> ITER
RAND_BENCH --> READ
BATCH_BENCH --> BATCH_READ
CONTENTION_BIN --> CONCURRENT_TESTS
CONCURRENT_TESTS --> BATCH_WRITE
Storage Benchmark Architecture
The storage benchmark (storage_benchmark.rs) is a standalone binary that measures single-process throughput across four operation types. It writes 1,000,000 entries and then exercises different read access patterns.
Benchmark Configuration Constants
The benchmark behavior is controlled by tunable constants defined at the top of the file:
| Constant | Value | Purpose |
|---|---|---|
ENTRY_SIZE | 8 bytes | Size of each value payload (stores u64 in little-endian) |
WRITE_BATCH_SIZE | 1,024 entries | Number of entries per batch_write call |
READ_BATCH_SIZE | 1,024 entries | Number of entries per batch_read call |
NUM_ENTRIES | 1,000,000 | Total entries written during setup phase |
NUM_RANDOM_CHECKS | 1,000,000 | Number of random single-key lookups |
NUM_BATCH_CHECKS | 1,000,000 | Total entries verified via batch reads |
Sources: benches/storage_benchmark.rs:16-26
Write Benchmark: benchmark_append_entries()
The write benchmark measures append-only write throughput using batched operations.
graph LR
subgraph "Write Benchmark Flow"
START["Start: Instant::now()"]
LOOP["Loop: 0..NUM_ENTRIES"]
BATCH["Accumulate in batch Vec"]
FLUSH["flush_batch()
every 1,024 entries"]
BATCH_WRITE["storage.batch_write(&refs)"]
CALC["Calculate writes/sec"]
end
START --> LOOP
LOOP --> BATCH
BATCH --> FLUSH
FLUSH --> BATCH_WRITE
LOOP --> CALC
subgraph "Data Generation"
KEY["Key: bench-key-{i}"]
VALUE["Value: i.to_le_bytes()
in 8-byte buffer"]
end
LOOP --> KEY
LOOP --> VALUE
KEY --> BATCH
VALUE --> BATCH
The benchmark creates fixed-width 8-byte payloads containing the loop index as a little-endian u64. Entries are accumulated in a batch and flushed every 1,024 entries via batch_write(). The elapsed time is used to calculate writes per second.
Sources: benches/storage_benchmark.rs:52-83 benches/storage_benchmark.rs:85-92
Sequential Read Benchmark: benchmark_sequential_reads()
The sequential read benchmark measures zero-copy iteration performance by walking the entire storage file from newest to oldest entry.
graph TB
subgraph "Sequential Read Pattern"
ITER_START["storage.into_iter()"]
ENTRY["For each EntryHandle"]
DEREF["Dereference: &*entry"]
PARSE["u64::from_le_bytes()"]
VERIFY["assert_eq!(stored, expected)"]
end
subgraph "Memory Access"
MMAP["Memory-mapped file access"]
ZERO_COPY["Zero-copy EntryHandle"]
BACKWARD_CHAIN["Follow prev_offset backward"]
end
ITER_START --> ENTRY
ENTRY --> DEREF
DEREF --> PARSE
PARSE --> VERIFY
ITER_START --> BACKWARD_CHAIN
BACKWARD_CHAIN --> MMAP
MMAP --> ZERO_COPY
ZERO_COPY --> DEREF
This benchmark uses storage.into_iter() which implements the IntoIterator trait, traversing the backward-linked chain via prev_offset fields. Each entry is accessed via zero-copy memory mapping and validated by parsing the stored u64 value.
Sources: benches/storage_benchmark.rs:98-118
Random Read Benchmark: benchmark_random_reads()
The random read benchmark measures hash index lookup performance by performing 1,000,000 random single-key reads.
graph LR
subgraph "Random Read Flow"
RNG["rng.random_range(0..NUM_ENTRIES)"]
KEY_GEN["Generate key: bench-key-{i}"]
LOOKUP["storage.read(key.as_bytes())"]
UNWRAP["unwrap() -> EntryHandle"]
VALIDATE["Parse u64 and assert_eq!"]
end
subgraph "Index Lookup Path"
XXH3["XXH3 hash of key"]
DASHMAP["DashMap index lookup"]
TAG_CHECK["16-bit tag collision check"]
OFFSET["Extract 48-bit offset"]
MMAP_ACCESS["Memory-mapped access at offset"]
end
RNG --> KEY_GEN
KEY_GEN --> LOOKUP
LOOKUP --> UNWRAP
UNWRAP --> VALIDATE
LOOKUP --> XXH3
XXH3 --> DASHMAP
DASHMAP --> TAG_CHECK
TAG_CHECK --> OFFSET
OFFSET --> MMAP_ACCESS
MMAP_ACCESS --> UNWRAP
Each iteration generates a random index, constructs the corresponding key, and performs a single read() operation. This exercises the O(1) hash index lookup path including XXH3 hashing, DashMap access, tag-based collision detection, and memory-mapped file access.
Sources: benches/storage_benchmark.rs:124-149
Batch Read Benchmark: benchmark_batch_reads()
The batch read benchmark measures vectorized lookup performance by reading 1,024 keys at a time via batch_read().
graph TB
subgraph "Batch Read Flow"
ACCUMULATE["Accumulate 1,024 keys in Vec"]
CONVERT["Convert to Vec<&[u8]>"]
BATCH_CALL["storage.batch_read(&key_refs)"]
RESULTS["Process Vec<Option<EntryHandle>>"]
VERIFY["Verify each entry's payload"]
end
subgraph "Batch Read Implementation"
PARALLEL["Optional: Rayon parallel iterator"]
MULTI_LOOKUP["Multiple hash lookups"]
COLLECT["Collect into result Vec"]
end
subgraph "Verification Logic"
PARSE_KEY["Extract index from key suffix"]
PARSE_VALUE["u64::from_le_bytes(handle)"]
ASSERT["assert_eq!(stored, idx)"]
end
ACCUMULATE --> CONVERT
CONVERT --> BATCH_CALL
BATCH_CALL --> RESULTS
RESULTS --> VERIFY
BATCH_CALL --> PARALLEL
PARALLEL --> MULTI_LOOKUP
MULTI_LOOKUP --> COLLECT
COLLECT --> RESULTS
VERIFY --> PARSE_KEY
VERIFY --> PARSE_VALUE
PARSE_KEY --> ASSERT
PARSE_VALUE --> ASSERT
The benchmark accumulates keys in batches of 1,024 and invokes batch_read() which can optionally use the parallel feature to perform lookups concurrently via Rayon. The verification phase includes a fast numeric suffix parser that extracts the index from the key string without heap allocation.
Sources: benches/storage_benchmark.rs:155-181 benches/storage_benchmark.rs:183-202
graph LR
subgraph "Rate Formatting Logic"
INPUT["Input: f64 rate"]
SPLIT["Split into whole and fractional parts"]
ROUND["Round fractional to 3 decimals"]
CARRY["Handle 1000 rounding carry"]
SEPARATE["Comma-separate thousands"]
FORMAT["Output: 4,741,483.464"]
end
INPUT --> SPLIT
SPLIT --> ROUND
ROUND --> CARRY
CARRY --> SEPARATE
SEPARATE --> FORMAT
Output Formatting
The benchmark produces human-readable output with formatted throughput numbers using the fmt_rate() utility function.
The fmt_rate() function formats rates with comma-separated thousands and exactly three decimal places. It uses the thousands crate’s separate_with_commas() method and handles edge cases where rounding produces 1000 in the fractional part.
Example output:
Wrote 1,000,000 entries of 8 bytes in 0.234s (4,273,504.273 writes/s)
Sequentially read 1,000,000 entries in 0.089s (11,235,955.056 reads/s)
Randomly read 1,000,000 entries in 0.532s (1,879,699.248 reads/s)
Batch-read verified 1,000,000 entries in 0.156s (6,410,256.410 reads/s)
Sources: benches/storage_benchmark.rs:204-233
Contention Benchmark
The contention benchmark (contention_benchmark.rs) measures performance degradation under concurrent write load. While the source file is not included in the provided files, it is referenced in Cargo.toml:61-63 and is designed to complement the concurrency tests shown in tests/concurrency_tests.rs:1-230
Expected Contention Scenarios
Based on the concurrency test patterns, the contention benchmark likely measures:
| Scenario | Description | Measured Metric |
|---|---|---|
| Concurrent Writes | Multiple threads writing different keys simultaneously | Throughput degradation under RwLock contention |
| Write Serialization | Effect of RwLock serializing write operations | Comparison vs. theoretical maximum (single-threaded) |
| Index Contention | DashMap update performance under concurrent load | Lock-free read performance maintained |
| Streaming Writes | Concurrent write_stream() calls with slow readers | I/O bottleneck vs. lock contention |
Concurrency Test Patterns
The tests/concurrency_tests.rs:1-230 file demonstrates three concurrency patterns that inform contention benchmarking:
Sources: tests/concurrency_tests.rs:111-161 tests/concurrency_tests.rs:163-229 tests/concurrency_tests.rs:14-109
graph TB
subgraph "Concurrent Write Test"
THREADS_WRITE["16 threads × 10 writes each"]
RWLOCK_WRITE["RwLock serializes writes"]
ATOMIC_UPDATE["AtomicU64 tail_offset updated"]
DASHMAP_UPDATE["DashMap index updated"]
end
subgraph "Interleaved Read/Write Test"
THREAD_A["Thread A: write → notify → read"]
THREAD_B["Thread B: wait → read → write → notify"]
SYNC["Tokio Notify synchronization"]
end
subgraph "Streaming Write Test"
SLOW_READER["SlowReader with artificial delay"]
CONCURRENT_STREAMS["2 concurrent write_stream()
calls"]
IO_BOUND["Tests I/O vs. lock contention"]
end
THREADS_WRITE --> RWLOCK_WRITE
RWLOCK_WRITE --> ATOMIC_UPDATE
ATOMIC_UPDATE --> DASHMAP_UPDATE
THREAD_A --> SYNC
THREAD_B --> SYNC
SLOW_READER --> CONCURRENT_STREAMS
CONCURRENT_STREAMS --> IO_BOUND
Running Benchmarks
Command-Line Usage
Execute benchmarks using Cargo’s benchmark runner:
Benchmark Execution Flow
The benchmarks use harness = false in Cargo.toml:59-63 meaning they execute as standalone binaries rather than using Criterion’s default test harness. This allows for custom output formatting and fine-grained control over benchmark structure.
Sources: Cargo.toml:57-63 benches/storage_benchmark.rs:32-46
Metrics and Analysis
Performance Indicators
The benchmark suite measures the following key performance indicators:
| Metric | Calculation | Typical Range | Optimization Focus |
|---|---|---|---|
| Write Throughput | NUM_ENTRIES / elapsed_time | 2-10M writes/sec | SIMD copy, batch sizing |
| Sequential Read Throughput | NUM_ENTRIES / elapsed_time | 5-20M reads/sec | Memory mapping, iterator overhead |
| Random Read Throughput | NUM_RANDOM_CHECKS / elapsed_time | 1-5M reads/sec | Hash index lookup, cache efficiency |
| Batch Read Throughput | NUM_BATCH_CHECKS / elapsed_time | 3-12M reads/sec | Parallel lookup, vectorization |
Factors Affecting Performance
Sources: benches/storage_benchmark.rs:16-26 Cargo.toml:49-55
Integration with Development Workflow
Performance Regression Detection
While Criterion.rs is included as a dependency Cargo.toml39 the current benchmark implementations use custom timing via std::time::Instant rather than Criterion’s statistical framework. This provides:
- Immediate feedback : Results printed directly to stdout during execution
- Reproducibility : Fixed workload sizes and patterns for consistent comparison
- Simplicity : No statistical overhead for quick performance checks
Benchmark Data for Tuning
The benchmark results inform optimization decisions:
| Optimization | Benchmark Validation | Expected Impact |
|---|---|---|
| SIMD copy implementation | Write throughput increase | 2-4x improvement on AVX2 systems |
| 64-byte alignment change | All operations improve | 10-30% from cache-line alignment |
parallel feature | Batch read throughput | 2-4x on multi-core systems |
| DashMap vs RwLock | Random read throughput | Eliminates read lock contention |
Sources: benches/storage_benchmark.rs:1-234 Cargo.toml:36-55
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Extensions and Utilities
Loading…
Extensions and Utilities
Relevant source files
- extensions/Cargo.toml
- simd-r-drive-entry-handle/src/constants.rs
- simd-r-drive-entry-handle/src/lib.rs
- src/utils.rs
- src/utils/align_or_copy.rs
- src/utils/verify_file_existence.rs
- tests/align_or_copy_tests.rs
This document covers the utility functions, helper modules, and constants provided by the SIMD R Drive ecosystem. These components include the simd-r-drive-extensions crate for higher-level storage operations, core utility functions in the main simd-r-drive crate, and shared constants from simd-r-drive-entry-handle.
For details on the core storage engine API, see DataStore API. For performance optimization features like SIMD acceleration, see SIMD Acceleration. For alignment-related architecture decisions, see Payload Alignment and Cache Efficiency.
Extensions Crate Overview
The simd-r-drive-extensions crate provides storage extensions and higher-level utilities built on top of the core simd-r-drive storage engine. It adds functionality for common storage patterns and data manipulation tasks.
graph TB
subgraph "simd-r-drive-extensions"
ExtCrate["simd-r-drive-extensions"]
ExtDeps["Dependencies:\n- bincode\n- serde\n- simd-r-drive\n- walkdir"]
end
subgraph "Core Dependencies"
Core["simd-r-drive"]
Bincode["bincode\nBinary Serialization"]
Serde["serde\nSerialization Traits"]
Walkdir["walkdir\nDirectory Traversal"]
end
ExtCrate --> ExtDeps
ExtDeps --> Core
ExtDeps --> Bincode
ExtDeps --> Serde
ExtDeps --> Walkdir
Core -.->|provides| DataStore["DataStore"]
Bincode -.->|enables| SerializationSupport["Structured Data Storage"]
Walkdir -.->|enables| FileSystemOps["File System Operations"]
Crate Structure
Sources: extensions/Cargo.toml:1-22
| Dependency | Purpose |
|---|---|
bincode | Binary serialization/deserialization for structured data storage |
serde | Serialization trait support with derive macros |
simd-r-drive | Core storage engine access |
walkdir | Directory tree traversal utilities |
Sources: extensions/Cargo.toml:13-17
Core Utilities Module
The main simd-r-drive crate exposes several utility functions through its utils module. These functions handle common tasks like alignment optimization, string formatting, and data validation.
graph TB
subgraph "utils Module"
UtilsRoot["src/utils.rs"]
AlignOrCopy["align_or_copy\nZero-Copy Optimization"]
AppendExt["append_extension\nString Path Handling"]
FormatBytes["format_bytes\nHuman-Readable Sizes"]
NamespaceHasher["NamespaceHasher\nHierarchical Keys"]
ParseBuffer["parse_buffer_size\nSize String Parsing"]
VerifyFile["verify_file_existence\nFile Validation"]
end
UtilsRoot --> AlignOrCopy
UtilsRoot --> AppendExt
UtilsRoot --> FormatBytes
UtilsRoot --> NamespaceHasher
UtilsRoot --> ParseBuffer
UtilsRoot --> VerifyFile
AlignOrCopy -.->|used by| ReadOps["Read Operations"]
NamespaceHasher -.->|used by| KeyManagement["Key Management"]
FormatBytes -.->|used by| Logging["Logging & Reporting"]
ParseBuffer -.->|used by| Config["Configuration Parsing"]
Utility Functions Overview
Sources: src/utils.rs:1-17
align_or_copy Function
The align_or_copy utility function provides zero-copy deserialization with automatic fallback for misaligned data. It attempts to reinterpret a byte slice as a typed slice without copying, and falls back to manual decoding when alignment requirements are not met.
Function Signature
Sources: src/utils/align_or_copy.rs:44-50
Operation Flow
Sources: src/utils/align_or_copy.rs:44-73
Usage Patterns
| Scenario | Outcome | Performance |
|---|---|---|
| Aligned 64-byte boundary, exact multiple | Cow::Borrowed | Zero-copy, optimal |
| Misaligned address | Cow::Owned | Allocation + decode |
| Non-multiple of element size | Panic | Invalid input |
Example Usage:
Sources: src/utils/align_or_copy.rs:38-43 tests/align_or_copy_tests.rs:7-12
Safety Considerations
The function uses unsafe for the align_to::<T>() call, which requires:
- Starting address must be aligned to
align_of::<T>() - Total size must be a multiple of
size_of::<T>()
These requirements are validated by checking that prefix and suffix slices are empty before returning the borrowed slice. If validation fails, the function falls back to safe manual decoding.
Sources: src/utils/align_or_copy.rs:28-35 src/utils/align_or_copy.rs:53-60
Other Utility Functions
| Function | Module Path | Purpose |
|---|---|---|
append_extension | src/utils/append_extension.rs | Safely appends file extensions to paths |
format_bytes | src/utils/format_bytes.rs | Formats byte counts as human-readable strings (KB, MB, GB) |
NamespaceHasher | src/utils/namespace_hasher.rs | Generates hierarchical, namespaced hash keys |
parse_buffer_size | src/utils/parse_buffer_size.rs | Parses size strings like “64KB”, “1MB” into byte counts |
verify_file_existence | src/utils/verify_file_existence.rs | Validates file paths before operations |
Sources: src/utils.rs:1-17
Entry Handle Constants
The simd-r-drive-entry-handle crate defines shared constants used throughout the storage system. These constants establish the binary layout of entries and alignment requirements.
graph TB
subgraph "simd-r-drive-entry-handle"
LibRoot["lib.rs"]
ConstMod["constants.rs"]
EntryHandle["entry_handle.rs"]
EntryMetadata["entry_metadata.rs"]
DebugAssert["debug_assert_aligned.rs"]
end
subgraph "Exported Constants"
MetadataSize["METADATA_SIZE = 20"]
KeyHashRange["KEY_HASH_RANGE = 0..8"]
PrevOffsetRange["PREV_OFFSET_RANGE = 8..16"]
ChecksumRange["CHECKSUM_RANGE = 16..20"]
ChecksumLen["CHECKSUM_LEN = 4"]
PayloadLog["PAYLOAD_ALIGN_LOG2 = 6"]
PayloadAlign["PAYLOAD_ALIGNMENT = 64"]
end
LibRoot --> ConstMod
LibRoot --> EntryHandle
LibRoot --> EntryMetadata
LibRoot --> DebugAssert
ConstMod --> MetadataSize
ConstMod --> KeyHashRange
ConstMod --> PrevOffsetRange
ConstMod --> ChecksumRange
ConstMod --> ChecksumLen
ConstMod --> PayloadLog
ConstMod --> PayloadAlign
PayloadAlign -.->|ensures| CacheLineOpt["Cache-Line Optimization"]
PayloadAlign -.->|enables| SIMDOps["SIMD Operations"]
Constants Module Structure
Sources: simd-r-drive-entry-handle/src/lib.rs:1-10 simd-r-drive-entry-handle/src/constants.rs:1-19
Metadata Layout Constants
The following constants define the fixed 20-byte metadata structure at the end of each entry:
| Constant | Value | Description |
|---|---|---|
METADATA_SIZE | 20 | Total size of entry metadata in bytes |
KEY_HASH_RANGE | 0..8 | Byte range for 64-bit XXH3 key hash |
PREV_OFFSET_RANGE | 8..16 | Byte range for 64-bit previous entry offset |
CHECKSUM_RANGE | 16..20 | Byte range for 32-bit CRC32C checksum |
CHECKSUM_LEN | 4 | Explicit length of checksum field |
Sources: simd-r-drive-entry-handle/src/constants.rs:3-11
Alignment Constants
These constants enforce 64-byte alignment for all payload data:
PAYLOAD_ALIGN_LOG2: Base-2 logarithm of alignment requirement (6 = 64 bytes)PAYLOAD_ALIGNMENT: Computed alignment value (64 bytes)
This alignment matches CPU cache line sizes and enables efficient SIMD operations. The maximum pre-padding per entry is PAYLOAD_ALIGNMENT - 1 (63 bytes).
Sources: simd-r-drive-entry-handle/src/constants.rs:13-18
Constant Relationships
Sources: simd-r-drive-entry-handle/src/constants.rs:1-19
sequenceDiagram
participant Client
participant EntryHandle
participant align_or_copy
participant Memory
Client->>EntryHandle: get_payload_bytes()
EntryHandle->>Memory: read &[u8] from mmap
EntryHandle->>align_or_copy: align_or_copy<f32, 4>(bytes, f32::from_le_bytes)
alt Aligned on 64-byte boundary
align_or_copy->>Memory: validate alignment
align_or_copy-->>Client: Cow::Borrowed(&[f32])
Note over Client,Memory: Zero-copy: direct memory access\nelse Misaligned
align_or_copy->>align_or_copy: chunks_exact(4)
align_or_copy->>align_or_copy: map(f32::from_le_bytes)
align_or_copy->>align_or_copy: collect into Vec<f32>
align_or_copy-->>Client: Cow::Owned(Vec<f32>)
Note over Client,align_or_copy: Fallback: allocated copy
end
Common Patterns
Zero-Copy Data Access
Utilities like align_or_copy enable zero-copy access patterns when memory alignment allows:
Sources: src/utils/align_or_copy.rs:44-73 simd-r-drive-entry-handle/src/constants.rs:13-18
Namespace-Based Key Management
The NamespaceHasher utility enables hierarchical key organization:
Sources: src/utils.rs:11-12
Size Formatting for Logging
The format_bytes utility provides human-readable output:
| Input Bytes | Formatted Output |
|---|---|
| 1023 | “1023 B” |
| 1024 | “1.00 KB” |
| 1048576 | “1.00 MB” |
| 1073741824 | “1.00 GB” |
Sources: src/utils.rs:7-8
Configuration Parsing
The parse_buffer_size utility handles size string inputs:
| Input String | Parsed Bytes |
|---|---|
| “64” | 64 |
| “64KB” | 65,536 |
| “1MB” | 1,048,576 |
| “2GB” | 2,147,483,648 |
Sources: src/utils.rs:13-14
Integration with Core Systems
Relationship to Storage Engine
Sources: extensions/Cargo.toml:1-22 src/utils.rs:1-17 simd-r-drive-entry-handle/src/lib.rs:1-10
Performance Considerations
| Utility | Performance Impact | Use Case |
|---|---|---|
align_or_copy | Zero-copy when aligned | Deserializing typed arrays from storage |
NamespaceHasher | Single XXH3 hash | Generating hierarchical keys |
format_bytes | String allocation | Logging and user display only |
PAYLOAD_ALIGNMENT | Enables SIMD ops | Core storage layout requirement |
Sources: src/utils/align_or_copy.rs:1-74 simd-r-drive-entry-handle/src/constants.rs:13-18
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Development Guide
Loading…
Development Guide
Relevant source files
This document provides an overview of development practices for the SIMD R Drive repository. It covers workspace organization, build processes, feature configuration, testing strategies, and CI/CD integration. This guide is intended for contributors and maintainers working on the codebase.
For detailed instructions on building and testing specific features, see Building and Testing. For CI/CD pipeline details, see CI/CD Pipeline. For version history and migration information, see Version History and Migration.
Workspace Organization
The repository uses a Cargo workspace structure with multiple interdependent packages. The workspace is defined in the root Cargo.toml and includes both core components and experimental features.
graph TB
subgraph "Root Workspace"
ROOT["Cargo.toml\nworkspace root"]
end
subgraph "Core Packages"
CORE["simd-r-drive\nmain storage engine"]
HANDLE["simd-r-drive-entry-handle\nentry abstraction"]
EXT["extensions\nutility functions"]
end
subgraph "Experimental Packages"
WS_SERVER["experiments/simd-r-drive-ws-server\nWebSocket RPC server"]
WS_CLIENT["experiments/simd-r-drive-ws-client\nWebSocket RPC client"]
SERVICE_DEF["experiments/simd-r-drive-muxio-service-definition\nRPC contract"]
end
subgraph "Excluded from Workspace"
PY_DIRECT["experiments/bindings/python\nPyO3 direct bindings"]
PY_WS["experiments/bindings/python-ws-client\nPython WebSocket client"]
end
ROOT --> CORE
ROOT --> HANDLE
ROOT --> EXT
ROOT --> WS_SERVER
ROOT --> WS_CLIENT
ROOT --> SERVICE_DEF
CORE --> HANDLE
WS_SERVER --> CORE
WS_SERVER --> SERVICE_DEF
WS_CLIENT --> SERVICE_DEF
PY_DIRECT -.excluded.-> ROOT
PY_WS -.excluded.-> ROOT
Workspace Structure
Sources: Cargo.toml:65-77
The workspace includes six member packages Cargo.toml:66-73 and excludes two Python binding packages Cargo.toml:74-77 which use their own build systems via maturin.
Package Dependencies
| Package | Purpose | Key Dependencies |
|---|---|---|
simd-r-drive | Core storage engine | memmap2, dashmap, xxhash-rust, simd-r-drive-entry-handle |
simd-r-drive-entry-handle | Entry abstraction layer | arrow (optional), memmap2 |
extensions | Utility functions | simd-r-drive, alignment helpers |
simd-r-drive-ws-server | WebSocket server | simd-r-drive, muxio-tokio-rpc-server, tokio |
simd-r-drive-ws-client | WebSocket client | muxio-tokio-rpc-client, tokio |
simd-r-drive-muxio-service-definition | RPC contract | muxio-rpc-service, bitcode |
Sources: Cargo.toml:23-34 Cargo.toml:80-112
Feature Flags and Configuration
The core simd-r-drive package provides three optional feature flags that enable additional capabilities or internal API access.
graph LR
subgraph "Available Features"
DEFAULT["default = []\nbaseline features"]
PARALLEL["parallel\nenables rayon"]
EXPOSE["expose-internal-api\nexposes internals"]
ARROW["arrow\nproxy to entry-handle"]
end
subgraph "Dependencies Enabled"
RAYON["rayon = '1.10.0'"]
ARROW_DEP["arrow = '57.0.0'"]
end
subgraph "Use Cases"
UC_DEFAULT["Standard storage operations"]
UC_PARALLEL["Parallel batch operations"]
UC_EXPOSE["Testing/benchmarking access"]
UC_ARROW["Zero-copy Arrow buffers"]
end
DEFAULT --> UC_DEFAULT
PARALLEL --> RAYON
RAYON --> UC_PARALLEL
EXPOSE --> UC_EXPOSE
ARROW --> ARROW_DEP
ARROW_DEP --> UC_ARROW
Feature Flag Overview
Sources: Cargo.toml:49-55
Feature Flag Definitions
The feature flags are defined in Cargo.toml:49-55:
| Feature | Purpose | Enables |
|---|---|---|
default | Baseline configuration | No additional dependencies |
parallel | Parallel batch operations | rayon dependency for multi-threaded processing |
expose-internal-api | Internal API access | Exposes internal types for testing/benchmarking |
arrow | Apache Arrow integration | Proxy flag to simd-r-drive-entry-handle/arrow |
The arrow feature is a proxy feature Cargo.toml:54-55 that forwards to the simd-r-drive-entry-handle package, enabling zero-copy Apache Arrow buffer integration.
Sources: Cargo.toml:49-55
Development Workflow
The typical development workflow involves building, testing, and validating changes across multiple feature combinations before committing.
graph TB
subgraph "Local Development"
BUILD["cargo build --workspace"]
TEST["cargo test --workspace"]
BENCH_CHECK["cargo bench --workspace --no-run"]
CHECK["cargo check --workspace"]
end
subgraph "Feature Testing"
TEST_DEFAULT["cargo test"]
TEST_PARALLEL["cargo test --features parallel"]
TEST_EXPOSE["cargo test --features expose-internal-api"]
TEST_ALL["cargo test --all-features"]
end
subgraph "Code Quality"
FMT["cargo fmt --all -- --check"]
CLIPPY["cargo clippy --workspace"]
DENY["cargo deny check"]
AUDIT["cargo audit"]
end
BUILD --> TEST
TEST --> BENCH_CHECK
TEST --> TEST_DEFAULT
TEST --> TEST_PARALLEL
TEST --> TEST_EXPOSE
TEST --> TEST_ALL
TEST --> FMT
TEST --> CLIPPY
CLIPPY --> DENY
DENY --> AUDIT
Standard Development Commands
Sources: .github/workflows/rust-tests.yml:54-61
Workspace Commands
All workspace members can be built, tested, and checked simultaneously using workspace flags:
Individual packages can be built by navigating to their directory or using the -p flag:
Sources: .github/workflows/rust-tests.yml:54-57
Testing Strategy
The repository employs multiple testing approaches to ensure correctness and performance across different configurations and platforms.
graph TB
subgraph "Unit Tests"
UNIT_CORE["Core storage tests\nsrc/ inline #[test]"]
UNIT_HANDLE["EntryHandle tests\nsimd-r-drive-entry-handle"]
UNIT_EXT["Extension utility tests"]
end
subgraph "Integration Tests"
INT_STORAGE["Storage operations\ntests/ directory"]
INT_CONCURRENCY["Concurrency tests\nserial_test"]
INT_RPC["RPC integration\nexperiments/"]
end
subgraph "Benchmarks"
BENCH_STORAGE["storage_benchmark\nCriterion harness:false"]
BENCH_CONTENTION["contention_benchmark\nCriterion harness:false"]
end
subgraph "Python Tests"
PY_UNIT["pytest unit tests"]
PY_README["README example tests"]
PY_INT["Integration tests"]
end
UNIT_CORE --> INT_STORAGE
UNIT_HANDLE --> INT_STORAGE
UNIT_EXT --> INT_STORAGE
INT_STORAGE --> BENCH_STORAGE
INT_CONCURRENCY --> BENCH_CONTENTION
INT_RPC --> PY_UNIT
PY_UNIT --> PY_README
PY_README --> PY_INT
Test Hierarchy
Sources: Cargo.toml:36-63
Test Types
| Test Type | Location | Purpose |
|---|---|---|
| Unit tests | src/ modules with #[test] | Verify individual function correctness |
| Integration tests | tests/ directory | Verify component interactions |
| Benchmarks | benches/ directory | Measure performance characteristics |
| Python tests | experiments/bindings/python*/tests/ | Verify Python bindings |
The benchmark suite uses Criterion.rs with custom harness configuration Cargo.toml:57-63 to disable the default test harness and enable statistical analysis.
Sources: Cargo.toml:36-63
CI/CD Integration
The repository uses GitHub Actions for continuous integration, running tests across multiple operating systems and feature combinations.
CI Matrix Strategy
The test pipeline .github/workflows/rust-tests.yml:1-62 executes a matrix build strategy:
Sources: .github/workflows/rust-tests.yml:10-61
Matrix Configuration
The CI pipeline tests 18 total combinations (3 OS × 6 feature sets) .github/workflows/rust-tests.yml:14-30:
| Feature Set | Flags |
|---|---|
| Default | "" (empty) |
| No Default Features | --no-default-features |
| Parallel | --features parallel |
| Expose Internal API | --features expose-internal-api |
| Parallel + Expose API | --features=parallel,expose-internal-api |
| All Features | --all-features |
The fail-fast: false configuration .github/workflows/rust-tests.yml15 ensures all matrix jobs complete even if one fails, providing comprehensive test coverage feedback.
Sources: .github/workflows/rust-tests.yml:14-30
Caching Strategy
The CI pipeline caches Cargo dependencies .github/workflows/rust-tests.yml:40-51 to reduce build times:
Sources: .github/workflows/rust-tests.yml:40-51
The cache key includes the OS, Cargo.lock hash, and feature flags, ensuring separate caches for different configurations while enabling reuse across builds with identical dependencies.
Version Management
The workspace uses unified versioning across all packages Cargo.toml:1-6:
| Field | Value |
|---|---|
| Version | 0.15.5-alpha |
| Edition | 2024 |
| Repository | https://github.com/jzombie/rust-simd-r-drive |
| License | Apache-2.0 |
All workspace members inherit these values via workspace inheritance Cargo.toml:14-21:
This ensures consistent versioning across the entire project. For detailed version history and migration guides, see Version History and Migration.
Sources: Cargo.toml:1-21
Ignored Files and Directories
The repository excludes certain files and directories from version control .gitignore:1-11:
| Pattern | Purpose |
|---|---|
**/target | Rust build artifacts |
*.bin | Binary data files (test/debug) |
/data | Local data directory for debugging |
out.txt | Output file for experimentation |
.cargo/config.toml | Local Cargo configuration overrides |
The /data directory .gitignore5 is specifically noted for debugging and experimentation purposes, allowing developers to maintain local test data without committing it.
Sources: .gitignore:1-11
Summary
The SIMD R Drive development environment is organized as a Cargo workspace with multiple packages, optional feature flags, comprehensive testing across platforms and configurations, and automated CI/CD validation. The workspace structure separates core functionality from experimental features, while unified versioning ensures consistency across all packages.
Key development practices include:
- Building and testing all workspace members simultaneously
- Testing multiple feature flag combinations locally before committing
- Leveraging CI/CD matrix builds for comprehensive platform coverage
- Using benchmarks with statistical analysis via Criterion.rs
- Maintaining separate build systems for Python bindings
For specific build instructions and test execution details, refer to Building and Testing. For CI/CD pipeline configuration details, refer to CI/CD Pipeline. For version history and migration guidance, refer to Version History and Migration.
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Building and Testing
Loading…
Building and Testing
Relevant source files
- .github/workflows/rust-tests.yml
- .gitignore
- Cargo.toml
- benches/storage_benchmark.rs
- src/main.rs
- src/utils/format_bytes.rs
- tests/concurrency_tests.rs
This document covers building the Rust workspace, running tests, and using feature flags for the simd-r-drive storage engine. It provides instructions for building individual crates, executing test suites, and running benchmarks.
For information about CI/CD automation, see CI/CD Pipeline. For building Python bindings specifically, see Building Python Bindings.
Prerequisites
The project requires:
- Rust toolchain (edition 2024, as specified in Cargo.toml4)
- Cargo (workspace resolver version 2)
- Platform support : Linux, macOS, Windows (all tested in CI)
Optional dependencies for specific features:
- Rayon (for
parallelfeature) - Apache Arrow libraries (for
arrowfeature) - Tokio runtime (for async tests and network experiments)
Sources: Cargo.toml:1-113
Workspace Structure
The project is organized as a Cargo workspace with multiple member crates:
Workspace Members (built together with --workspace):
simd-r-drive- Core storage enginesimd-r-drive-entry-handle- Entry data structuresextensions- Helper utilities- Network experiment crates
Excluded Members (must be built separately):
- Python binding directories use separate build systems (maturin/PyO3)
Sources: Cargo.toml:65-78
Building the Core Library
Basic Build
Build the core library with default features:
Build in release mode for production use:
Building the Entire Workspace
Build all workspace members:
Build all targets (lib, bins, tests, benches):
Sources: Cargo.toml:11-21 .github/workflows/rust-tests.yml:53-54
Feature Flags
The project uses Cargo feature flags to enable optional functionality:
| Feature Flag | Purpose | Dependencies Added |
|---|---|---|
default | Base functionality (empty) | None |
parallel | Multi-threaded operations | rayon = "1.10.0" |
arrow | Apache Arrow columnar data | arrow = "57.0.0" (via entry-handle) |
expose-internal-api | Expose internal APIs for testing | None |
Building with Features
Build with specific features:
Sources: Cargo.toml:49-55 Cargo.toml30
Running Tests
Test Suite Organization
The project includes multiple test types:
| Test Type | Location | Purpose |
|---|---|---|
| Unit tests | src/**/*.rs (inline) | Test individual functions/modules |
| Integration tests | tests/*.rs | Test public API interactions |
| Concurrency tests | tests/concurrency_tests.rs | Test thread-safety under load |
| Benchmarks | benches/*.rs | Performance measurements |
Basic Test Execution
Run all tests in the workspace:
Run tests with verbose output:
Run tests for a specific crate:
Sources: .github/workflows/rust-tests.yml:56-57
Testing with Different Feature Combinations
The CI matrix tests 6 feature combinations across 3 operating systems:
Run tests matching CI configurations:
Sources: .github/workflows/rust-tests.yml:14-31
Concurrency Tests
The concurrency test suite validates thread-safety under high contention:
Concurrency Test Cases
The test suite includes three primary concurrency tests:
| Test Function | Configuration | Purpose |
|---|---|---|
concurrent_write_test | 16 threads × 10 writes | Validates concurrent writes don’t corrupt data |
concurrent_slow_streamed_write_test | Multi-threaded streaming | Tests concurrent write_stream with simulated latency |
interleaved_read_write_test | Synchronized read/write | Validates read-after-write consistency |
Running Concurrency Tests
The tests use #[serial] annotation to prevent parallel execution (since they test shared state):
Test Requirements:
- Uses
serial_test::serialto serialize test execution tests/concurrency_tests.rs1 - Requires Tokio multi-threaded runtime tests/concurrency_tests.rs14
- Uses
tempfilefor isolated storage instances tests/concurrency_tests.rs9
Sources: tests/concurrency_tests.rs:1-230
Benchmarking
Benchmark Suite
The project includes two benchmark suites:
Benchmark Configuration:
| Benchmark | Harness | Purpose |
|---|---|---|
storage_benchmark | false (custom) | Micro-benchmarks for core operations |
contention_benchmark | false (custom) | Multi-threaded contention scenarios |
Running Benchmarks
Compile benchmarks without running:
Run all benchmarks:
Run specific benchmark:
Storage Benchmark Operations
The storage_benchmark suite measures four operation types:
| Operation | Test Size | Batch Size | Purpose |
|---|---|---|---|
| Append entries | 1,000,000 entries | 1,024 entries/batch | Write throughput |
| Sequential reads | All entries | N/A (iterator) | Zero-copy iteration |
| Random reads | 1,000,000 lookups | 1 entry/lookup | Random access latency |
| Batch reads | 1,000,000 entries | 1,024 entries/batch | Vectorized read throughput |
Key Constants:
ENTRY_SIZE = 8bytes benches/storage_benchmark.rs20WRITE_BATCH_SIZE = 1024benches/storage_benchmark.rs21READ_BATCH_SIZE = 1024benches/storage_benchmark.rs22NUM_ENTRIES = 1_000_000benches/storage_benchmark.rs24
Sources: Cargo.toml:57-63 benches/storage_benchmark.rs:1-234
Building the CLI Binary
The project includes a command-line interface:
Building the CLI
Build the CLI binary:
Run the CLI directly:
The CLI binary will be located at:
- Debug:
target/debug/simd-r-drive(or.exeon Windows) - Release:
target/release/simd-r-drive
Sources: src/main.rs:1-12 Cargo.toml25
Cross-Platform Considerations
Platform-Specific Testing
The CI pipeline tests on three operating systems:
| OS | Runner | Notes |
|---|---|---|
| Linux | ubuntu-latest | Primary development platform |
| macOS | macos-latest | Darwin/BSD compatibility |
| Windows | windows-latest | MSVC toolchain |
Known Platform Differences
- File Locking : The
BufWriter<File>uses platform-specific file locking Cargo.toml:23-24 - Memory Mapping :
memmap2handles platform differences inmmapAPIs Cargo.toml29 - Path Separators : Tests use
tempfilefor cross-platform temporary paths Cargo.toml45
Running Platform-Specific Tests
Test on your local platform:
Sources: .github/workflows/rust-tests.yml:14-18
Development Workflow
Quick Test Commands
Common development commands:
| Command | Purpose |
|---|---|
cargo check | Fast syntax/type checking |
cargo test | Run all tests |
cargo test --test concurrency_tests | Run concurrency tests only |
cargo bench --no-run | Verify benchmarks compile |
cargo run -- --help | Test CLI binary |
Caching Dependencies
The CI uses cargo caching to speed up builds:
Local development automatically uses Cargo’s built-in caching in these directories.
Sources: .github/workflows/rust-tests.yml:40-51
Test Data Management
Test isolation strategies:
- Temporary Files : Use
tempfile::tempdir()for isolated test storage tests/concurrency_tests.rs37 - Serial Execution : Use
#[serial]for tests sharing state tests/concurrency_tests.rs15 - Cleanup : Temporary files are automatically removed on drop tests/concurrency_tests.rs37
Sources: tests/concurrency_tests.rs:37-40 .gitignore:1-11
Test Debugging
Enabling Trace Logs
The CLI and tests use tracing for structured logging:
The main CLI initializes tracing with info level by default src/main.rs7
Test Output Verbosity
Capture test output:
Sources: src/main.rs7 Cargo.toml:32-33
Verifying Build Artifacts
Checking Benchmark Compilation
The CI ensures benchmarks compile even though they’re not run in CI:
This is important because benchmark compilation uses different code paths (criterion harness disabled) Cargo.toml59
Build All Targets
Verify all code compiles:
This builds:
- Library crates (
--lib) - Binary targets (
--bins) - Test targets (
--tests) - Benchmark targets (
--benches) - Example code (
--examples)
Sources: .github/workflows/rust-tests.yml:53-61
Document Sources:
- Cargo.toml:1-113
- .github/workflows/rust-tests.yml:1-62
- tests/concurrency_tests.rs:1-230
- benches/storage_benchmark.rs:1-234
- src/main.rs:1-12
- .gitignore:1-11
- src/utils/format_bytes.rs:1-31
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
CI/CD Pipeline
Loading…
CI/CD Pipeline
Relevant source files
- .github/workflows/rust-lint.yml
- .github/workflows/rust-tests.yml
- .gitignore
- CHANGELOG.md
- simd-r-drive-entry-handle/src/debug_assert_aligned.rs
Purpose and Scope
This document describes the continuous integration and continuous delivery (CI/CD) infrastructure for the SIMD R Drive project. The CI/CD pipeline is implemented using GitHub Actions workflows that automatically validate code quality, run tests across multiple platforms and feature combinations, and perform security audits on every commit and pull request.
For information about building and testing the project locally, see Building and Testing. For details about version management and breaking changes that may affect CI/CD configuration, see Version History and Migration.
Sources: .github/workflows/rust-tests.yml:1-62 .github/workflows/rust-lint.yml:1-44
Workflow Overview
The CI/CD pipeline consists of two primary GitHub Actions workflows:
| Workflow | File | Primary Purpose | Trigger Events |
|---|---|---|---|
| Rust Tests | .github/workflows/rust-tests.yml | Multi-platform testing across OS and feature combinations | Push to main, tags (v*), PRs to main |
| Rust Lint | .github/workflows/rust-lint.yml | Code quality, formatting, and security checks | All pushes and pull requests |
Both workflows run in parallel to provide comprehensive validation before code is merged.
Sources: .github/workflows/rust-tests.yml:1-9 .github/workflows/rust-lint.yml:1-3
CI/CD Architecture
CI/CD Architecture Overview
This diagram shows the complete CI/CD pipeline structure. The test workflow creates a matrix of 18 job combinations (3 operating systems × 6 feature configurations), while the lint workflow runs a sequential series of quality checks.
Sources: .github/workflows/rust-tests.yml:10-31 .github/workflows/rust-lint.yml:5-43
Test Workflow (rust-tests.yml)
Test Matrix Configuration
The test workflow uses a GitHub Actions matrix strategy to validate the codebase across multiple dimensions:
Test Matrix Execution Flow
graph LR
subgraph "Operating Systems"
OS_U["ubuntu-latest"]
OS_M["macos-latest"]
OS_W["windows-latest"]
end
subgraph "Feature Configurations"
F1["flags: empty\nDefault"]
F2["flags: --no-default-features\nNo Default Features"]
F3["flags: --features parallel\nParallel"]
F4["flags: --features expose-internal-api\nExpose Internal API"]
F5["flags: --features=parallel,expose-internal-api\nParallel + Expose API"]
F6["flags: --all-features\nAll Features"]
end
subgraph "Test Steps"
CHECKOUT["actions/checkout@v4"]
RUST_INSTALL["dtolnay/rust-toolchain@stable"]
CACHE["actions/cache@v4\nCargo dependencies"]
BUILD["cargo build --workspace --all-targets"]
TEST["cargo test --workspace --all-targets"]
BENCH["cargo bench --workspace --no-run"]
end
OS_U --> F1
OS_U --> F2
OS_U --> F3
OS_U --> F4
OS_U --> F5
OS_U --> F6
F1 --> CHECKOUT
CHECKOUT --> RUST_INSTALL
RUST_INSTALL --> CACHE
CACHE --> BUILD
BUILD --> TEST
TEST --> BENCH
Each of the 18 matrix combinations follows the same execution flow: checkout code, install Rust toolchain, restore cached dependencies, build all workspace targets, run all tests, and verify benchmarks compile.
Sources: .github/workflows/rust-tests.yml:14-31 .github/workflows/rust-tests.yml:33-61
Matrix Strategy Details
| Parameter | Values | Purpose |
|---|---|---|
os | ubuntu-latest, macos-latest, windows-latest | Validate cross-platform compatibility |
fail-fast | false | Continue running all matrix jobs even if one fails |
include.name | See feature list below | Descriptive name for each feature combination |
include.flags | Cargo command-line flags | Feature flags passed to cargo build and cargo test |
Feature Combinations:
- Default (
flags: ""): Standard feature set with default features enabled - No Default Features (
flags: "--no-default-features"): Minimal build without optional features - Parallel (
flags: "--features parallel"): Enables parallel processing capabilities - Expose Internal API (
flags: "--features expose-internal-api"): Exposes internal APIs for testing/experimentation - Parallel + Expose API (
flags: "--features=parallel,expose-internal-api"): Combination of parallel and internal API features - All Features (
flags: "--all-features"): Enables all available features includingarrowintegration
Sources: .github/workflows/rust-tests.yml:18-30
Test Execution Steps
The test workflow executes the following steps for each matrix combination:
1. Repository Checkout
Uses GitHub’s official checkout action to clone the repository.
Sources: .github/workflows/rust-tests.yml:33-34
2. Rust Toolchain Installation
Installs the stable Rust toolchain using the dtolnay/rust-toolchain action.
Sources: .github/workflows/rust-tests.yml:36-37
3. Dependency Caching
Caches Cargo dependencies and build artifacts to speed up subsequent runs. The cache key includes:
- Operating system (
runner.os) - Cargo.lock file hash for dependency versioning
- Matrix flags to separate caches for different feature combinations
Sources: .github/workflows/rust-tests.yml:39-51
4. Build
Builds all workspace members and all target types (lib, bin, tests, benches, examples) with the specified feature flags.
Sources: .github/workflows/rust-tests.yml:53-54
5. Test Execution
Runs all tests across the entire workspace with verbose output enabled.
Sources: .github/workflows/rust-tests.yml:56-57
6. Benchmark Compilation Check
Verifies that all benchmarks compile successfully without actually executing them. The --no-run flag ensures benchmarks are only compiled, not executed, which would be time-consuming in CI.
Sources: .github/workflows/rust-tests.yml:59-61
graph TB
TRIGGER["Push or Pull Request"]
subgraph "Setup Steps"
CHECKOUT["actions/checkout@v3"]
RUST_INSTALL["dtolnay/rust-toolchain@stable"]
COMPONENTS["rustup component add\nrustfmt clippy"]
TOOLS["cargo install\ncargo-deny cargo-audit"]
end
subgraph "Quality Checks"
FMT["cargo fmt --all -- --check\nVerify formatting"]
CLIPPY["cargo clippy --workspace\n--all-targets --all-features\nLint warnings"]
DOC["RUSTDOCFLAGS=-D warnings\ncargo doc --workspace\nDocumentation quality"]
end
subgraph "Security Checks"
DENY["cargo deny check\nLicense/dependency policy"]
AUDIT["cargo audit\nKnown vulnerabilities"]
end
TRIGGER --> CHECKOUT
CHECKOUT --> RUST_INSTALL
RUST_INSTALL --> COMPONENTS
COMPONENTS --> TOOLS
TOOLS --> FMT
FMT --> CLIPPY
CLIPPY --> DOC
DOC --> DENY
DENY --> AUDIT
Lint Workflow (rust-lint.yml)
The lint workflow performs comprehensive code quality and security checks on a single platform (Ubuntu):
Lint Workflow Execution Graph
The lint workflow runs sequentially through setup, quality checks, and security audits. All checks must pass for the workflow to succeed.
Sources: .github/workflows/rust-lint.yml:1-44
Lint Steps Breakdown
1. Component Installation Workaround
This step addresses a GitHub Actions environment issue where rustfmt and clippy may not be automatically available. The workflow explicitly installs these components to ensure consistent behavior.
Sources: .github/workflows/rust-lint.yml:13-18
2. Tool Installation
Installs third-party Cargo subcommands:
cargo-deny: Validates dependency licenses, sources, and advisories against policy rulescargo-audit: Checks dependencies against the RustSec Advisory Database for known security vulnerabilities
Sources: .github/workflows/rust-lint.yml:20-23
3. Format Verification
Verifies that all code follows Rust’s standard formatting conventions using rustfmt. The --check flag ensures the command fails if any files need reformatting without modifying them.
Sources: .github/workflows/rust-lint.yml:25-27
4. Clippy Linting
Runs Clippy, Rust’s official linter, with the following configuration:
--workspace: Lint all workspace members--all-targets: Lint library, binaries, tests, benchmarks, and examples--all-features: Enable all features when linting-D warnings: Treat all warnings as errors, failing the build if any issues are found
Sources: .github/workflows/rust-lint.yml:29-31
5. Documentation Verification
Generates and validates documentation with strict checks:
RUSTDOCFLAGS="-D warnings": Treats documentation warnings as errors--workspace: Document all workspace members--no-deps: Only document workspace crates, not dependencies--document-private-items: Include documentation for private items to ensure comprehensive coverage
Sources: .github/workflows/rust-lint.yml:33-35
6. Dependency Policy Enforcement
Validates dependencies against policy rules defined in a deny.toml configuration file (if present). This checks:
- License compatibility
- Banned/allowed crates
- Advisory database for security issues
- Source verification (crates.io, git repositories)
Sources: .github/workflows/rust-lint.yml:37-39
7. Security Audit
Scans Cargo.lock against the RustSec Advisory Database to identify dependencies with known security vulnerabilities. This provides early warning of security issues in the dependency tree.
Sources: .github/workflows/rust-lint.yml:41-43
Caching Strategy
The test workflow implements an intelligent caching strategy to reduce build times:
Cache Key Structure
${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}-${{ matrix.flags }}
Components:
runner.os: Operating system (Linux, macOS, Windows)hashFiles('**/Cargo.lock'): Hash of all Cargo.lock files in the repositorymatrix.flags: Feature flag combination being tested
This multi-dimensional key ensures:
- Different operating systems maintain separate caches
- Cache invalidation occurs when dependencies change
- Different feature combinations don’t share potentially incompatible build artifacts
Cached Directories
| Directory | Contents | Purpose |
|---|---|---|
~/.cargo/bin/ | Installed Cargo binaries | Reuse installed tools across runs |
~/.cargo/registry/index/ | Crates.io registry index | Avoid re-downloading registry metadata |
~/.cargo/registry/cache/ | Downloaded crate archives | Skip re-downloading crate source code |
~/.cargo/git/db/ | Git dependencies | Reuse git repository clones |
target/ | Compiled artifacts | Skip recompiling unchanged dependencies |
Cache Restore Fallback
If an exact cache match is not found, the workflow attempts to restore a cache with a partial key match (same OS and Cargo.lock hash but different flags). This provides some benefit even when testing different feature combinations.
Sources: .github/workflows/rust-tests.yml:39-51
Workflow Triggers and Conditions
Test Workflow Triggers
The test workflow (rust-tests.yml) activates on:
| Event Type | Condition | Purpose |
|---|---|---|
| Push | Branch: main | Validate main branch commits |
| Push | Tag: v* | Validate release tag creation |
| Pull Request | Target: main | Pre-merge validation |
This configuration ensures:
- All changes to the main branch are tested
- Release tags trigger comprehensive validation
- Pull requests are validated before merge
Sources: .github/workflows/rust-tests.yml:3-8
Lint Workflow Triggers
The lint workflow (rust-lint.yml) activates on:
| Event Type | Condition | Purpose |
|---|---|---|
| Push | All branches | Immediate feedback on all commits |
| Pull Request | All pull requests | Pre-merge code quality validation |
This broader trigger ensures code quality checks run on all development branches, not just main.
Sources: .github/workflows/rust-lint.yml3
Integration with Repository Configuration
Ignored Files and Directories
The CI/CD workflows respect the repository’s .gitignore configuration:
Key exclusions:
**/target: Build artifacts (handled by caching)*.bin: Binary data files created by storage engine tests/data: Debugging and experimentation directory.cargo/config.toml: Local Cargo configuration overrides
Sources: .gitignore:1-11
Alignment Changes and CI Impact
The CI/CD pipeline automatically validates alignment-sensitive code across all platforms. Version 0.15.0 introduced a breaking change increasing PAYLOAD_ALIGNMENT from 16 to 64 bytes, which the CI validates through:
-
Debug assertions in
simd-r-drive-entry-handle/src/debug_assert_aligned.rs:debug_assert_aligned(): Validates pointer alignmentdebug_assert_aligned_offset(): Validates file offset alignment
-
Cross-platform testing ensures alignment works correctly on:
- x86_64 (AVX2 256-bit)
- ARM (NEON 128-bit)
- Both 32-bit and 64-bit architectures
The debug assertions compile to no-ops in release builds but provide comprehensive validation in CI test runs:
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:26-43 CHANGELOG.md:25-51
Failure Modes and Debugging
Matrix Job Independence
The test workflow sets fail-fast: false, which means:
- If one OS/feature combination fails, others continue to completion
- Developers can see all failure patterns at once
- Useful for identifying platform-specific or feature-specific issues
Sources: .github/workflows/rust-tests.yml15
Common Failure Scenarios
| Check | Failure Cause | Resolution |
|---|---|---|
cargo fmt | Code not formatted | Run cargo fmt --all locally |
cargo clippy | Linting violations | Address warnings or allow with #[allow(clippy::...)] |
cargo doc | Documentation errors | Fix broken doc comments or missing documentation |
cargo deny | Dependency policy violation | Update dependencies or adjust policy |
cargo audit | Known vulnerability | Update affected dependency or acknowledge advisory |
| Test matrix job | Platform/feature-specific bug | Debug locally with same OS and feature flags |
| Benchmark compilation | Benchmark code error | Fix benchmark code or feature gate |
Debugging Failed Matrix Jobs
To reproduce a specific matrix job failure locally:
-
Identify the failing OS and feature combination from the GitHub Actions log
-
Use the exact command shown in the workflow:
-
For cross-platform issues, use Docker or a VM matching the CI environment
Sources: .github/workflows/rust-tests.yml:53-57
CI/CD Pipeline Maintenance
Adding New Feature Flags
To add a new feature flag to the test matrix:
-
Add a new entry to the
matrix.includesection inrust-tests.yml: -
Consider whether the feature should be included in the “All Features” test
-
Update documentation if the feature has platform-specific behavior
Sources: .github/workflows/rust-tests.yml:18-30
Updating Toolchain Versions
Both workflows use GitHub Actions to manage Rust toolchain versions:
- Stable toolchain :
dtolnay/rust-toolchain@stableautomatically tracks the latest stable release - Pinning a specific version : Replace
@stablewith@1.XX.Xif needed - Nightly features : Change
@stableto@nightly(may require additional stability considerations)
Sources: .github/workflows/rust-tests.yml:36-37 .github/workflows/rust-lint.yml11
Monitoring CI Performance
Key metrics to monitor:
- Cache hit rate : Check if Cargo caches are being restored effectively
- Build time trends : Monitor for increases that might indicate dependency bloat
- Test execution time : Identify slow tests that could benefit from optimization
- Matrix job duration : Ensure no single OS/feature combination becomes a bottleneck
GitHub Actions provides timing information for each step and job in the workflow run logs.
Dismiss
Refresh this wiki
Enter email to refresh
This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Version History and Migration
Loading…
Version History and Migration
Relevant source files
This page documents the version history of simd-r-drive, tracking breaking changes, file format evolution, and providing actionable migration guides for upgrading between versions. For information about building and testing procedures, see Building and Testing. For CI/CD pipeline details, see CI/CD Pipeline.
Versioning Strategy
The project follows a modified Semantic Versioning approach while in alpha status:
| Version Component | Meaning |
|---|---|
0.MINOR.PATCH-alpha | Current alpha phase format |
MINOR increment | Breaking changes, including on-disk format changes |
PATCH increment | Non-breaking changes, bug fixes, dependency updates |
-alpha suffix | Indicates pre-1.0 status with possible instability |
Breaking Change Policy : Any change to the on-disk storage format or core API that prevents backward compatibility results in a MINOR version bump. The project maintains a strict policy of documenting all breaking changes in the changelog with migration instructions.
Sources : CHANGELOG.md:1-5
Version Timeline
The following diagram shows the evolution of major versions and their breaking changes:
Sources : CHANGELOG.md:19-82
timeline
title "simd-r-drive Version History"
section 0.14.x Series
0.14.0-alpha : Introduced payload alignment
: Added pre-padding mechanism
: 16-byte default alignment
: BREAKING: Format incompatible with 0.13.x\nsection 0.15.x Series\n0.15.0-alpha : Increased alignment to 64 bytes\n: Added debug alignment assertions\n: BREAKING: Format incompatible with 0.14.x\n0.15.5-alpha : Arrow dependency bump to 57.0.0\n: No format changes
Breaking Changes History
Version 0.15.5-alpha (2025-10-27)
Type : Non-breaking maintenance release
Changes :
- Apache Arrow dependency updated to version 57.0.0
- Affects
arrowfeature flag only - No changes to
DataStore,EntryHandle, or storage format - No migration required
Sources : CHANGELOG.md:19-22
Version 0.15.0-alpha (2025-09-25)
Type : BREAKING - On-disk format incompatible with 0.14.x
Critical Changes :
The PAYLOAD_ALIGNMENT constant in src/storage_engine/constants.rs increased from 16 bytes (log₂ = 4) to 64 bytes (log₂ = 6). This ensures safe zero-copy access for:
- SSE: 16-byte operations
- AVX2: 32-byte operations
- AVX-512: 64-byte operations
- CPU cache lines: 64 bytes on modern x86_64/ARM
Title : On-Disk Format Comparison: 0.14.x vs 0.15.x
Added Features :
debug_assert_aligned(ptr: *const u8, align: usize)in simd-r-drive-entry-handle/src/debug_assert_aligned.rs:26-43- Validates pointer alignment at runtime in debug/test builds
- No-op in release builds (zero cost abstraction)
debug_assert_aligned_offset(off: u64)in simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88- Validates file offset alignment using
PAYLOAD_ALIGNMENTconstant - Checks
off.is_multiple_of(PAYLOAD_ALIGNMENT)
- Validates file offset alignment using
- Enhanced
EntryHandle::as_arrow_buffer()with alignment assertions - Enhanced
EntryHandle::into_arrow_buffer()with alignment assertions
Affected Code Entities :
| Entity | Location | Change |
|---|---|---|
PAYLOAD_ALIGN_LOG2 | src/storage_engine/constants.rs | 4 → 6 |
PAYLOAD_ALIGNMENT | src/storage_engine/constants.rs | 16 → 64 |
| Write path pre-padding calculation | Storage engine | Uses new alignment value |
| Read path offset calculation | Storage engine | Expects new alignment |
EntryMetadata parsing | Storage engine | Unchanged (20 bytes) |
Technical Incompatibility Details :
When a 0.14.x reader opens a 0.15.x file:
- Incorrect offset calculation : Reader calculates
payload_start = metadata_end + (16 - offset % 16) % 16 - Actual offset : File contains
payload_start = metadata_end + (64 - offset % 64) % 64 - Result : Reader may access pre-pad bytes or miss payload start, causing:
- CRC32 checksum failures
- Deserialization errors
- Silent data corruption if payload happens to parse
Title : Read Operation Failure in Mixed Versions
Sources : CHANGELOG.md:25-52 simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88
Version 0.14.0-alpha (2025-09-08)
Type : BREAKING - On-disk format incompatible with 0.13.x
Critical Changes :
First version to introduce configurable payload alignment via pre-padding mechanism. Payloads now guaranteed to start at addresses that are multiples of PAYLOAD_ALIGNMENT.
Title : Introduction of Pre-Padding in 0.14.0-alpha
New Constants Introduced :
| Constant | Value (0.14.0) | Type | Description |
|---|---|---|---|
PAYLOAD_ALIGN_LOG2 | 4 | u32 | Log₂ of alignment (2⁴ = 16) |
PAYLOAD_ALIGNMENT | 16 | u64 | Derived as 1 << PAYLOAD_ALIGN_LOG2 |
New Feature Flags :
arrow: Enables Apache Arrow integration viaEntryHandle::as_arrow_buffer()andinto_arrow_buffer()
Added Methods (requires arrow feature):
EntryHandle::as_arrow_buffer(&self) -> arrow_buffer::Buffer- Creates zero-copy Arrow buffer view over payload
- Requires payload alignment to satisfy Arrow’s requirements
EntryHandle::into_arrow_buffer(self) -> arrow_buffer::Buffer- Converts
EntryHandleinto Arrow buffer (consumes handle)
- Converts
Pre-Padding Calculation Algorithm :
Given:
- metadata_end: u64 // offset after EntryMetadata
- PAYLOAD_ALIGNMENT: u64 = 16
Calculate:
padding = (PAYLOAD_ALIGNMENT - (metadata_end % PAYLOAD_ALIGNMENT)) % PAYLOAD_ALIGNMENT
payload_start = metadata_end + padding
Assert:
payload_start % PAYLOAD_ALIGNMENT == 0
Technical Incompatibility Details :
0.13.x readers do not skip pre-padding bytes. When reading 0.14.x files:
- Reader expects payload immediately after metadata
- Reads pre-pad zero bytes as payload start
- Payload deserialization fails or produces corrupt data
- CRC32 checksum computed over wrong byte range
Sources : CHANGELOG.md:55-82
Migration Procedures
Migrating from 0.14.x to 0.15.x
Title : Migration Process: 0.14.x to 0.15.x
Detailed Migration Steps :
Step 1: Environment Setup
- Compile 0.14.x binary:
git checkout v0.14.0-alpha && cargo build --release - Compile 0.15.x binary:
git checkout v0.15.0-alpha && cargo build --release - Verify disk space: new file size ≈ old file size × 1.1 (due to increased padding)
- Backup:
cp data.store data.store.backup
Step 2: Extract Data with 0.14.x Binary
Using DataStore API:
Step 3: Create New Store with 0.15.x Binary
Using DataStore::create():
Step 4: Verification Pass
Check all entries are readable and valid:
Step 5: Deployment Strategy
For single-process deployments:
- Stop process
- Run migration script
- Atomic file replacement:
mv new.store data.store - Restart process with 0.15.x binary
For multi-service deployments:
- Deploy all reader services to 0.15.x first (backward compatible reads)
- Stop all writer services
- Run migration on primary data store
- Verify migrated store
- Deploy writer services to 0.15.x
- Resume writes
Sources : CHANGELOG.md:43-51
Migrating from 0.13.x to 0.14.x
Step-by-Step Migration :
Technical Changes :
- 0.13.x: No pre-padding, payload immediately follows
EntryMetadata - 0.14.x: Pre-padding inserted,
PAYLOAD_ALIGNMENT = 16guaranteed
Migration Process :
-
Data extraction (using 0.13.x binary):
-
Store creation (using 0.14.x binary):
-
Verification (using 0.14.x binary):
- Read each key and validate CRC32
- Compare payloads with original data
Service Deployment Order :
- Stage 1 : Deploy readers with 0.14.x (can read old format)
- Stage 2 : Migrate data store to new format
- Stage 3 : Deploy writers with 0.14.x
- Rationale : Prevents writers from creating 0.14.x files before readers can handle them
Sources : CHANGELOG.md:75-81
Compatibility Matrix
The following table shows version compatibility for readers and writers:
| Writer Version | Reader 0.13.x | Reader 0.14.x | Reader 0.15.x |
|---|---|---|---|
| 0.13.x | ✅ Compatible | ✅ Compatible | ✅ Compatible |
| 0.14.x | ❌ Breaks | ✅ Compatible | ✅ Compatible |
| 0.15.x | ❌ Breaks | ❌ Breaks | ✅ Compatible |
Legend :
- ✅ Compatible: Reader can correctly parse writer’s format
- ❌ Breaks: Reader cannot correctly parse writer’s format (data corruption risk)
Key Observations :
- Newer readers are backward-compatible (can read older formats)
- Older readers cannot read newer formats (forward compatibility not guaranteed)
- Each
MINORversion bump introduces a new on-disk format
Sources : CHANGELOG.md:25-82
File Format Version Detection
The storage engine does not embed a file format version marker in the data file. Version compatibility must be managed externally through:
- Deployment tracking : Maintain records of which binary version wrote each store
- File naming conventions : Include version in filename (e.g.,
data-v0.15.store) - Metadata sidecars : Store version information in separate metadata files
- Service configuration : Configure services with expected format version
Limitations :
- No automatic format detection at runtime
- Mixed-version deployment requires careful orchestration
- Checksum validation alone cannot detect version mismatches
Sources : CHANGELOG.md:1-82
Upgrade Strategies
Strategy 1: Blue-Green Deployment
Advantages :
- Clean separation of old and new versions
- Easy rollback if issues discovered
- No mixed-version complexity
Disadvantages :
- Requires duplicate infrastructure during migration
- Data must be fully copied
- Higher resource cost
Strategy 2: Rolling Upgrade (Reader-First)
Advantages :
- Minimal infrastructure duplication
- Gradual rollout reduces risk
- Reader compatibility maintained throughout
Disadvantages :
- Requires maintenance window for data migration
- More complex orchestration
- Must coordinate across multiple services
Sources : CHANGELOG.md:43-51 CHANGELOG.md:75-81
Alignment Configuration Reference
Title : Code Entity Mapping: Alignment System
Alignment Constants :
| Constant | Location | Type | Version History | Description |
|---|---|---|---|---|
PAYLOAD_ALIGN_LOG2 | src/storage_engine/constants.rs | u32 | 0.14: 4, 0.15: 6 | Log₂ of alignment (2^n) |
PAYLOAD_ALIGNMENT | src/storage_engine/constants.rs | u64 | 0.14: 16, 0.15: 64 | Computed as 1 << PAYLOAD_ALIGN_LOG2 |
Debug Assertion Functions :
| Function | Location | Signature | Behavior |
|---|---|---|---|
debug_assert_aligned() | simd-r-drive-entry-handle/src/debug_assert_aligned.rs:26-43 | fn(ptr: *const u8, align: usize) | Asserts (ptr as usize) & (align - 1) == 0 in debug/test, no-op in release |
debug_assert_aligned_offset() | simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88 | fn(off: u64) | Asserts off.is_multiple_of(PAYLOAD_ALIGNMENT) in debug/test, no-op in release |
Function Implementation Details :
Both assertion functions use conditional compilation to ensure zero runtime cost in release builds:
Pre-Padding Calculation (used in write path):
Given:
metadata_end: u64
PAYLOAD_ALIGNMENT: u64 (16 or 64)
Compute:
padding = (PAYLOAD_ALIGNMENT - (metadata_end % PAYLOAD_ALIGNMENT)) % PAYLOAD_ALIGNMENT
payload_start = metadata_end + padding
Invariant:
payload_start % PAYLOAD_ALIGNMENT == 0
Version-Specific Alignment Values :
| Version | PAYLOAD_ALIGN_LOG2 | PAYLOAD_ALIGNMENT | Max Pre-Pad | Rationale |
|---|---|---|---|---|
| 0.13.x | N/A | N/A | 0 | No alignment guarantees |
| 0.14.x | 4 | 16 | 15 bytes | SSE compatibility (128-bit) |
| 0.15.x | 6 | 64 | 63 bytes | AVX-512 + cache-line optimization |
Sources : CHANGELOG.md:25-42 CHANGELOG.md:55-74 simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88
CI/CD Integration for Version Management
Title : CI/CD Pipeline Steps for Version Validation
Linting Steps (from .github/workflows/rust-lint.yml:1-44):
| Step | Command | Purpose | Version Impact |
|---|---|---|---|
| Format check | cargo fmt --all -- --check | Enforces formatting consistency | Prevents style regressions |
| Clippy | cargo clippy --workspace --all-targets --all-features -- -D warnings | Static analysis for bugs and anti-patterns | Catches breaking API changes |
| Documentation | RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps --document-private-items | Ensures all public APIs documented | Documents version-specific changes |
| Dependency check | cargo deny check | Validates licenses and bans | Prevents supply chain issues |
| Security audit | cargo audit | Scans for CVEs in dependencies | Ensures security compliance |
Pre-Release Checklist :
Before incrementing version number in Cargo.toml:
- Run full lint suite:
cargo fmt && cargo clippy --all-features - Test all feature combinations (see CI/CD Pipeline)
- Update CHANGELOG.md:1-82 with changes:
- List breaking changes under
### Breaking - Provide migration steps under
### Migration - Document affected code entities
- List breaking changes under
- Verify backward compatibility claims
- Test migration procedure on sample data store
Testing Matrix Coverage : See CI/CD Pipeline for full matrix of OS and feature combinations tested.
Sources : .github/workflows/rust-lint.yml:1-44 CHANGELOG.md:1-16
Future Considerations
Toward 1.0 Release :
- Embed format version marker in file header
- Implement automatic format detection
- Support multi-version reader capability
- Define stable API surface with backward compatibility guarantees
Deprecation Policy (Post-1.0):
- Major version bumps for breaking changes
- Deprecated features maintained for one major version
- Clear migration paths documented before removals
Sources : CHANGELOG.md:1-5
Dismiss
Refresh this wiki
Enter email to refresh