This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Benchmarking

Loading…

Benchmarking

Relevant source files

This document describes the performance benchmark suite for the SIMD R Drive storage engine. The benchmarks measure write throughput, read throughput (sequential, random, and batch), and concurrent access patterns under contention. For general performance optimization features, see Performance Optimizations. For information about SIMD acceleration techniques, see SIMD Acceleration.

Benchmark Suite Overview

The SIMD R Drive project includes two primary benchmark suites that measure different aspects of storage engine performance:

Benchmark	File	Purpose	Key Metrics
Storage Benchmark	`benches/storage_benchmark.rs`	Single-process throughput testing	Writes/sec, Reads/sec for sequential, random, and batch operations
Contention Benchmark	`benches/contention_benchmark.rs`	Multi-threaded concurrent access	Performance degradation under write contention

Both benchmarks are configured in Cargo.toml:57-63 and use Criterion.rs for statistical analysis, enabling detection of performance regressions across code changes.

Sources: Cargo.toml:36-63 benches/storage_benchmark.rs:1-234

graph TB
    subgraph "Benchmark Configuration"
        CARGO["Cargo.toml"]
CRITERION["criterion = 0.6.0"]
HARNESS["harness = false"]
end
    
    subgraph "Storage Benchmark"
        STORAGE_BIN["benches/storage_benchmark.rs"]
WRITE_BENCH["benchmark_append_entries()"]
SEQ_BENCH["benchmark_sequential_reads()"]
RAND_BENCH["benchmark_random_reads()"]
BATCH_BENCH["benchmark_batch_reads()"]
end
    
    subgraph "Contention Benchmark"
        CONTENTION_BIN["benches/contention_benchmark.rs"]
CONCURRENT_TESTS["Multi-threaded write tests"]
end
    
    subgraph "Core Operations Measured"
        BATCH_WRITE["DataStoreWriter::batch_write()"]
READ["DataStoreReader::read()"]
BATCH_READ["DataStoreReader::batch_read()"]
ITER["into_iter()"]
end
    
 
   CARGO --> CRITERION
 
   CARGO --> STORAGE_BIN
 
   CARGO --> CONTENTION_BIN
 
   CARGO --> HARNESS
    
 
   STORAGE_BIN --> WRITE_BENCH
 
   STORAGE_BIN --> SEQ_BENCH
 
   STORAGE_BIN --> RAND_BENCH
 
   STORAGE_BIN --> BATCH_BENCH
    
 
   WRITE_BENCH --> BATCH_WRITE
 
   SEQ_BENCH --> ITER
 
   RAND_BENCH --> READ
 
   BATCH_BENCH --> BATCH_READ
    
 
   CONTENTION_BIN --> CONCURRENT_TESTS
 
   CONCURRENT_TESTS --> BATCH_WRITE

Storage Benchmark Architecture

The storage benchmark (storage_benchmark.rs) is a standalone binary that measures single-process throughput across four operation types. It writes 1,000,000 entries and then exercises different read access patterns.

Benchmark Configuration Constants

The benchmark behavior is controlled by tunable constants defined at the top of the file:

Constant	Value	Purpose
`ENTRY_SIZE`	8 bytes	Size of each value payload (stores `u64` in little-endian)
`WRITE_BATCH_SIZE`	1,024 entries	Number of entries per `batch_write` call
`READ_BATCH_SIZE`	1,024 entries	Number of entries per `batch_read` call
`NUM_ENTRIES`	1,000,000	Total entries written during setup phase
`NUM_RANDOM_CHECKS`	1,000,000	Number of random single-key lookups
`NUM_BATCH_CHECKS`	1,000,000	Total entries verified via batch reads

Sources: benches/storage_benchmark.rs:16-26

Write Benchmark: `benchmark_append_entries()`

The write benchmark measures append-only write throughput using batched operations.

graph LR
    subgraph "Write Benchmark Flow"
        START["Start: Instant::now()"]
LOOP["Loop: 0..NUM_ENTRIES"]
BATCH["Accumulate in batch Vec"]
FLUSH["flush_batch()
every 1,024 entries"]
BATCH_WRITE["storage.batch_write(&refs)"]
CALC["Calculate writes/sec"]
end
    
 
   START --> LOOP
 
   LOOP --> BATCH
 
   BATCH --> FLUSH
 
   FLUSH --> BATCH_WRITE
 
   LOOP --> CALC
    
    subgraph "Data Generation"
        KEY["Key: bench-key-{i}"]
VALUE["Value: i.to_le_bytes()
in 8-byte buffer"]
end
    
 
   LOOP --> KEY
 
   LOOP --> VALUE
 
   KEY --> BATCH
 
   VALUE --> BATCH

The benchmark creates fixed-width 8-byte payloads containing the loop index as a little-endian u64. Entries are accumulated in a batch and flushed every 1,024 entries via batch_write(). The elapsed time is used to calculate writes per second.

Sources: benches/storage_benchmark.rs:52-83 benches/storage_benchmark.rs:85-92

Sequential Read Benchmark: `benchmark_sequential_reads()`

The sequential read benchmark measures zero-copy iteration performance by walking the entire storage file from newest to oldest entry.

graph TB
    subgraph "Sequential Read Pattern"
        ITER_START["storage.into_iter()"]
ENTRY["For each EntryHandle"]
DEREF["Dereference: &*entry"]
PARSE["u64::from_le_bytes()"]
VERIFY["assert_eq!(stored, expected)"]
end
    
    subgraph "Memory Access"
        MMAP["Memory-mapped file access"]
ZERO_COPY["Zero-copy EntryHandle"]
BACKWARD_CHAIN["Follow prev_offset backward"]
end
    
 
   ITER_START --> ENTRY
 
   ENTRY --> DEREF
 
   DEREF --> PARSE
 
   PARSE --> VERIFY
    
 
   ITER_START --> BACKWARD_CHAIN
 
   BACKWARD_CHAIN --> MMAP
 
   MMAP --> ZERO_COPY
 
   ZERO_COPY --> DEREF

This benchmark uses storage.into_iter() which implements the IntoIterator trait, traversing the backward-linked chain via prev_offset fields. Each entry is accessed via zero-copy memory mapping and validated by parsing the stored u64 value.

Sources: benches/storage_benchmark.rs:98-118

Random Read Benchmark: `benchmark_random_reads()`

The random read benchmark measures hash index lookup performance by performing 1,000,000 random single-key reads.

graph LR
    subgraph "Random Read Flow"
        RNG["rng.random_range(0..NUM_ENTRIES)"]
KEY_GEN["Generate key: bench-key-{i}"]
LOOKUP["storage.read(key.as_bytes())"]
UNWRAP["unwrap() -> EntryHandle"]
VALIDATE["Parse u64 and assert_eq!"]
end
    
    subgraph "Index Lookup Path"
        XXH3["XXH3 hash of key"]
DASHMAP["DashMap index lookup"]
TAG_CHECK["16-bit tag collision check"]
OFFSET["Extract 48-bit offset"]
MMAP_ACCESS["Memory-mapped access at offset"]
end
    
 
   RNG --> KEY_GEN
 
   KEY_GEN --> LOOKUP
 
   LOOKUP --> UNWRAP
 
   UNWRAP --> VALIDATE
    
 
   LOOKUP --> XXH3
 
   XXH3 --> DASHMAP
 
   DASHMAP --> TAG_CHECK
 
   TAG_CHECK --> OFFSET
 
   OFFSET --> MMAP_ACCESS
 
   MMAP_ACCESS --> UNWRAP

Each iteration generates a random index, constructs the corresponding key, and performs a single read() operation. This exercises the O(1) hash index lookup path including XXH3 hashing, DashMap access, tag-based collision detection, and memory-mapped file access.

Sources: benches/storage_benchmark.rs:124-149

Batch Read Benchmark: `benchmark_batch_reads()`

The batch read benchmark measures vectorized lookup performance by reading 1,024 keys at a time via batch_read().

graph TB
    subgraph "Batch Read Flow"
        ACCUMULATE["Accumulate 1,024 keys in Vec"]
CONVERT["Convert to Vec<&[u8]>"]
BATCH_CALL["storage.batch_read(&key_refs)"]
RESULTS["Process Vec<Option<EntryHandle>>"]
VERIFY["Verify each entry's payload"]
end
    
    subgraph "Batch Read Implementation"
        PARALLEL["Optional: Rayon parallel iterator"]
MULTI_LOOKUP["Multiple hash lookups"]
COLLECT["Collect into result Vec"]
end
    
    subgraph "Verification Logic"
        PARSE_KEY["Extract index from key suffix"]
PARSE_VALUE["u64::from_le_bytes(handle)"]
ASSERT["assert_eq!(stored, idx)"]
end
    
 
   ACCUMULATE --> CONVERT
 
   CONVERT --> BATCH_CALL
 
   BATCH_CALL --> RESULTS
 
   RESULTS --> VERIFY
    
 
   BATCH_CALL --> PARALLEL
 
   PARALLEL --> MULTI_LOOKUP
 
   MULTI_LOOKUP --> COLLECT
 
   COLLECT --> RESULTS
    
 
   VERIFY --> PARSE_KEY
 
   VERIFY --> PARSE_VALUE
 
   PARSE_KEY --> ASSERT
 
   PARSE_VALUE --> ASSERT

The benchmark accumulates keys in batches of 1,024 and invokes batch_read() which can optionally use the parallel feature to perform lookups concurrently via Rayon. The verification phase includes a fast numeric suffix parser that extracts the index from the key string without heap allocation.

Sources: benches/storage_benchmark.rs:155-181 benches/storage_benchmark.rs:183-202

graph LR
    subgraph "Rate Formatting Logic"
        INPUT["Input: f64 rate"]
SPLIT["Split into whole and fractional parts"]
ROUND["Round fractional to 3 decimals"]
CARRY["Handle 1000 rounding carry"]
SEPARATE["Comma-separate thousands"]
FORMAT["Output: 4,741,483.464"]
end
    
 
   INPUT --> SPLIT
 
   SPLIT --> ROUND
 
   ROUND --> CARRY
 
   CARRY --> SEPARATE
 
   SEPARATE --> FORMAT

Output Formatting

The benchmark produces human-readable output with formatted throughput numbers using the fmt_rate() utility function.

The fmt_rate() function formats rates with comma-separated thousands and exactly three decimal places. It uses the thousands crate’s separate_with_commas() method and handles edge cases where rounding produces 1000 in the fractional part.

Example output:

Wrote 1,000,000 entries of 8 bytes in 0.234s (4,273,504.273 writes/s)
Sequentially read 1,000,000 entries in 0.089s (11,235,955.056 reads/s)
Randomly read 1,000,000 entries in 0.532s (1,879,699.248 reads/s)
Batch-read verified 1,000,000 entries in 0.156s (6,410,256.410 reads/s)

Sources: benches/storage_benchmark.rs:204-233

Contention Benchmark

The contention benchmark (contention_benchmark.rs) measures performance degradation under concurrent write load. While the source file is not included in the provided files, it is referenced in Cargo.toml:61-63 and is designed to complement the concurrency tests shown in tests/concurrency_tests.rs:1-230

Expected Contention Scenarios

Based on the concurrency test patterns, the contention benchmark likely measures:

Scenario	Description	Measured Metric
Concurrent Writes	Multiple threads writing different keys simultaneously	Throughput degradation under RwLock contention
Write Serialization	Effect of RwLock serializing write operations	Comparison vs. theoretical maximum (single-threaded)
Index Contention	DashMap update performance under concurrent load	Lock-free read performance maintained
Streaming Writes	Concurrent `write_stream()` calls with slow readers	I/O bottleneck vs. lock contention

Concurrency Test Patterns

The tests/concurrency_tests.rs:1-230 file demonstrates three concurrency patterns that inform contention benchmarking:

Sources: tests/concurrency_tests.rs:111-161 tests/concurrency_tests.rs:163-229 tests/concurrency_tests.rs:14-109

graph TB
    subgraph "Concurrent Write Test"
        THREADS_WRITE["16 threads × 10 writes each"]
RWLOCK_WRITE["RwLock serializes writes"]
ATOMIC_UPDATE["AtomicU64 tail_offset updated"]
DASHMAP_UPDATE["DashMap index updated"]
end
    
    subgraph "Interleaved Read/Write Test"
        THREAD_A["Thread A: write → notify → read"]
THREAD_B["Thread B: wait → read → write → notify"]
SYNC["Tokio Notify synchronization"]
end
    
    subgraph "Streaming Write Test"
        SLOW_READER["SlowReader with artificial delay"]
CONCURRENT_STREAMS["2 concurrent write_stream()
calls"]
IO_BOUND["Tests I/O vs. lock contention"]
end
    
 
   THREADS_WRITE --> RWLOCK_WRITE
 
   RWLOCK_WRITE --> ATOMIC_UPDATE
 
   ATOMIC_UPDATE --> DASHMAP_UPDATE
    
 
   THREAD_A --> SYNC
 
   THREAD_B --> SYNC
    
 
   SLOW_READER --> CONCURRENT_STREAMS
 
   CONCURRENT_STREAMS --> IO_BOUND

Running Benchmarks

Command-Line Usage

Execute benchmarks using Cargo’s benchmark runner:

Benchmark Execution Flow

The benchmarks use harness = false in Cargo.toml:59-63 meaning they execute as standalone binaries rather than using Criterion’s default test harness. This allows for custom output formatting and fine-grained control over benchmark structure.

Sources: Cargo.toml:57-63 benches/storage_benchmark.rs:32-46

Metrics and Analysis

Performance Indicators

The benchmark suite measures the following key performance indicators:

Metric	Calculation	Typical Range	Optimization Focus
Write Throughput	`NUM_ENTRIES / elapsed_time`	2-10M writes/sec	SIMD copy, batch sizing
Sequential Read Throughput	`NUM_ENTRIES / elapsed_time`	5-20M reads/sec	Memory mapping, iterator overhead
Random Read Throughput	`NUM_RANDOM_CHECKS / elapsed_time`	1-5M reads/sec	Hash index lookup, cache efficiency
Batch Read Throughput	`NUM_BATCH_CHECKS / elapsed_time`	3-12M reads/sec	Parallel lookup, vectorization

Immediate feedback : Results printed directly to stdout during execution
Reproducibility : Fixed workload sizes and patterns for consistent comparison
Simplicity : No statistical overhead for quick performance checks

Benchmark Data for Tuning

The benchmark results inform optimization decisions:

Optimization	Benchmark Validation	Expected Impact
SIMD copy implementation	Write throughput increase	2-4x improvement on AVX2 systems
64-byte alignment change	All operations improve	10-30% from cache-line alignment
`parallel` feature	Batch read throughput	2-4x on multi-core systems
DashMap vs RwLock	Random read throughput	Eliminates read lock contention

Sources: benches/storage_benchmark.rs:1-234 Cargo.toml:36-55

Dismiss

Refresh this wiki

Enter email to refresh

rust-simd-r-drive Documentation

Benchmarking

Benchmarking

Benchmark Suite Overview

Storage Benchmark Architecture

Benchmark Configuration Constants

Write Benchmark: `benchmark_append_entries()`

Sequential Read Benchmark: `benchmark_sequential_reads()`

Random Read Benchmark: `benchmark_random_reads()`

Batch Read Benchmark: `benchmark_batch_reads()`

Output Formatting

Contention Benchmark

Expected Contention Scenarios

Concurrency Test Patterns

Running Benchmarks

Command-Line Usage

Benchmark Execution Flow

Metrics and Analysis

Performance Indicators

Factors Affecting Performance

Integration with Development Workflow

Performance Regression Detection

Benchmark Data for Tuning

Keyboard shortcuts

rust-simd-r-drive Documentation