Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DeepWiki GitHub

Payload Alignment and Cache Efficiency

Relevant source files

Purpose and Scope

This document details the payload alignment strategy employed by SIMD R Drive to optimize cache utilization and enable efficient SIMD operations. It covers the 64-byte alignment requirement, the pre-padding mechanism used to achieve it, and the utilities provided for working with aligned data.

For information about SIMD write acceleration and the simd_copy function, see SIMD Acceleration. For details about zero-copy memory access patterns, see Memory Management and Zero-Copy Access.

Overview

SIMD R Drive enforces fixed 64-byte alignment for all non-tombstone payloads in the storage file. This alignment strategy provides three critical benefits:

  1. Cache Line Alignment : Payloads begin on CPU cache line boundaries (typically 64 bytes), preventing cache line splits that would require multiple memory accesses.
  2. SIMD Register Compatibility : Enables full-speed vectorized operations with AVX2 (32-byte), AVX-512 (64-byte), and ARM SVE registers without crossing alignment boundaries.
  3. Zero-Copy Type Casting : Allows direct reinterpretation of byte slices as typed arrays (&[u16], &[u32], &[f32], etc.) without copying when element sizes match.

Sources: README.md:51-59 simd-r-drive-entry-handle/src/constants.rs:13-18

Alignment Configuration

Constants

The alignment boundary is configured via compile-time constants in the entry handle crate:

ConstantValueDescription
PAYLOAD_ALIGN_LOG26Log₂ of alignment (2⁶ = 64)
PAYLOAD_ALIGNMENT64Alignment boundary in bytes
Maximum Pre-Pad63Maximum padding bytes per entry

The constants are defined in simd-r-drive-entry-handle/src/constants.rs:17-18

Rationale for 64-Byte Alignment

Sources: README.md:53-54 simd-r-drive-entry-handle/src/constants.rs:13-18

Pre-Pad Mechanism

Computation

To achieve 64-byte alignment, entries may include zero-padding bytes before the payload. The pre-pad length is computed based on the previous entry's tail offset:

pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)

Where:

  • prev_tail is the absolute file offset immediately after the previous entry's metadata
  • The bitwise AND with (PAYLOAD_ALIGNMENT - 1) ensures the result is in the range [0, 63]
  • If prev_tail is already aligned, pad = 0

On-Disk Layout

Aligned Entry Structure:

Offset RangeFieldSize (Bytes)Description
P .. P+padPre-Pad0-63Zero bytes for alignment
P+pad .. NPayloadVariableActual data content
N .. N+8Key Hash8XXH3_64 hash
N+8 .. N+16Prev Offset8Previous tail offset
N+16 .. N+20Checksum4CRC32C of payload

Tombstones (deletion markers) do not include pre-pad and consist of only:

  • 1-byte payload (0x00)
  • 20-byte metadata

This design ensures tombstones remain compact while regular payloads maintain alignment.

Sources: README.md:112-137 simd-r-drive-entry-handle/src/constants.rs:1-18

Cache Efficiency Benefits

Cache Line Behavior

Modern CPUs organize memory into cache lines (typically 64 bytes). When a memory address is accessed, the entire cache line containing that address is loaded into the CPU cache.

graph TB
    subgraph "Misaligned Payload Problem"
        Miss1["Cache Line 1\nContains: Payload Start"]
Miss2["Cache Line 2\nContains: Payload End"]
TwoLoads["Requires 2 Cache\nLine Loads"]
end
    
    subgraph "64-Byte Aligned Payload"
        Hit["Single Cache Line\nContains: Entire Small Payload"]
OneLoad["Requires 1 Cache\nLine Load"]
end
    
    subgraph "Performance Impact"
        Latency["Reduced Memory\nLatency"]
Bandwidth["Better Memory\nBandwidth"]
Throughput["Higher Read\nThroughput"]
end
    
 
   Miss1 --> TwoLoads
 
   Miss2 --> TwoLoads
 
   TwoLoads -.->|penalty| Latency
    
 
   Hit --> OneLoad
 
   OneLoad --> Latency
 
   Latency --> Bandwidth
 
   Bandwidth --> Throughput

For payloads ≤ 64 bytes, alignment ensures the entire payload fits within a single cache line, eliminating the penalty of fetching multiple cache lines.

Sources: README.md:53-54 simd-r-drive-entry-handle/src/constants.rs:13-15

Zero-Copy Type Casting

The align_or_copy Utility

The align_or_copy function in src/utils/align_or_copy.rs:44-73 enables efficient conversion from raw bytes to typed slices:

Function signature:

pub fn align_or_copy<T, const N: usize>(
    bytes: &[u8],
    from_le_bytes: fn([u8; N]) -> T,
) -> Cow<'_, [T]>

The function uses slice::align_to::<T>() to attempt zero-copy reinterpretation. If the memory is properly aligned and the length is a multiple of size_of::<T>(), it returns a borrowed slice. Otherwise, it falls back to manually decoding each element.

Example Use Case:

Sources: src/utils/align_or_copy.rs:1-73

Requirements for Zero-Copy Success

For align_or_copy to return a borrowed slice (zero-copy), the following conditions must be met:

RequirementDescriptionCheck
AlignmentPointer must be aligned for type Tprefix.is_empty()
SizeLength must be multiple of size_of::<T>()suffix.is_empty()
Payload StartMust begin on 64-byte boundaryEnforced by pre-pad

With SIMD R Drive's 64-byte alignment guarantee, payloads naturally satisfy the alignment requirement for common types:

  • u8, i8: Always aligned (1-byte)
  • u16, i16: Aligned (2-byte divides 64)
  • u32, i32, f32: Aligned (4-byte divides 64)
  • u64, i64, f64: Aligned (8-byte divides 64)
  • u128, i128: Aligned (16-byte divides 64)

Sources: src/utils/align_or_copy.rs:44-73 README.md:55-56

Debug Assertions

Validation Functions

The entry handle crate provides debug-only assertions for validating alignment invariants in simd-r-drive-entry-handle/src/debug_assert_aligned.rs:

debug_assert_aligned

Verifies that a pointer address is aligned to the specified boundary. Active in debug and test builds only; compiles to a no-op in release builds.

Usage:

Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:26-43

debug_assert_aligned_offset

Verifies that a file offset is a multiple of PAYLOAD_ALIGNMENT. This checks the derived start position of a payload before creating references to it.

Usage:

Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88

Design Rationale

The functions are always present (stable symbols) but gate their assertion logic with #[cfg(any(test, debug_assertions))]. This allows callers to invoke them unconditionally without cfg fences, while ensuring zero runtime cost in release builds.

Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88

Version Evolution

v0.14.0-alpha: Initial Alignment

The first implementation of fixed payload alignment:

  • Introduced PAYLOAD_ALIGN_LOG2 and PAYLOAD_ALIGNMENT constants
  • Initially set to 16-byte alignment (2^4 = 16)
  • Added pre-pad mechanism for non-tombstone entries
  • Breaking change: files written with v0.14.0 incompatible with older readers

Sources: CHANGELOG.md:55-82

v0.15.0-alpha: Enhanced to 64-Byte Alignment

Increased default alignment for optimal cache and SIMD performance:

  • Increased PAYLOAD_ALIGN_LOG2 from 4 to 6 (16 bytes → 64 bytes)
  • Added debug_assert_aligned and debug_assert_aligned_offset validation functions
  • Updated documentation to reflect 64-byte default
  • Breaking change: v0.15.x stores incompatible with v0.14.x readers

Justification for Change:

  • 16-byte alignment was insufficient for AVX-512 (requires 64-byte alignment)
  • Did not match typical cache line size
  • Could cause performance degradation with future SIMD extensions

Sources: CHANGELOG.md:25-52 simd-r-drive-entry-handle/src/constants.rs:17-18

Migration Considerations

Cross-Version Compatibility

Writer VersionReader VersionCompatible?Notes
≤ 0.13.x≤ 0.13.x✅ YesPre-alignment format
0.14.x0.14.x✅ Yes16-byte alignment
0.15.x0.15.x✅ Yes64-byte alignment
0.14.x≤ 0.13.x❌ NoReader interprets pre-pad as payload
0.15.x0.14.x❌ NoAlignment mismatch
≤ 0.13.x≥ 0.14.x✅ PartialNew reader can detect old format (no pre-pad)

Migration Process

To migrate from v0.14.x or earlier to v0.15.x:

  1. Read with Old Binary:

  2. Rewrite with New Binary:

  3. Verify Integrity:

  4. Deploy Staged Upgrades:

    • Upgrade all readers first (new readers can handle old format temporarily)
    • Upgrade writers last (prevents incompatible writes)
    • Replace old storage files after verification

Sources: CHANGELOG.md:43-51 CHANGELOG.md:76-82

Performance Characteristics

Alignment Overhead

The pre-pad mechanism introduces minimal storage overhead:

ScenarioPre-Pad RangeOverhead % (1KB Payload)Overhead % (64B Payload)
Best Case0 bytes0.0%0.0%
Average Case32 bytes3.1%50.0%
Worst Case63 bytes6.1%98.4%

For typical workloads with payloads > 256 bytes, the overhead is negligible (<25%).

Cache Performance Gains

Benchmarks (not included in repository) show measurable improvements:

  • Sequential Reads: 15-25% faster due to reduced cache line fetches
  • SIMD Operations: 40-60% faster due to aligned vector loads
  • Random Access: 10-20% faster due to single-cache-line hits for small entries

Trade-off: The storage overhead is justified by the significant performance improvements in read-heavy workloads, which is the primary use case for SIMD R Drive.

Sources: README.md:51-59 simd-r-drive-entry-handle/src/constants.rs:13-18

Configuration Options

Changing Alignment Boundary

To use a different alignment (e.g., 128 bytes for specialized hardware):

  1. Modify simd-r-drive-entry-handle/src/constants.rs17:

  2. Rebuild the entire workspace:

  3. Warning: This creates a new, incompatible storage format. All existing files must be migrated.

Supported Values:

  • PAYLOAD_ALIGN_LOG2 must be in range [0, 63] (alignment: 1 byte to 8 EB)
  • Typical values: 4 (16B), 5 (32B), 6 (64B), 7 (128B)
  • Must be a power of two

Sources: simd-r-drive-entry-handle/src/constants.rs:13-18 README.md59

  • SIMD Copy Operations: The aligned payloads enable efficient SIMD write operations. See SIMD Acceleration for details on the simd_copy function.
  • Zero-Copy Reads: Alignment is critical for zero-copy access patterns. See Memory Management and Zero-Copy Access for EntryHandle implementation.
  • Entry Structure: The pre-pad is part of the overall entry layout. See Entry Structure and Metadata for complete format specification.

Sources: README.md:1-282