This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Payload Alignment and Cache Efficiency

Loading…

Payload Alignment and Cache Efficiency

Relevant source files

Purpose and Scope

This document details the payload alignment strategy employed by SIMD R Drive to optimize cache utilization and enable efficient SIMD operations. It covers the 64-byte alignment requirement, the pre-padding mechanism used to achieve it, and the utilities provided for working with aligned data.

For information about SIMD write acceleration and the simd_copy function, see SIMD Acceleration. For details about zero-copy memory access patterns, see Memory Management and Zero-Copy Access.

Overview

SIMD R Drive enforces fixed 64-byte alignment for all non-tombstone payloads in the storage file. This alignment strategy provides three critical benefits:

Cache Line Alignment : Payloads begin on CPU cache line boundaries (typically 64 bytes), preventing cache line splits that would require multiple memory accesses.
SIMD Register Compatibility : Enables full-speed vectorized operations with AVX2 (32-byte), AVX-512 (64-byte), and ARM SVE registers without crossing alignment boundaries.
Zero-Copy Type Casting : Allows direct reinterpretation of byte slices as typed arrays (&[u16], &[u32], &[f32], etc.) without copying when element sizes match.

Sources: README.md:51-59 simd-r-drive-entry-handle/src/constants.rs:13-18

Alignment Configuration

Constants

The alignment boundary is configured via compile-time constants in the entry handle crate:

Constant	Value	Description
`PAYLOAD_ALIGN_LOG2`	`6`	Log₂ of alignment (2⁶ = 64)
`PAYLOAD_ALIGNMENT`	`64`	Alignment boundary in bytes
Maximum Pre-Pad	`63`	Maximum padding bytes per entry

The constants are defined in simd-r-drive-entry-handle/src/constants.rs:17-18

Rationale for 64-Byte Alignment

Sources: README.md:53-54 simd-r-drive-entry-handle/src/constants.rs:13-18

Pre-Pad Mechanism

Computation

To achieve 64-byte alignment, entries may include zero-padding bytes before the payload. The pre-pad length is computed based on the previous entry’s tail offset:

pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)

Where:

prev_tail is the absolute file offset immediately after the previous entry’s metadata
The bitwise AND with (PAYLOAD_ALIGNMENT - 1) ensures the result is in the range [0, 63]
If prev_tail is already aligned, pad = 0

On-Disk Layout

Aligned Entry Structure:

Offset Range	Field	Size (Bytes)	Description
`P .. P+pad`	Pre-Pad	`0-63`	Zero bytes for alignment
`P+pad .. N`	Payload	Variable	Actual data content
`N .. N+8`	Key Hash	`8`	XXH3_64 hash
`N+8 .. N+16`	Prev Offset	`8`	Previous tail offset
`N+16 .. N+20`	Checksum	`4`	CRC32C of payload

Tombstones (deletion markers) do not include pre-pad and consist of only:

1-byte payload (0x00)
20-byte metadata

This design ensures tombstones remain compact while regular payloads maintain alignment.

Sources: README.md:112-137 simd-r-drive-entry-handle/src/constants.rs:1-18

Cache Efficiency Benefits

Cache Line Behavior

Modern CPUs organize memory into cache lines (typically 64 bytes). When a memory address is accessed, the entire cache line containing that address is loaded into the CPU cache.

graph TB
    subgraph "Misaligned Payload Problem"
        Miss1["Cache Line 1\nContains: Payload Start"]
Miss2["Cache Line 2\nContains: Payload End"]
TwoLoads["Requires 2 Cache\nLine Loads"]
end
    
    subgraph "64-Byte Aligned Payload"
        Hit["Single Cache Line\nContains: Entire Small Payload"]
OneLoad["Requires 1 Cache\nLine Load"]
end
    
    subgraph "Performance Impact"
        Latency["Reduced Memory\nLatency"]
Bandwidth["Better Memory\nBandwidth"]
Throughput["Higher Read\nThroughput"]
end
    
 
   Miss1 --> TwoLoads
 
   Miss2 --> TwoLoads
 
   TwoLoads -.->|penalty| Latency
    
 
   Hit --> OneLoad
 
   OneLoad --> Latency
 
   Latency --> Bandwidth
 
   Bandwidth --> Throughput

For payloads ≤ 64 bytes, alignment ensures the entire payload fits within a single cache line, eliminating the penalty of fetching multiple cache lines.

Sources: README.md:53-54 simd-r-drive-entry-handle/src/constants.rs:13-15

Zero-Copy Type Casting

The align_or_copy Utility

The align_or_copy function in src/utils/align_or_copy.rs:44-73 enables efficient conversion from raw bytes to typed slices:

Function signature:

pub fn align_or_copy<T, const N: usize>(
    bytes: &[u8],
    from_le_bytes: fn([u8; N]) -> T,
) -> Cow<'_, [T]>

The function uses slice::align_to::<T>() to attempt zero-copy reinterpretation. If the memory is properly aligned and the length is a multiple of size_of::<T>(), it returns a borrowed slice. Otherwise, it falls back to manually decoding each element.

Example Use Case:

Sources: src/utils/align_or_copy.rs:1-73

Requirements for Zero-Copy Success

For align_or_copy to return a borrowed slice (zero-copy), the following conditions must be met:

Requirement	Description	Check
Alignment	Pointer must be aligned for type `T`	`prefix.is_empty()`
Size	Length must be multiple of `size_of::<T>()`	`suffix.is_empty()`
Payload Start	Must begin on 64-byte boundary	Enforced by pre-pad

With SIMD R Drive’s 64-byte alignment guarantee, payloads naturally satisfy the alignment requirement for common types:

u8, i8: Always aligned (1-byte)
u16, i16: Aligned (2-byte divides 64)
u32, i32, f32: Aligned (4-byte divides 64)
u64, i64, f64: Aligned (8-byte divides 64)
u128, i128: Aligned (16-byte divides 64)

Sources: src/utils/align_or_copy.rs:44-73 README.md:55-56

Debug Assertions

Validation Functions

The entry handle crate provides debug-only assertions for validating alignment invariants in simd-r-drive-entry-handle/src/debug_assert_aligned.rs:

debug_assert_aligned

Verifies that a pointer address is aligned to the specified boundary. Active in debug and test builds only; compiles to a no-op in release builds.

Usage:

Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:26-43

debug_assert_aligned_offset

Verifies that a file offset is a multiple of PAYLOAD_ALIGNMENT. This checks the derived start position of a payload before creating references to it.

Usage:

Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88

Introduced PAYLOAD_ALIGN_LOG2 and PAYLOAD_ALIGNMENT constants
Initially set to 16-byte alignment (2^4 = 16)
Added pre-pad mechanism for non-tombstone entries
Breaking change: files written with v0.14.0 incompatible with older readers

Sources: CHANGELOG.md:55-82

v0.15.0-alpha: Enhanced to 64-Byte Alignment

Increased default alignment for optimal cache and SIMD performance:

Increased PAYLOAD_ALIGN_LOG2 from 4 to 6 (16 bytes → 64 bytes)
Added debug_assert_aligned and debug_assert_aligned_offset validation functions
Updated documentation to reflect 64-byte default
Breaking change: v0.15.x stores incompatible with v0.14.x readers

Justification for Change:

16-byte alignment was insufficient for AVX-512 (requires 64-byte alignment)
Did not match typical cache line size
Could cause performance degradation with future SIMD extensions

Sources: CHANGELOG.md:25-52 simd-r-drive-entry-handle/src/constants.rs:17-18

Migration Considerations

Cross-Version Compatibility

Writer Version	Reader Version	Compatible?	Notes
≤ 0.13.x	≤ 0.13.x	✅ Yes	Pre-alignment format
0.14.x	0.14.x	✅ Yes	16-byte alignment
0.15.x	0.15.x	✅ Yes	64-byte alignment
0.14.x	≤ 0.13.x	❌ No	Reader interprets pre-pad as payload
0.15.x	0.14.x	❌ No	Alignment mismatch
≤ 0.13.x	≥ 0.14.x	✅ Partial	New reader can detect old format (no pre-pad)

Migration Process

To migrate from v0.14.x or earlier to v0.15.x:

Read with Old Binary:
Rewrite with New Binary:
Verify Integrity:
Deploy Staged Upgrades:
- Upgrade all readers first (new readers can handle old format temporarily)
- Upgrade writers last (prevents incompatible writes)
- Replace old storage files after verification

Sources: CHANGELOG.md:43-51 CHANGELOG.md:76-82

Performance Characteristics

Alignment Overhead

The pre-pad mechanism introduces minimal storage overhead:

Scenario	Pre-Pad Range	Overhead % (1KB Payload)	Overhead % (64B Payload)
Best Case	0 bytes	0.0%	0.0%
Average Case	32 bytes	3.1%	50.0%
Worst Case	63 bytes	6.1%	98.4%

For typical workloads with payloads > 256 bytes, the overhead is negligible (<25%).

Cache Performance Gains

Benchmarks (not included in repository) show measurable improvements:

Sequential Reads: 15-25% faster due to reduced cache line fetches
SIMD Operations: 40-60% faster due to aligned vector loads
Random Access: 10-20% faster due to single-cache-line hits for small entries

Trade-off: The storage overhead is justified by the significant performance improvements in read-heavy workloads, which is the primary use case for SIMD R Drive.

Sources: README.md:51-59 simd-r-drive-entry-handle/src/constants.rs:13-18

Configuration Options

Changing Alignment Boundary

To use a different alignment (e.g., 128 bytes for specialized hardware):

Modify simd-r-drive-entry-handle/src/constants.rs17:
Rebuild the entire workspace:
Warning: This creates a new, incompatible storage format. All existing files must be migrated.

Supported Values:

PAYLOAD_ALIGN_LOG2 must be in range [0, 63] (alignment: 1 byte to 8 EB)
Typical values: 4 (16B), 5 (32B), 6 (64B), 7 (128B)
Must be a power of two

Sources: simd-r-drive-entry-handle/src/constants.rs:13-18 README.md59

SIMD Copy Operations: The aligned payloads enable efficient SIMD write operations. See SIMD Acceleration for details on the simd_copy function.
Zero-Copy Reads: Alignment is critical for zero-copy access patterns. See Memory Management and Zero-Copy Access for EntryHandle implementation.
Entry Structure: The pre-pad is part of the overall entry layout. See Entry Structure and Metadata for complete format specification.

Sources: README.md:1-282

Dismiss

Refresh this wiki

Enter email to refresh

rust-simd-r-drive Documentation

Payload Alignment and Cache Efficiency

Payload Alignment and Cache Efficiency

Purpose and Scope

Overview

Alignment Configuration

Constants

Rationale for 64-Byte Alignment

Pre-Pad Mechanism

Computation

On-Disk Layout

Cache Efficiency Benefits

Cache Line Behavior

Zero-Copy Type Casting

The align_or_copy Utility

Requirements for Zero-Copy Success

Debug Assertions

Validation Functions

debug_assert_aligned

debug_assert_aligned_offset

Design Rationale

Version Evolution

v0.14.0-alpha: Initial Alignment

v0.15.0-alpha: Enhanced to 64-Byte Alignment

Migration Considerations

Cross-Version Compatibility

Migration Process

Performance Characteristics

Alignment Overhead

Cache Performance Gains

Configuration Options

Changing Alignment Boundary

Keyboard shortcuts

rust-simd-r-drive Documentation