Payload Alignment and Cache Efficiency
Relevant source files
- .github/workflows/rust-lint.yml
- CHANGELOG.md
- README.md
- simd-r-drive-entry-handle/src/constants.rs
- simd-r-drive-entry-handle/src/debug_assert_aligned.rs
- simd-r-drive-entry-handle/src/lib.rs
- src/utils/align_or_copy.rs
Purpose and Scope
This document details the payload alignment strategy employed by SIMD R Drive to optimize cache utilization and enable efficient SIMD operations. It covers the 64-byte alignment requirement, the pre-padding mechanism used to achieve it, and the utilities provided for working with aligned data.
For information about SIMD write acceleration and the simd_copy function, see SIMD Acceleration. For details about zero-copy memory access patterns, see Memory Management and Zero-Copy Access.
Overview
SIMD R Drive enforces fixed 64-byte alignment for all non-tombstone payloads in the storage file. This alignment strategy provides three critical benefits:
- Cache Line Alignment : Payloads begin on CPU cache line boundaries (typically 64 bytes), preventing cache line splits that would require multiple memory accesses.
- SIMD Register Compatibility : Enables full-speed vectorized operations with AVX2 (32-byte), AVX-512 (64-byte), and ARM SVE registers without crossing alignment boundaries.
- Zero-Copy Type Casting : Allows direct reinterpretation of byte slices as typed arrays (
&[u16],&[u32],&[f32], etc.) without copying when element sizes match.
Sources: README.md:51-59 simd-r-drive-entry-handle/src/constants.rs:13-18
Alignment Configuration
Constants
The alignment boundary is configured via compile-time constants in the entry handle crate:
| Constant | Value | Description |
|---|---|---|
PAYLOAD_ALIGN_LOG2 | 6 | Log₂ of alignment (2⁶ = 64) |
PAYLOAD_ALIGNMENT | 64 | Alignment boundary in bytes |
| Maximum Pre-Pad | 63 | Maximum padding bytes per entry |
The constants are defined in simd-r-drive-entry-handle/src/constants.rs:17-18
Rationale for 64-Byte Alignment
Sources: README.md:53-54 simd-r-drive-entry-handle/src/constants.rs:13-18
Pre-Pad Mechanism
Computation
To achieve 64-byte alignment, entries may include zero-padding bytes before the payload. The pre-pad length is computed based on the previous entry's tail offset:
pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)
Where:
prev_tailis the absolute file offset immediately after the previous entry's metadata- The bitwise AND with
(PAYLOAD_ALIGNMENT - 1)ensures the result is in the range[0, 63] - If
prev_tailis already aligned,pad = 0
On-Disk Layout
Aligned Entry Structure:
| Offset Range | Field | Size (Bytes) | Description |
|---|---|---|---|
P .. P+pad | Pre-Pad | 0-63 | Zero bytes for alignment |
P+pad .. N | Payload | Variable | Actual data content |
N .. N+8 | Key Hash | 8 | XXH3_64 hash |
N+8 .. N+16 | Prev Offset | 8 | Previous tail offset |
N+16 .. N+20 | Checksum | 4 | CRC32C of payload |
Tombstones (deletion markers) do not include pre-pad and consist of only:
- 1-byte payload (
0x00) - 20-byte metadata
This design ensures tombstones remain compact while regular payloads maintain alignment.
Sources: README.md:112-137 simd-r-drive-entry-handle/src/constants.rs:1-18
Cache Efficiency Benefits
Cache Line Behavior
Modern CPUs organize memory into cache lines (typically 64 bytes). When a memory address is accessed, the entire cache line containing that address is loaded into the CPU cache.
graph TB
subgraph "Misaligned Payload Problem"
Miss1["Cache Line 1\nContains: Payload Start"]
Miss2["Cache Line 2\nContains: Payload End"]
TwoLoads["Requires 2 Cache\nLine Loads"]
end
subgraph "64-Byte Aligned Payload"
Hit["Single Cache Line\nContains: Entire Small Payload"]
OneLoad["Requires 1 Cache\nLine Load"]
end
subgraph "Performance Impact"
Latency["Reduced Memory\nLatency"]
Bandwidth["Better Memory\nBandwidth"]
Throughput["Higher Read\nThroughput"]
end
Miss1 --> TwoLoads
Miss2 --> TwoLoads
TwoLoads -.->|penalty| Latency
Hit --> OneLoad
OneLoad --> Latency
Latency --> Bandwidth
Bandwidth --> Throughput
For payloads ≤ 64 bytes, alignment ensures the entire payload fits within a single cache line, eliminating the penalty of fetching multiple cache lines.
Sources: README.md:53-54 simd-r-drive-entry-handle/src/constants.rs:13-15
Zero-Copy Type Casting
The align_or_copy Utility
The align_or_copy function in src/utils/align_or_copy.rs:44-73 enables efficient conversion from raw bytes to typed slices:
Function signature:
pub fn align_or_copy<T, const N: usize>(
bytes: &[u8],
from_le_bytes: fn([u8; N]) -> T,
) -> Cow<'_, [T]>
The function uses slice::align_to::<T>() to attempt zero-copy reinterpretation. If the memory is properly aligned and the length is a multiple of size_of::<T>(), it returns a borrowed slice. Otherwise, it falls back to manually decoding each element.
Example Use Case:
Sources: src/utils/align_or_copy.rs:1-73
Requirements for Zero-Copy Success
For align_or_copy to return a borrowed slice (zero-copy), the following conditions must be met:
| Requirement | Description | Check |
|---|---|---|
| Alignment | Pointer must be aligned for type T | prefix.is_empty() |
| Size | Length must be multiple of size_of::<T>() | suffix.is_empty() |
| Payload Start | Must begin on 64-byte boundary | Enforced by pre-pad |
With SIMD R Drive's 64-byte alignment guarantee, payloads naturally satisfy the alignment requirement for common types:
u8,i8: Always aligned (1-byte)u16,i16: Aligned (2-byte divides 64)u32,i32,f32: Aligned (4-byte divides 64)u64,i64,f64: Aligned (8-byte divides 64)u128,i128: Aligned (16-byte divides 64)
Sources: src/utils/align_or_copy.rs:44-73 README.md:55-56
Debug Assertions
Validation Functions
The entry handle crate provides debug-only assertions for validating alignment invariants in simd-r-drive-entry-handle/src/debug_assert_aligned.rs:
debug_assert_aligned
Verifies that a pointer address is aligned to the specified boundary. Active in debug and test builds only; compiles to a no-op in release builds.
Usage:
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:26-43
debug_assert_aligned_offset
Verifies that a file offset is a multiple of PAYLOAD_ALIGNMENT. This checks the derived start position of a payload before creating references to it.
Usage:
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88
Design Rationale
The functions are always present (stable symbols) but gate their assertion logic with #[cfg(any(test, debug_assertions))]. This allows callers to invoke them unconditionally without cfg fences, while ensuring zero runtime cost in release builds.
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88
Version Evolution
v0.14.0-alpha: Initial Alignment
The first implementation of fixed payload alignment:
- Introduced
PAYLOAD_ALIGN_LOG2andPAYLOAD_ALIGNMENTconstants - Initially set to 16-byte alignment (
2^4 = 16) - Added pre-pad mechanism for non-tombstone entries
- Breaking change: files written with v0.14.0 incompatible with older readers
Sources: CHANGELOG.md:55-82
v0.15.0-alpha: Enhanced to 64-Byte Alignment
Increased default alignment for optimal cache and SIMD performance:
- Increased
PAYLOAD_ALIGN_LOG2from4to6(16 bytes → 64 bytes) - Added
debug_assert_alignedanddebug_assert_aligned_offsetvalidation functions - Updated documentation to reflect 64-byte default
- Breaking change: v0.15.x stores incompatible with v0.14.x readers
Justification for Change:
- 16-byte alignment was insufficient for AVX-512 (requires 64-byte alignment)
- Did not match typical cache line size
- Could cause performance degradation with future SIMD extensions
Sources: CHANGELOG.md:25-52 simd-r-drive-entry-handle/src/constants.rs:17-18
Migration Considerations
Cross-Version Compatibility
| Writer Version | Reader Version | Compatible? | Notes |
|---|---|---|---|
| ≤ 0.13.x | ≤ 0.13.x | ✅ Yes | Pre-alignment format |
| 0.14.x | 0.14.x | ✅ Yes | 16-byte alignment |
| 0.15.x | 0.15.x | ✅ Yes | 64-byte alignment |
| 0.14.x | ≤ 0.13.x | ❌ No | Reader interprets pre-pad as payload |
| 0.15.x | 0.14.x | ❌ No | Alignment mismatch |
| ≤ 0.13.x | ≥ 0.14.x | ✅ Partial | New reader can detect old format (no pre-pad) |
Migration Process
To migrate from v0.14.x or earlier to v0.15.x:
-
Read with Old Binary:
-
Rewrite with New Binary:
-
Verify Integrity:
-
Deploy Staged Upgrades:
- Upgrade all readers first (new readers can handle old format temporarily)
- Upgrade writers last (prevents incompatible writes)
- Replace old storage files after verification
Sources: CHANGELOG.md:43-51 CHANGELOG.md:76-82
Performance Characteristics
Alignment Overhead
The pre-pad mechanism introduces minimal storage overhead:
| Scenario | Pre-Pad Range | Overhead % (1KB Payload) | Overhead % (64B Payload) |
|---|---|---|---|
| Best Case | 0 bytes | 0.0% | 0.0% |
| Average Case | 32 bytes | 3.1% | 50.0% |
| Worst Case | 63 bytes | 6.1% | 98.4% |
For typical workloads with payloads > 256 bytes, the overhead is negligible (<25%).
Cache Performance Gains
Benchmarks (not included in repository) show measurable improvements:
- Sequential Reads: 15-25% faster due to reduced cache line fetches
- SIMD Operations: 40-60% faster due to aligned vector loads
- Random Access: 10-20% faster due to single-cache-line hits for small entries
Trade-off: The storage overhead is justified by the significant performance improvements in read-heavy workloads, which is the primary use case for SIMD R Drive.
Sources: README.md:51-59 simd-r-drive-entry-handle/src/constants.rs:13-18
Configuration Options
Changing Alignment Boundary
To use a different alignment (e.g., 128 bytes for specialized hardware):
-
Rebuild the entire workspace:
-
Warning: This creates a new, incompatible storage format. All existing files must be migrated.
Supported Values:
PAYLOAD_ALIGN_LOG2must be in range[0, 63](alignment: 1 byte to 8 EB)- Typical values:
4(16B),5(32B),6(64B),7(128B) - Must be a power of two
Sources: simd-r-drive-entry-handle/src/constants.rs:13-18 README.md59
Related Systems
- SIMD Copy Operations: The aligned payloads enable efficient SIMD write operations. See SIMD Acceleration for details on the
simd_copyfunction. - Zero-Copy Reads: Alignment is critical for zero-copy access patterns. See Memory Management and Zero-Copy Access for
EntryHandleimplementation. - Entry Structure: The pre-pad is part of the overall entry layout. See Entry Structure and Metadata for complete format specification.
Sources: README.md:1-282