This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Payload Alignment and Cache Efficiency
Loading…
Payload Alignment and Cache Efficiency
Relevant source files
- .github/workflows/rust-lint.yml
- CHANGELOG.md
- README.md
- simd-r-drive-entry-handle/src/constants.rs
- simd-r-drive-entry-handle/src/debug_assert_aligned.rs
- simd-r-drive-entry-handle/src/lib.rs
- src/utils/align_or_copy.rs
Purpose and Scope
This document details the payload alignment strategy employed by SIMD R Drive to optimize cache utilization and enable efficient SIMD operations. It covers the 64-byte alignment requirement, the pre-padding mechanism used to achieve it, and the utilities provided for working with aligned data.
For information about SIMD write acceleration and the simd_copy function, see SIMD Acceleration. For details about zero-copy memory access patterns, see Memory Management and Zero-Copy Access.
Overview
SIMD R Drive enforces fixed 64-byte alignment for all non-tombstone payloads in the storage file. This alignment strategy provides three critical benefits:
- Cache Line Alignment : Payloads begin on CPU cache line boundaries (typically 64 bytes), preventing cache line splits that would require multiple memory accesses.
- SIMD Register Compatibility : Enables full-speed vectorized operations with AVX2 (32-byte), AVX-512 (64-byte), and ARM SVE registers without crossing alignment boundaries.
- Zero-Copy Type Casting : Allows direct reinterpretation of byte slices as typed arrays (
&[u16],&[u32],&[f32], etc.) without copying when element sizes match.
Sources: README.md:51-59 simd-r-drive-entry-handle/src/constants.rs:13-18
Alignment Configuration
Constants
The alignment boundary is configured via compile-time constants in the entry handle crate:
| Constant | Value | Description |
|---|---|---|
PAYLOAD_ALIGN_LOG2 | 6 | Log₂ of alignment (2⁶ = 64) |
PAYLOAD_ALIGNMENT | 64 | Alignment boundary in bytes |
| Maximum Pre-Pad | 63 | Maximum padding bytes per entry |
The constants are defined in simd-r-drive-entry-handle/src/constants.rs:17-18
Rationale for 64-Byte Alignment
Sources: README.md:53-54 simd-r-drive-entry-handle/src/constants.rs:13-18
Pre-Pad Mechanism
Computation
To achieve 64-byte alignment, entries may include zero-padding bytes before the payload. The pre-pad length is computed based on the previous entry’s tail offset:
pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)
Where:
prev_tailis the absolute file offset immediately after the previous entry’s metadata- The bitwise AND with
(PAYLOAD_ALIGNMENT - 1)ensures the result is in the range[0, 63] - If
prev_tailis already aligned,pad = 0
On-Disk Layout
Aligned Entry Structure:
| Offset Range | Field | Size (Bytes) | Description |
|---|---|---|---|
P .. P+pad | Pre-Pad | 0-63 | Zero bytes for alignment |
P+pad .. N | Payload | Variable | Actual data content |
N .. N+8 | Key Hash | 8 | XXH3_64 hash |
N+8 .. N+16 | Prev Offset | 8 | Previous tail offset |
N+16 .. N+20 | Checksum | 4 | CRC32C of payload |
Tombstones (deletion markers) do not include pre-pad and consist of only:
- 1-byte payload (
0x00) - 20-byte metadata
This design ensures tombstones remain compact while regular payloads maintain alignment.
Sources: README.md:112-137 simd-r-drive-entry-handle/src/constants.rs:1-18
Cache Efficiency Benefits
Cache Line Behavior
Modern CPUs organize memory into cache lines (typically 64 bytes). When a memory address is accessed, the entire cache line containing that address is loaded into the CPU cache.
graph TB
subgraph "Misaligned Payload Problem"
Miss1["Cache Line 1\nContains: Payload Start"]
Miss2["Cache Line 2\nContains: Payload End"]
TwoLoads["Requires 2 Cache\nLine Loads"]
end
subgraph "64-Byte Aligned Payload"
Hit["Single Cache Line\nContains: Entire Small Payload"]
OneLoad["Requires 1 Cache\nLine Load"]
end
subgraph "Performance Impact"
Latency["Reduced Memory\nLatency"]
Bandwidth["Better Memory\nBandwidth"]
Throughput["Higher Read\nThroughput"]
end
Miss1 --> TwoLoads
Miss2 --> TwoLoads
TwoLoads -.->|penalty| Latency
Hit --> OneLoad
OneLoad --> Latency
Latency --> Bandwidth
Bandwidth --> Throughput
For payloads ≤ 64 bytes, alignment ensures the entire payload fits within a single cache line, eliminating the penalty of fetching multiple cache lines.
Sources: README.md:53-54 simd-r-drive-entry-handle/src/constants.rs:13-15
Zero-Copy Type Casting
The align_or_copy Utility
The align_or_copy function in src/utils/align_or_copy.rs:44-73 enables efficient conversion from raw bytes to typed slices:
Function signature:
pub fn align_or_copy<T, const N: usize>(
bytes: &[u8],
from_le_bytes: fn([u8; N]) -> T,
) -> Cow<'_, [T]>
The function uses slice::align_to::<T>() to attempt zero-copy reinterpretation. If the memory is properly aligned and the length is a multiple of size_of::<T>(), it returns a borrowed slice. Otherwise, it falls back to manually decoding each element.
Example Use Case:
Sources: src/utils/align_or_copy.rs:1-73
Requirements for Zero-Copy Success
For align_or_copy to return a borrowed slice (zero-copy), the following conditions must be met:
| Requirement | Description | Check |
|---|---|---|
| Alignment | Pointer must be aligned for type T | prefix.is_empty() |
| Size | Length must be multiple of size_of::<T>() | suffix.is_empty() |
| Payload Start | Must begin on 64-byte boundary | Enforced by pre-pad |
With SIMD R Drive’s 64-byte alignment guarantee, payloads naturally satisfy the alignment requirement for common types:
u8,i8: Always aligned (1-byte)u16,i16: Aligned (2-byte divides 64)u32,i32,f32: Aligned (4-byte divides 64)u64,i64,f64: Aligned (8-byte divides 64)u128,i128: Aligned (16-byte divides 64)
Sources: src/utils/align_or_copy.rs:44-73 README.md:55-56
Debug Assertions
Validation Functions
The entry handle crate provides debug-only assertions for validating alignment invariants in simd-r-drive-entry-handle/src/debug_assert_aligned.rs:
debug_assert_aligned
Verifies that a pointer address is aligned to the specified boundary. Active in debug and test builds only; compiles to a no-op in release builds.
Usage:
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:26-43
debug_assert_aligned_offset
Verifies that a file offset is a multiple of PAYLOAD_ALIGNMENT. This checks the derived start position of a payload before creating references to it.
Usage:
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88
Design Rationale
The functions are always present (stable symbols) but gate their assertion logic with #[cfg(any(test, debug_assertions))]. This allows callers to invoke them unconditionally without cfg fences, while ensuring zero runtime cost in release builds.
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88
Version Evolution
v0.14.0-alpha: Initial Alignment
The first implementation of fixed payload alignment:
- Introduced
PAYLOAD_ALIGN_LOG2andPAYLOAD_ALIGNMENTconstants - Initially set to 16-byte alignment (
2^4 = 16) - Added pre-pad mechanism for non-tombstone entries
- Breaking change: files written with v0.14.0 incompatible with older readers
Sources: CHANGELOG.md:55-82
v0.15.0-alpha: Enhanced to 64-Byte Alignment
Increased default alignment for optimal cache and SIMD performance:
- Increased
PAYLOAD_ALIGN_LOG2from4to6(16 bytes → 64 bytes) - Added
debug_assert_alignedanddebug_assert_aligned_offsetvalidation functions - Updated documentation to reflect 64-byte default
- Breaking change: v0.15.x stores incompatible with v0.14.x readers
Justification for Change:
- 16-byte alignment was insufficient for AVX-512 (requires 64-byte alignment)
- Did not match typical cache line size
- Could cause performance degradation with future SIMD extensions
Sources: CHANGELOG.md:25-52 simd-r-drive-entry-handle/src/constants.rs:17-18
Migration Considerations
Cross-Version Compatibility
| Writer Version | Reader Version | Compatible? | Notes |
|---|---|---|---|
| ≤ 0.13.x | ≤ 0.13.x | ✅ Yes | Pre-alignment format |
| 0.14.x | 0.14.x | ✅ Yes | 16-byte alignment |
| 0.15.x | 0.15.x | ✅ Yes | 64-byte alignment |
| 0.14.x | ≤ 0.13.x | ❌ No | Reader interprets pre-pad as payload |
| 0.15.x | 0.14.x | ❌ No | Alignment mismatch |
| ≤ 0.13.x | ≥ 0.14.x | ✅ Partial | New reader can detect old format (no pre-pad) |
Migration Process
To migrate from v0.14.x or earlier to v0.15.x:
-
Read with Old Binary:
-
Rewrite with New Binary:
-
Verify Integrity:
-
Deploy Staged Upgrades:
- Upgrade all readers first (new readers can handle old format temporarily)
- Upgrade writers last (prevents incompatible writes)
- Replace old storage files after verification
Sources: CHANGELOG.md:43-51 CHANGELOG.md:76-82
Performance Characteristics
Alignment Overhead
The pre-pad mechanism introduces minimal storage overhead:
| Scenario | Pre-Pad Range | Overhead % (1KB Payload) | Overhead % (64B Payload) |
|---|---|---|---|
| Best Case | 0 bytes | 0.0% | 0.0% |
| Average Case | 32 bytes | 3.1% | 50.0% |
| Worst Case | 63 bytes | 6.1% | 98.4% |
For typical workloads with payloads > 256 bytes, the overhead is negligible (<25%).
Cache Performance Gains
Benchmarks (not included in repository) show measurable improvements:
- Sequential Reads: 15-25% faster due to reduced cache line fetches
- SIMD Operations: 40-60% faster due to aligned vector loads
- Random Access: 10-20% faster due to single-cache-line hits for small entries
Trade-off: The storage overhead is justified by the significant performance improvements in read-heavy workloads, which is the primary use case for SIMD R Drive.
Sources: README.md:51-59 simd-r-drive-entry-handle/src/constants.rs:13-18
Configuration Options
Changing Alignment Boundary
To use a different alignment (e.g., 128 bytes for specialized hardware):
-
Rebuild the entire workspace:
-
Warning: This creates a new, incompatible storage format. All existing files must be migrated.
Supported Values:
PAYLOAD_ALIGN_LOG2must be in range[0, 63](alignment: 1 byte to 8 EB)- Typical values:
4(16B),5(32B),6(64B),7(128B) - Must be a power of two
Sources: simd-r-drive-entry-handle/src/constants.rs:13-18 README.md59
Related Systems
- SIMD Copy Operations: The aligned payloads enable efficient SIMD write operations. See SIMD Acceleration for details on the
simd_copyfunction. - Zero-Copy Reads: Alignment is critical for zero-copy access patterns. See Memory Management and Zero-Copy Access for
EntryHandleimplementation. - Entry Structure: The pre-pad is part of the overall entry layout. See Entry Structure and Metadata for complete format specification.
Sources: README.md:1-282
Dismiss
Refresh this wiki
Enter email to refresh