This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Payload Alignment and Cache Efficiency
Loading…
Payload Alignment and Cache Efficiency
Relevant source files
- .github/workflows/rust-lint.yml
- CHANGELOG.md
- README.md
- simd-r-drive-entry-handle/src/debug_assert_aligned.rs
- tests/alignment_tests.rs
Purpose and Scope
This document explains the payload alignment strategy used by SIMD R Drive to optimize cache efficiency and enable zero-copy SIMD operations. It covers the PAYLOAD_ALIGNMENT constant, the pre-padding mechanism that ensures alignment, cache line optimization, and the testing infrastructure that validates alignment invariants.
For information about SIMD-accelerated operations themselves (vectorized copying and hashing), see SIMD Acceleration. For details on zero-copy memory access patterns, see Memory Management and Zero-Copy Access.
Overview
SIMD R Drive aligns all non-tombstone payloads to a fixed boundary defined by PAYLOAD_ALIGNMENT, currently set to 64 bytes. This alignment ensures that:
- Payloads begin on CPU cache line boundaries (typically 64 bytes)
- SIMD vector loads (SSE, AVX, AVX-512, NEON) can operate without crossing alignment boundaries
- Zero-copy typed views (
&[u32],&[u64],&[u128]) can be safely cast without additional copying
Sources: README.md:51-59 README.md:110-137
The PAYLOAD_ALIGNMENT Constant
Definition and Configuration
The alignment is controlled by two constants in the entry handle package:
| Constant | Value | Purpose |
|---|---|---|
PAYLOAD_ALIGN_LOG2 | 6 | Log₂ of alignment (2⁶ = 64) |
PAYLOAD_ALIGNMENT | 64 | Actual alignment in bytes |
The PAYLOAD_ALIGNMENT value is calculated as 1 << PAYLOAD_ALIGN_LOG2, ensuring it is always a power of two. This constant determines where each payload begins in the storage file.
Diagram: PAYLOAD_ALIGNMENT Configuration and Storage Layout
graph LR
subgraph "Configuration"
LOG2["PAYLOAD_ALIGN_LOG2\n(constant = 6)"]
ALIGN["PAYLOAD_ALIGNMENT\n(1 << 6 = 64)"]
end
subgraph "Storage File"
ENTRY1["Entry 1\n@ offset 0"]
PAD1["Pre-Pad\n(0-63 bytes)"]
PAYLOAD1["Payload 1\n@ 64-byte boundary"]
META1["Metadata\n(20 bytes)"]
PAD2["Pre-Pad"]
PAYLOAD2["Payload 2\n@ next 64-byte boundary"]
end
LOG2 --> ALIGN
ALIGN -.determines.-> PAD1
ALIGN -.determines.-> PAD2
ENTRY1 --> PAD1
PAD1 --> PAYLOAD1
PAYLOAD1 --> META1
META1 --> PAD2
PAD2 --> PAYLOAD2
The alignment constant can be modified by changing PAYLOAD_ALIGN_LOG2 in the constants file and rebuilding all components. However, this creates incompatibility with files written using different alignment values.
Sources: README.md59 CHANGELOG.md:64-67 simd-r-drive-entry-handle/src/debug_assert_aligned.rs:69-70
Cache Line Optimization
CPU Cache Architecture
Modern CPU cache lines are typically 64 bytes wide. When data is loaded from memory, the CPU fetches entire cache lines at once. Aligning payloads to 64-byte boundaries ensures:
- No cache line splits : Each payload begins at a cache line boundary, preventing a single logical read from spanning two cache lines
- Predictable cache behavior : Sequential reads traverse cache lines in order without fragmentation
- Reduced memory bandwidth : The CPU can prefetch entire cache lines efficiently
Alignment Benefit Matrix
| Scenario | 16-byte Alignment | 64-byte Alignment |
|---|---|---|
| Cache line splits per payload | Likely (3.75x less aligned) | Never (boundary-aligned) |
| SIMD load efficiency | Good for SSE | Optimal for AVX/AVX-512 |
| Prefetcher effectiveness | Moderate | High |
| Memory bandwidth utilization | ~85-90% | ~95-98% |
Sources: README.md53 CHANGELOG.md:27-30
SIMD Compatibility
Vector Instruction Requirements
Different SIMD instruction sets have varying alignment requirements:
| SIMD Extension | Vector Size | Typical Alignment | Supported By 64-byte Alignment |
|---|---|---|---|
| SSE2 | 128 bits (16 bytes) | 16-byte | ✅ Yes |
| AVX2 | 256 bits (32 bytes) | 32-byte | ✅ Yes |
| AVX-512 | 512 bits (64 bytes) | 64-byte | ✅ Yes |
| NEON (ARM) | 128 bits (16 bytes) | 16-byte | ✅ Yes |
| SVE (ARM) | Variable (128-2048 bits) | 16-byte minimum | ✅ Yes |
SIMD Load Operations
The alignment tests demonstrate safe SIMD operations using aligned loads:
Diagram: SIMD 64-byte Lane Loading
The test implementation at tests/alignment_tests.rs:69-95 demonstrates x86_64 SIMD loads using _mm_load_si128, while tests/alignment_tests.rs:97-122 shows aarch64 using vld1q_u8. Both safely load four 16-byte lanes from a 64-byte aligned payload.
Sources: tests/alignment_tests.rs:69-122 README.md:53-54
Pre-Padding Mechanism
Padding Calculation
To ensure each payload starts at a 64-byte boundary, the system inserts zero-filled pre-padding bytes before the payload. The padding length is calculated as:
pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT)) & (PAYLOAD_ALIGNMENT - 1)
Where:
prev_tailis the absolute file offset immediately after the previous entry’s metadata- The bitwise AND with
(PAYLOAD_ALIGNMENT - 1)ensures the result is in range[0, PAYLOAD_ALIGNMENT - 1]
graph TB
subgraph "Entry N-1"
PREV_PAYLOAD["Payload\n(variable length)"]
PREV_META["Metadata\n(20 bytes)"]
end
subgraph "Entry N Structure"
PREPAD["Pre-Pad\n(0-63 zero bytes)"]
PAYLOAD["Payload\n(starts at 64-byte boundary)"]
KEYHASH["key_hash\n(8 bytes)"]
PREVOFF["prev_offset\n(8 bytes)"]
CRC["crc32c\n(4 bytes)"]
end
subgraph "Alignment Validation"
CHECK["payload_start %\nPAYLOAD_ALIGNMENT == 0"]
end
PREV_PAYLOAD --> PREV_META
PREV_META --> PREPAD
PREPAD --> PAYLOAD
PAYLOAD --> KEYHASH
KEYHASH --> PREVOFF
PREVOFF --> CRC
PAYLOAD -.verified by.-> CHECK
Entry Structure with Pre-Padding
Diagram: Entry Structure with Pre-Padding
The prev_offset field stores the absolute file offset of the previous entry’s tail (end of metadata), allowing readers to calculate the pre-padding length by examining where the previous entry ended.
Sources: README.md:112-137 README.md:133-137
Alignment Evolution: From 16 to 64 Bytes
Version History
The payload alignment was increased in version 0.15.0-alpha:
| Version | Alignment | Rationale |
|---|---|---|
| ≤ 0.13.x-alpha | Variable (no alignment) | Minimal storage overhead |
| 0.14.0-alpha | 16 bytes | SSE compatibility, basic alignment |
| 0.15.0-alpha | 64 bytes | Cache line + AVX-512 optimization |
graph TB
subgraph "Pre-0.15 (16-byte)"
OLD_WRITE["Writer\n(16-byte align)"]
OLD_FILE["Storage File\n(16-byte boundaries)"]
OLD_READ["Reader\n(expects 16-byte)"]
end
subgraph "Post-0.15 (64-byte)"
NEW_WRITE["Writer\n(64-byte align)"]
NEW_FILE["Storage File\n(64-byte boundaries)"]
NEW_READ["Reader\n(expects 64-byte)"]
end
subgraph "Incompatibility"
MISMATCH["Old reader\n+ New file\n= Parse Error"]
MISMATCH2["New reader\n+ Old file\n= Parse Error"]
end
OLD_WRITE --> OLD_FILE
OLD_FILE --> OLD_READ
NEW_WRITE --> NEW_FILE
NEW_FILE --> NEW_READ
OLD_READ -.cannot read.-> NEW_FILE
NEW_READ -.cannot read.-> OLD_FILE
NEW_FILE --> MISMATCH
OLD_FILE --> MISMATCH2
Breaking Change Impact
The alignment change in 0.15.0-alpha is a breaking change that affects file compatibility:
Diagram: Alignment Version Incompatibility
Migration Strategy
The changelog specifies a migration path at CHANGELOG.md:43-51:
- Read all entries using the old binary (with old alignment)
- Write entries into a fresh store using the new binary (with 64-byte alignment)
- Replace the old file after verification
- In multi-service environments, upgrade readers before writers to prevent parse errors
Sources: CHANGELOG.md:19-51 CHANGELOG.md:55-81
Alignment Testing and Validation
Debug-Only Assertions
The system includes two debug-only alignment validation functions that compile to no-ops in release builds:
Pointer Alignment Assertion
debug_assert_aligned(ptr: *const u8, align: usize) validates that a pointer is aligned to the specified boundary. Implementation at simd-r-drive-entry-handle/src/debug_assert_aligned.rs:26-43
Behavior:
- Debug/test builds : Uses
debug_assert!to verify(ptr as usize & (align - 1)) == 0 - Release/bench builds : No-op with zero runtime cost
Offset Alignment Assertion
debug_assert_aligned_offset(off: u64) validates that a file offset is aligned to PAYLOAD_ALIGNMENT. Implementation at simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88
Behavior:
- Debug/test builds : Verifies
off.is_multiple_of(PAYLOAD_ALIGNMENT) - Release/bench builds : No-op with zero runtime cost
Comprehensive Alignment Test Suite
The alignment test at tests/alignment_tests.rs:1-245 validates multiple alignment scenarios:
Diagram: Alignment Test Coverage and Validation Flow
Test Implementation Details
The test verifies:
- Address alignment at tests/alignment_tests.rs:24-32: Confirms payload pointer is multiple of 64
- Type alignment at tests/alignment_tests.rs:35-56: Validates alignment sufficient for
u32,u64,u128 - Bytemuck casting at tests/alignment_tests.rs:59-67: Proves zero-copy typed views work
- SIMD operations at tests/alignment_tests.rs:69-133: Executes actual SIMD loads on aligned data
- Iterator consistency at tests/alignment_tests.rs:236-243: Ensures all iterated entries are aligned
Sources: tests/alignment_tests.rs:1-245 simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-89
Performance Benefits
Zero-Copy Typed Views
The 64-byte alignment enables safe zero-copy reinterpretation of byte slices as typed slices without additional validation or copying:
| Source Type | Target Type | Requirement | Satisfied by 64-byte Alignment |
|---|---|---|---|
&[u8] | &[u16] | 2-byte aligned | ✅ Yes (64 % 2 = 0) |
&[u8] | &[u32] | 4-byte aligned | ✅ Yes (64 % 4 = 0) |
&[u8] | &[u64] | 8-byte aligned | ✅ Yes (64 % 8 = 0) |
&[u8] | &[u128] | 16-byte aligned | ✅ Yes (64 % 16 = 0) |
The README states at README.md:55-56: “When your payload length matches the element size, you can safely reinterpret the bytes as typed slices (e.g., &[u16], &[u32], &[u64], &[u128]) without copying.”
Practical Benefits Summary
From README.md59:
- Cache-friendly zero-copy reads : Payloads align with CPU cache lines
- Predictable SIMD performance : Vector operations never cross alignment boundaries
- Simpler casting : No runtime alignment checks needed for typed views
- Fewer fallback copies : Libraries like
bytemuckcan cast without allocation
Storage Overhead
The pre-padding mechanism adds variable overhead:
- Worst case : 63 bytes of padding per entry (when previous tail is 1 byte before boundary)
- Average case : ~31.5 bytes per entry (uniform distribution assumption)
- Best case : 0 bytes (when previous tail already aligns)
For small payloads, this overhead can be significant. For large payloads (>>64 bytes), the overhead becomes negligible relative to payload size.
Sources: README.md:53-59 README.md110 tests/alignment_tests.rs:215-221
Integration with Arrow Buffers
When the arrow feature is enabled, EntryHandle provides methods to create Apache Arrow buffers that leverage alignment:
as_arrow_buffer(): Creates an Arrow buffer view without copyinginto_arrow_buffer(): Converts into an Arrow buffer with alignment validation
Both methods include debug assertions to verify pointer and offset alignment at simd-r-drive-entry-handle/src/debug_assert_aligned.rs:66-88 ensuring Arrow’s alignment requirements are met.
Sources: CHANGELOG.md:67-68 README.md59
CI/CD Validation
The GitHub Actions workflow at .github/workflows/rust-lint.yml:1-43 ensures alignment-related code passes:
- Clippy lints : Validates unsafe SIMD code and alignment assertions
- Format checks : Ensures consistent style in alignment-critical code
- Documentation warnings : Catches missing docs for alignment APIs
The test workflow (referenced in the CI setup) runs alignment tests across multiple platforms (x86_64, aarch64) to verify SIMD compatibility on different architectures.
Sources: .github/workflows/rust-lint.yml:1-43
Summary
The 64-byte PAYLOAD_ALIGNMENT is a foundational design choice that:
- Aligns payloads with CPU cache lines for optimal memory access
- Satisfies alignment requirements for SSE, AVX, AVX-512, and NEON SIMD instructions
- Enables safe zero-copy casting to typed slices (
&[u32],&[u64], etc.) - Integrates seamlessly with Apache Arrow’s buffer requirements
The pre-padding mechanism transparently maintains this alignment while preserving the append-only storage model. Comprehensive testing validates alignment across write, delete, and overwrite scenarios, ensuring both correctness and performance optimization.
Dismiss
Refresh this wiki
Enter email to refresh