Entry Structure and Metadata
Relevant source files
- .github/workflows/rust-lint.yml
- CHANGELOG.md
- README.md
- simd-r-drive-entry-handle/Cargo.toml
- simd-r-drive-entry-handle/src/debug_assert_aligned.rs
- simd-r-drive-entry-handle/src/entry_metadata.rs
Purpose and Scope
This document details the on-disk binary layout of entries in the SIMD R Drive storage engine. It covers the structure of aligned entries, tombstones, metadata fields, and the alignment strategy that enables zero-copy access.
For information about how entries are read and accessed in memory, see Memory Management and Zero-Copy Access. For details on the validation chain and recovery mechanisms, see Storage Architecture.
On-Disk Entry Layout Overview
Every entry written to the storage file consists of three components:
- Pre-Pad Bytes (optional, 0-63 bytes) - Zero bytes inserted to ensure the payload starts at a 64-byte boundary
- Payload - Variable-length binary data
- Metadata - Fixed 20-byte structure containing key hash, previous offset, and checksum
The exception is tombstones (deletion markers), which use a minimal 1-byte payload with no pre-padding.
Sources: README.md:104-137 simd-r-drive-entry-handle/src/entry_metadata.rs:9-37
Aligned Entry Structure
Entry Layout Table
| Offset Range | Field | Size (Bytes) | Description |
|---|---|---|---|
P .. P+pad | Pre-Pad (optional) | pad | Zero bytes to align payload start |
P+pad .. N | Payload | N-(P+pad) | Variable-length data |
N .. N+8 | Key Hash | 8 | 64-bit XXH3 key hash |
N+8 .. N+16 | Prev Offset | 8 | Absolute offset of previous tail |
N+16 .. N+20 | Checksum | 4 | CRC32C of payload |
Where:
pad = (A - (prev_tail % A)) & (A - 1), withA = PAYLOAD_ALIGNMENT(64 bytes)- The next entry starts at offset
N + 20
Aligned Entry Structure Diagram
Sources: README.md:112-137 simd-r-drive-entry-handle/src/entry_metadata.rs:11-23
Tombstone Structure
Tombstones are special deletion markers that do not require payload alignment. They consist of a single zero byte followed by the standard 20-byte metadata structure.
Tombstone Layout Table
| Offset Range | Field | Size (Bytes) | Description |
|---|---|---|---|
T .. T+1 | Payload | 1 | Single byte 0x00 |
T+1 .. T+21 | Metadata | 20 | Key hash, prev, crc32c |
Tombstone Structure Diagram
Sources: README.md:126-131 simd-r-drive-entry-handle/src/entry_metadata.rs:25-30
EntryMetadata Structure
The EntryMetadata struct represents the fixed 20-byte metadata block that follows every payload. It is defined in #[repr(C)] layout to ensure consistent binary representation.
graph TB
subgraph EntryMetadataStruct["EntryMetadata struct"]
field1["key_hash: u64\n8 bytes\nXXH3_64 hash"]
field2["prev_offset: u64\n8 bytes\nbackward chain link"]
field3["checksum: [u8; 4]\n4 bytes\nCRC32C payload checksum"]
end
field1 --> field2
field2 --> field3
note4["Serialized at offset N\nfollowing payload"]
note5["Total: METADATA_SIZE = 20"]
field1 -.-> note4
field3 -.-> note5
Metadata Fields
Field Descriptions
key_hash: u64 (8 bytes, offset N .. N+8)
- 64-bit XXH3 hash of the key
- Used by
KeyIndexerfor O(1) lookups - Combined with a tag for collision detection
- Hardware-accelerated via SSE2/AVX2/NEON
prev_offset: u64 (8 bytes, offset N+8 .. N+16)
- Absolute file offset of the previous entry for this key
- Forms a backward-linked chain for version history
- Set to
0for the first entry of a key - Used during chain validation and recovery
checksum: [u8; 4] (4 bytes, offset N+16 .. N+20)
- CRC32C checksum of the payload
- Provides fast integrity verification
- Not cryptographically secure
- Used during recovery to detect corruption
Serialization and Deserialization
The EntryMetadata struct provides methods for converting to/from bytes:
serialize() -> [u8; 20]- Converts metadata to byte array using little-endian encodingdeserialize(data:&[u8]) -> Self- Reconstructs metadata from byte slice
Sources: simd-r-drive-entry-handle/src/entry_metadata.rs:44-113 README.md:114-120
Pre-Padding and Alignment Strategy
Alignment Purpose
All non-tombstone payloads start at a 64-byte aligned address. This alignment ensures:
- Cache-line efficiency - Matches typical CPU cache line size
- SIMD optimization - Enables full-speed AVX2/AVX-512/NEON operations
- Zero-copy typed views - Allows safe reinterpretation as typed slices (
&[u16],&[u32], etc.)
graph TD
Start["Calculate padding needed"]
GetPrevTail["prev_tail = last written offset"]
CalcPad["pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT))\n& (PAYLOAD_ALIGNMENT - 1)"]
CheckPad{"pad > 0?"}
WritePad["Write pad zero bytes"]
WritePayload["Write payload at aligned offset"]
Start --> GetPrevTail
GetPrevTail --> CalcPad
CalcPad --> CheckPad
CheckPad -->|Yes| WritePad
CheckPad -->|No| WritePayload
WritePad --> WritePayload
The alignment is configured via PAYLOAD_ALIGNMENT constant (64 bytes as of version 0.15.0).
Pre-Padding Calculation
The formula pad = (A - (prev_tail % A)) & (A - 1) where A = PAYLOAD_ALIGNMENT ensures:
- If
prev_tailis already aligned,pad = 0 - Otherwise,
padequals the bytes needed to reach the next aligned boundary - Maximum padding is
A - 1bytes (63 bytes for 64-byte alignment)
Constants
The alignment is defined in simd-r-drive-entry-handle/src/constants.rs:1-20:
| Constant | Value | Description |
|---|---|---|
PAYLOAD_ALIGN_LOG2 | 6 | Log₂ of alignment (2⁶ = 64) |
PAYLOAD_ALIGNMENT | 64 | Actual alignment boundary in bytes |
METADATA_SIZE | 20 | Fixed size of metadata block |
Sources: README.md:51-59 simd-r-drive-entry-handle/src/entry_metadata.rs:22-23 CHANGELOG.md:25-51
Backward Chain Formation
Chain Structure
Each entry's prev_offset field creates a backward-linked chain that tracks the version history for a given key. This chain is essential for:
- Recovery and validation on file open
- Detecting incomplete writes
- Rebuilding the index
Chain Properties
- Most recent entry is at the end of the file (highest offset)
- Chain traversal moves backward from tail toward offset 0
- First entry for a key has
prev_offset = 0 - Valid chain can be walked all the way back to byte 0 without gaps
- Broken chain indicates corruption or incomplete write
Usage in Recovery
During file open, the system:
- Scans backward from EOF reading metadata
- Follows
prev_offsetlinks to validate chain continuity - Verifies checksums at each step
- Truncates file if corruption is detected
- Scans forward to rebuild the index
Sources: README.md:139-147 simd-r-drive-entry-handle/src/entry_metadata.rs:41-43
Entry Type Comparison
Aligned Entry vs. Tombstone
| Aspect | Aligned Entry (Non-Tombstone) | Tombstone (Deletion Marker) |
|---|---|---|
| Pre-padding | 0-63 bytes (alignment dependent) | None |
| Payload size | Variable (user-defined) | Fixed 1 byte (0x00) |
| Payload alignment | 64-byte boundary | No alignment requirement |
| Metadata size | 20 bytes | 20 bytes |
| Total minimum size | 21 bytes (1-byte payload + metadata) | 21 bytes (1-byte + metadata) |
| Total maximum overhead | 83 bytes (63-byte pad + 20 metadata) | 21 bytes |
| Zero-copy capable | Yes (aligned payload) | No (tombstone flag only) |
When Tombstones Are Used
Tombstones mark key deletions while maintaining chain integrity. They:
- Preserve the backward chain via
prev_offset - Use minimal space (no alignment overhead)
- Are detected during reads and filtered out
- Enable recovery to skip deleted entries
Sources: README.md:112-137 simd-r-drive-entry-handle/src/entry_metadata.rs:9-37
Metadata Serialization Format
Binary Layout in File
Constants for Range Indexing
The simd-r-drive-entry-handle/src/constants.rs:1-20 file defines range constants for metadata field access:
KEY_HASH_RANGE = 0..8PREV_OFFSET_RANGE = 8..16CHECKSUM_RANGE = 16..20METADATA_SIZE = 20
These ranges are used in EntryMetadata::serialize() and deserialize() methods.
Sources: simd-r-drive-entry-handle/src/entry_metadata.rs:62-112
Alignment Evolution and Migration
Version History
v0.14.0-alpha and earlier: Used 16-byte alignment (PAYLOAD_ALIGNMENT = 16)
v0.15.0-alpha onwards: Changed to 64-byte alignment (PAYLOAD_ALIGNMENT = 64)
This change was made to:
- Ensure full cache-line alignment
- Support AVX-512 and future SIMD extensions
- Improve zero-copy performance across modern hardware
Migration Considerations
Storage files created with different alignment values are not compatible :
- v0.14.x readers cannot correctly parse v0.15.x stores
- v0.15.x readers may misinterpret v0.14.x padding
To migrate between versions:
- Read all entries using the old version binary
- Write entries to a new store using the new version binary
- Replace the old file after verification
In multi-service environments, deploy reader upgrades before writer upgrades to avoid mixed-version issues.
Sources: CHANGELOG.md:25-82 README.md:51-59
Debug Assertions for Alignment
Runtime Validation
The codebase includes debug-only alignment assertions that validate both pointer and offset alignment:
debug_assert_aligned(ptr: *const u8, align: usize) - Validates pointer alignment
- Active in debug and test builds
- Zero cost in release/bench builds
- Ensures buffer base address is properly aligned
debug_assert_aligned_offset(off: u64) - Validates file offset alignment
- Checks that derived payload start offset is at
PAYLOAD_ALIGNMENTboundary - Used during entry handle creation
- Defined in simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88
These assertions help catch alignment issues during development without imposing runtime overhead in production.
Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88 CHANGELOG.md:33-41
Summary
The SIMD R Drive entry structure uses a carefully designed binary layout that balances efficiency, integrity, and flexibility:
- Fixed 64-byte alignment ensures cache-friendly, SIMD-optimized access
- 20-byte fixed metadata provides fast integrity checks and chain traversal
- Variable pre-padding maintains alignment without complex calculations
- Minimal tombstones mark deletions efficiently
- Backward-linked chain enables robust recovery and validation
This design enables zero-copy reads, high write throughput, and automatic crash recovery while maintaining a simple, append-only storage model.