Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DeepWiki GitHub

Entry Structure and Metadata

Relevant source files

Purpose and Scope

This document details the on-disk binary layout of entries in the SIMD R Drive storage engine. It covers the structure of aligned entries, tombstones, metadata fields, and the alignment strategy that enables zero-copy access.

For information about how entries are read and accessed in memory, see Memory Management and Zero-Copy Access. For details on the validation chain and recovery mechanisms, see Storage Architecture.


On-Disk Entry Layout Overview

Every entry written to the storage file consists of three components:

  1. Pre-Pad Bytes (optional, 0-63 bytes) - Zero bytes inserted to ensure the payload starts at a 64-byte boundary
  2. Payload - Variable-length binary data
  3. Metadata - Fixed 20-byte structure containing key hash, previous offset, and checksum

The exception is tombstones (deletion markers), which use a minimal 1-byte payload with no pre-padding.

Sources: README.md:104-137 simd-r-drive-entry-handle/src/entry_metadata.rs:9-37


Aligned Entry Structure

Entry Layout Table

Offset RangeFieldSize (Bytes)Description
P .. P+padPre-Pad (optional)padZero bytes to align payload start
P+pad .. NPayloadN-(P+pad)Variable-length data
N .. N+8Key Hash864-bit XXH3 key hash
N+8 .. N+16Prev Offset8Absolute offset of previous tail
N+16 .. N+20Checksum4CRC32C of payload

Where:

  • pad = (A - (prev_tail % A)) & (A - 1), with A = PAYLOAD_ALIGNMENT (64 bytes)
  • The next entry starts at offset N + 20

Aligned Entry Structure Diagram

Sources: README.md:112-137 simd-r-drive-entry-handle/src/entry_metadata.rs:11-23


Tombstone Structure

Tombstones are special deletion markers that do not require payload alignment. They consist of a single zero byte followed by the standard 20-byte metadata structure.

Tombstone Layout Table

Offset RangeFieldSize (Bytes)Description
T .. T+1Payload1Single byte 0x00
T+1 .. T+21Metadata20Key hash, prev, crc32c

Tombstone Structure Diagram

Sources: README.md:126-131 simd-r-drive-entry-handle/src/entry_metadata.rs:25-30


EntryMetadata Structure

The EntryMetadata struct represents the fixed 20-byte metadata block that follows every payload. It is defined in #[repr(C)] layout to ensure consistent binary representation.

graph TB
    subgraph EntryMetadataStruct["EntryMetadata struct"]
field1["key_hash: u64\n8 bytes\nXXH3_64 hash"]
field2["prev_offset: u64\n8 bytes\nbackward chain link"]
field3["checksum: [u8; 4]\n4 bytes\nCRC32C payload checksum"]
end
    
 
   field1 --> field2
 
   field2 --> field3
    
    note4["Serialized at offset N\nfollowing payload"]
note5["Total: METADATA_SIZE = 20"]
field1 -.-> note4
 
   field3 -.-> note5

Metadata Fields

Field Descriptions

key_hash: u64 (8 bytes, offset N .. N+8)

  • 64-bit XXH3 hash of the key
  • Used by KeyIndexer for O(1) lookups
  • Combined with a tag for collision detection
  • Hardware-accelerated via SSE2/AVX2/NEON

prev_offset: u64 (8 bytes, offset N+8 .. N+16)

  • Absolute file offset of the previous entry for this key
  • Forms a backward-linked chain for version history
  • Set to 0 for the first entry of a key
  • Used during chain validation and recovery

checksum: [u8; 4] (4 bytes, offset N+16 .. N+20)

  • CRC32C checksum of the payload
  • Provides fast integrity verification
  • Not cryptographically secure
  • Used during recovery to detect corruption

Serialization and Deserialization

The EntryMetadata struct provides methods for converting to/from bytes:

  • serialize() -> [u8; 20] - Converts metadata to byte array using little-endian encoding
  • deserialize(data:&[u8]) -> Self - Reconstructs metadata from byte slice

Sources: simd-r-drive-entry-handle/src/entry_metadata.rs:44-113 README.md:114-120


Pre-Padding and Alignment Strategy

Alignment Purpose

All non-tombstone payloads start at a 64-byte aligned address. This alignment ensures:

  • Cache-line efficiency - Matches typical CPU cache line size
  • SIMD optimization - Enables full-speed AVX2/AVX-512/NEON operations
  • Zero-copy typed views - Allows safe reinterpretation as typed slices (&[u16], &[u32], etc.)
graph TD
    Start["Calculate padding needed"]
GetPrevTail["prev_tail = last written offset"]
CalcPad["pad = (PAYLOAD_ALIGNMENT - (prev_tail % PAYLOAD_ALIGNMENT))\n& (PAYLOAD_ALIGNMENT - 1)"]
CheckPad{"pad > 0?"}
WritePad["Write pad zero bytes"]
WritePayload["Write payload at aligned offset"]
Start --> GetPrevTail
 
   GetPrevTail --> CalcPad
 
   CalcPad --> CheckPad
 
   CheckPad -->|Yes| WritePad
 
   CheckPad -->|No| WritePayload
 
   WritePad --> WritePayload

The alignment is configured via PAYLOAD_ALIGNMENT constant (64 bytes as of version 0.15.0).

Pre-Padding Calculation

The formula pad = (A - (prev_tail % A)) & (A - 1) where A = PAYLOAD_ALIGNMENT ensures:

  • If prev_tail is already aligned, pad = 0
  • Otherwise, pad equals the bytes needed to reach the next aligned boundary
  • Maximum padding is A - 1 bytes (63 bytes for 64-byte alignment)

Constants

The alignment is defined in simd-r-drive-entry-handle/src/constants.rs:1-20:

ConstantValueDescription
PAYLOAD_ALIGN_LOG26Log₂ of alignment (2⁶ = 64)
PAYLOAD_ALIGNMENT64Actual alignment boundary in bytes
METADATA_SIZE20Fixed size of metadata block

Sources: README.md:51-59 simd-r-drive-entry-handle/src/entry_metadata.rs:22-23 CHANGELOG.md:25-51


Backward Chain Formation

Chain Structure

Each entry's prev_offset field creates a backward-linked chain that tracks the version history for a given key. This chain is essential for:

  • Recovery and validation on file open
  • Detecting incomplete writes
  • Rebuilding the index

Chain Properties

  • Most recent entry is at the end of the file (highest offset)
  • Chain traversal moves backward from tail toward offset 0
  • First entry for a key has prev_offset = 0
  • Valid chain can be walked all the way back to byte 0 without gaps
  • Broken chain indicates corruption or incomplete write

Usage in Recovery

During file open, the system:

  1. Scans backward from EOF reading metadata
  2. Follows prev_offset links to validate chain continuity
  3. Verifies checksums at each step
  4. Truncates file if corruption is detected
  5. Scans forward to rebuild the index

Sources: README.md:139-147 simd-r-drive-entry-handle/src/entry_metadata.rs:41-43


Entry Type Comparison

Aligned Entry vs. Tombstone

AspectAligned Entry (Non-Tombstone)Tombstone (Deletion Marker)
Pre-padding0-63 bytes (alignment dependent)None
Payload sizeVariable (user-defined)Fixed 1 byte (0x00)
Payload alignment64-byte boundaryNo alignment requirement
Metadata size20 bytes20 bytes
Total minimum size21 bytes (1-byte payload + metadata)21 bytes (1-byte + metadata)
Total maximum overhead83 bytes (63-byte pad + 20 metadata)21 bytes
Zero-copy capableYes (aligned payload)No (tombstone flag only)

When Tombstones Are Used

Tombstones mark key deletions while maintaining chain integrity. They:

  • Preserve the backward chain via prev_offset
  • Use minimal space (no alignment overhead)
  • Are detected during reads and filtered out
  • Enable recovery to skip deleted entries

Sources: README.md:112-137 simd-r-drive-entry-handle/src/entry_metadata.rs:9-37


Metadata Serialization Format

Binary Layout in File

Constants for Range Indexing

The simd-r-drive-entry-handle/src/constants.rs:1-20 file defines range constants for metadata field access:

  • KEY_HASH_RANGE = 0..8
  • PREV_OFFSET_RANGE = 8..16
  • CHECKSUM_RANGE = 16..20
  • METADATA_SIZE = 20

These ranges are used in EntryMetadata::serialize() and deserialize() methods.

Sources: simd-r-drive-entry-handle/src/entry_metadata.rs:62-112


Alignment Evolution and Migration

Version History

v0.14.0-alpha and earlier: Used 16-byte alignment (PAYLOAD_ALIGNMENT = 16)

v0.15.0-alpha onwards: Changed to 64-byte alignment (PAYLOAD_ALIGNMENT = 64)

This change was made to:

  • Ensure full cache-line alignment
  • Support AVX-512 and future SIMD extensions
  • Improve zero-copy performance across modern hardware

Migration Considerations

Storage files created with different alignment values are not compatible :

  • v0.14.x readers cannot correctly parse v0.15.x stores
  • v0.15.x readers may misinterpret v0.14.x padding

To migrate between versions:

  1. Read all entries using the old version binary
  2. Write entries to a new store using the new version binary
  3. Replace the old file after verification

In multi-service environments, deploy reader upgrades before writer upgrades to avoid mixed-version issues.

Sources: CHANGELOG.md:25-82 README.md:51-59


Debug Assertions for Alignment

Runtime Validation

The codebase includes debug-only alignment assertions that validate both pointer and offset alignment:

debug_assert_aligned(ptr: *const u8, align: usize) - Validates pointer alignment

  • Active in debug and test builds
  • Zero cost in release/bench builds
  • Ensures buffer base address is properly aligned

debug_assert_aligned_offset(off: u64) - Validates file offset alignment

These assertions help catch alignment issues during development without imposing runtime overhead in production.

Sources: simd-r-drive-entry-handle/src/debug_assert_aligned.rs:1-88 CHANGELOG.md:33-41


Summary

The SIMD R Drive entry structure uses a carefully designed binary layout that balances efficiency, integrity, and flexibility:

  • Fixed 64-byte alignment ensures cache-friendly, SIMD-optimized access
  • 20-byte fixed metadata provides fast integrity checks and chain traversal
  • Variable pre-padding maintains alignment without complex calculations
  • Minimal tombstones mark deletions efficiently
  • Backward-linked chain enables robust recovery and validation

This design enables zero-copy reads, high write throughput, and automatic crash recovery while maintaining a simple, append-only storage model.