Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DeepWiki GitHub

Extensions and Utilities

Relevant source files

This document covers the utility functions, helper modules, and constants provided by the SIMD R Drive ecosystem. These components include the simd-r-drive-extensions crate for higher-level storage operations, core utility functions in the main simd-r-drive crate, and shared constants from simd-r-drive-entry-handle.

For details on the core storage engine API, see DataStore API. For performance optimization features like SIMD acceleration, see SIMD Acceleration. For alignment-related architecture decisions, see Payload Alignment and Cache Efficiency.

Extensions Crate Overview

The simd-r-drive-extensions crate provides storage extensions and higher-level utilities built on top of the core simd-r-drive storage engine. It adds functionality for common storage patterns and data manipulation tasks.

graph TB
    subgraph "simd-r-drive-extensions"
        ExtCrate["simd-r-drive-extensions"]
ExtDeps["Dependencies:\n- bincode\n- serde\n- simd-r-drive\n- walkdir"]
end
    
    subgraph "Core Dependencies"
        Core["simd-r-drive"]
Bincode["bincode\nBinary Serialization"]
Serde["serde\nSerialization Traits"]
Walkdir["walkdir\nDirectory Traversal"]
end
    
 
   ExtCrate --> ExtDeps
 
   ExtDeps --> Core
 
   ExtDeps --> Bincode
 
   ExtDeps --> Serde
 
   ExtDeps --> Walkdir
    
 
   Core -.->|provides| DataStore["DataStore"]
Bincode -.->|enables| SerializationSupport["Structured Data Storage"]
Walkdir -.->|enables| FileSystemOps["File System Operations"]

Crate Structure

Sources: extensions/Cargo.toml:1-22

DependencyPurpose
bincodeBinary serialization/deserialization for structured data storage
serdeSerialization trait support with derive macros
simd-r-driveCore storage engine access
walkdirDirectory tree traversal utilities

Sources: extensions/Cargo.toml:13-17

Core Utilities Module

The main simd-r-drive crate exposes several utility functions through its utils module. These functions handle common tasks like alignment optimization, string formatting, and data validation.

graph TB
    subgraph "utils Module"
        UtilsRoot["src/utils.rs"]
AlignOrCopy["align_or_copy\nZero-Copy Optimization"]
AppendExt["append_extension\nString Path Handling"]
FormatBytes["format_bytes\nHuman-Readable Sizes"]
NamespaceHasher["NamespaceHasher\nHierarchical Keys"]
ParseBuffer["parse_buffer_size\nSize String Parsing"]
VerifyFile["verify_file_existence\nFile Validation"]
end
    
 
   UtilsRoot --> AlignOrCopy
 
   UtilsRoot --> AppendExt
 
   UtilsRoot --> FormatBytes
 
   UtilsRoot --> NamespaceHasher
 
   UtilsRoot --> ParseBuffer
 
   UtilsRoot --> VerifyFile
    
 
   AlignOrCopy -.->|used by| ReadOps["Read Operations"]
NamespaceHasher -.->|used by| KeyManagement["Key Management"]
FormatBytes -.->|used by| Logging["Logging & Reporting"]
ParseBuffer -.->|used by| Config["Configuration Parsing"]

Utility Functions Overview

Sources: src/utils.rs:1-17

align_or_copy Function

The align_or_copy utility function provides zero-copy deserialization with automatic fallback for misaligned data. It attempts to reinterpret a byte slice as a typed slice without copying, and falls back to manual decoding when alignment requirements are not met.

Function Signature

Sources: src/utils/align_or_copy.rs:44-50

Operation Flow

Sources: src/utils/align_or_copy.rs:44-73

Usage Patterns

ScenarioOutcomePerformance
Aligned 64-byte boundary, exact multipleCow::BorrowedZero-copy, optimal
Misaligned addressCow::OwnedAllocation + decode
Non-multiple of element sizePanicInvalid input

Example Usage:

Sources: src/utils/align_or_copy.rs:38-43 tests/align_or_copy_tests.rs:7-12

Safety Considerations

The function uses unsafe for the align_to::<T>() call, which requires:

  1. Starting address must be aligned to align_of::<T>()
  2. Total size must be a multiple of size_of::<T>()

These requirements are validated by checking that prefix and suffix slices are empty before returning the borrowed slice. If validation fails, the function falls back to safe manual decoding.

Sources: src/utils/align_or_copy.rs:28-35 src/utils/align_or_copy.rs:53-60

Other Utility Functions

FunctionModule PathPurpose
append_extensionsrc/utils/append_extension.rsSafely appends file extensions to paths
format_bytessrc/utils/format_bytes.rsFormats byte counts as human-readable strings (KB, MB, GB)
NamespaceHashersrc/utils/namespace_hasher.rsGenerates hierarchical, namespaced hash keys
parse_buffer_sizesrc/utils/parse_buffer_size.rsParses size strings like "64KB", "1MB" into byte counts
verify_file_existencesrc/utils/verify_file_existence.rsValidates file paths before operations

Sources: src/utils.rs:1-17

Entry Handle Constants

The simd-r-drive-entry-handle crate defines shared constants used throughout the storage system. These constants establish the binary layout of entries and alignment requirements.

graph TB
    subgraph "simd-r-drive-entry-handle"
        LibRoot["lib.rs"]
ConstMod["constants.rs"]
EntryHandle["entry_handle.rs"]
EntryMetadata["entry_metadata.rs"]
DebugAssert["debug_assert_aligned.rs"]
end
    
    subgraph "Exported Constants"
        MetadataSize["METADATA_SIZE = 20"]
KeyHashRange["KEY_HASH_RANGE = 0..8"]
PrevOffsetRange["PREV_OFFSET_RANGE = 8..16"]
ChecksumRange["CHECKSUM_RANGE = 16..20"]
ChecksumLen["CHECKSUM_LEN = 4"]
PayloadLog["PAYLOAD_ALIGN_LOG2 = 6"]
PayloadAlign["PAYLOAD_ALIGNMENT = 64"]
end
    
 
   LibRoot --> ConstMod
 
   LibRoot --> EntryHandle
 
   LibRoot --> EntryMetadata
 
   LibRoot --> DebugAssert
    
 
   ConstMod --> MetadataSize
 
   ConstMod --> KeyHashRange
 
   ConstMod --> PrevOffsetRange
 
   ConstMod --> ChecksumRange
 
   ConstMod --> ChecksumLen
 
   ConstMod --> PayloadLog
 
   ConstMod --> PayloadAlign
    
 
   PayloadAlign -.->|ensures| CacheLineOpt["Cache-Line Optimization"]
PayloadAlign -.->|enables| SIMDOps["SIMD Operations"]

Constants Module Structure

Sources: simd-r-drive-entry-handle/src/lib.rs:1-10 simd-r-drive-entry-handle/src/constants.rs:1-19

Metadata Layout Constants

The following constants define the fixed 20-byte metadata structure at the end of each entry:

ConstantValueDescription
METADATA_SIZE20Total size of entry metadata in bytes
KEY_HASH_RANGE0..8Byte range for 64-bit XXH3 key hash
PREV_OFFSET_RANGE8..16Byte range for 64-bit previous entry offset
CHECKSUM_RANGE16..20Byte range for 32-bit CRC32C checksum
CHECKSUM_LEN4Explicit length of checksum field

Sources: simd-r-drive-entry-handle/src/constants.rs:3-11

Alignment Constants

These constants enforce 64-byte alignment for all payload data:

  • PAYLOAD_ALIGN_LOG2 : Base-2 logarithm of alignment requirement (6 = 64 bytes)
  • PAYLOAD_ALIGNMENT : Computed alignment value (64 bytes)

This alignment matches CPU cache line sizes and enables efficient SIMD operations. The maximum pre-padding per entry is PAYLOAD_ALIGNMENT - 1 (63 bytes).

Sources: simd-r-drive-entry-handle/src/constants.rs:13-18

Constant Relationships

Sources: simd-r-drive-entry-handle/src/constants.rs:1-19

sequenceDiagram
    participant Client
    participant EntryHandle
    participant align_or_copy
    participant Memory
    
    Client->>EntryHandle: get_payload_bytes()
    EntryHandle->>Memory: read &[u8] from mmap
    EntryHandle->>align_or_copy: align_or_copy<f32, 4>(bytes, f32::from_le_bytes)
    
    alt Aligned on 64-byte boundary
        align_or_copy->>Memory: validate alignment
        align_or_copy-->>Client: Cow::Borrowed(&[f32])
        Note over Client,Memory: Zero-copy: direct memory access\nelse Misaligned
        align_or_copy->>align_or_copy: chunks_exact(4)
        align_or_copy->>align_or_copy: map(f32::from_le_bytes)
        align_or_copy->>align_or_copy: collect into Vec<f32>
        align_or_copy-->>Client: Cow::Owned(Vec<f32>)
        Note over Client,align_or_copy: Fallback: allocated copy
    end

Common Patterns

Zero-Copy Data Access

Utilities like align_or_copy enable zero-copy access patterns when memory alignment allows:

Sources: src/utils/align_or_copy.rs:44-73 simd-r-drive-entry-handle/src/constants.rs:13-18

Namespace-Based Key Management

The NamespaceHasher utility enables hierarchical key organization:

Sources: src/utils.rs:11-12

Size Formatting for Logging

The format_bytes utility provides human-readable output:

Input BytesFormatted Output
1023"1023 B"
1024"1.00 KB"
1048576"1.00 MB"
1073741824"1.00 GB"

Sources: src/utils.rs:7-8

Configuration Parsing

The parse_buffer_size utility handles size string inputs:

Input StringParsed Bytes
"64"64
"64KB"65,536
"1MB"1,048,576
"2GB"2,147,483,648

Sources: src/utils.rs:13-14

Integration with Core Systems

Relationship to Storage Engine

Sources: extensions/Cargo.toml:1-22 src/utils.rs:1-17 simd-r-drive-entry-handle/src/lib.rs:1-10

Performance Considerations

UtilityPerformance ImpactUse Case
align_or_copyZero-copy when alignedDeserializing typed arrays from storage
NamespaceHasherSingle XXH3 hashGenerating hierarchical keys
format_bytesString allocationLogging and user display only
PAYLOAD_ALIGNMENTEnables SIMD opsCore storage layout requirement

Sources: src/utils/align_or_copy.rs:1-74 simd-r-drive-entry-handle/src/constants.rs:13-18