Extensions and Utilities
Relevant source files
- extensions/Cargo.toml
- simd-r-drive-entry-handle/src/constants.rs
- simd-r-drive-entry-handle/src/lib.rs
- src/utils.rs
- src/utils/align_or_copy.rs
- tests/align_or_copy_tests.rs
This document covers the utility functions, helper modules, and constants provided by the SIMD R Drive ecosystem. These components include the simd-r-drive-extensions crate for higher-level storage operations, core utility functions in the main simd-r-drive crate, and shared constants from simd-r-drive-entry-handle.
For details on the core storage engine API, see DataStore API. For performance optimization features like SIMD acceleration, see SIMD Acceleration. For alignment-related architecture decisions, see Payload Alignment and Cache Efficiency.
Extensions Crate Overview
The simd-r-drive-extensions crate provides storage extensions and higher-level utilities built on top of the core simd-r-drive storage engine. It adds functionality for common storage patterns and data manipulation tasks.
graph TB
subgraph "simd-r-drive-extensions"
ExtCrate["simd-r-drive-extensions"]
ExtDeps["Dependencies:\n- bincode\n- serde\n- simd-r-drive\n- walkdir"]
end
subgraph "Core Dependencies"
Core["simd-r-drive"]
Bincode["bincode\nBinary Serialization"]
Serde["serde\nSerialization Traits"]
Walkdir["walkdir\nDirectory Traversal"]
end
ExtCrate --> ExtDeps
ExtDeps --> Core
ExtDeps --> Bincode
ExtDeps --> Serde
ExtDeps --> Walkdir
Core -.->|provides| DataStore["DataStore"]
Bincode -.->|enables| SerializationSupport["Structured Data Storage"]
Walkdir -.->|enables| FileSystemOps["File System Operations"]
Crate Structure
Sources: extensions/Cargo.toml:1-22
| Dependency | Purpose |
|---|---|
bincode | Binary serialization/deserialization for structured data storage |
serde | Serialization trait support with derive macros |
simd-r-drive | Core storage engine access |
walkdir | Directory tree traversal utilities |
Sources: extensions/Cargo.toml:13-17
Core Utilities Module
The main simd-r-drive crate exposes several utility functions through its utils module. These functions handle common tasks like alignment optimization, string formatting, and data validation.
graph TB
subgraph "utils Module"
UtilsRoot["src/utils.rs"]
AlignOrCopy["align_or_copy\nZero-Copy Optimization"]
AppendExt["append_extension\nString Path Handling"]
FormatBytes["format_bytes\nHuman-Readable Sizes"]
NamespaceHasher["NamespaceHasher\nHierarchical Keys"]
ParseBuffer["parse_buffer_size\nSize String Parsing"]
VerifyFile["verify_file_existence\nFile Validation"]
end
UtilsRoot --> AlignOrCopy
UtilsRoot --> AppendExt
UtilsRoot --> FormatBytes
UtilsRoot --> NamespaceHasher
UtilsRoot --> ParseBuffer
UtilsRoot --> VerifyFile
AlignOrCopy -.->|used by| ReadOps["Read Operations"]
NamespaceHasher -.->|used by| KeyManagement["Key Management"]
FormatBytes -.->|used by| Logging["Logging & Reporting"]
ParseBuffer -.->|used by| Config["Configuration Parsing"]
Utility Functions Overview
Sources: src/utils.rs:1-17
align_or_copy Function
The align_or_copy utility function provides zero-copy deserialization with automatic fallback for misaligned data. It attempts to reinterpret a byte slice as a typed slice without copying, and falls back to manual decoding when alignment requirements are not met.
Function Signature
Sources: src/utils/align_or_copy.rs:44-50
Operation Flow
Sources: src/utils/align_or_copy.rs:44-73
Usage Patterns
| Scenario | Outcome | Performance |
|---|---|---|
| Aligned 64-byte boundary, exact multiple | Cow::Borrowed | Zero-copy, optimal |
| Misaligned address | Cow::Owned | Allocation + decode |
| Non-multiple of element size | Panic | Invalid input |
Example Usage:
Sources: src/utils/align_or_copy.rs:38-43 tests/align_or_copy_tests.rs:7-12
Safety Considerations
The function uses unsafe for the align_to::<T>() call, which requires:
- Starting address must be aligned to
align_of::<T>() - Total size must be a multiple of
size_of::<T>()
These requirements are validated by checking that prefix and suffix slices are empty before returning the borrowed slice. If validation fails, the function falls back to safe manual decoding.
Sources: src/utils/align_or_copy.rs:28-35 src/utils/align_or_copy.rs:53-60
Other Utility Functions
| Function | Module Path | Purpose |
|---|---|---|
append_extension | src/utils/append_extension.rs | Safely appends file extensions to paths |
format_bytes | src/utils/format_bytes.rs | Formats byte counts as human-readable strings (KB, MB, GB) |
NamespaceHasher | src/utils/namespace_hasher.rs | Generates hierarchical, namespaced hash keys |
parse_buffer_size | src/utils/parse_buffer_size.rs | Parses size strings like "64KB", "1MB" into byte counts |
verify_file_existence | src/utils/verify_file_existence.rs | Validates file paths before operations |
Sources: src/utils.rs:1-17
Entry Handle Constants
The simd-r-drive-entry-handle crate defines shared constants used throughout the storage system. These constants establish the binary layout of entries and alignment requirements.
graph TB
subgraph "simd-r-drive-entry-handle"
LibRoot["lib.rs"]
ConstMod["constants.rs"]
EntryHandle["entry_handle.rs"]
EntryMetadata["entry_metadata.rs"]
DebugAssert["debug_assert_aligned.rs"]
end
subgraph "Exported Constants"
MetadataSize["METADATA_SIZE = 20"]
KeyHashRange["KEY_HASH_RANGE = 0..8"]
PrevOffsetRange["PREV_OFFSET_RANGE = 8..16"]
ChecksumRange["CHECKSUM_RANGE = 16..20"]
ChecksumLen["CHECKSUM_LEN = 4"]
PayloadLog["PAYLOAD_ALIGN_LOG2 = 6"]
PayloadAlign["PAYLOAD_ALIGNMENT = 64"]
end
LibRoot --> ConstMod
LibRoot --> EntryHandle
LibRoot --> EntryMetadata
LibRoot --> DebugAssert
ConstMod --> MetadataSize
ConstMod --> KeyHashRange
ConstMod --> PrevOffsetRange
ConstMod --> ChecksumRange
ConstMod --> ChecksumLen
ConstMod --> PayloadLog
ConstMod --> PayloadAlign
PayloadAlign -.->|ensures| CacheLineOpt["Cache-Line Optimization"]
PayloadAlign -.->|enables| SIMDOps["SIMD Operations"]
Constants Module Structure
Sources: simd-r-drive-entry-handle/src/lib.rs:1-10 simd-r-drive-entry-handle/src/constants.rs:1-19
Metadata Layout Constants
The following constants define the fixed 20-byte metadata structure at the end of each entry:
| Constant | Value | Description |
|---|---|---|
METADATA_SIZE | 20 | Total size of entry metadata in bytes |
KEY_HASH_RANGE | 0..8 | Byte range for 64-bit XXH3 key hash |
PREV_OFFSET_RANGE | 8..16 | Byte range for 64-bit previous entry offset |
CHECKSUM_RANGE | 16..20 | Byte range for 32-bit CRC32C checksum |
CHECKSUM_LEN | 4 | Explicit length of checksum field |
Sources: simd-r-drive-entry-handle/src/constants.rs:3-11
Alignment Constants
These constants enforce 64-byte alignment for all payload data:
PAYLOAD_ALIGN_LOG2: Base-2 logarithm of alignment requirement (6 = 64 bytes)PAYLOAD_ALIGNMENT: Computed alignment value (64 bytes)
This alignment matches CPU cache line sizes and enables efficient SIMD operations. The maximum pre-padding per entry is PAYLOAD_ALIGNMENT - 1 (63 bytes).
Sources: simd-r-drive-entry-handle/src/constants.rs:13-18
Constant Relationships
Sources: simd-r-drive-entry-handle/src/constants.rs:1-19
sequenceDiagram
participant Client
participant EntryHandle
participant align_or_copy
participant Memory
Client->>EntryHandle: get_payload_bytes()
EntryHandle->>Memory: read &[u8] from mmap
EntryHandle->>align_or_copy: align_or_copy<f32, 4>(bytes, f32::from_le_bytes)
alt Aligned on 64-byte boundary
align_or_copy->>Memory: validate alignment
align_or_copy-->>Client: Cow::Borrowed(&[f32])
Note over Client,Memory: Zero-copy: direct memory access\nelse Misaligned
align_or_copy->>align_or_copy: chunks_exact(4)
align_or_copy->>align_or_copy: map(f32::from_le_bytes)
align_or_copy->>align_or_copy: collect into Vec<f32>
align_or_copy-->>Client: Cow::Owned(Vec<f32>)
Note over Client,align_or_copy: Fallback: allocated copy
end
Common Patterns
Zero-Copy Data Access
Utilities like align_or_copy enable zero-copy access patterns when memory alignment allows:
Sources: src/utils/align_or_copy.rs:44-73 simd-r-drive-entry-handle/src/constants.rs:13-18
Namespace-Based Key Management
The NamespaceHasher utility enables hierarchical key organization:
Sources: src/utils.rs:11-12
Size Formatting for Logging
The format_bytes utility provides human-readable output:
| Input Bytes | Formatted Output |
|---|---|
| 1023 | "1023 B" |
| 1024 | "1.00 KB" |
| 1048576 | "1.00 MB" |
| 1073741824 | "1.00 GB" |
Sources: src/utils.rs:7-8
Configuration Parsing
The parse_buffer_size utility handles size string inputs:
| Input String | Parsed Bytes |
|---|---|
| "64" | 64 |
| "64KB" | 65,536 |
| "1MB" | 1,048,576 |
| "2GB" | 2,147,483,648 |
Sources: src/utils.rs:13-14
Integration with Core Systems
Relationship to Storage Engine
Sources: extensions/Cargo.toml:1-22 src/utils.rs:1-17 simd-r-drive-entry-handle/src/lib.rs:1-10
Performance Considerations
| Utility | Performance Impact | Use Case |
|---|---|---|
align_or_copy | Zero-copy when aligned | Deserializing typed arrays from storage |
NamespaceHasher | Single XXH3 hash | Generating hierarchical keys |
format_bytes | String allocation | Logging and user display only |
PAYLOAD_ALIGNMENT | Enables SIMD ops | Core storage layout requirement |
Sources: src/utils/align_or_copy.rs:1-74 simd-r-drive-entry-handle/src/constants.rs:13-18