Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Key Indexing and Hashing

Loading…

Key Indexing and Hashing

Relevant source files

Purpose and Scope

This page documents the key indexing system and hashing mechanisms used in SIMD R Drive’s storage engine. It covers the KeyIndexer data structure, the XXH3 hashing algorithm, tag-based collision detection, and hardware acceleration features.

For information about how the index is accessed in concurrent operations, see Concurrency and Thread Safety. For details on how metadata is stored alongside payloads, see Entry Structure and Metadata.


Overview

The SIMD R Drive storage engine maintains an in-memory index that maps key hashes to file offsets, enabling O(1) lookup performance for stored entries. This index is critical for avoiding full file scans when retrieving data.

The indexing system consists of three main components:

  1. KeyIndexer : A concurrent hash map that stores packed values containing both a collision-detection tag and a file offset
  2. XXH3_64 hashing : A fast, hardware-accelerated hashing algorithm that generates 64-bit hashes from arbitrary keys
  3. Tag-based verification : A secondary collision detection mechanism that validates lookups to prevent hash collision errors

Sources: src/storage_engine/data_store.rs:1-33 README.md:158-168


KeyIndexer Structure

The KeyIndexer struct is defined in src/storage_engine/key_indexer.rs:56-59 and manages the in-memory hash index. It wraps a HashMap<u64, u64, Xxh3BuildHasher> where keys are XXH3 key hashes and values are packed 64-bit integers containing both a collision-detection tag and file offset.

graph TB
    subgraph DataStore["DataStore struct"]
KeyIndexerField["key_indexer: Arc&lt;RwLock&lt;KeyIndexer&gt;&gt;"]
end
    
    subgraph KeyIndexer["KeyIndexer struct"]
IndexField["index: HashMap&lt;u64, u64, Xxh3BuildHasher&gt;"]
MapKey["Key: u64\n(key_hash from compute_hash)"]
MapValue["Value: u64\n(packed tag / offset)"]
end
    
    subgraph Constants["Key Constants"]
TagBits["TAG_BITS = 16"]
OffsetMask["OFFSET_MASK = (1 << 48) - 1"]
end
    
    subgraph Methods["Public Methods"]
TagFromHash["tag_from_hash(key_hash) -> u16"]
TagFromKey["tag_from_key(key) -> u16"]
Pack["pack(tag, offset) -> u64"]
Unpack["unpack(packed) -> (u16, u64)"]
Build["build(mmap, tail_offset) -> Self"]
Insert["insert(key_hash, offset) -> Result"]
GetPacked["get_packed(&key_hash) -> Option<&u64>"]
GetOffset["get_offset(&key_hash) -> Option<u64>"]
Remove["remove(&key_hash) -> Option<u64>"]
end
    
 
   KeyIndexerField --> IndexField
 
   IndexField --> MapKey
 
   IndexField --> MapValue
 
   MapValue --> TagBits
 
   MapValue --> OffsetMask
    
    Pack -.uses.-> TagBits
    Pack -.uses.-> OffsetMask
    Unpack -.uses.-> TagBits
    Unpack -.uses.-> OffsetMask

Packed Value Format

The KeyIndexer stores a compact 64-bit packed value for each hash. This value is constructed by the pack function src/storage_engine/key_indexer.rs:79-85 and decoded by unpack src/storage_engine/key_indexer.rs:88-93:

BitsFieldDescriptionConstant Used
63-48Tag (16-bit)Collision detection tag from upper hash bitsTAG_BITS = 16
47-0Offset (48-bit)Absolute file offset to entry metadataOFFSET_MASK = 0xFFFFFFFFFFFF

The packing formula is: packed = (tag << (64 - TAG_BITS)) | offset

The unpacking extracts: tag = (packed >> (64 - TAG_BITS)) as u16 and offset = packed & OFFSET_MASK

Maximum Addressable File Size : The 48-bit offset field supports files up to 256 TiB (2^48 bytes). Attempting to use larger offsets will panic in debug builds due to debug_assert! checks src/storage_engine/key_indexer.rs:80-83

Sources: src/storage_engine/key_indexer.rs:9-15 src/storage_engine/key_indexer.rs:56-59 src/storage_engine/key_indexer.rs:79-93 src/storage_engine/data_store.rs31


Hashing Algorithm: XXH3_64

SIMD R Drive uses the XXH3_64 hashing algorithm from the xxhash-rust crate src/storage_engine/digest.rs XXH3 is optimized for speed and provides automatic hardware acceleration through SIMD instructions.

graph TB
    subgraph DigestModule["storage_engine::digest module"]
ComputeHash["compute_hash(key: &[u8]) -> u64"]
ComputeHashBatch["compute_hash_batch(keys: &[&[u8]]) -> Vec&lt;u64&gt;"]
ComputeChecksum["compute_checksum(payload: &[u8]) -> [u8; 4]"]
Xxh3BuildHasher["Xxh3BuildHasher struct"]
end
    
    subgraph Callers["Usage in DataStore"]
WriteMethod["write(key, payload)"]
BatchWrite["batch_write(entries)"]
KeyIndexerHashMap["HashMap&lt;u64, u64, Xxh3BuildHasher&gt;"]
end
    
 
   WriteMethod -->|calls| ComputeHash
 
   BatchWrite -->|calls| ComputeHashBatch
 
   WriteMethod -->|calls| ComputeChecksum
 
   KeyIndexerHashMap -->|uses hasher| Xxh3BuildHasher
    
 
   ComputeHash -->|produces| KeyHash["key_hash: u64"]
ComputeHashBatch -->|produces| KeyHashes["Vec&lt;u64&gt;"]

Hash Function API

The digest module exports the following functions used throughout the codebase:

FunctionSignatureImplementationUse Case
compute_hashfn(key: &[u8]) -> u64Wraps xxhash_rust::xxh3::xxh3_64Single key hashing
compute_hash_batchfn(keys: &[&[u8]]) -> Vec<u64>Parallel iterator over keysBatch write operations
compute_checksumfn(payload: &[u8]) -> [u8; 4]CRC32C from crc32fastPayload integrity checks
Xxh3BuildHasherstruct implementing BuildHasherCustom hasher for HashMapKeyIndexer HashMap hasher

The compute_hash_batch function leverages Rayon for parallel hashing when processing multiple keys simultaneously src/storage_engine/data_store.rs:839-842

graph LR
    KeyInput["Input: &[u8]"]
subgraph XXH3Crate["xxhash-rust crate"]
CPUDetect["Runtime CPU\nFeature Detection"]
subgraph x86Implementation["x86_64 Implementation"]
SSE2Path["SSE2 Path\n(always available)"]
AVX2Path["AVX2 Path\n(if cpuid detects)"]
end
        
        subgraph ARMImplementation["aarch64 Implementation"]
NEONPath["NEON Path\n(default on ARM)"]
end
        
        subgraph FallbackImplementation["Fallback"]
ScalarPath["Scalar Operations"]
end
    end
    
    Output["Output: u64 hash"]
KeyInput --> CPUDetect
 
   CPUDetect --> SSE2Path
 
   CPUDetect --> AVX2Path
 
   CPUDetect --> NEONPath
 
   CPUDetect --> ScalarPath
    
 
   SSE2Path --> Output
 
   AVX2Path --> Output
 
   NEONPath --> Output
 
   ScalarPath --> Output

Hardware Acceleration

The XXH3_64 implementation automatically detects and utilizes CPU-specific SIMD instructions at runtime:

PlatformDefault SIMDOptional FeaturesDetection Method
x86_64SSE2 (baseline)AVX2Runtime cpuid instruction
aarch64NEON (always)NoneCompile-time default
OtherScalar fallbackNoneCompile-time detection

The hardware acceleration is transparent to the application code. The compute_hash function signature remains the same regardless of which SIMD path is taken README.md:160-165

Sources: src/storage_engine/digest.rs src/storage_engine/data_store.rs:2-4 src/storage_engine/key_indexer.rs2 README.md:158-168


Tag-Based Collision Detection

While XXH3_64 produces high-quality 64-bit hashes, the system implements an additional collision detection layer using 16-bit tags. The tag is derived from the upper 16 bits of the key hash src/storage_engine/key_indexer.rs:64-66

Tag Computation Methods

Two methods generate tags for collision detection:

MethodSignatureSourceUsage Context
tag_from_hashfn(key_hash: u64) -> u16src/storage_engine/key_indexer.rs:64-66When hash is known
tag_from_keyfn(key: &[u8]) -> u16src/storage_engine/key_indexer.rs:69-72Direct from key bytes

The tag_from_hash function extracts the tag: (key_hash >> (64 - TAG_BITS)) as u16

The tag_from_key function computes: tag_from_hash(compute_hash(key))

graph TB
    subgraph WriteFlow["Write Operation Flow"]
Key["key: &[u8]"]
ComputeHash["compute_hash(key)"]
KeyHash["key_hash: u64"]
TagFromHash["KeyIndexer::tag_from_hash(key_hash)"]
Tag["tag: u16"]
WriteData["Write payload + metadata to file"]
Offset["metadata_offset: u64"]
Pack["KeyIndexer::pack(tag, offset)"]
PackedValue["packed: u64"]
Insert["key_indexer.insert(key_hash, offset)"]
end
    
 
   Key --> ComputeHash
 
   ComputeHash --> KeyHash
 
   KeyHash --> TagFromHash
 
   TagFromHash --> Tag
 
   WriteData --> Offset
 
   Tag --> Pack
 
   Offset --> Pack
 
   Pack --> PackedValue
 
   KeyHash --> Insert
 
   PackedValue --> Insert

Write-Time Tag Storage

During write operations, the tag is packed with the offset before insertion into the index:

The insert method in KeyIndexer src/storage_engine/key_indexer.rs:135-160 performs collision detection at write time by verifying that the new tag matches any existing tag for the same key hash.

Read-Time Tag Verification

The read_entry_with_context method src/storage_engine/data_store.rs:501-565 implements tag verification during reads:

The verification logic src/storage_engine/data_store.rs:513-521:

Collision Probability Analysis

The dual-layer verification provides strong collision resistance:

LayerBitsCollision ProbabilityDescription
XXH3_64 Hash64~2^-64Primary hash collision
Tag Verification16~2^-16Secondary tag collision given hash collision
Combined80~2^-80Both hash and tag must collide

With 2^16 = 65,536 possible tag values, the tag check provides sufficient discrimination for practical workloads. The KeyIndexer documentation src/storage_engine/key_indexer.rs:20-56 notes this can distinguish over 4 billion keys with ~50% collision probability (birthday bound).

Sources: src/storage_engine/key_indexer.rs:9-72 src/storage_engine/key_indexer.rs:135-160 src/storage_engine/data_store.rs:501-565

Write-Time Collision Rejection

The KeyIndexer::insert method src/storage_engine/key_indexer.rs:135-160 enforces collision detection at write time:

If a collision is detected (same hash, different tag), the write operation fails with an error src/storage_engine/data_store.rs:245-251 This prevents index corruption and ensures data integrity.

Sources: src/storage_engine/key_indexer.rs:135-160 src/storage_engine/data_store.rs:238-252


Index Building and Maintenance

Index Construction on Open

When DataStore::open src/storage_engine/data_store.rs:84-117 is called, the KeyIndexer is constructed by the static build method src/storage_engine/key_indexer.rs:98-124 which scans backward through the validated storage file:

The backward scan ensures only the most recent version of each key is indexed. Keys seen earlier in the scan (which represent newer entries) are added to the seen HashSet to skip older versions src/storage_engine/key_indexer.rs:108-111

Sources: src/storage_engine/data_store.rs:84-117 src/storage_engine/key_indexer.rs:98-124

Index Updates During Writes

After each write operation, the reindex method src/storage_engine/data_store.rs:224-259 updates the in-memory index with new key mappings:

Critical : The file must be flushed before calling reindex src/storage_engine/data_store.rs814 to ensure newly written data is visible in the new memory-mapped view. The flush guarantees that the OS has persisted the data to disk before the mmap is recreated.

The key_indexer.insert call may return an error if a hash collision is detected at write time src/storage_engine/data_store.rs:246-250 In this case, the entire batch operation is aborted to prevent an inconsistent index state.

Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:818-824


Concurrent Access and Locking

The KeyIndexer is protected by an Arc<RwLock<KeyIndexer>> wrapper src/storage_engine/data_store.rs31 enabling multiple concurrent readers while ensuring exclusive access for writers.

graph TB
    subgraph DataStoreField["DataStore struct field"]
KeyIndexerArc["key_indexer: Arc&lt;RwLock&lt;KeyIndexer&gt;&gt;"]
end
    
    subgraph ReadOperations["Read Operations (Shared Lock)"]
ReadOp["read(key)"]
BatchReadOp["batch_read(keys)"]
IterEntries["iter_entries()"]
ParIterEntries["par_iter_entries() (parallel feature)"]
end
    
    subgraph WriteOperations["Write Operations (Exclusive Lock)"]
WriteOp["write(key, payload)"]
BatchWriteOp["batch_write(entries)"]
DeleteOp["delete(key)"]
ReindexOp["reindex() (internal)"]
end
    
    subgraph LockAcquisition["Lock Acquisition"]
ReadLock["key_indexer.read().unwrap()"]
WriteLock["key_indexer.write().map_err(...)"]
end
    
 
   ReadOp --> ReadLock
 
   BatchReadOp --> ReadLock
 
   IterEntries --> ReadLock
 
   ParIterEntries --> ReadLock
    
 
   WriteOp --> WriteLock
 
   BatchWriteOp --> WriteLock
 
   DeleteOp --> WriteLock
 
   ReindexOp --> WriteLock
    
 
   ReadLock -.-> KeyIndexerArc
 
   WriteLock -.-> KeyIndexerArc

Lock Granularity

OperationLock on key_indexerLock on fileAtomicity
read(key)Read (shared)NoneLock-free read
batch_read(keys)Read (shared)NoneLock-free reads in batch
write(key, data)Write (exclusive)Write (exclusive)Full write atomicity
batch_write(...)Write (exclusive)Write (exclusive)Batch atomicity
delete(key)Write (exclusive)Write (exclusive)Tombstone write atomicity

The reindex method acquires both the mmap mutex src/storage_engine/data_store.rs232 and the key_indexer write lock src/storage_engine/data_store.rs:233-236 to atomically update both structures after a write operation.

Parallel Iteration : When the parallel feature is enabled, par_iter_entries src/storage_engine/data_store.rs:296-361 clones all packed values under a read lock, then releases the lock before parallel processing. This allows concurrent reads during parallel iteration.

Sources: src/storage_engine/data_store.rs31 src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:296-361


Performance Characteristics

Lookup Performance

The KeyIndexer provides O(1) average-case lookup performance using the HashMap src/storage_engine/key_indexer.rs58 The get_packed method src/storage_engine/key_indexer.rs:163-166 performs a single hash table lookup.

Empirical performance from README README.md:166-167:

  • 1 million random seeks (8-byte entries): typically < 1 second
  • Hash computation overhead : negligible due to SIMD acceleration
  • Tag verification overhead : minimal (one bit shift + one comparison)

Memory Overhead

Each index entry in the HashMap<u64, u64, Xxh3BuildHasher> consumes:

ComponentSize (bytes)Description
Key (u64)8XXH3 hash of the key
Value (u64)8Packed (tag
HashMap overhead~16-24Bucket pointers and metadata
Total per entry32-40Approximate overhead per unique key

For a dataset with 1 million unique keys , the KeyIndexer occupies approximately 32-40 MB of RAM. This is a small fraction of typical system memory, enabling efficient indexing even for large datasets.

Batch Operation Performance

The compute_hash_batch function src/storage_engine/data_store.rs:839-842 leverages Rayon for parallel hashing:

This parallel hashing provides near-linear speedup with CPU core count for large batches, as each key hash is computed independently.

Hardware Acceleration Impact

SIMD acceleration in XXH3_64 provides measurable performance improvements for hash-intensive workloads:

PlatformSIMD InstructionsRelative PerformanceSpeedup vs Scalar
x86_64SSE2~2-3x fasterBaseline
x86_64AVX2~3-4x faster1.5x over SSE2
aarch64NEON~2-3x fasterBaseline
FallbackScalar1x (baseline)N/A

Performance gains are most significant for:

  • batch_write operations with many keys
  • compute_hash_batch calls processing large key sets
  • Workloads with small payload sizes where hashing dominates

Sources: src/storage_engine/key_indexer.rs:56-59 src/storage_engine/data_store.rs:838-843 README.md:160-167


Error Handling and Collision Management

Collision Probability Analysis

The dual-layer verification system (64-bit hash + 16-bit tag) provides strong collision resistance as documented in src/storage_engine/key_indexer.rs:17-56:

Verification LayerBitsProbabilityDescription
XXH3_64 Hash Collision642^-64Two different keys produce same hash
Tag Collision (given hash collision)162^-16Tags match despite different keys
Combined Collision802^-80Both hash and tag must collide simultaneously

The 16-bit tag provides 65,536 distinct values. According to the birthday paradox, this supports over 4 billion keys with ~50% collision probability src/storage_engine/key_indexer.rs:48-49

Write-Time Collision Handling

The KeyIndexer::insert method src/storage_engine/key_indexer.rs:135-160 enforces strict collision detection during writes. If a tag mismatch occurs, the insert returns Err:

The reindex method propagates this error and aborts the entire batch operation src/storage_engine/data_store.rs:246-250:

This fail-fast approach ensures:

  • No partial writes that could corrupt the index
  • Deterministic error handling (writes either fully succeed or fully fail)
  • Index consistency is maintained across all operations

Read-Time Collision Handling

The read_entry_with_context method src/storage_engine/data_store.rs:501-565 detects collisions during reads when the original key is provided for verification src/storage_engine/data_store.rs:513-521:

When a read-time collision is detected:

  1. A warning is logged to help diagnose the issue
  2. None is returned to the caller (key not found)
  3. The index remains unchanged (reads do not modify state)

Read operations without the original key (e.g., when using pre-hashed keys) cannot perform tag verification and may return incorrect results if a hash collision exists. This is a tradeoff for performance in batch operations.

Sources: src/storage_engine/key_indexer.rs:17-56 src/storage_engine/key_indexer.rs:135-160 src/storage_engine/data_store.rs:238-252 src/storage_engine/data_store.rs:501-565


Summary

The key indexing and hashing system in SIMD R Drive provides:

  1. Fast lookups : O(1) hash-based access to entries
  2. Hardware acceleration : Automatic SIMD optimization on SSE2, AVX2, and NEON platforms
  3. Collision resistance : Dual-layer verification with 64-bit hashes and 16-bit tags
  4. Thread safety : Concurrent reads with exclusive writes via RwLock
  5. Low memory overhead : 16 bytes per unique key

This design enables efficient storage operations even for datasets with millions of entries, while maintaining data integrity through robust collision detection.

Sources: src/storage_engine/data_store.rs:1-1183 README.md:158-168

Dismiss

Refresh this wiki

Enter email to refresh