This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Key Indexing and Hashing
Loading…
Key Indexing and Hashing
Relevant source files
- README.md
- experiments/bindings/python-ws-client/pyproject.toml
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/init.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.py
- experiments/bindings/python-ws-client/simd_r_drive_ws_client/data_store_ws_client.pyi
- src/storage_engine/data_store.rs
- src/storage_engine/key_indexer.rs
Purpose and Scope
This page documents the key indexing system and hashing mechanisms used in SIMD R Drive’s storage engine. It covers the KeyIndexer data structure, the XXH3 hashing algorithm, tag-based collision detection, and hardware acceleration features.
For information about how the index is accessed in concurrent operations, see Concurrency and Thread Safety. For details on how metadata is stored alongside payloads, see Entry Structure and Metadata.
Overview
The SIMD R Drive storage engine maintains an in-memory index that maps key hashes to file offsets, enabling O(1) lookup performance for stored entries. This index is critical for avoiding full file scans when retrieving data.
The indexing system consists of three main components:
KeyIndexer: A concurrent hash map that stores packed values containing both a collision-detection tag and a file offset- XXH3_64 hashing : A fast, hardware-accelerated hashing algorithm that generates 64-bit hashes from arbitrary keys
- Tag-based verification : A secondary collision detection mechanism that validates lookups to prevent hash collision errors
Sources: src/storage_engine/data_store.rs:1-33 README.md:158-168
KeyIndexer Structure
The KeyIndexer struct is defined in src/storage_engine/key_indexer.rs:56-59 and manages the in-memory hash index. It wraps a HashMap<u64, u64, Xxh3BuildHasher> where keys are XXH3 key hashes and values are packed 64-bit integers containing both a collision-detection tag and file offset.
graph TB
subgraph DataStore["DataStore struct"]
KeyIndexerField["key_indexer: Arc<RwLock<KeyIndexer>>"]
end
subgraph KeyIndexer["KeyIndexer struct"]
IndexField["index: HashMap<u64, u64, Xxh3BuildHasher>"]
MapKey["Key: u64\n(key_hash from compute_hash)"]
MapValue["Value: u64\n(packed tag / offset)"]
end
subgraph Constants["Key Constants"]
TagBits["TAG_BITS = 16"]
OffsetMask["OFFSET_MASK = (1 << 48) - 1"]
end
subgraph Methods["Public Methods"]
TagFromHash["tag_from_hash(key_hash) -> u16"]
TagFromKey["tag_from_key(key) -> u16"]
Pack["pack(tag, offset) -> u64"]
Unpack["unpack(packed) -> (u16, u64)"]
Build["build(mmap, tail_offset) -> Self"]
Insert["insert(key_hash, offset) -> Result"]
GetPacked["get_packed(&key_hash) -> Option<&u64>"]
GetOffset["get_offset(&key_hash) -> Option<u64>"]
Remove["remove(&key_hash) -> Option<u64>"]
end
KeyIndexerField --> IndexField
IndexField --> MapKey
IndexField --> MapValue
MapValue --> TagBits
MapValue --> OffsetMask
Pack -.uses.-> TagBits
Pack -.uses.-> OffsetMask
Unpack -.uses.-> TagBits
Unpack -.uses.-> OffsetMask
Packed Value Format
The KeyIndexer stores a compact 64-bit packed value for each hash. This value is constructed by the pack function src/storage_engine/key_indexer.rs:79-85 and decoded by unpack src/storage_engine/key_indexer.rs:88-93:
| Bits | Field | Description | Constant Used |
|---|---|---|---|
| 63-48 | Tag (16-bit) | Collision detection tag from upper hash bits | TAG_BITS = 16 |
| 47-0 | Offset (48-bit) | Absolute file offset to entry metadata | OFFSET_MASK = 0xFFFFFFFFFFFF |
The packing formula is: packed = (tag << (64 - TAG_BITS)) | offset
The unpacking extracts: tag = (packed >> (64 - TAG_BITS)) as u16 and offset = packed & OFFSET_MASK
Maximum Addressable File Size : The 48-bit offset field supports files up to 256 TiB (2^48 bytes). Attempting to use larger offsets will panic in debug builds due to debug_assert! checks src/storage_engine/key_indexer.rs:80-83
Sources: src/storage_engine/key_indexer.rs:9-15 src/storage_engine/key_indexer.rs:56-59 src/storage_engine/key_indexer.rs:79-93 src/storage_engine/data_store.rs31
Hashing Algorithm: XXH3_64
SIMD R Drive uses the XXH3_64 hashing algorithm from the xxhash-rust crate src/storage_engine/digest.rs XXH3 is optimized for speed and provides automatic hardware acceleration through SIMD instructions.
graph TB
subgraph DigestModule["storage_engine::digest module"]
ComputeHash["compute_hash(key: &[u8]) -> u64"]
ComputeHashBatch["compute_hash_batch(keys: &[&[u8]]) -> Vec<u64>"]
ComputeChecksum["compute_checksum(payload: &[u8]) -> [u8; 4]"]
Xxh3BuildHasher["Xxh3BuildHasher struct"]
end
subgraph Callers["Usage in DataStore"]
WriteMethod["write(key, payload)"]
BatchWrite["batch_write(entries)"]
KeyIndexerHashMap["HashMap<u64, u64, Xxh3BuildHasher>"]
end
WriteMethod -->|calls| ComputeHash
BatchWrite -->|calls| ComputeHashBatch
WriteMethod -->|calls| ComputeChecksum
KeyIndexerHashMap -->|uses hasher| Xxh3BuildHasher
ComputeHash -->|produces| KeyHash["key_hash: u64"]
ComputeHashBatch -->|produces| KeyHashes["Vec<u64>"]
Hash Function API
The digest module exports the following functions used throughout the codebase:
| Function | Signature | Implementation | Use Case |
|---|---|---|---|
compute_hash | fn(key: &[u8]) -> u64 | Wraps xxhash_rust::xxh3::xxh3_64 | Single key hashing |
compute_hash_batch | fn(keys: &[&[u8]]) -> Vec<u64> | Parallel iterator over keys | Batch write operations |
compute_checksum | fn(payload: &[u8]) -> [u8; 4] | CRC32C from crc32fast | Payload integrity checks |
Xxh3BuildHasher | struct implementing BuildHasher | Custom hasher for HashMap | KeyIndexer HashMap hasher |
The compute_hash_batch function leverages Rayon for parallel hashing when processing multiple keys simultaneously src/storage_engine/data_store.rs:839-842
graph LR
KeyInput["Input: &[u8]"]
subgraph XXH3Crate["xxhash-rust crate"]
CPUDetect["Runtime CPU\nFeature Detection"]
subgraph x86Implementation["x86_64 Implementation"]
SSE2Path["SSE2 Path\n(always available)"]
AVX2Path["AVX2 Path\n(if cpuid detects)"]
end
subgraph ARMImplementation["aarch64 Implementation"]
NEONPath["NEON Path\n(default on ARM)"]
end
subgraph FallbackImplementation["Fallback"]
ScalarPath["Scalar Operations"]
end
end
Output["Output: u64 hash"]
KeyInput --> CPUDetect
CPUDetect --> SSE2Path
CPUDetect --> AVX2Path
CPUDetect --> NEONPath
CPUDetect --> ScalarPath
SSE2Path --> Output
AVX2Path --> Output
NEONPath --> Output
ScalarPath --> Output
Hardware Acceleration
The XXH3_64 implementation automatically detects and utilizes CPU-specific SIMD instructions at runtime:
| Platform | Default SIMD | Optional Features | Detection Method |
|---|---|---|---|
| x86_64 | SSE2 (baseline) | AVX2 | Runtime cpuid instruction |
| aarch64 | NEON (always) | None | Compile-time default |
| Other | Scalar fallback | None | Compile-time detection |
The hardware acceleration is transparent to the application code. The compute_hash function signature remains the same regardless of which SIMD path is taken README.md:160-165
Sources: src/storage_engine/digest.rs src/storage_engine/data_store.rs:2-4 src/storage_engine/key_indexer.rs2 README.md:158-168
Tag-Based Collision Detection
While XXH3_64 produces high-quality 64-bit hashes, the system implements an additional collision detection layer using 16-bit tags. The tag is derived from the upper 16 bits of the key hash src/storage_engine/key_indexer.rs:64-66
Tag Computation Methods
Two methods generate tags for collision detection:
| Method | Signature | Source | Usage Context |
|---|---|---|---|
tag_from_hash | fn(key_hash: u64) -> u16 | src/storage_engine/key_indexer.rs:64-66 | When hash is known |
tag_from_key | fn(key: &[u8]) -> u16 | src/storage_engine/key_indexer.rs:69-72 | Direct from key bytes |
The tag_from_hash function extracts the tag: (key_hash >> (64 - TAG_BITS)) as u16
The tag_from_key function computes: tag_from_hash(compute_hash(key))
graph TB
subgraph WriteFlow["Write Operation Flow"]
Key["key: &[u8]"]
ComputeHash["compute_hash(key)"]
KeyHash["key_hash: u64"]
TagFromHash["KeyIndexer::tag_from_hash(key_hash)"]
Tag["tag: u16"]
WriteData["Write payload + metadata to file"]
Offset["metadata_offset: u64"]
Pack["KeyIndexer::pack(tag, offset)"]
PackedValue["packed: u64"]
Insert["key_indexer.insert(key_hash, offset)"]
end
Key --> ComputeHash
ComputeHash --> KeyHash
KeyHash --> TagFromHash
TagFromHash --> Tag
WriteData --> Offset
Tag --> Pack
Offset --> Pack
Pack --> PackedValue
KeyHash --> Insert
PackedValue --> Insert
Write-Time Tag Storage
During write operations, the tag is packed with the offset before insertion into the index:
The insert method in KeyIndexer src/storage_engine/key_indexer.rs:135-160 performs collision detection at write time by verifying that the new tag matches any existing tag for the same key hash.
Read-Time Tag Verification
The read_entry_with_context method src/storage_engine/data_store.rs:501-565 implements tag verification during reads:
The verification logic src/storage_engine/data_store.rs:513-521:
Collision Probability Analysis
The dual-layer verification provides strong collision resistance:
| Layer | Bits | Collision Probability | Description |
|---|---|---|---|
| XXH3_64 Hash | 64 | ~2^-64 | Primary hash collision |
| Tag Verification | 16 | ~2^-16 | Secondary tag collision given hash collision |
| Combined | 80 | ~2^-80 | Both hash and tag must collide |
With 2^16 = 65,536 possible tag values, the tag check provides sufficient discrimination for practical workloads. The KeyIndexer documentation src/storage_engine/key_indexer.rs:20-56 notes this can distinguish over 4 billion keys with ~50% collision probability (birthday bound).
Sources: src/storage_engine/key_indexer.rs:9-72 src/storage_engine/key_indexer.rs:135-160 src/storage_engine/data_store.rs:501-565
Write-Time Collision Rejection
The KeyIndexer::insert method src/storage_engine/key_indexer.rs:135-160 enforces collision detection at write time:
If a collision is detected (same hash, different tag), the write operation fails with an error src/storage_engine/data_store.rs:245-251 This prevents index corruption and ensures data integrity.
Sources: src/storage_engine/key_indexer.rs:135-160 src/storage_engine/data_store.rs:238-252
Index Building and Maintenance
Index Construction on Open
When DataStore::open src/storage_engine/data_store.rs:84-117 is called, the KeyIndexer is constructed by the static build method src/storage_engine/key_indexer.rs:98-124 which scans backward through the validated storage file:
The backward scan ensures only the most recent version of each key is indexed. Keys seen earlier in the scan (which represent newer entries) are added to the seen HashSet to skip older versions src/storage_engine/key_indexer.rs:108-111
Sources: src/storage_engine/data_store.rs:84-117 src/storage_engine/key_indexer.rs:98-124
Index Updates During Writes
After each write operation, the reindex method src/storage_engine/data_store.rs:224-259 updates the in-memory index with new key mappings:
Critical : The file must be flushed before calling reindex src/storage_engine/data_store.rs814 to ensure newly written data is visible in the new memory-mapped view. The flush guarantees that the OS has persisted the data to disk before the mmap is recreated.
The key_indexer.insert call may return an error if a hash collision is detected at write time src/storage_engine/data_store.rs:246-250 In this case, the entire batch operation is aborted to prevent an inconsistent index state.
Sources: src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:818-824
Concurrent Access and Locking
The KeyIndexer is protected by an Arc<RwLock<KeyIndexer>> wrapper src/storage_engine/data_store.rs31 enabling multiple concurrent readers while ensuring exclusive access for writers.
graph TB
subgraph DataStoreField["DataStore struct field"]
KeyIndexerArc["key_indexer: Arc<RwLock<KeyIndexer>>"]
end
subgraph ReadOperations["Read Operations (Shared Lock)"]
ReadOp["read(key)"]
BatchReadOp["batch_read(keys)"]
IterEntries["iter_entries()"]
ParIterEntries["par_iter_entries() (parallel feature)"]
end
subgraph WriteOperations["Write Operations (Exclusive Lock)"]
WriteOp["write(key, payload)"]
BatchWriteOp["batch_write(entries)"]
DeleteOp["delete(key)"]
ReindexOp["reindex() (internal)"]
end
subgraph LockAcquisition["Lock Acquisition"]
ReadLock["key_indexer.read().unwrap()"]
WriteLock["key_indexer.write().map_err(...)"]
end
ReadOp --> ReadLock
BatchReadOp --> ReadLock
IterEntries --> ReadLock
ParIterEntries --> ReadLock
WriteOp --> WriteLock
BatchWriteOp --> WriteLock
DeleteOp --> WriteLock
ReindexOp --> WriteLock
ReadLock -.-> KeyIndexerArc
WriteLock -.-> KeyIndexerArc
Lock Granularity
| Operation | Lock on key_indexer | Lock on file | Atomicity |
|---|---|---|---|
read(key) | Read (shared) | None | Lock-free read |
batch_read(keys) | Read (shared) | None | Lock-free reads in batch |
write(key, data) | Write (exclusive) | Write (exclusive) | Full write atomicity |
batch_write(...) | Write (exclusive) | Write (exclusive) | Batch atomicity |
delete(key) | Write (exclusive) | Write (exclusive) | Tombstone write atomicity |
The reindex method acquires both the mmap mutex src/storage_engine/data_store.rs232 and the key_indexer write lock src/storage_engine/data_store.rs:233-236 to atomically update both structures after a write operation.
Parallel Iteration : When the parallel feature is enabled, par_iter_entries src/storage_engine/data_store.rs:296-361 clones all packed values under a read lock, then releases the lock before parallel processing. This allows concurrent reads during parallel iteration.
Sources: src/storage_engine/data_store.rs31 src/storage_engine/data_store.rs:224-259 src/storage_engine/data_store.rs:296-361
Performance Characteristics
Lookup Performance
The KeyIndexer provides O(1) average-case lookup performance using the HashMap src/storage_engine/key_indexer.rs58 The get_packed method src/storage_engine/key_indexer.rs:163-166 performs a single hash table lookup.
Empirical performance from README README.md:166-167:
- 1 million random seeks (8-byte entries): typically < 1 second
- Hash computation overhead : negligible due to SIMD acceleration
- Tag verification overhead : minimal (one bit shift + one comparison)
Memory Overhead
Each index entry in the HashMap<u64, u64, Xxh3BuildHasher> consumes:
| Component | Size (bytes) | Description |
|---|---|---|
Key (u64) | 8 | XXH3 hash of the key |
Value (u64) | 8 | Packed (tag |
| HashMap overhead | ~16-24 | Bucket pointers and metadata |
| Total per entry | 32-40 | Approximate overhead per unique key |
For a dataset with 1 million unique keys , the KeyIndexer occupies approximately 32-40 MB of RAM. This is a small fraction of typical system memory, enabling efficient indexing even for large datasets.
Batch Operation Performance
The compute_hash_batch function src/storage_engine/data_store.rs:839-842 leverages Rayon for parallel hashing:
This parallel hashing provides near-linear speedup with CPU core count for large batches, as each key hash is computed independently.
Hardware Acceleration Impact
SIMD acceleration in XXH3_64 provides measurable performance improvements for hash-intensive workloads:
| Platform | SIMD Instructions | Relative Performance | Speedup vs Scalar |
|---|---|---|---|
| x86_64 | SSE2 | ~2-3x faster | Baseline |
| x86_64 | AVX2 | ~3-4x faster | 1.5x over SSE2 |
| aarch64 | NEON | ~2-3x faster | Baseline |
| Fallback | Scalar | 1x (baseline) | N/A |
Performance gains are most significant for:
batch_writeoperations with many keyscompute_hash_batchcalls processing large key sets- Workloads with small payload sizes where hashing dominates
Sources: src/storage_engine/key_indexer.rs:56-59 src/storage_engine/data_store.rs:838-843 README.md:160-167
Error Handling and Collision Management
Collision Probability Analysis
The dual-layer verification system (64-bit hash + 16-bit tag) provides strong collision resistance as documented in src/storage_engine/key_indexer.rs:17-56:
| Verification Layer | Bits | Probability | Description |
|---|---|---|---|
| XXH3_64 Hash Collision | 64 | 2^-64 | Two different keys produce same hash |
| Tag Collision (given hash collision) | 16 | 2^-16 | Tags match despite different keys |
| Combined Collision | 80 | 2^-80 | Both hash and tag must collide simultaneously |
The 16-bit tag provides 65,536 distinct values. According to the birthday paradox, this supports over 4 billion keys with ~50% collision probability src/storage_engine/key_indexer.rs:48-49
Write-Time Collision Handling
The KeyIndexer::insert method src/storage_engine/key_indexer.rs:135-160 enforces strict collision detection during writes. If a tag mismatch occurs, the insert returns Err:
The reindex method propagates this error and aborts the entire batch operation src/storage_engine/data_store.rs:246-250:
This fail-fast approach ensures:
- No partial writes that could corrupt the index
- Deterministic error handling (writes either fully succeed or fully fail)
- Index consistency is maintained across all operations
Read-Time Collision Handling
The read_entry_with_context method src/storage_engine/data_store.rs:501-565 detects collisions during reads when the original key is provided for verification src/storage_engine/data_store.rs:513-521:
When a read-time collision is detected:
- A warning is logged to help diagnose the issue
Noneis returned to the caller (key not found)- The index remains unchanged (reads do not modify state)
Read operations without the original key (e.g., when using pre-hashed keys) cannot perform tag verification and may return incorrect results if a hash collision exists. This is a tradeoff for performance in batch operations.
Sources: src/storage_engine/key_indexer.rs:17-56 src/storage_engine/key_indexer.rs:135-160 src/storage_engine/data_store.rs:238-252 src/storage_engine/data_store.rs:501-565
Summary
The key indexing and hashing system in SIMD R Drive provides:
- Fast lookups : O(1) hash-based access to entries
- Hardware acceleration : Automatic SIMD optimization on SSE2, AVX2, and NEON platforms
- Collision resistance : Dual-layer verification with 64-bit hashes and 16-bit tags
- Thread safety : Concurrent reads with exclusive writes via
RwLock - Low memory overhead : 16 bytes per unique key
This design enables efficient storage operations even for datasets with millions of entries, while maintaining data integrity through robust collision detection.
Sources: src/storage_engine/data_store.rs:1-1183 README.md:158-168
Dismiss
Refresh this wiki
Enter email to refresh