Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

DataStore API

Loading…

DataStore API

Relevant source files

This page documents the public API of the DataStore struct and its associated traits DataStoreReader and DataStoreWriter. These interfaces provide the primary methods for interacting with the storage engine, including write, read, delete, batch operations, and streaming methods.

Scope : This page covers the application-level API methods available to users of the storage engine. For details on the underlying storage format, see Entry Structure and Metadata. For implementation details of concurrency mechanisms, see Concurrency and Thread Safety. For key hashing internals, see Key Indexing and Hashing.


API Architecture

The DataStore API is organized around a core DataStore struct with two trait-based interfaces:

Sources : src/storage_engine/data_store.rs:26-33 src/storage_engine/traits.rs src/storage_engine.rs21

graph TB
    subgraph "Public API"
        DS["DataStore"]
DSR["DataStoreReader trait"]
DSW["DataStoreWriter trait"]
end
    
    subgraph "Core Operations"
        WRITE["Write Operations\nwrite()\nbatch_write()\nwrite_stream()"]
READ["Read Operations\nread()\nbatch_read()\nread_last_entry()"]
DELETE["Delete Operations\ndelete()\nbatch_delete()"]
MANAGE["Management Operations\nrename()\ncopy()\ntransfer()"]
ITER["Iteration\niter_entries()\npar_iter_entries()"]
end
    
    subgraph "Internal Components"
        FILE["Arc<RwLock<BufWriter<File>>>"]
MMAP["Arc<Mutex<Arc<Mmap>>>"]
INDEXER["Arc<RwLock<KeyIndexer>>"]
TAIL["AtomicU64 tail_offset"]
end
    
 
   DS --> DSR
 
   DS --> DSW
    
 
   DSR --> READ
 
   DSR --> ITER
 
   DSW --> WRITE
 
   DSW --> DELETE
 
   DSW --> MANAGE
    
 
   DS --> FILE
 
   DS --> MMAP
 
   DS --> INDEXER
 
   DS --> TAIL
    
 
   WRITE --> FILE
 
   WRITE --> INDEXER
 
   WRITE --> TAIL
 
   READ --> MMAP
 
   READ --> INDEXER
 
   DELETE --> FILE
 
   DELETE --> INDEXER

DataStore Struct

The DataStore struct is the primary interface for interacting with the storage engine. It encapsulates file I/O, memory mapping, key indexing, and concurrency control.

Core Fields

FieldTypePurpose
fileArc<RwLock<BufWriter<File>>>Buffered file writer protected by read-write lock for synchronized writes
mmapArc<Mutex<Arc<Mmap>>>Memory-mapped file reference wrapped in mutex to prevent unsafe remapping
tail_offsetAtomicU64Current end-of-file offset, atomically updated for lock-free reads
key_indexerArc<RwLock<KeyIndexer>>Hash-based index mapping key hashes to file offsets
pathPathBufFile system path to the storage file

Sources : src/storage_engine/data_store.rs:26-33

Creation Methods

Sources : src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:141-144

Opening Storage

MethodSignatureBehavior
open()pub fn open(path: &Path) -> Result<Self>Opens existing storage or creates new file if not present
open_existing()pub fn open_existing(path: &Path) -> Result<Self>Opens only existing files, returns error if file does not exist
from()impl From<PathBuf> for DataStoreConvenience constructor, panics on failure

Sources : src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:141-144 src/storage_engine/data_store.rs:53-64


Write Operations

Write operations are defined by the DataStoreWriter trait and implemented for DataStore. All write methods return Result<u64> where the u64 is the new tail offset after writing.

Sources : src/storage_engine/data_store.rs:752-939

graph TB
    subgraph "Write API Methods"
        W1["write(key, payload)"]
W2["batch_write(entries)"]
W3["write_stream(key, reader)"]
W4["write_with_key_hash(hash, payload)"]
W5["batch_write_with_key_hashes(entries)"]
W6["write_stream_with_key_hash(hash, reader)"]
end
    
    subgraph "Internal Write Path"
        LOCK["Acquire RwLock<File>"]
HASH["compute_hash()
or\ncompute_hash_batch()"]
ALIGN["Calculate prepad_len()"]
BUFFER["Buffer construction"]
SIMD["simd_copy()
for payload"]
META["EntryMetadata serialization"]
FLUSH["file.flush()"]
REINDEX["reindex()"]
end
    
 
   W1 --> HASH
 
   W2 --> HASH
 
   W3 --> HASH
 
   HASH --> W4
 
   HASH --> W5
 
   HASH --> W6
    
 
   W4 --> LOCK
 
   W5 --> LOCK
 
   W6 --> LOCK
    
 
   LOCK --> ALIGN
 
   ALIGN --> BUFFER
 
   BUFFER --> SIMD
 
   SIMD --> META
 
   META --> FLUSH
 
   FLUSH --> REINDEX
    
 
   REINDEX --> MMAP_UPDATE["Update mmap Arc"]
REINDEX --> INDEX_UPDATE["Update KeyIndexer"]
REINDEX --> TAIL_UPDATE["Update AtomicU64"]

Single Entry Write

Method : write(key: &[u8], payload: &[u8]) -> Result<u64>

Writes a single key-value pair atomically. The write is immediately flushed to disk.

Implementation details :

  • Computes XXH3 hash of key using compute_hash()
  • Delegates to write_with_key_hash()
  • Internally uses batch_write_with_key_hashes() with single entry
  • Calculates 64-byte alignment padding via prepad_len()
  • Uses simd_copy() for payload transfer
  • Appends EntryMetadata (20 bytes)
  • Calls reindex() to update mmap and index

Sources : src/storage_engine/data_store.rs:827-834

Batch Write

Method : batch_write(entries: &[(&[u8], &[u8])]) -> Result<u64>

Writes multiple key-value pairs in a single locked operation. Reduces disk I/O overhead by buffering all entries and flushing once at the end.

Process :

  1. Computes hashes for all keys via compute_hash_batch()
  2. Acquires write lock once for entire batch
  3. Builds buffer with aligned entries
  4. Writes buffer to file with single write_all()
  5. Updates index with all new mappings atomically

Sources : src/storage_engine/data_store.rs:838-843 src/storage_engine/data_store.rs:847-939

Streaming Write

Method : write_stream<R: Read>(key: &[u8], reader: &mut R) -> Result<u64>

Writes data from a Read source without requiring full in-memory allocation. Suitable for large payloads that exceed available memory.

Characteristics :

  • Uses fixed 8KB buffer (WRITE_STREAM_BUFFER_SIZE)
  • Reads chunks incrementally from source
  • Computes CRC32C checksum while streaming
  • Validates that payload is non-empty and not null-only
  • Immediately flushes after completion

Sources : src/storage_engine/data_store.rs:753-825

Pre-hashed Write Methods

For performance optimization when keys are reused, the API provides methods accepting pre-computed hashes:

MethodDescription
write_with_key_hash()Single write with pre-computed hash
batch_write_with_key_hashes()Batch write with pre-computed hashes
write_stream_with_key_hash()Streaming write with pre-computed hash

These skip the hashing step and proceed directly to storage operations.

Sources : src/storage_engine/data_store.rs:832-834 src/storage_engine/data_store.rs:847-939 src/storage_engine/data_store.rs:758-825


graph TB
    subgraph "Read API Methods"
        R1["read(key)"]
R2["batch_read(keys)"]
R3["read_last_entry()"]
R4["read_with_key_hash(hash)"]
R5["batch_read_hashed_keys(hashes)"]
R6["read_metadata(key)"]
R7["exists(key)"]
end
    
    subgraph "Internal Read Path"
        HASH_KEY["compute_hash()
or\ncompute_hash_batch()"]
INDEXER_READ["key_indexer.read()"]
MMAP_CLONE["get_mmap_arc()"]
UNPACK["KeyIndexer::unpack(packed)"]
TAG_CHECK["Verify 16-bit tag"]
BOUNDS["Bounds checking"]
PREPAD["Derive entry_start from\nprev_offset + prepad_len()"]
HANDLE["Construct EntryHandle"]
end
    
 
   R1 --> HASH_KEY
 
   R2 --> HASH_KEY
 
   HASH_KEY --> R4
 
   HASH_KEY --> R5
    
 
   R4 --> INDEXER_READ
 
   R5 --> INDEXER_READ
 
   R7 --> R1
    
 
   INDEXER_READ --> MMAP_CLONE
 
   MMAP_CLONE --> UNPACK
 
   UNPACK --> TAG_CHECK
 
   TAG_CHECK --> BOUNDS
 
   BOUNDS --> PREPAD
 
   PREPAD --> HANDLE
    
 
   R3 --> MMAP_CLONE
 
   R6 --> R1

Read Operations

Read operations are defined by the DataStoreReader trait. All reads are zero-copy when possible, returning EntryHandle references to memory-mapped regions.

Sources : src/storage_engine/data_store.rs:1027-1182

Single Entry Read

Method : read(key: &[u8]) -> Result<Option<EntryHandle>>

Retrieves a single entry by key. Returns None if key does not exist or is deleted (tombstone).

Implementation :

  • Computes key hash via compute_hash()
  • Acquires read lock on key_indexer
  • Looks up packed (tag, offset) value
  • Verifies 16-bit tag to detect hash collisions
  • Derives entry boundaries from prev_offset and prepad_len()
  • Returns EntryHandle with zero-copy access to payload

Sources : src/storage_engine/data_store.rs:1040-1049 src/storage_engine/data_store.rs:501-565

Batch Read

Method : batch_read(keys: &[&[u8]]) -> Result<Vec<Option<EntryHandle>>>

Reads multiple entries in a single index lock acquisition. More efficient than individual reads when processing multiple keys.

Process :

  1. Computes all key hashes via compute_hash_batch()
  2. Acquires single read lock on indexer
  3. Performs lookup for each hash
  4. Verifies tags for collision detection
  5. Returns vector preserving input order

Sources : src/storage_engine/data_store.rs:1105-1109 src/storage_engine/data_store.rs:1111-1158

Read Last Entry

Method : read_last_entry() -> Result<Option<EntryHandle>>

Retrieves the most recently written entry without requiring a key lookup. Uses tail_offset to locate the last metadata block.

Use case : Useful for sequential processing or determining the latest state.

Sources : src/storage_engine/data_store.rs:1061-1103

Metadata Read

Method : read_metadata(key: &[u8]) -> Result<Option<EntryMetadata>>

Retrieves only the metadata (key hash, previous offset, checksum) without accessing the payload. More efficient when only metadata is needed.

Sources : src/storage_engine/data_store.rs:1160-1162

Existence Check

Method : exists(key: &[u8]) -> Result<bool>

Checks if a key exists without retrieving the full entry. Lightweight operation that only performs index lookup and tag verification.

Sources : src/storage_engine/data_store.rs:1030-1032

Read Operations Summary

MethodReturnsLock DurationUse Case
read()Option<EntryHandle>Single index read lockStandard single-key retrieval
batch_read()Vec<Option<EntryHandle>>Single index read lockMultiple keys, order preserved
read_last_entry()Option<EntryHandle>No index lock requiredSequential or state check
read_metadata()Option<EntryMetadata>Single index read lockMetadata only, no payload
exists()boolSingle index read lockFast existence check

Sources : src/storage_engine/data_store.rs:1027-1182


graph LR
 
   DELETE_API["delete(key)\nbatch_delete(keys)"] --> HASH["compute_hash_batch()"]
HASH --> CHECK_EXISTS["Filter existing keys\nvia key_indexer.read()"]
CHECK_EXISTS --> TOMBSTONE["Create (hash, NULL_BYTE)\npairs"]
TOMBSTONE --> BATCH_WRITE["batch_write_with_key_hashes()\nwith allow_null_bytes=true"]
BATCH_WRITE --> UPDATE_INDEX["reindex()
removes\nkeys from index"]

Delete Operations

Delete operations write tombstone entries (single null byte + metadata) to mark keys as deleted. The append-only model means deletions do not reclaim space until compaction.

Sources : src/storage_engine/data_store.rs:986-1024

Single Delete

Method : delete(key: &[u8]) -> Result<u64>

Deletes a single key by writing a tombstone entry. Internally delegates to batch_delete() with a single key.

Sources : src/storage_engine/data_store.rs:986-988

Batch Delete

Method : batch_delete(keys: &[&[u8]]) -> Result<u64>

Deletes multiple keys in a single operation. Optimized to skip keys that don’t exist, avoiding unnecessary tombstone writes.

Process :

  1. Hashes all keys via compute_hash_batch()
  2. Filters to only keys present in index
  3. Constructs tombstone entries (NULL_BYTE + metadata)
  4. Calls batch_write_with_key_hashes() with allow_null_bytes=true
  5. Index updated to remove deleted keys

Sources : src/storage_engine/data_store.rs:990-1024

Pre-hashed Delete

Method : batch_delete_key_hashes(prehashed_keys: &[u64]) -> Result<u64>

Deletes keys using pre-computed hashes. Useful when hashes are already available from previous operations.

Sources : src/storage_engine/data_store.rs:995-1024


Entry Management Operations

These operations combine read, write, and delete to provide higher-level functionality for managing entries across storage instances.

Rename

Method : rename(old_key: &[u8], new_key: &[u8]) -> Result<u64>

Renames a key by:

  1. Reading the entry at old_key
  2. Creating an EntryStream from it
  3. Writing to new_key via write_stream()
  4. Deleting old_key

Constraint : old_key must exist and must differ from new_key.

Sources : src/storage_engine/data_store.rs:941-958

Copy

Method : copy(key: &[u8], target: &DataStore) -> Result<u64>

Copies an entry from the current storage to a different DataStore instance. The source entry remains unchanged.

Process :

  1. Reads entry from source
  2. Extracts payload and metadata
  3. Writes to target using write_stream_with_key_hash()
  4. Preserves original key hash

Constraint : Source and target must be different storage files.

Sources : src/storage_engine/data_store.rs:960-979 src/storage_engine/data_store.rs:587-590

Transfer

Method : transfer(key: &[u8], target: &DataStore) -> Result<u64>

Moves an entry from the current storage to a different instance by copying then deleting from source.

Equivalent to : copy() followed by delete()

Sources : src/storage_engine/data_store.rs:981-984

Entry Management Summary

OperationSource ModifiedTarget ModifiedUse Case
rename()Yes (old deleted, new added)N/ASame storage, different key
copy()NoYes (entry added)Cross-storage duplication
transfer()Yes (entry deleted)Yes (entry added)Cross-storage migration

Sources : src/storage_engine/data_store.rs:941-984


graph TB
    subgraph "Iteration Methods"
        ITER_OWNED["into_iter()\n(consumes DataStore)"]
ITER_REF["iter_entries()\n(borrows DataStore)"]
PAR_ITER["par_iter_entries()\n(parallel, requires 'parallel' feature)"]
end
    
    subgraph "EntryIterator Implementation"
        CURSOR["cursor: u64 = tail_offset"]
SEEN_KEYS["seen_keys: HashSet<u64>"]
NEXT["next()
method"]
METADATA["Read EntryMetadata"]
PREPAD_CALC["Derive entry_start from\nprev_offset + prepad_len()"]
SKIP_DUPE["Skip if key_hash in seen_keys"]
SKIP_TOMB["Skip if entry is NULL_BYTE"]
EMIT["Emit EntryHandle"]
end
    
    subgraph "Parallel Iterator"
        COLLECT["Collect key_indexer offsets"]
PAR_MAP["Rayon par_iter()"]
FILTER_MAP["filter_map constructs\nEntryHandle per thread"]
end
    
 
   ITER_OWNED --> ITER_REF
 
   ITER_REF --> CURSOR
 
   ITER_REF --> SEEN_KEYS
    
 
   CURSOR --> NEXT
 
   NEXT --> METADATA
 
   METADATA --> PREPAD_CALC
 
   PREPAD_CALC --> SKIP_DUPE
 
   SKIP_DUPE --> SKIP_TOMB
 
   SKIP_TOMB --> EMIT
    
 
   PAR_ITER --> COLLECT
 
   COLLECT --> PAR_MAP
 
   PAR_MAP --> FILTER_MAP

Iteration and Traversal

The DataStore provides multiple methods for iterating over all valid entries in the storage.

Sources : src/storage_engine/data_store.rs:269-361 src/storage_engine/entry_iterator.rs:8-127

Sequential Iteration

Method : iter_entries() -> EntryIterator

Returns an iterator that traverses all valid entries sequentially. The iterator:

  • Starts at tail_offset and walks backward via prev_offset chain
  • Tracks seen key hashes to ensure only latest versions are returned
  • Filters out tombstone entries automatically
  • Returns EntryHandle objects with zero-copy access

Sources : src/storage_engine/data_store.rs:276-280 src/storage_engine/entry_iterator.rs:41-47

Consuming Iteration

Trait : impl IntoIterator for DataStore

Allows consuming a DataStore instance to produce an iterator:

Internally delegates to iter_entries().

Sources : src/storage_engine/data_store.rs:44-50

Parallel Iteration

Method : par_iter_entries() -> impl ParallelIterator<Item = EntryHandle>

Feature gate : Requires parallel feature flag.

Provides Rayon-powered parallel iteration for high-throughput processing on multi-core systems.

Implementation strategy :

  1. Acquires read lock on key_indexer briefly
  2. Collects all packed offset values into a Vec<u64>
  3. Releases lock immediately
  4. Creates parallel iterator over collected offsets
  5. Constructs EntryHandle objects in parallel threads

Performance : Ideal for bulk operations like analytics, caching, or transformation pipelines.

Sources : src/storage_engine/data_store.rs:296-361

Iteration Comparison

MethodOwnershipConcurrencyLock Hold TimeUse Case
iter_entries()BorrowsSequentialLock per next() callGeneral-purpose scanning
into_iter()ConsumesSequentialLock per next() callOne-time full traversal
par_iter_entries()BorrowsParallelBrief upfront lockHigh-throughput processing

Sources : src/storage_engine/data_store.rs:44-50 src/storage_engine/data_store.rs:276-280 src/storage_engine/data_store.rs:296-361


Utility and Maintenance Methods

File Information

MethodReturnsDescription
len()Result<usize>Number of unique keys in storage (excludes deleted)
is_empty()Result<bool>Returns true if no keys exist
file_size()Result<u64>Total size of storage file in bytes
get_path()PathBufReturns path to storage file

Sources : src/storage_engine/data_store.rs:1164-1181 src/storage_engine/data_store.rs:265-267

Compaction

Method : compact(&mut self) -> Result<()>

Reclaims disk space by creating a new storage file containing only the latest version of each key. Tombstone entries are excluded.

Process :

  1. Creates temporary backup file with .bk extension
  2. Iterates through iter_entries() (which returns only latest versions)
  3. Copies each entry via copy_handle()
  4. Swaps temporary file with original via std::fs::rename()

Thread safety warning : Should only be called when no other threads are accessing the storage. The &mut self requirement prevents concurrent mutations but does not prevent reads if the instance is wrapped in Arc<DataStore>.

Sources : src/storage_engine/data_store.rs:706-749

Compaction Estimation

Method : estimate_compaction_savings() -> u64

Calculates potential space savings from compaction without performing the operation. Returns the difference between total file size and the size needed for unique entries only.

Sources : src/storage_engine/data_store.rs:605-616


Internal Support Methods

These methods support the public API but are not directly exposed to users.

Reindexing

Method : reindex()

Called after every write operation to:

  1. Re-map the file via init_mmap() to include new data
  2. Update key_indexer with new key-to-offset mappings
  3. Update tail_offset atomically

Acquires locks on both mmap and key_indexer to ensure consistency.

Sources : src/storage_engine/data_store.rs:224-259

Entry Context Reading

Method : read_entry_with_context()

Internal helper centralizing read logic for both read() and batch_read(). Parameters include the key hash, mmap reference, and indexer guard. Performs:

  • Index lookup
  • Tag verification (if original key provided)
  • Bounds checking
  • Tombstone detection
  • EntryHandle construction

Sources : src/storage_engine/data_store.rs:501-565

Recovery Chain Validation

Method : recover_valid_chain()

Called during open() to validate storage file integrity. Walks backward through the file following prev_offset chains until reaching offset 0. Truncates file if incomplete write detected.

Sources : src/storage_engine/data_store.rs:383-482

Alignment Calculation

Method : prepad_len(offset: u64) -> usize

Computes padding bytes required to align offset to PAYLOAD_ALIGNMENT (64 bytes). Uses bitwise operations for efficiency:

pad = (A - (offset % A)) & (A - 1)

Sources : src/storage_engine/data_store.rs:669-673


API Patterns and Conventions

Return Value Pattern

Most write operations return Result<u64> where the u64 is the new tail_offset after the operation. This allows chaining operations or validating expected file growth.

Error Handling

The API uses std::io::Result<T> consistently. Common error cases:

  • InvalidInput: Empty payloads, null-byte-only payloads, invalid rename
  • NotFound: Key does not exist (for operations requiring existing keys)
  • Lock poisoning errors (converted to std::io::Error)

Pre-hashed Key Methods

Many operations offer both standard and pre-hashed variants:

  • Standard: write(key, payload) - computes hash internally
  • Pre-hashed: write_with_key_hash(hash, payload) - uses provided hash

Pre-hashed methods enable optimization when keys are reused across multiple operations.

Batch Operations Benefit

Batch methods acquire locks once for the entire batch, significantly reducing overhead:

  • batch_write() vs multiple write() calls
  • batch_read() vs multiple read() calls
  • batch_delete() vs multiple delete() calls

Sources : src/storage_engine/data_store.rs:752-1182


Trait Implementations

DataStoreReader Trait

Defines read-only operations. Associated type EntryHandleType allows flexibility in handle implementation.

Implementors : DataStore

Key methods : read(), batch_read(), read_last_entry(), read_metadata(), exists(), len(), is_empty(), file_size()

Sources : src/storage_engine/traits.rs

DataStoreWriter Trait

Defines mutating operations. All methods take &self (not &mut self) because internal synchronization via RwLock enables safe concurrent access.

Implementors : DataStore

Key methods : write(), batch_write(), write_stream(), delete(), batch_delete(), rename(), copy(), transfer()

Sources : src/storage_engine/traits.rs

From Trait

Convenience constructor that panics on failure:

Sources : src/storage_engine/data_store.rs:53-64

IntoIterator Trait

Allows consuming iteration over storage entries. Returns EntryIterator as the iterator type.

Sources : src/storage_engine/data_store.rs:44-50

Dismiss

Refresh this wiki

Enter email to refresh

On this page

  • DataStore API
  • API Architecture
  • DataStore Struct
  • Core Fields
  • Creation Methods
  • Opening Storage
  • Write Operations
  • Single Entry Write
  • Batch Write
  • Streaming Write
  • Pre-hashed Write Methods
  • Read Operations
  • Single Entry Read
  • Batch Read
  • Read Last Entry
  • Metadata Read
  • Existence Check
  • Read Operations Summary
  • Delete Operations
  • Single Delete
  • Batch Delete
  • Pre-hashed Delete
  • Entry Management Operations
  • Rename
  • Copy
  • Transfer
  • Entry Management Summary
  • Iteration and Traversal
  • Sequential Iteration
  • Consuming Iteration
  • Parallel Iteration
  • Iteration Comparison
  • Utility and Maintenance Methods
  • File Information
  • Compaction
  • Compaction Estimation
  • Internal Support Methods
  • Reindexing
  • Entry Context Reading
  • Recovery Chain Validation
  • Alignment Calculation
  • API Patterns and Conventions
  • Return Value Pattern
  • Error Handling
  • Pre-hashed Key Methods
  • Batch Operations Benefit
  • Trait Implementations
  • DataStoreReader Trait
  • DataStoreWriter Trait
  • From Trait
  • IntoIterator Trait