This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
DataStore API
Loading…
DataStore API
Relevant source files
- README.md
- src/lib.rs
- src/storage_engine.rs
- src/storage_engine/data_store.rs
- src/storage_engine/entry_iterator.rs
This page documents the public API of the DataStore struct and its associated traits DataStoreReader and DataStoreWriter. These interfaces provide the primary methods for interacting with the storage engine, including write, read, delete, batch operations, and streaming methods.
Scope : This page covers the application-level API methods available to users of the storage engine. For details on the underlying storage format, see Entry Structure and Metadata. For implementation details of concurrency mechanisms, see Concurrency and Thread Safety. For key hashing internals, see Key Indexing and Hashing.
API Architecture
The DataStore API is organized around a core DataStore struct with two trait-based interfaces:
Sources : src/storage_engine/data_store.rs:26-33 src/storage_engine/traits.rs src/storage_engine.rs21
graph TB
subgraph "Public API"
DS["DataStore"]
DSR["DataStoreReader trait"]
DSW["DataStoreWriter trait"]
end
subgraph "Core Operations"
WRITE["Write Operations\nwrite()\nbatch_write()\nwrite_stream()"]
READ["Read Operations\nread()\nbatch_read()\nread_last_entry()"]
DELETE["Delete Operations\ndelete()\nbatch_delete()"]
MANAGE["Management Operations\nrename()\ncopy()\ntransfer()"]
ITER["Iteration\niter_entries()\npar_iter_entries()"]
end
subgraph "Internal Components"
FILE["Arc<RwLock<BufWriter<File>>>"]
MMAP["Arc<Mutex<Arc<Mmap>>>"]
INDEXER["Arc<RwLock<KeyIndexer>>"]
TAIL["AtomicU64 tail_offset"]
end
DS --> DSR
DS --> DSW
DSR --> READ
DSR --> ITER
DSW --> WRITE
DSW --> DELETE
DSW --> MANAGE
DS --> FILE
DS --> MMAP
DS --> INDEXER
DS --> TAIL
WRITE --> FILE
WRITE --> INDEXER
WRITE --> TAIL
READ --> MMAP
READ --> INDEXER
DELETE --> FILE
DELETE --> INDEXER
DataStore Struct
The DataStore struct is the primary interface for interacting with the storage engine. It encapsulates file I/O, memory mapping, key indexing, and concurrency control.
Core Fields
| Field | Type | Purpose |
|---|---|---|
file | Arc<RwLock<BufWriter<File>>> | Buffered file writer protected by read-write lock for synchronized writes |
mmap | Arc<Mutex<Arc<Mmap>>> | Memory-mapped file reference wrapped in mutex to prevent unsafe remapping |
tail_offset | AtomicU64 | Current end-of-file offset, atomically updated for lock-free reads |
key_indexer | Arc<RwLock<KeyIndexer>> | Hash-based index mapping key hashes to file offsets |
path | PathBuf | File system path to the storage file |
Sources : src/storage_engine/data_store.rs:26-33
Creation Methods
Sources : src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:141-144
Opening Storage
| Method | Signature | Behavior |
|---|---|---|
open() | pub fn open(path: &Path) -> Result<Self> | Opens existing storage or creates new file if not present |
open_existing() | pub fn open_existing(path: &Path) -> Result<Self> | Opens only existing files, returns error if file does not exist |
from() | impl From<PathBuf> for DataStore | Convenience constructor, panics on failure |
Sources : src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:141-144 src/storage_engine/data_store.rs:53-64
Write Operations
Write operations are defined by the DataStoreWriter trait and implemented for DataStore. All write methods return Result<u64> where the u64 is the new tail offset after writing.
Sources : src/storage_engine/data_store.rs:752-939
graph TB
subgraph "Write API Methods"
W1["write(key, payload)"]
W2["batch_write(entries)"]
W3["write_stream(key, reader)"]
W4["write_with_key_hash(hash, payload)"]
W5["batch_write_with_key_hashes(entries)"]
W6["write_stream_with_key_hash(hash, reader)"]
end
subgraph "Internal Write Path"
LOCK["Acquire RwLock<File>"]
HASH["compute_hash()
or\ncompute_hash_batch()"]
ALIGN["Calculate prepad_len()"]
BUFFER["Buffer construction"]
SIMD["simd_copy()
for payload"]
META["EntryMetadata serialization"]
FLUSH["file.flush()"]
REINDEX["reindex()"]
end
W1 --> HASH
W2 --> HASH
W3 --> HASH
HASH --> W4
HASH --> W5
HASH --> W6
W4 --> LOCK
W5 --> LOCK
W6 --> LOCK
LOCK --> ALIGN
ALIGN --> BUFFER
BUFFER --> SIMD
SIMD --> META
META --> FLUSH
FLUSH --> REINDEX
REINDEX --> MMAP_UPDATE["Update mmap Arc"]
REINDEX --> INDEX_UPDATE["Update KeyIndexer"]
REINDEX --> TAIL_UPDATE["Update AtomicU64"]
Single Entry Write
Method : write(key: &[u8], payload: &[u8]) -> Result<u64>
Writes a single key-value pair atomically. The write is immediately flushed to disk.
Implementation details :
- Computes XXH3 hash of key using
compute_hash() - Delegates to
write_with_key_hash() - Internally uses
batch_write_with_key_hashes()with single entry - Calculates 64-byte alignment padding via
prepad_len() - Uses
simd_copy()for payload transfer - Appends
EntryMetadata(20 bytes) - Calls
reindex()to update mmap and index
Sources : src/storage_engine/data_store.rs:827-834
Batch Write
Method : batch_write(entries: &[(&[u8], &[u8])]) -> Result<u64>
Writes multiple key-value pairs in a single locked operation. Reduces disk I/O overhead by buffering all entries and flushing once at the end.
Process :
- Computes hashes for all keys via
compute_hash_batch() - Acquires write lock once for entire batch
- Builds buffer with aligned entries
- Writes buffer to file with single
write_all() - Updates index with all new mappings atomically
Sources : src/storage_engine/data_store.rs:838-843 src/storage_engine/data_store.rs:847-939
Streaming Write
Method : write_stream<R: Read>(key: &[u8], reader: &mut R) -> Result<u64>
Writes data from a Read source without requiring full in-memory allocation. Suitable for large payloads that exceed available memory.
Characteristics :
- Uses fixed 8KB buffer (
WRITE_STREAM_BUFFER_SIZE) - Reads chunks incrementally from source
- Computes CRC32C checksum while streaming
- Validates that payload is non-empty and not null-only
- Immediately flushes after completion
Sources : src/storage_engine/data_store.rs:753-825
Pre-hashed Write Methods
For performance optimization when keys are reused, the API provides methods accepting pre-computed hashes:
| Method | Description |
|---|---|
write_with_key_hash() | Single write with pre-computed hash |
batch_write_with_key_hashes() | Batch write with pre-computed hashes |
write_stream_with_key_hash() | Streaming write with pre-computed hash |
These skip the hashing step and proceed directly to storage operations.
Sources : src/storage_engine/data_store.rs:832-834 src/storage_engine/data_store.rs:847-939 src/storage_engine/data_store.rs:758-825
graph TB
subgraph "Read API Methods"
R1["read(key)"]
R2["batch_read(keys)"]
R3["read_last_entry()"]
R4["read_with_key_hash(hash)"]
R5["batch_read_hashed_keys(hashes)"]
R6["read_metadata(key)"]
R7["exists(key)"]
end
subgraph "Internal Read Path"
HASH_KEY["compute_hash()
or\ncompute_hash_batch()"]
INDEXER_READ["key_indexer.read()"]
MMAP_CLONE["get_mmap_arc()"]
UNPACK["KeyIndexer::unpack(packed)"]
TAG_CHECK["Verify 16-bit tag"]
BOUNDS["Bounds checking"]
PREPAD["Derive entry_start from\nprev_offset + prepad_len()"]
HANDLE["Construct EntryHandle"]
end
R1 --> HASH_KEY
R2 --> HASH_KEY
HASH_KEY --> R4
HASH_KEY --> R5
R4 --> INDEXER_READ
R5 --> INDEXER_READ
R7 --> R1
INDEXER_READ --> MMAP_CLONE
MMAP_CLONE --> UNPACK
UNPACK --> TAG_CHECK
TAG_CHECK --> BOUNDS
BOUNDS --> PREPAD
PREPAD --> HANDLE
R3 --> MMAP_CLONE
R6 --> R1
Read Operations
Read operations are defined by the DataStoreReader trait. All reads are zero-copy when possible, returning EntryHandle references to memory-mapped regions.
Sources : src/storage_engine/data_store.rs:1027-1182
Single Entry Read
Method : read(key: &[u8]) -> Result<Option<EntryHandle>>
Retrieves a single entry by key. Returns None if key does not exist or is deleted (tombstone).
Implementation :
- Computes key hash via
compute_hash() - Acquires read lock on
key_indexer - Looks up packed (tag, offset) value
- Verifies 16-bit tag to detect hash collisions
- Derives entry boundaries from
prev_offsetandprepad_len() - Returns
EntryHandlewith zero-copy access to payload
Sources : src/storage_engine/data_store.rs:1040-1049 src/storage_engine/data_store.rs:501-565
Batch Read
Method : batch_read(keys: &[&[u8]]) -> Result<Vec<Option<EntryHandle>>>
Reads multiple entries in a single index lock acquisition. More efficient than individual reads when processing multiple keys.
Process :
- Computes all key hashes via
compute_hash_batch() - Acquires single read lock on indexer
- Performs lookup for each hash
- Verifies tags for collision detection
- Returns vector preserving input order
Sources : src/storage_engine/data_store.rs:1105-1109 src/storage_engine/data_store.rs:1111-1158
Read Last Entry
Method : read_last_entry() -> Result<Option<EntryHandle>>
Retrieves the most recently written entry without requiring a key lookup. Uses tail_offset to locate the last metadata block.
Use case : Useful for sequential processing or determining the latest state.
Sources : src/storage_engine/data_store.rs:1061-1103
Metadata Read
Method : read_metadata(key: &[u8]) -> Result<Option<EntryMetadata>>
Retrieves only the metadata (key hash, previous offset, checksum) without accessing the payload. More efficient when only metadata is needed.
Sources : src/storage_engine/data_store.rs:1160-1162
Existence Check
Method : exists(key: &[u8]) -> Result<bool>
Checks if a key exists without retrieving the full entry. Lightweight operation that only performs index lookup and tag verification.
Sources : src/storage_engine/data_store.rs:1030-1032
Read Operations Summary
| Method | Returns | Lock Duration | Use Case |
|---|---|---|---|
read() | Option<EntryHandle> | Single index read lock | Standard single-key retrieval |
batch_read() | Vec<Option<EntryHandle>> | Single index read lock | Multiple keys, order preserved |
read_last_entry() | Option<EntryHandle> | No index lock required | Sequential or state check |
read_metadata() | Option<EntryMetadata> | Single index read lock | Metadata only, no payload |
exists() | bool | Single index read lock | Fast existence check |
Sources : src/storage_engine/data_store.rs:1027-1182
graph LR DELETE_API["delete(key)\nbatch_delete(keys)"] --> HASH["compute_hash_batch()"] HASH --> CHECK_EXISTS["Filter existing keys\nvia key_indexer.read()"] CHECK_EXISTS --> TOMBSTONE["Create (hash, NULL_BYTE)\npairs"] TOMBSTONE --> BATCH_WRITE["batch_write_with_key_hashes()\nwith allow_null_bytes=true"] BATCH_WRITE --> UPDATE_INDEX["reindex() removes\nkeys from index"]
Delete Operations
Delete operations write tombstone entries (single null byte + metadata) to mark keys as deleted. The append-only model means deletions do not reclaim space until compaction.
Sources : src/storage_engine/data_store.rs:986-1024
Single Delete
Method : delete(key: &[u8]) -> Result<u64>
Deletes a single key by writing a tombstone entry. Internally delegates to batch_delete() with a single key.
Sources : src/storage_engine/data_store.rs:986-988
Batch Delete
Method : batch_delete(keys: &[&[u8]]) -> Result<u64>
Deletes multiple keys in a single operation. Optimized to skip keys that don’t exist, avoiding unnecessary tombstone writes.
Process :
- Hashes all keys via
compute_hash_batch() - Filters to only keys present in index
- Constructs tombstone entries (NULL_BYTE + metadata)
- Calls
batch_write_with_key_hashes()withallow_null_bytes=true - Index updated to remove deleted keys
Sources : src/storage_engine/data_store.rs:990-1024
Pre-hashed Delete
Method : batch_delete_key_hashes(prehashed_keys: &[u64]) -> Result<u64>
Deletes keys using pre-computed hashes. Useful when hashes are already available from previous operations.
Sources : src/storage_engine/data_store.rs:995-1024
Entry Management Operations
These operations combine read, write, and delete to provide higher-level functionality for managing entries across storage instances.
Rename
Method : rename(old_key: &[u8], new_key: &[u8]) -> Result<u64>
Renames a key by:
- Reading the entry at
old_key - Creating an
EntryStreamfrom it - Writing to
new_keyviawrite_stream() - Deleting
old_key
Constraint : old_key must exist and must differ from new_key.
Sources : src/storage_engine/data_store.rs:941-958
Copy
Method : copy(key: &[u8], target: &DataStore) -> Result<u64>
Copies an entry from the current storage to a different DataStore instance. The source entry remains unchanged.
Process :
- Reads entry from source
- Extracts payload and metadata
- Writes to target using
write_stream_with_key_hash() - Preserves original key hash
Constraint : Source and target must be different storage files.
Sources : src/storage_engine/data_store.rs:960-979 src/storage_engine/data_store.rs:587-590
Transfer
Method : transfer(key: &[u8], target: &DataStore) -> Result<u64>
Moves an entry from the current storage to a different instance by copying then deleting from source.
Equivalent to : copy() followed by delete()
Sources : src/storage_engine/data_store.rs:981-984
Entry Management Summary
| Operation | Source Modified | Target Modified | Use Case |
|---|---|---|---|
rename() | Yes (old deleted, new added) | N/A | Same storage, different key |
copy() | No | Yes (entry added) | Cross-storage duplication |
transfer() | Yes (entry deleted) | Yes (entry added) | Cross-storage migration |
Sources : src/storage_engine/data_store.rs:941-984
graph TB
subgraph "Iteration Methods"
ITER_OWNED["into_iter()\n(consumes DataStore)"]
ITER_REF["iter_entries()\n(borrows DataStore)"]
PAR_ITER["par_iter_entries()\n(parallel, requires 'parallel' feature)"]
end
subgraph "EntryIterator Implementation"
CURSOR["cursor: u64 = tail_offset"]
SEEN_KEYS["seen_keys: HashSet<u64>"]
NEXT["next()
method"]
METADATA["Read EntryMetadata"]
PREPAD_CALC["Derive entry_start from\nprev_offset + prepad_len()"]
SKIP_DUPE["Skip if key_hash in seen_keys"]
SKIP_TOMB["Skip if entry is NULL_BYTE"]
EMIT["Emit EntryHandle"]
end
subgraph "Parallel Iterator"
COLLECT["Collect key_indexer offsets"]
PAR_MAP["Rayon par_iter()"]
FILTER_MAP["filter_map constructs\nEntryHandle per thread"]
end
ITER_OWNED --> ITER_REF
ITER_REF --> CURSOR
ITER_REF --> SEEN_KEYS
CURSOR --> NEXT
NEXT --> METADATA
METADATA --> PREPAD_CALC
PREPAD_CALC --> SKIP_DUPE
SKIP_DUPE --> SKIP_TOMB
SKIP_TOMB --> EMIT
PAR_ITER --> COLLECT
COLLECT --> PAR_MAP
PAR_MAP --> FILTER_MAP
Iteration and Traversal
The DataStore provides multiple methods for iterating over all valid entries in the storage.
Sources : src/storage_engine/data_store.rs:269-361 src/storage_engine/entry_iterator.rs:8-127
Sequential Iteration
Method : iter_entries() -> EntryIterator
Returns an iterator that traverses all valid entries sequentially. The iterator:
- Starts at
tail_offsetand walks backward viaprev_offsetchain - Tracks seen key hashes to ensure only latest versions are returned
- Filters out tombstone entries automatically
- Returns
EntryHandleobjects with zero-copy access
Sources : src/storage_engine/data_store.rs:276-280 src/storage_engine/entry_iterator.rs:41-47
Consuming Iteration
Trait : impl IntoIterator for DataStore
Allows consuming a DataStore instance to produce an iterator:
Internally delegates to iter_entries().
Sources : src/storage_engine/data_store.rs:44-50
Parallel Iteration
Method : par_iter_entries() -> impl ParallelIterator<Item = EntryHandle>
Feature gate : Requires parallel feature flag.
Provides Rayon-powered parallel iteration for high-throughput processing on multi-core systems.
Implementation strategy :
- Acquires read lock on
key_indexerbriefly - Collects all packed offset values into a
Vec<u64> - Releases lock immediately
- Creates parallel iterator over collected offsets
- Constructs
EntryHandleobjects in parallel threads
Performance : Ideal for bulk operations like analytics, caching, or transformation pipelines.
Sources : src/storage_engine/data_store.rs:296-361
Iteration Comparison
| Method | Ownership | Concurrency | Lock Hold Time | Use Case |
|---|---|---|---|---|
iter_entries() | Borrows | Sequential | Lock per next() call | General-purpose scanning |
into_iter() | Consumes | Sequential | Lock per next() call | One-time full traversal |
par_iter_entries() | Borrows | Parallel | Brief upfront lock | High-throughput processing |
Sources : src/storage_engine/data_store.rs:44-50 src/storage_engine/data_store.rs:276-280 src/storage_engine/data_store.rs:296-361
Utility and Maintenance Methods
File Information
| Method | Returns | Description |
|---|---|---|
len() | Result<usize> | Number of unique keys in storage (excludes deleted) |
is_empty() | Result<bool> | Returns true if no keys exist |
file_size() | Result<u64> | Total size of storage file in bytes |
get_path() | PathBuf | Returns path to storage file |
Sources : src/storage_engine/data_store.rs:1164-1181 src/storage_engine/data_store.rs:265-267
Compaction
Method : compact(&mut self) -> Result<()>
Reclaims disk space by creating a new storage file containing only the latest version of each key. Tombstone entries are excluded.
Process :
- Creates temporary backup file with
.bkextension - Iterates through
iter_entries()(which returns only latest versions) - Copies each entry via
copy_handle() - Swaps temporary file with original via
std::fs::rename()
Thread safety warning : Should only be called when no other threads are accessing the storage. The &mut self requirement prevents concurrent mutations but does not prevent reads if the instance is wrapped in Arc<DataStore>.
Sources : src/storage_engine/data_store.rs:706-749
Compaction Estimation
Method : estimate_compaction_savings() -> u64
Calculates potential space savings from compaction without performing the operation. Returns the difference between total file size and the size needed for unique entries only.
Sources : src/storage_engine/data_store.rs:605-616
Internal Support Methods
These methods support the public API but are not directly exposed to users.
Reindexing
Method : reindex()
Called after every write operation to:
- Re-map the file via
init_mmap()to include new data - Update
key_indexerwith new key-to-offset mappings - Update
tail_offsetatomically
Acquires locks on both mmap and key_indexer to ensure consistency.
Sources : src/storage_engine/data_store.rs:224-259
Entry Context Reading
Method : read_entry_with_context()
Internal helper centralizing read logic for both read() and batch_read(). Parameters include the key hash, mmap reference, and indexer guard. Performs:
- Index lookup
- Tag verification (if original key provided)
- Bounds checking
- Tombstone detection
EntryHandleconstruction
Sources : src/storage_engine/data_store.rs:501-565
Recovery Chain Validation
Method : recover_valid_chain()
Called during open() to validate storage file integrity. Walks backward through the file following prev_offset chains until reaching offset 0. Truncates file if incomplete write detected.
Sources : src/storage_engine/data_store.rs:383-482
Alignment Calculation
Method : prepad_len(offset: u64) -> usize
Computes padding bytes required to align offset to PAYLOAD_ALIGNMENT (64 bytes). Uses bitwise operations for efficiency:
pad = (A - (offset % A)) & (A - 1)
Sources : src/storage_engine/data_store.rs:669-673
API Patterns and Conventions
Return Value Pattern
Most write operations return Result<u64> where the u64 is the new tail_offset after the operation. This allows chaining operations or validating expected file growth.
Error Handling
The API uses std::io::Result<T> consistently. Common error cases:
InvalidInput: Empty payloads, null-byte-only payloads, invalid renameNotFound: Key does not exist (for operations requiring existing keys)- Lock poisoning errors (converted to
std::io::Error)
Pre-hashed Key Methods
Many operations offer both standard and pre-hashed variants:
- Standard:
write(key, payload)- computes hash internally - Pre-hashed:
write_with_key_hash(hash, payload)- uses provided hash
Pre-hashed methods enable optimization when keys are reused across multiple operations.
Batch Operations Benefit
Batch methods acquire locks once for the entire batch, significantly reducing overhead:
batch_write()vs multiplewrite()callsbatch_read()vs multipleread()callsbatch_delete()vs multipledelete()calls
Sources : src/storage_engine/data_store.rs:752-1182
Trait Implementations
DataStoreReader Trait
Defines read-only operations. Associated type EntryHandleType allows flexibility in handle implementation.
Implementors : DataStore
Key methods : read(), batch_read(), read_last_entry(), read_metadata(), exists(), len(), is_empty(), file_size()
Sources : src/storage_engine/traits.rs
DataStoreWriter Trait
Defines mutating operations. All methods take &self (not &mut self) because internal synchronization via RwLock enables safe concurrent access.
Implementors : DataStore
Key methods : write(), batch_write(), write_stream(), delete(), batch_delete(), rename(), copy(), transfer()
Sources : src/storage_engine/traits.rs
From Trait
Convenience constructor that panics on failure:
Sources : src/storage_engine/data_store.rs:53-64
IntoIterator Trait
Allows consuming iteration over storage entries. Returns EntryIterator as the iterator type.
Sources : src/storage_engine/data_store.rs:44-50
Dismiss
Refresh this wiki
Enter email to refresh
On this page
- DataStore API
- API Architecture
- DataStore Struct
- Core Fields
- Creation Methods
- Opening Storage
- Write Operations
- Single Entry Write
- Batch Write
- Streaming Write
- Pre-hashed Write Methods
- Read Operations
- Single Entry Read
- Batch Read
- Read Last Entry
- Metadata Read
- Existence Check
- Read Operations Summary
- Delete Operations
- Single Delete
- Batch Delete
- Pre-hashed Delete
- Entry Management Operations
- Rename
- Copy
- Transfer
- Entry Management Summary
- Iteration and Traversal
- Sequential Iteration
- Consuming Iteration
- Parallel Iteration
- Iteration Comparison
- Utility and Maintenance Methods
- File Information
- Compaction
- Compaction Estimation
- Internal Support Methods
- Reindexing
- Entry Context Reading
- Recovery Chain Validation
- Alignment Calculation
- API Patterns and Conventions
- Return Value Pattern
- Error Handling
- Pre-hashed Key Methods
- Batch Operations Benefit
- Trait Implementations
- DataStoreReader Trait
- DataStoreWriter Trait
- From
Trait - IntoIterator Trait