This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
DataStore API
Loading…
DataStore API
Relevant source files
- src/lib.rs
- src/storage_engine.rs
- src/storage_engine/data_store.rs
- src/storage_engine/entry_iterator.rs
Purpose and Scope
This document describes the public API of the DataStore type, which provides the primary interface for interacting with the append-only storage engine. It covers all public methods for reading, writing, and managing data entries, including single and batch operations, streaming interfaces, and maintenance functions.
For information about the underlying storage architecture and file format, see Storage Architecture. For details on memory management and zero-copy access patterns, see Memory Management and Zero-Copy Access. For key indexing internals, see Key Indexing and Hashing.
API Overview
Trait-Based Design
The DataStore API is organized around two core traits that separate read and write concerns:
Sources: src/storage_engine/data_store.rs:752-1182 src/storage_engine/traits.rs
graph TB
subgraph "Core Traits"
DSR["DataStoreReader\ntrait"]
DSW["DataStoreWriter\ntrait"]
end
subgraph "Implementation"
DS["DataStore\nstruct"]
end
subgraph "Associated Types"
EH["EntryHandle\n(DataStoreReader::EntryHandleType)"]
end
DS -->|implements| DSR
DS -->|implements| DSW
DSR -.associated type.-> EH
DSR -->|methods| READ["read()\nbatch_read()\nexists()\nlen()\nfile_size()"]
DSW -->|methods| WRITE["write()\nbatch_write()\nwrite_stream()\ndelete()\nrename()\ncopy()"]
Creating and Opening Storage
Opening Storage Files
| Method | Purpose | Creates New File |
|---|---|---|
DataStore::open(&Path) | Opens existing or creates new storage | Yes |
DataStore::open_existing(&Path) | Opens only existing storage | No |
From<PathBuf> conversion | Creates from path (panics on error) | Yes |
Opening or Creating:
Opening Existing Only:
From Conversion:
Sources: src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:119-144 src/storage_engine/data_store.rs:53-64
Write Operations
Single Entry Write
The write method appends a key-value pair to storage:
Method Signature:
Returns: The new tail offset (absolute file position after write)
Example:
Pre-Hashed Variant:
For applications that pre-compute hashes (e.g., using NamespaceHasher):
Sources: src/storage_engine/data_store.rs:827-834 src/lib.rs:32-36
Streaming Write
For large payloads or when data comes from an I/O source:
Method Signature:
Example:
Constraints:
- Payload cannot be empty
- NULL-byte-only streams are rejected
- Uses 8KB buffer internally (
WRITE_STREAM_BUFFER_SIZE)
Sources: src/storage_engine/data_store.rs:753-825 src/lib.rs:84-102
Batch Write
For efficient bulk insertion:
Method Signature:
Example:
Performance Benefits:
- Single write lock acquisition
- Batched I/O reduces system calls
- Single
reindexcall for all entries
Pre-Hashed Variant:
Sources: src/storage_engine/data_store.rs:838-843 src/storage_engine/data_store.rs:847-939
Read Operations
Single Entry Read
Method Signature:
Returns:
Ok(Some(EntryHandle))if key exists and is validOk(None)if key not found, deleted, or tag mismatch detectedErr(_)on lock poisoning or I/O errors
Example:
Pre-Hashed Variant:
Sources: src/storage_engine/data_store.rs:1040-1059 src/storage_engine/data_store.rs:502-565
Batch Read
Method Signature:
Example:
Performance Benefits:
- Single lock acquisition for all reads
- Single mmap Arc clone shared across all lookups
- Maintains order:
results[i]corresponds tokeys[i]
Pre-Hashed Variant:
Sources: src/storage_engine/data_store.rs:1105-1158
Existence Checks
Lightweight checks without reading payload data:
| Method | Description |
|---|---|
exists(&[u8]) -> Result<bool> | Check if key exists |
exists_with_key_hash(u64) -> Result<bool> | Pre-hashed variant |
len() -> Result<usize> | Count of unique keys |
is_empty() -> Result<bool> | Check if storage has entries |
Example:
Sources: src/storage_engine/data_store.rs:1030-1038 src/storage_engine/data_store.rs:1164-1177
Reading Last Entry
Retrieve the most recently written entry without key lookup:
Example:
Sources: src/storage_engine/data_store.rs:1061-1103
Reading Metadata Only
Retrieve only metadata without payload access:
Example:
Sources: src/storage_engine/data_store.rs:1160-1162
Delete Operations
Single Delete
Deletion is implemented by writing a tombstone (single NULL byte):
Method Signature:
Example:
Batch Variant:
Optimization: Only writes tombstones for keys that actually exist, avoiding unnecessary I/O.
Sources: src/storage_engine/data_store.rs:986-1024 src/lib.rs:59-62
Entry Manipulation Operations
Rename
Changes a key while preserving its value:
Method Signature:
Example:
Error Conditions:
- Returns error if
old_key == new_key - Returns error if old key not found
Sources: src/storage_engine/data_store.rs:941-958
Copy
Copies an entry from one storage to another:
Example:
Constraints:
- Source and target must be different files
- Key must exist in source
Sources: src/storage_engine/data_store.rs:960-979
Transfer
Copies an entry to another storage and then deletes it from the source:
Example:
Equivalent to:
Sources: src/storage_engine/data_store.rs:981-984
Iteration and Streaming
Entry Iterator
Provides sequential access to all valid entries:
Method Signature:
Example:
Iteration Guarantees:
- Returns only the latest version of each key
- Skips deleted entries (tombstones)
- Traverses backward from tail to head
- Zero-copy: shares
Arc<Mmap>across all handles
Sources: src/storage_engine/data_store.rs:269-280 src/storage_engine/data_store.rs:44-51 src/storage_engine/entry_iterator.rs:56-127
Parallel Iterator (Feature: parallel)
When the parallel feature is enabled, process entries across multiple threads:
Example:
Implementation Strategy:
- Collect all entry offsets with short read lock
- Release lock immediately
- Construct
EntryHandleobjects in parallel across threads
Sources: src/storage_engine/data_store.rs:296-361
Entry Stream
Convert EntryHandle to a Read trait implementer for streaming:
Example:
Use Cases:
- Streaming large entries without loading entire payload
- Piping entry data to network sockets
- Processing entries chunk-by-chunk
Sources: src/lib.rs:86-92 src/storage_engine/entry_stream.rs
Maintenance Operations
Compaction
Removes old versions of keys to reclaim space:
Method Signature:
Example:
Behavior:
- Creates
.bkbackup file during process - Keeps only latest version of each key
- Does NOT remove tombstones (deleted entries)
- Swaps files atomically on success
⚠️ Concurrency Warning:
- Requires
&mut selfbut doesn’t prevent concurrent reads viaArc<DataStore> - Should only be called when no other threads access the storage
- Consider external synchronization for
Arc-wrapped instances
Sources: src/storage_engine/data_store.rs:706-749
Estimating Compaction Savings
Preview how much space compaction would reclaim:
Example:
Calculation:
savings = file_size - sum(unique_entry_sizes)
Sources: src/storage_engine/data_store.rs:605-616
File Size
Get the total size of the storage file:
Example:
Sources: src/storage_engine/data_store.rs:1179-1181
API Method Summary
Complete Method Reference Table
| Category | Method | Key Type | Pre-Hashed Variant |
|---|---|---|---|
| Write | write(key, payload) | &[u8] | write_with_key_hash(hash, payload) |
write_stream(key, reader) | &[u8] | write_stream_with_key_hash(hash, reader) | |
batch_write(entries) | &[&[u8]] | batch_write_with_key_hashes(hashes, allow_null) | |
| Read | read(key) | &[u8] | read_with_key_hash(hash) |
batch_read(keys) | &[&[u8]] | batch_read_hashed_keys(hashes, keys?) | |
read_last_entry() | N/A | N/A | |
read_metadata(key) | &[u8] | - | |
| Existence | exists(key) | &[u8] | exists_with_key_hash(hash) |
len() | N/A | N/A | |
is_empty() | N/A | N/A | |
| Delete | delete(key) | &[u8] | - |
batch_delete(keys) | &[&[u8]] | batch_delete_key_hashes(hashes) | |
| Manipulation | rename(old, new) | &[u8] | - |
copy(key, target) | &[u8] | - | |
transfer(key, target) | &[u8] | - | |
| Iteration | iter_entries() | N/A | N/A |
par_iter_entries() | N/A | N/A | |
| Maintenance | compact() | N/A | N/A |
estimate_compaction_savings() | N/A | N/A | |
file_size() | N/A | N/A | |
get_path() | N/A | N/A |
Sources: src/storage_engine/data_store.rs:752-1182 src/storage_engine/traits.rs
Return Types and Error Handling
Return Value Pattern
Most methods follow this pattern:
Where T is typically:
u64: Tail offset after write/delete operationsOption<EntryHandle>: Read operations (None if not found)Vec<Option<EntryHandle>>: Batch read operationsbool: Existence checksusize: Count operations(): Maintenance operations
Error Conditions
Common error scenarios:
| Error | Condition |
|---|---|
std::io::Error | File I/O failures, disk full |
"Failed to acquire lock" | Lock poisoning (panic in another thread) |
"Key not found" | Required key missing (copy, rename, transfer) |
"Payload cannot be empty" | Empty payload in write |
"NULL-byte payloads..." | Attempt to write NULL bytes directly |
"Cannot rename key to itself" | Rename with identical keys |
"Cannot copy to same storage" | Copy within same file |
"Hash collision" | Tag verification fails during batch write |
Sources: src/storage_engine/data_store.rs
Testing and Debugging Methods
Methods Available in Test/Debug Builds
The following methods are only available when compiled with test or debug_assertions:
Purpose:
get_mmap_arc_for_testing(): Access memory map for validationarc_ptr(): Get raw pointer for zero-copy verification
Sources: src/storage_engine/data_store.rs:630-655
Usage Patterns
Pattern: Write and Read
Pattern: Batch Operations
Pattern: Streaming Large Data
Pattern: Iterate and Process
Pattern: Pre-Hashed Keys with Namespace
Sources: src/lib.rs:20-63 src/lib.rs:66-115
Dismiss
Refresh this wiki
Enter email to refresh