Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

DataStore API

Loading…

DataStore API

Relevant source files

Purpose and Scope

This document describes the public API of the DataStore type, which provides the primary interface for interacting with the append-only storage engine. It covers all public methods for reading, writing, and managing data entries, including single and batch operations, streaming interfaces, and maintenance functions.

For information about the underlying storage architecture and file format, see Storage Architecture. For details on memory management and zero-copy access patterns, see Memory Management and Zero-Copy Access. For key indexing internals, see Key Indexing and Hashing.


API Overview

Trait-Based Design

The DataStore API is organized around two core traits that separate read and write concerns:

Sources: src/storage_engine/data_store.rs:752-1182 src/storage_engine/traits.rs

graph TB
    subgraph "Core Traits"
        DSR["DataStoreReader\ntrait"]
DSW["DataStoreWriter\ntrait"]
end
    
    subgraph "Implementation"
        DS["DataStore\nstruct"]
end
    
    subgraph "Associated Types"
        EH["EntryHandle\n(DataStoreReader::EntryHandleType)"]
end
    
 
   DS -->|implements| DSR
 
   DS -->|implements| DSW
    DSR -.associated type.-> EH
    
 
   DSR -->|methods| READ["read()\nbatch_read()\nexists()\nlen()\nfile_size()"]
DSW -->|methods| WRITE["write()\nbatch_write()\nwrite_stream()\ndelete()\nrename()\ncopy()"]

Creating and Opening Storage

Opening Storage Files

MethodPurposeCreates New File
DataStore::open(&Path)Opens existing or creates new storageYes
DataStore::open_existing(&Path)Opens only existing storageNo
From<PathBuf> conversionCreates from path (panics on error)Yes

Opening or Creating:

Opening Existing Only:

From Conversion:

Sources: src/storage_engine/data_store.rs:84-117 src/storage_engine/data_store.rs:119-144 src/storage_engine/data_store.rs:53-64


Write Operations

Single Entry Write

The write method appends a key-value pair to storage:

Method Signature:

Returns: The new tail offset (absolute file position after write)

Example:

Pre-Hashed Variant:

For applications that pre-compute hashes (e.g., using NamespaceHasher):

Sources: src/storage_engine/data_store.rs:827-834 src/lib.rs:32-36


Streaming Write

For large payloads or when data comes from an I/O source:

Method Signature:

Example:

Constraints:

  • Payload cannot be empty
  • NULL-byte-only streams are rejected
  • Uses 8KB buffer internally (WRITE_STREAM_BUFFER_SIZE)

Sources: src/storage_engine/data_store.rs:753-825 src/lib.rs:84-102


Batch Write

For efficient bulk insertion:

Method Signature:

Example:

Performance Benefits:

  • Single write lock acquisition
  • Batched I/O reduces system calls
  • Single reindex call for all entries

Pre-Hashed Variant:

Sources: src/storage_engine/data_store.rs:838-843 src/storage_engine/data_store.rs:847-939


Read Operations

Single Entry Read

Method Signature:

Returns:

  • Ok(Some(EntryHandle)) if key exists and is valid
  • Ok(None) if key not found, deleted, or tag mismatch detected
  • Err(_) on lock poisoning or I/O errors

Example:

Pre-Hashed Variant:

Sources: src/storage_engine/data_store.rs:1040-1059 src/storage_engine/data_store.rs:502-565


Batch Read

Method Signature:

Example:

Performance Benefits:

  • Single lock acquisition for all reads
  • Single mmap Arc clone shared across all lookups
  • Maintains order: results[i] corresponds to keys[i]

Pre-Hashed Variant:

Sources: src/storage_engine/data_store.rs:1105-1158


Existence Checks

Lightweight checks without reading payload data:

MethodDescription
exists(&[u8]) -> Result<bool>Check if key exists
exists_with_key_hash(u64) -> Result<bool>Pre-hashed variant
len() -> Result<usize>Count of unique keys
is_empty() -> Result<bool>Check if storage has entries

Example:

Sources: src/storage_engine/data_store.rs:1030-1038 src/storage_engine/data_store.rs:1164-1177


Reading Last Entry

Retrieve the most recently written entry without key lookup:

Example:

Sources: src/storage_engine/data_store.rs:1061-1103


Reading Metadata Only

Retrieve only metadata without payload access:

Example:

Sources: src/storage_engine/data_store.rs:1160-1162


Delete Operations

Single Delete

Deletion is implemented by writing a tombstone (single NULL byte):

Method Signature:

Example:

Batch Variant:

Optimization: Only writes tombstones for keys that actually exist, avoiding unnecessary I/O.

Sources: src/storage_engine/data_store.rs:986-1024 src/lib.rs:59-62


Entry Manipulation Operations

Rename

Changes a key while preserving its value:

Method Signature:

Example:

Error Conditions:

  • Returns error if old_key == new_key
  • Returns error if old key not found

Sources: src/storage_engine/data_store.rs:941-958


Copy

Copies an entry from one storage to another:

Example:

Constraints:

  • Source and target must be different files
  • Key must exist in source

Sources: src/storage_engine/data_store.rs:960-979


Transfer

Copies an entry to another storage and then deletes it from the source:

Example:

Equivalent to:

Sources: src/storage_engine/data_store.rs:981-984


Iteration and Streaming

Entry Iterator

Provides sequential access to all valid entries:

Method Signature:

Example:

Iteration Guarantees:

  • Returns only the latest version of each key
  • Skips deleted entries (tombstones)
  • Traverses backward from tail to head
  • Zero-copy: shares Arc<Mmap> across all handles

Sources: src/storage_engine/data_store.rs:269-280 src/storage_engine/data_store.rs:44-51 src/storage_engine/entry_iterator.rs:56-127


Parallel Iterator (Feature: parallel)

When the parallel feature is enabled, process entries across multiple threads:

Example:

Implementation Strategy:

  1. Collect all entry offsets with short read lock
  2. Release lock immediately
  3. Construct EntryHandle objects in parallel across threads

Sources: src/storage_engine/data_store.rs:296-361


Entry Stream

Convert EntryHandle to a Read trait implementer for streaming:

Example:

Use Cases:

  • Streaming large entries without loading entire payload
  • Piping entry data to network sockets
  • Processing entries chunk-by-chunk

Sources: src/lib.rs:86-92 src/storage_engine/entry_stream.rs


Maintenance Operations

Compaction

Removes old versions of keys to reclaim space:

Method Signature:

Example:

Behavior:

  • Creates .bk backup file during process
  • Keeps only latest version of each key
  • Does NOT remove tombstones (deleted entries)
  • Swaps files atomically on success

⚠️ Concurrency Warning:

  • Requires &mut self but doesn’t prevent concurrent reads via Arc<DataStore>
  • Should only be called when no other threads access the storage
  • Consider external synchronization for Arc-wrapped instances

Sources: src/storage_engine/data_store.rs:706-749


Estimating Compaction Savings

Preview how much space compaction would reclaim:

Example:

Calculation:

savings = file_size - sum(unique_entry_sizes)

Sources: src/storage_engine/data_store.rs:605-616


File Size

Get the total size of the storage file:

Example:

Sources: src/storage_engine/data_store.rs:1179-1181


API Method Summary

Complete Method Reference Table

CategoryMethodKey TypePre-Hashed Variant
Writewrite(key, payload)&[u8]write_with_key_hash(hash, payload)
write_stream(key, reader)&[u8]write_stream_with_key_hash(hash, reader)
batch_write(entries)&[&[u8]]batch_write_with_key_hashes(hashes, allow_null)
Readread(key)&[u8]read_with_key_hash(hash)
batch_read(keys)&[&[u8]]batch_read_hashed_keys(hashes, keys?)
read_last_entry()N/AN/A
read_metadata(key)&[u8]-
Existenceexists(key)&[u8]exists_with_key_hash(hash)
len()N/AN/A
is_empty()N/AN/A
Deletedelete(key)&[u8]-
batch_delete(keys)&[&[u8]]batch_delete_key_hashes(hashes)
Manipulationrename(old, new)&[u8]-
copy(key, target)&[u8]-
transfer(key, target)&[u8]-
Iterationiter_entries()N/AN/A
par_iter_entries()N/AN/A
Maintenancecompact()N/AN/A
estimate_compaction_savings()N/AN/A
file_size()N/AN/A
get_path()N/AN/A

Sources: src/storage_engine/data_store.rs:752-1182 src/storage_engine/traits.rs


Return Types and Error Handling

Return Value Pattern

Most methods follow this pattern:

Where T is typically:

  • u64: Tail offset after write/delete operations
  • Option<EntryHandle>: Read operations (None if not found)
  • Vec<Option<EntryHandle>>: Batch read operations
  • bool: Existence checks
  • usize: Count operations
  • (): Maintenance operations

Error Conditions

Common error scenarios:

ErrorCondition
std::io::ErrorFile I/O failures, disk full
"Failed to acquire lock"Lock poisoning (panic in another thread)
"Key not found"Required key missing (copy, rename, transfer)
"Payload cannot be empty"Empty payload in write
"NULL-byte payloads..."Attempt to write NULL bytes directly
"Cannot rename key to itself"Rename with identical keys
"Cannot copy to same storage"Copy within same file
"Hash collision"Tag verification fails during batch write

Sources: src/storage_engine/data_store.rs


Testing and Debugging Methods

Methods Available in Test/Debug Builds

The following methods are only available when compiled with test or debug_assertions:

Purpose:

  • get_mmap_arc_for_testing(): Access memory map for validation
  • arc_ptr(): Get raw pointer for zero-copy verification

Sources: src/storage_engine/data_store.rs:630-655


Usage Patterns

Pattern: Write and Read

Pattern: Batch Operations

Pattern: Streaming Large Data

Pattern: Iterate and Process

Pattern: Pre-Hashed Keys with Namespace

Sources: src/lib.rs:20-63 src/lib.rs:66-115

Dismiss

Refresh this wiki

Enter email to refresh