Contents

Versioned Index

A non-blocking ordered versioned key-value index built on a copy-on-write B+ tree. Atomic multi-operation transactions and snapshot-isolated reads.

data-structuresconcurrencyindexing

Overview

OrderedVersionedIndex<K, V> is the core abstraction. It is built for systems where writes are serialized by design and reads are concurrent and frequent. The writer publishes writes atomically. Concurrent readers observe a complete, stable state with no locks or coordination.

It combines ordered reads, atomic multi-operation transactions, and snapshot isolation under a non-blocking reader-writer contract.

Architecture

The provided implementation is PersistentBPlusTree<K, V>, built on a copy-on-write B+ tree. Every write rebuilds only the affected root-to-leaf path, reusing untouched subtrees via structural sharing. Commit is a single atomic pointer publish.

copy-on-write B+ tree, insert key 85 R0 50 | 82 82 | 93 82 87 91 R1 82 | 93 shared 82 85 87 91 before (R0) after (R1) only the affected root-to-leaf path is copied · untouched subtrees are shared
Copy-on-write B+ tree. Inserting key 85 copies only the affected path. The left subtree is shared. Readers on R0 are unaffected.

Readers that captured a prior root continue on a stable, immutable view for the lifetime of their operation. No locks, no coordination. New reads after the pointer publish observe the updated state.

Multi-operation transactions track which nodes have already been copied into an exclusive set. Subsequent mutations to the same node mutate in-place rather than copying again, making a multi-write transaction cheaper than equivalent sequential single-operation writes.

Pluggable key storage

Tree logic is decoupled from key representation through a KeyStorage interface. Two implementations are provided:

  • ArrayKeyStorage. Keys in an Object[], ordered by a Comparator. General-purpose baseline.
  • PackedByteKeyStorage. All keys in a node packed into a single byte[] with an offset table. Reduces heap pointer indirection for byte-key workloads.

Testing

The implementation is validated through four tiers:

  • Stateful property-based tests. jqwik generates randomized action chains across 1000 runs with maxKeys sampled between 2 and 10, verified against a TreeMap oracle.
  • Seeded deterministic stress tests. 20,000 randomized operations with fixed seed for full reproducibility.
  • Structural invariant tests. After puts and removes, the tree is walked to assert key ordering, separator alignment, uniform leaf depth, and fill bounds.
  • Unit tests. Behavioral assertions across maxKeys 2–8 for both storage implementations.

Background

versioned-index was built as the indexing layer for Axis. Axis maintains an inverted index of user keys to internal revisions. Raft serializes writes via a single leader, making the system inherently SWMR. Rather than reaching for read-write locks, this index was built to match the workload directly: a copy-on-write structure where readers hold lock-free snapshots and the writer publishes atomically.