Project Nayuki


B-tree set

B-trees are a generalization of binary search trees that aggregates data in blocks. When the data structure is stored on disk and is accessed, B-trees perform much faster than plain BSTs. This is why the B-tree is a crucial piece of technology that underpins databases and file systems alike.

The logic behind B-trees is not simple; in terms of code length it comes at roughly 3× the amount of code compared to AVL trees (which is one of the easier types of balanced BSTs). I wrote this code to prove to myself that I understand the algorithms and can produce a bug-free implementation. The Python version of the code is the clearest and most concise, but the Java version was developed and debugged first and has static typing. The C++ version and Rust version are coded in an idiomatic way with respect to the languages.

This B-tree implementation is pretty standard and has no noteworthy unique features (it merely shows that I have my own working implementation). In the code, the degree is defined as the minimum number of children that a non-root internal node must have. Every non-root node must have between degree−1 to 2×degree−1 keys (inclusive). For every internal node, its number of children must be exactly its number of keys plus one. One thing to note is that I chose the one-pass variant of the insertion/deletion algorithms: When performing insertion, visiting a full node will split it immediately (instead of traversing down the tree, then promoting new keys up the tree) even if no value gets inserted. When performing deletion, visiting a minimum-size node will increase its size immediately (instead of traversing down the tree, then merging deficient nodes going upward) even if no value gets deleted.

Source code

Java (SE 7+)

The class implements the java.util.Set interface, so it can be used as a drop-in replacement of HashSet or TreeSet. The iterator is not fail-fast on concurrent modifications and doesn’t support element removal. The maximum set size is Integer.MAX_VALUE (i.e. 231 − 1) to respect the java.util.Collection interface, though the size limit of this data structure can be easily raised to Long.MAX_VALUE.

Python

This implementation has an interface that mimics the built-in set type, including core features like len(), in, add(), remove(), iter().

C++ (C++14 and above)

The class BTreeSet has an interface similar to std::set. Currently no iterator is implemented.

Rust

The struct BTreeSet is modeled after std::collections::HashSet. An iterator implementation is included.

License: MIT (open source)

More info