Project Nayuki

B-tree set

B-trees are a generalization of binary search trees that aggregates data in blocks. When the data structure is stored on disk and is accessed, B-trees perform much faster than plain BSTs. This is why the B-tree is a crucial piece of technology that underpins databases and file systems alike.

The logic behind B-trees is not simple; in terms of code length it comes at roughly 3× the amount of code compared to AVL trees (which is one of the easier types of balanced BSTs). I wrote this code to prove to myself that I understand the algorithms and can produce a bug-free implementation. The Python version of the code is the clearest and most concise, the Java version was developed and debugged first and has static typing, and the C++ version explicitly shows when nodes get deallocated.

This B-tree implementation is pretty standard and has no noteworthy unique features (it merely shows that I have my own working implementation.) One thing to note is that I chose the one-pass variant of the insertion/deletion algorithms: When performing insertion, visiting a full node will split it immediately (instead of traversing down the tree, then promoting new keys up the tree) even if no insertion happens. When performing deletion, visiting a minimum-size node will increase its size immediately (instead of traversing down the tree, then merging deficient nodes going upward) even if no deletion happens.

Source code

Java (SE 5+)

The class implements the java.util.Set interface, so it can be used as a drop-in replacement of HashSet or TreeSet. The iterator is not fail-fast on concurrent modifications and doesn’t support element removal. The maximum set size is Integer.MAX_VALUE (i.e. 231 − 1) to respect the java.util.Collection interface, though the size limit of this data structure can be easily raised to Long.MAX_VALUE.

Python (compatible with 2 and 3)

This implementation has an interface that mimics the built-in set type, including key features like len(), in, add(), remove(), iter().

C++ (C++11 and above)

The class BTreeSet has an interface similar to std::set. Currently no iterator is implemented.

License: MIT (open source)