Project Nayuki


B-tree set

B-trees are a generalization of binary search trees that aggregates data in blocks. When the data structure is stored on disk and is accessed, B-trees perform much faster than plain BSTs. This is why the B-tree is a crucial piece of technology that underpins databases and file systems alike.

The logic behind B-trees is not simple; in terms of code length it comes at roughly 3× the amount of code compared to AVL trees (which is one of the easier types of balanced BSTs). I wrote this code to prove to myself that I understand the algorithms and can produce a bug-free implementation. The Python version of the code is the clearest and most concise, the Java version was developed and debugged first and has static typing, and the C++ version explicitly shows when nodes get deallocated.

This B-tree implementation is pretty standard and has no noteworthy unique features (it merely shows that I have my own working implementation.) One thing to note is that I chose the one-pass variant of the insertion/deletion algorithms: When performing insertion, visiting a full node will split it immediately (instead of traversing down the tree, then promoting new keys up the tree) even if no insertion happens. When performing deletion, visiting a minimum-size node will increase its size immediately (instead of traversing down the tree, then merging deficient nodes going upward) even if no deletion happens.

Source code

Java

The class implements the java.util.Set interface, so it can be used as a drop-in replacement of HashSet or TreeSet. The iterator is not fail-fast on concurrent modifications and doesn’t support element removal. The maximum set size is Integer.MAX_VALUE (i.e. 231 − 1) to respect the java.util.Collection interface, though the size limit of this data structure can be easily raised to Long.MAX_VALUE.

Python

This implementation has an interface that mimics the built-in set type, including key features like len(), in, add(), remove(), iter(). Compatible with Python 2 and 3.

C++

The class BTreeSet has an interface similar to std::set. Currently no iterator is implemented. Requires C++11.

License: MIT (open source)