B-tree set
B-trees are a generalization of binary search trees that aggregates data in blocks. When the data structure is stored on disk and is accessed, B-trees perform much faster than plain BSTs. This is why the B-tree is a crucial piece of technology that underpins databases and file systems alike.
The logic behind B-trees is not simple; in terms of code length it comes at roughly 3× the amount of code compared to AVL trees (which is one of the easier types of balanced BSTs). I wrote this code to prove to myself that I understand the algorithms and can produce a bug-free implementation. The Python version of the code is the clearest and most concise, but the Java version was developed and debugged first and has static typing. The C++ version and Rust version are coded in an idiomatic way with respect to the languages.
This B-tree implementation is pretty standard and has no noteworthy unique features (it merely shows that I have my own working implementation). In the code, the degree is defined as the minimum number of children that a non-root internal node must have. Every non-root node must have between degree−1 to 2×degree−1 keys (inclusive). For every internal node, its number of children must be exactly its number of keys plus one. One thing to note is that I chose the one-pass variant of the insertion/
Source code
- Java (SE 7+)
-
- BTreeSet.java
- BTreeSetTest.java (JUnit)
The class implements the
java.util.Set
interface, so it can be used as a drop-in replacement ofHashSet
orTreeSet
. The iterator is not fail-fast on concurrent modifications and doesn’t support element removal. The maximum set size isInteger.MAX_VALUE
(i.e. 231 − 1) to respect thejava.util.Collection
interface, though the size limit of this data structure can be easily raised toLong.MAX_VALUE
. - Python
-
This implementation has an interface that mimics the built-in
set
type, including core features likelen()
,in
,add()
,remove()
,iter()
. - C++ (C++14 and above)
-
The class
BTreeSet
has an interface similar tostd::set
. Currently no iterator is implemented. - Rust
-
The struct
BTreeSet
is modeled afterstd::
. An iterator implementation is included.collections:: HashSet
License: MIT (open source)