Binary array set
Introduction
The binary array set is a data structure that allows adding elements in amortized \(O(1)\) time and testing set membership in worstcase \(O((\log n)^2)\) time. The set is orderless and the data structure does not support removing elements.
It’s useful for applications that need to add elements to a set and test for duplicates quickly but don’t need to remove items (though the entire set can be discarded). One application of this structure is to remember the set of nodes already visited in a breadthfirst search of a graph, which only grows as time goes on.
Also this data structure is spaceefficient, having only \(O(\log n)\) overhead for pointers, unlike balanced tree structures that use \(O(n)\) extra space for pointers. So this is a fast set data structure that occupies practically the same amount of space as a flat array. This is especially helpful in languages like Java, where the virtual machine gives each object (such as a tree node) a significant amount of storage overhead.
Internal details
How this data structure works is that it has a sequence of \(\lfloor \log_2 n \rfloor + 1\) slots, where the slot at index \(k\) (0based counting) is either null or an ascendingsorted array of length \(2^k\). For example, this is one way of how the abstract set {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13} is represented as a binary array set:
Slot 0: [5] (length 1)
Slot 1: Null
Slot 2: [2, 3, 9, 13] (length 4)
Slot 3: [1, 4, 6, 7, 8, 10, 11, 12] (length 8)
Note that the size of the set is 13, and its binary representation is 1101_{2}. The binary representation corresponds with which slots are filled. Also note that the union of all the arrays in the binary array set is exactly equal to the set of elements being represented, no more, no less, no duplicates.
To test if a given value is an member of the set, we simply binary search in each array (in the nonnull slots) and return whether a result was found.
To add a new element, we wrap the element in a new array of length 1. If slot 0 is empty, we simply put down the new array. Otherwise we merge the new array with the array at slot 1, in ascending order, to form a new sorted array of length 2. We check if slot 1 is empty; we repeat this merge/check process on increasing slot indexes until we find an empty slot.
Source code
 Java

The class implements the
java.util.Set
interface, so it can be used as a dropin replacement ofHashSet
orTreeSet
as long asremove()
is not called. The iterator is not failfast on concurrent modifications. The maximum set size isInteger.MAX_VALUE
(i.e. \(2^{31}  1\)).  Python

This implementation only supports the methods {
add()
,clear()
}, builtin functions {len()
,iter()
}, and operatorin
– so it only supports a fraction of the methods that the builtinset
type has. Compatible with Python 2 and 3.  C++

The class is modeled after the Java version, supporting the methods
size()
,add()
,contains()
, andclear()
. It has far fewer features thanstd::set
. Requires C++11.
License: MIT (open source)
Notes
This data structure is described in Introduction to Algorithms 2nd and 3rd editions, chapter 17 “Amortized Analysis”, problem 172 “Making binary search dynamic” (page 473 in the 3rd edition).
I asked a question on Stack Overflow if there is a name for this data structure, but the result was negative. So “binary array set” is a name I made up. (The name can’t be used to find existing literature about the data structure.)
The behavior of this data structure is like a simplified version of binomial heaps with flat arrays instead of binary tree nodes.
The binary array set can be readily adapted to a map/dictionary if needed.