Project Nayuki

Compact hash map (Java)


When storing a large number of objects in memory, Java has poor space efficiency compared to lower level languages like C and C++. The reasons are many: Primitives like int must be boxed as objects when used in the collections framework, strings are stored in UTF-16, every object has a header and needs to be reached via a pointer, linked list and tree nodes cannot store the payload data inline but must hold a pointer to the object (whereas C++ templates can store data inline), etc.

Because of these internal details of Java and the libraries, using something like HashMap<String,Integer> can consume 4× the theoretical minimum storage requirement. This becomes a problem when millions of objects need to be stored in RAM and this consumption approaches the JVM or computer’s memory limit.

Here I provide a much more memory-efficient implementation of a hash table based map/dictionary abstract data structure. It has certain caveats and it takes more coding work to utilize this data structure, but it can provide huge wins in memory-constrained situations.

Source code

Compact hash map code:

There are a couple of basic ideas behind how the compact hash map works:

All of these design choices make the data structure more memory-efficient at the expense of consuming more CPU time and having more complex handling logic (more lines of code to verify and maintain).

As an extra, the compact hash map code can be reduced and specialized to implement a compact hash set:

License: MIT (open source)


The demo program was run on “regular” mode versus “compact” mode using the same system environment, waiting until program termination with an OutOfMemoryError. This was tested on Windows XP SP3 32-bit, Oracle Java 1.7.0_25, memory limit -mx300M. The data being stored is pairs of (random 10-character ASCII string, random signed 32-bit integer), which has a theoretical minimum size of 14 bytes.

We see that the compact implementation can store 2.4× the amount of data as the regular implementation while using the same amount of memory. Note that after the first few minutes, the benchmark runs slower and slower as the JVM garbage collector executes more and more frequently.

Quantity Regular hash map Compact hash map

Number of key-value pairs added to the map, an internal measurement reported on standard output.

2 990 755 7 399 340

Amount of memory used, as measured internally by the program, reported on standard output.

290.00 MiB 290.00 MiB
Mem usage

An external measurement reported by Windows Task Manager.

354.42 MiB 312.85 MiB
VM size

An external measurement reported by Windows Task Manager.

367.81 MiB 326.21 MiB
CPU time

As reported by Windows Task Manager. It gives you an idea of how long the benchmark takes. Over 80% of the time is spent on garbage collection anyway, so these numbers do not reflect the speed of the map implementations.

44 s 260 s


Exactly how does CompactHashMap use memory differently compared to the standard java.util.HashMap? Let’s look at the internal data structures and count the bytes in the data fields.

To make the example concrete, assume that we are storing 10-character ASCII strings as keys and 32-bit integers as values. Also assume that pointers are 4 bytes and each object has 4 bytes of overhead for a class pointer.

Compact hash map

Map entry

(String s, int k): Serialized into byte[] as {s in UTF-8, k in big endian} = 10 + 4 bytes = 14 bytes.
byte[]: 14-byte payload, int length, object header = 14 + 4 + 4 bytes = 22 bytes.


At the recommended load factor of 0.5, the hash table array length is about double the number of entries stored. Each array slot is a 4-byte pointer to the byte[] data. Thus an average of 8 bytes are used per entry due to the free space provisioning.


Storage per entry: 22 (byte[] object) + 8 (pointers in hash table) = 30 bytes.

Standard hash map

Map entry data

char[] (inside String): 10-char payload, int length, object header = 20 + 4 + 4 bytes = 28 bytes.
String (Java 1.7.0_06 or above): Pointer to char[], int hash, int hash32, header = 4 + 4 + 4 + 4 bytes = 16 bytes.
Integer: int value, header = 4 + 4 bytes = 8 bytes.


Pointer to key, pointer to value, int hash, pointer to next entry, header = 4 + 4 + 4 + 4 + 4 bytes = 20 bytes.


The hash table array length is approximately a multiple of the number of entries stored (for the default load factor of 0.75). Each array slot is a 4-byte pointer to a HashMap.Entry object. On average 5.33 bytes are used per entry due to the load factor.


Storage per entry: 28 (char[]) + 16 (String) + 8 (Integer) + 20 (HashMap.Entry) + 5 (pointers in hash table) = 77 bytes.