hamt_go

A Go implementation of the HAMT data structure.

A Hash Array Mapped Trie (HAMT) provides fast and memory-efficient access to large amounts of data held in memory. Values are stored by key. All HAMT keys are mapped into fixed length unsigned integers. In this implementation these are 64 bits long, Go uint64s. The value may be of any type but typically is a pointer to a data structure of interest.

The HAMT trie is essentially a prefix trie where the nodes in the tree are either tables or leaf entries.

In a simpler implementation at each level there is either a leaf node or a table with up to 2^w slots, where w is a small integer. The next w bits of the hash derived from the key are construed as an index into that table. The table grows dynamically to its maximum number of slots as entries are added. Whenever there is a collision at a given level, a pointer to a new table replaces the current leaf entry, and both new and old entries move to the child table, possibly recursing in the process.

The current implementation has been tested with values of 3, 4, 5, and 6 for w. Preliminary performance tests indicate that w=6 is optimal. That is, a table with 64 slots gives the best performance.

An enhancement introduces a fixed size table at the root. This is characterized by another small integer t: the root has 2^t entries and the table at the root is indexed by the first t bits of the key’s hashcode. Performance tests show that for optimal performance the root table should approach the total number of entries in size.

A further enhancement would allow dynamic resizing of the root table. This has not yet been implemented.

Whereas a normal hash table would be quite large and might periodically require expensive resizing, the HAMT data structure is roughly as fast as a hash table, but starts small and consumes more memory only as needed.

Limitations

Project Status

The code works and is reasonably well-tested. Insertand Find operations, while not yet thoroughly optimized, take approximately 0.5 to 0.6 microseconds each on a lightly-loaded server (about 1.1us each to insert a million values and verify that the value can be found using the key, where the root table has 2^18 slots). As the root table approaches the number of entries in size, this falls to about 1.0 us, or 500ns/op.

These figures were obtained from single-threaded tests. With some degree of concurrency, the code runs at about 370 ns/op.

References

Bagwell, “Ideal Hash Trees” (2001 PDF)

Wikipeida, “Hash array mapped trie”

Wikipedia, “SWAR”

Licensing

Creative Commons License
The material on this github.io website is licensed under a Creative Commons Attribution 4.0 International License.

Project software is licensed under an MIT license. Follow the SOFTWARE LICENSE link below for more information on project software licensing.


github link to project project