aden

/

hashes

Hashing

An exploration of different hashing types.

Overview

This project was created in my ADEN class to learn about different hashing algorithms.

All hashing algorithms use some kind of hashing function to convert an object into a key for a map. We explored means of implementing hash functions, with a focus on what to do when hash collisions occur (when two objects create the same key).

Project Planning

We created four different hashes:

  1. Linear Probing
  2. Quadratic Probing
  3. Linked List
  4. Cuckoo Hash

Linear Probing

When this type of hash has a collision, the hash searches sequentially until it finds a key that it can place an object into in the map.

Once the object has been run through the hash function and converted to a value, like 6, we check the map.

If the value at key 6 already is not empty, then we check the value at key 7, and so on, until we find an open spot.

Quadratic Probing

Quadratic probing is exactly like linear probing, except instead of increasing the index sequentially if there is a collision, you increase by a squared value. This attempts to reduce clustering, wherein keys with values are often clustered together in a map, because of the nature of linear probing.

Linked List

This type of hash does not have collisions. Instead, every value that maps to a key contains a linked list. When an object that is run through the hash already exists in the map, you add that object's value to the map's value's linked list. This makes managing the hash much simpler.

Cuckoo Hash

This final hash type works like a cuckoo's nest.

You maintain two separate hash tables at any given time. When there is a collision in the first table, you run the object through a different hash function to put the object in the second table. If there is another collision in the second table, then you evict the value at the second table, put your most recent value into the second table, and try to place the evicted value in the first table using the first hash function.

This often leads to the exchange of many items between the two tables, simply when you attempt to put a single item in the table.

Results

A deep-dive analysis on hashes.

I gained a much better understanding of how hashes are implemented – I had never considered that collisions would be a problem, because I had assumed the hash function always produced a unique hash for every input. Coding the cuckoo hash was challenging, because I had a few recursion problems that made debugging difficult. I wonder what the standard hashing algorithm is, and if there is one that can deal with any type of input, not just strings or integers.