Hash Tables — animations that will make you understand how they work

Junmin Lee
8 min readNov 10, 2021

--

Hello, today we’re going to talk about things like how hash tables work, and about hash functions, collisions etc. After knowing how hash tables work, you will be able to understand why hash tables are so fast for searching keys.

The “not so smart” way to search

There is an array, and we’ve got a bunch of names stored in it. Let’s say you want to search for the key “Randy”.

You decide to go through the whole array one by one until you find your match. The array above is pretty simple, but in reality, arrays are are larger than this, and you would be like…

Using brute force to find Randy will make you cry like this little emoji

Hash Tables

Using brute force to find Randy, is going to be pretty inefficient. But, let’s say that there’s a way to find out which index Randy is in, and surprisingly we can find that out just with the name “Randy” itself. That would be amazing right?

This is the main idea of a hash table. The thing that can turn the word “Randy” into the index number of where Randy is stored is something we call a hash function.

We input Randy into the hash function, it spits out 82. Then we go to index 82, and we can find Randy right away. This is possible because when we were storing Randy in the array, we used the same hash function.

So when storing data, we put Randy in a hash function, which will give us 82, and we store Randy in location 82. Later on, when we try to search Randy, we put Randy in the hash function, get the hash code which would be 82 and then go to that index 82 to find Randy.

A data structure that has data stored in this way is called a hash table. You can see that hash functions play a pretty important role in these hash tables. Let’s look into hash functions a bit more.

Hash Functions

I will give you an example of how a hash function works. In the real world, people would use more complicated and sophisticated algorithms, but here, I’m will show a very simple and easy way to turn Randy into 82.

So.. here it is.

Each letter of the word RANDY will be converted to its ASCII code. That just means we will turn each letter into a number. Then sum them all up, divide it with 100 and get the remainder. Why 100? Because that is the size of the array we want to store the name into. In that case, the resulting hash code will always be something between 0 and 99.

That goes for other names too.

And when we try to find a word, Let’s say we want to find STAN, we would put STAN into the hash function, get the hash code, and easily search it in the hash table. Sounds perfect! This seems like the best data structure ever, right? But there is something we need to talk about, when we’re dealing with hash tables. And that would be something called collisions.

Collisions

In the hash function example I just gave, there might be a chance that two names can have the same hash code. Let’s say we want to store the two names RANDY and ERIC in an array of size 7. If we put them in the hash function, they will be giving out the same hash code.

So if we had stored RANDY in the array, and then try to store ERIC, the location of index number is already taken by RANDY, so we can’t store ERIC. Handling this kind of situation so that we can store both RANDY and ERIC in the array, is called collision handling. We will look into two ways of collision handling: open addressing and separate chaining. First let’s see what open addressing is.

Open Addressing

If the spot for RANDY is already taken, we just store ERIC in the next index. When we later on try to search ERIC, we first go to it’s original position [4] where the hash functions tells us to go, and if it’s not there, then we’ll look at the next index.

Although we need to add an extra step when searching, this would still be faster than searching all the indices one by one from the beginning. However, with open addressing, as the array gets filled more and more, the names are getting further and further away from where it should be. We will start to lose the benefits of hash tables. So, this would lead me to explaining the second collision handling method, which is called separate chaining, and it is very effective.

Separate Chaining

A simple and easy way to explain separate chaining would be : storing multiple names in one array.

How? By using linked lists.

Instead of storing the names in the indices of the array, each index will hold a pointer that points to the head of a linked list that has a list of names. Linked lists will be able to grow and shrink dynamically, so we don’t have much limitation towards the size of the data we want to store. Depending of the situation it doesn’t always have to be a linked list, it can be any other form of a list, but in this posting I will be explaining with a linked list.

Using the combination of an array and a linked list makes separate chaining a very fast and effective method for hash tables. So far, we talked about two ways of handling collisions : open addressing and separate chaining. In this tutorial we will look into the details of inserting, deleting and searching keys in a hash table using separate chaining.

In the continuing section of this post, we will call the names ‘keys’ because hash tables are usually used for storing key and value pairs. For example, the name Randy can be associated to a value that holds data. So if we can search the key, we will be able to access the data that we’re looking for.

Inserting, Searching and Deleting Keys

Let’s look at how we can insert a key into the hash table. We will first get the hash code by putting the key in a hash function, then go to that certain index in the hash table array, and insert it right there. But because we are doing separate chaining, we will have to store it in the linked list which we will be calling ‘buckets’.

Let’s first store RANDY in the hash table.

Next we will add ERIC who shares the same hash code with ERIC. Like we do in a linked list insertion, we will create a new node for ERIC, then make the ERIC node the head, and then let it point to RANDY.

When we are searching for the key, we will first find the bucket that holds the key by putting the key in the hash function. This will narrow down the search. Then we will search the nodes in the bucket one by one until we find the key.

If we want to delete a key, we go to the index, look for the key in the bucket, and then we disconnect it from the bucket.

We do that by letting the previous node skip the node that we’re going to delete and make it point to the next node. Whenever we are inserting, searching and deleting keys in a hash table, we are taking advantage of the benefits of an array, where we can immediately access an index, and that let’s us insert, search and delete keys faster compared to other data structures.

Time Complexity

Let’s talk about how fast it can be by looking into the time complexity of hash tables. The time complexity of hash tables for inserting, deleting and searching is expected to be constant, in an average case. Comparing that to some other data structures, that’s pretty good. You can think of hash tables with separate chaining as a combination of the benefits of arrays and linked lists. The fact that array let us get instant random access to the data, enables us to search, insert and delete data very fast.

But like always, there is a worst case scenario.

If all the key’s collide with each other, the hash table would become more like a linked list, and it would take more time to insert search and delete, the time complexity will become O(n).

Summary

  • We can search stuff really quick because a hash function will tell us where the key is.
  • Several keys can have the same hash code and cause a collision but we can handle that with open addressing or separate chaining.
  • Separate chaining lets us store multiple keys in one index by using lists.
  • Inserting, searching and deleting keys can be done in constant time (in average) however, in the worst case it can be O(n)

--

--

Junmin Lee
Junmin Lee

Written by Junmin Lee

I am a tech enthusiast, workaholic, grad student, academic tutor, cat & dog lover, constant learner, coffee addict and an idealist. Yes, both cats && dogs.

No responses yet