The Yellow Brick Roads of Python & Data Structures: A Hash Map Story
You ever wonder how all the data we have computers can so easily pull up information from seemingly thin air? Do we just take it for granted that computers are so smart, and can memories a list of our friends and their addresses as if they understand the relationship between those two data points.
Well… depending on what side of the argument you’re on -you’re either happy or not that they don’t, because we have to trick them.
Hash-maps are a relative easy, and beautiful concept to comprehend. Hash-maps use arrays to fill in information. Typically our array is a list of other arrays, and inside each array is a node of information.
[[key, value], [key, value], [key, value]…]
Here we just need to know what index we need to find Fulano’s address.
We could even used linked-lists in our array to make it hold more information going even deeper.
[ [[key, value], [key, value], [key, value]], [[key, value], [key, value], [key, value]] …[,,]…]
Here we’d need to know what index to search for, and then iterate through the list to find Fulano.
Personally I prefer the link-list option, because sometimes an index-number will be assigned to more than one key.
In the traditional array we’ll have to check if that index spot is empty. If it isn’t then we’ll move down to the next index, and see if that index is empty. This could go on for a while, and mess up other keys down the line. This is known as collisions. Using a linked-list will simply add to it, and allows for keys to share index numbers. This is known as separate chaining. Learn more here.
But how do we know which index to look at? Sometimes our array’s can be vast, and iterating through them every time we want to find Fulano is extremely inefficient… as much as Fulano would like to stay obscure.
We’ll hash it! Hashing is just performing a mathematical operation on the key. Hashing performs the encode() function on our key information -Fulano! The encoding will be the same every time so it’s a reliable constant. We’ll then take the sum of our encoding, and that’s our hash code!
Now this can produce a large number, and easily surpass the number of indices in our array.
When creating a hash-map our array size is set from the start, and cannot be changed. So to make sure we get a number that’ll work we’ll need to compress this hash code by using the modulo operator by the size of the array.
Here’s how it would look:
These two functions working together will reliably assign the same index number to our key.
Do note that from the onset when creating our HashMap class all we need is set the size of our array, and from there we’ll create a linked-list array.
From here all we need is the ability to assign our information to an index, and the ability to retrieve it.
To assign we’ll make an assign() function that takes in the key, and value to be put away. We’ll use our handy-dandy hash() and compress() functions to figure out an index to assign it. Than load our [key, vaule] into a Node class.
I will also want to check if our key is already in our array. If that’s the case I’ll just update the value since our key is already in there. If not than I’ll use the insert() function from in my LinkedList class to insert the new Node.
Retrieving it would be working backwards from this, and just iterating through the assigned index. If the key isn’t found we’d return None.
This is how we trick programs into “memorizing” huge amounts of data, and have the ability to recall better than any human mind could.
Let’s give it a test run. I’ll instantiate a HashMap named ‘locations’, than loop through an address book to assign them to locations, and see if I can’t find Fulano.
Better move quick. Who knows how long he’ll be there.