Ian's DBC SF Firecrackers Blog

Arrays vs Hashes

One question that frequently arises among new programmers is what's the real difference between arrays and hashes. Why are hashes used when their functionality could easily be mimicked using standard arrays?

First of all, let's review what an array is. When you declare an array, the application executing the code assigns a location in memory where that array begins. Then the index of each element of that array serves as an offset (i.e. the first element has offset 0 since it is in the slot at the start of the array, then the second element has to go in the next slot in memory and has offset 1, the third element in the next with offset 2, and so on).

So what's a hash? From a code standpoint, it's a lot like a two arrays in parallel where one array has the payload data while the other has the lookup keys. So if you want to store, say a value of 10 as a height of something, you could add (or modify) a pair where the key is "height" and the value is 10. From this sense, you might say that it's like a pseudo object about halfway between an array and an object.

What isn't immediately obvious is why the extra structure is necessary. Obviously arrays have their advantages, particularly if you know what payload data belongs where (ex. height is in index 0, width is in index 1, etc). Therefore you save all that extra storage space that would be taken up by keys. However this is only true in situations where the list of data is relatively consistent among instances of arrays. Say you wanted to store the critical details of polygons and polyhedrons. Most do have a meaningful height, but with varying numbers of sides and vertices. It is conceivable that you could create a uniform layout of an array that would store all the relevant data for a finite number of these shapes, but the more you want to capture, the larger such a map of an array gets, so each instance of the data structure would contain numerous slots that store nothing but nil, making it less efficient storagewise.

Speed is another factor to keep mindful of. For very small payload data, arrays may be faster to lookup, but the longer an array gets, the longer it takes to check if it contains an element that, say, is equal to 10 or "height" and this grows linearly, which compounds the problem if lots of different arrays are used. Hashes offer a solution that significantly reduces the growth of runtime with data table size. It will take some time to explain what exactly this does behind the scenes, but it uses a hash function (hence the name) to segment the data into smaller sections or "buckets" of memory that make it easier to search through and perform other functions like add/delete dynamically.

Therefore storing data in arrays or hashes will depend on the application of those data structures. If you need something small and simple, an array may be the way to go. If you need something larger with a complex or dynamic library of information to store, hashes might be the best bet.