[Solved] Distinction between a data structure’s members being stored by hash value and by index [closed]


Arrays are allocated as single, large blocks of memory and entries are accessed by their indexes. The order of entries is fixed and they need have no particular identity apart from their position in the array.

Other more complex data structures allow one to store objects identified and accessed using some sort of key. (Hash tables, sets, dictionaries, …) Let’s call these “keyed collections”. Some objects have a natural key e.g. “SocialSecurityNumber” but what should one do if a key is needed and there are no obvious candidate field/s in the data object?

Hashing is a technique which sets out to derive a “fairly unique identity” to associate with an object. Think of it as mapping numbers to (arbitrary) data.

  • Although there are some “standard hashing techniques”, this is still a field that is evolving – involving some interesting mathematics.
  • Hashes have purposes including secure hashing (to detect and prevent deliberate tampering with data), error detection and – in this case – keyed (or hashed) data access.
  • A non-secure hash algorithm should be as fast as possible BUT optimising for speed can involve a trade-off against the “fairly unique” part of the mapping requirement (while secure hashing is unavoidably – and sometimes deliberately – more slow and expensive)
  • Hashing cannot (ever) guarantee that a given hash value is unique to an object and so attention has to be given to minimising the occurrence of “collisions” and optimising how to deal with them when they occur. This is a difficult subject on its own, when you consider that data has to be treated as “arbitrary” – either appearing to be random, to contain sequences/patterns and/or with duplication.

With that said, assuming we have a “good” hash function, we can – in principle at least – store arbitrary objects in keyed collections.

Important considerations

  1. Arrays offer extremely fast sequential and random access (by index), while insert, delete and growth operations are slow.
  2. Keyed collections have the advantage you quote of offering extremely fast inserts and deletes, but they are granular in nature and introduce complexities such as memory fragmentation (memory management is an overhead, added complexity means added cost).
  3. Performance degrades rapidly when collisions start occurring.
  4. There is no such thing as a free lunch and calculating hashes is relatively expensive (compared to simply using an index value or stored key).
  5. There is a specific downside to hashes that “natural keys” and indexes do not have, which is they do not offer a natural ordering/sequence. (Processing objects sequentially according to their hash values is tantamount to processing them randomly.)

It is always important to choose data structures appropriate to their intended use (but that’s what the link you quote is all about;-)

solved Distinction between a data structure’s members being stored by hash value and by index [closed]