Content
A hash table, or hash map, is a data structure h
containing <key, value> pairs. In absence of key conflicts, given a key k
, h[k] can be retrieved in time O(1). On average, hash tables are more efficient than search trees for locating values.
Hash Table Operations
- search: O(1) when there is no key conflict
- insert: O(1) amortized for some hash functions
- delete: O(1) amortized for some hash functions
Hash Function
Hash function maps a key directly to its address in the hash table.
Requirements of hash functions:
- Uniform distribution, to reduce collision possibility
- Avoid clustering to improve lookup efficiency
Frequently used hashing functions:
- modular hashing: h(k) k % m, for some m. m is usually the number of buckets
- multiplicative hashing: multiply k by a very large number: h(k) = floor(m * frac(k*a)), where k is the integer key, a is a real number, frac() gets the fractional part of a real number, and floor returns the next smallest integer. This is more prone to clustering.
- Cyclic redundancy checks (CRC): Suitable for longer stream of data, h(k) = CRC(stream-of-data-so-far)
- Cryptographic hash functions: make it computationally infeasible to get key from hash.
- Pre-compiled hash, if keys are available beforehand.