A hash table is a generalization of the simpler notion of an ordinary array.
Direct addressing is a simple technique that works well when the universe U of the keys (all possible values of k) is reasonably small.
Operations on direct-address tables
we use a direct-address table, denoted by T[0..m−1], in which each position, or slot, corresponds to a key in the universe U .
- DIRECT ADDRESS SEARCH(T,k) return T [k]
- DIRECT ADDRESS INSERT(T,x) T [key[x]] ← x
- DIRECT ADDRESS DELETE(T,x) T[key[x]] ← NIL
Hash tables
With hashing, the element is stored in slot h(k),
i.e., we use a hash function h to compute the slot for the element using key k, where h maps the universe U of keys into the slots of a hash table T [0. . m − 1].
- We say that an element with key k hashes to slot h(k).
- We also say that h(k) is the hash value of key k.
Notice that with direct addressing, an element with key k is stored in slot k, which is a very special hash table.
The drawback of any hash tables
The drawback of any hash tables is the collision when two different keys are mapped to the same slot.
One effective way to resolve collisions is called chaining, which works as follows: Put all elements that hash to the same slot in a linked list.
Operations on hash tables
The directory operations on a hash table T are easy to implement when collisions are resolved by chaining.
- CHAINED HASH SEARCH(T,k)
search for an element with key k in the linked list T [h(k)] - CHAINED HASH INSERT(T,x)
insert x at the head of the linked list T[h(key[x])] - CHAINED HASH DELETE(T,x)
delete x from the linked list T [h(key[x])]
Analysis of simple uniform hashing with chaining
A simple uniform hashing assumes that any given element is equally likely to hash into any of the m slots, independently of where any other element has hashed to.
The average behavior of hashing under the simple uniform hashing assumption is much better, which takes Θ(1 + α) time.
Let the hash table contain m slots. For j = 0,...,m−1, denote by nj the length of the linked list T[j], so that n=n0+n1+...+n m−1, and the average value of nj is E[nj] = α = n/m.
the relationship between the load factor α and the time of searching/deletion of an element:Θ(1 + α)
Case 1: Unsuccessful search for key k:
The linked list T ( j) for hash value h(k) (= j) has to traversed. The expected length of
T(j) is E[nj] = α = n/m.
Case 2: Successful search for key k: Let ki = key[xi]. For keys ki and kj, denote by
Xij = I{h(ki) = h(kj)} a random variable. Pr{h(ki) = h(kj)} = 1/m. Thus, E[Xij] = 1/m.
The amount of time spent on this linked list (to identify k(i)) is proportional to the number of keys before it in the linked list, while the probability of a key k(j) in front of key k(i) in the linked list with j > i is 1/m.
The average time complexity of successful search for key ki thus is