arrayIndex = hugeNumber % arraySizeAn array into which data is inserted using a hash function is called a hash table. Collision occurs when two keys map to the same index. Solutions to collision:

- Open Addressing
- Separate Chaining

** Open Addressing ** - when a data item cannot be placed at the index
calculated by the hash function, another location in the aray is sought.

- Linear Probing
- Quadratic Probing
- Double Hashing

In ** Linear Probing ** we search sequentially for vacant cells. As
more items are inserted in the array clusters grow larger. It is not a
problem when the array is half full, and still not bad when it is two-
thirds full. Beyond this, however, the performance degrades seriously
as the clusters grow larger and larger. The performance is determined
by the ** Load Factor. ** The Load Factor is the ratio of the
number of items in a table to the table's size.

loadFactor = nItems / arraySize

If ** x ** is the position in the array where the collision occurs, in
** Quadratic Probing ** the step sizes are x + 1, x + 4, x + 9, x + 16,
and so on. The problem with Quadratic Probing is that it gives rise to
secondary clustering.

** Double Hashing ** or rehashing: Hash the key a second time, using
a different hash function, and use the result as the step size. For a given
key the step size remains constant throughout a probe, but it is different
for different keys. The secondary hash function must not be the same as
the primary hash function and it must not output 0 (zero).

stepSize = constant - ( key % constant )The constant is a prime number and smaller than the array size. Double hashing requires that the size of the hash table is a prime number. Using a prime number as the array size makes it impossible for any number to divide it evenly, so the probe sequence will eventually check every cell. Suppose the array size is 15 ( indices from 0 to 14 ) and that a particular key hashes to an initial index of 0 and a step size of 5. For example consider hashing the following sequence of numbers 15, 30, 45, 60, 75, 90, 105. Then the probe sequence will be 0, 5, 10, 0, 5, 10, and so on, repeating endlessly.

If the array size was 13 and the numbers were [13, 26, 39, 42, 65, 78, 91] then the step size would be [2, 4, 1, 3, 5, 2, 4]. Supposing the step size was the same for a set of numbers then the sequence of steps would be [0, 5, 10, 2, 7, 12, 4, 9, 1, 6, 11, 3] and so on. If there is even one empty cell, the probe will find it.

In ** Separate Chaining ** a data item's key is hashed to the index in
the usual way, and the item is inserted into the linked list at that index.
Other items that hash to the same index are simply added to the linked
list. In separate chaining it is normal to put N or more items into an
N-cell array. Finding the initial cell takes fast O(1) time, but searching
through a list takes time proportional to the number of items on the list
- O(m). In separate chaining the load factor can rise above 1 without
hurting performance very much. It is not important to make the table size
a prime number.

** Buckets: ** Another approach similar to separate chaining is to use
an array at each location in the hash table instead of a linked list.
Such arrays are called buckets. This approach is not as efficient as the
linked list approach, however, because of the problem of choosing the
size of the buckets. If they are too small they may overflow, and if
they are too large they waste memory.

** Hash Functions: ** A good hash function is simple so that it can
be computed quickly. A perfect hash function maps every key into a
different table location. Use a prime number as the array size.

** Hashing Strings: ** We can convert short strings to key numbers
by multiplying digit codes by powers of a constant. The three letter
word * ace * could turn into a number by calculating

key = 1 * 26This approach has the desirable attribute of involving all the characters in the input string. The calculated key value can then be hashed into an array index in the usual way:^{2}+ 3 * 26^{1}+ 5 * 26^{0}

index = key % arraySize

def hashFunc1 ( key, arraySize ): hashVal = 0 pow26 = 1 for j in range (len(key) - 1, -1, -1): letter = int (key[j]) - 96 hashVal += pow26 * letter pow26 *= 26 return hashVal % arraySizeThe

a

The ** hashFunc1() ** cannot handle long strings because the hashVal
exceeds the size of ** int. ** Notice that the key always ends up
being less than the array size. In Horner's method we can apply the
modulo (%) operator at each step in the calculation. This gives the
same result as applying the modulo operator once at the end, but avoids
the overflow.

def hashFunc2 ( key, arraySize ): hashVal = 0 for j in range (len(key)): letter = ord (key[j]) - 96 hashVal = (hashVal * 26 + letter ) % arraySize return hashVal