Exercises B

Note

You must complete these exercises by Wednesday of W8.

Exercise 4.2.1 (Hash of Long and Double)

Here is the formula used by Java to calculate a hash function on doubles (bits is a 64 bit array represented as a long):

return (int) bits ^ (bits >>> 32);
  • Why not just use (int) bits (casting from long to int)? Hint: The reference book suggests that a good hash function should use all bits for its calculation. Why?

  • A double in Java is represented in 64 bits as \((-1)^s \times m \times 2^{(e - 1023)}\). The first bit \(s\) is the sign, the next 11 bits represent the exponent in binary form and the last 52 bits represent the mantissa (decimal part) in binary form. Do a positive decimal number and its opposite get different hash functions?

Exercise 4.2.2 (Hash and casting of integers)

  • Is the hash function of a 32-bit integer the same as the hash function of the same integer that would be double-cast?

  • Is the hash function of a 32-bit integer the same as the hash function of the same integer that would be cast as a long? Hint: Long.toBinaryString( Double.doubleToRawLongBits(a)) displays the array of bits used to represent a double.

Exercise 4.2.3 (Hash of String: the choice of M and R constants)

The hash function for a given string as presented in the book p460 is as follows:

int hash = 0;
for (int i = 0; i < s.length(); i++)
    hash = (R * hash + s.charAt(i)) % M;

In the book’s implementation, \(M\) (the size of the hash table) is a power of two. The suggested value for \(R\) is a small prime such that 31 so that the bits of all characters play a role.

  • Suppose that \(R\) is a multiple of \(M\). What would happen in the calculation?

  • Suppose that \(R\) is an even number. What would happen?

In both cases, how many entries in the string will actually determine the hash code? What are the risks in terms of collision? Can load factor control solve the problem? Explain why using 31 is a good choice for array sizes that are powers of two? Would it also be a good choice for an array size that starts at 31 and is multiplied by two each time it needs to resize?

  • In the book implementation, \(M\) (the size of the hash table) is a power of two, initialized to 16. Suppose that at some point \(M\) is \(2^8=256\). Then two integer keys are added to a hash table implemented with separate chaining: respectively \(2560\) and \(3072\) (it is assumed that these additions do not cause any resizing of the table). As you know, the hash code of an integer key (int) is the number itself. Will adding these two values cause a collision between them in the table? If so why?

    If so can you suggest a third value that will also collide?

    If there is a collision, can it disappear the next time the table is resized as in the book implementation?

  • What do you suggest to avoid this problem? What is the \(M\) initialization and resizing policy used in java.util.HashMap? Does this solve the problem on our example?

Exercise 4.2.4 (Design of Hash function for Vehicules)

  • What would you suggest as a hash function for identifying vehicles that are strings of numbers and letters of the form: « 9X9XX99X9XX999999 » where a 9 represents a number and an « X » represents a letter from A to Z.

  • Does your hash function have the property that for a hypothetical array size N of \(10^{11} \cdot 26^6\) there is never a collision?

Exercise 4.2.5 (Design of Hash function: Citizens)

Let’s imagine that we want to build a directory of Belgian citizens and we want to be able to access each citizen by his identity card number (12 digits). We can then consider this number as the unique key identifying each citizen and use this key as an index in an array in Java. To each index would correspond a reference to an instance of the class class whose fields constitute the information that we want to store for each citizen. What is the time complexity of the following operations?

  • search for the information relative to a citizen from his identity card number. identity card number.

  • add a new citizen.

Isn’t this implementation of a dictionary even better than a hash table? Can we have a collision problem in this case? Justify.

Exercise 4.2.6 Rabin-Karp, the return of revenge

Check that you have obtained a solution in \(\mathcal{O}(n)\), and not \(\mathcal{O}(kn)\) (not counting the initial hashing of the keywords to be searched, which is in \(\mathcal{O}(km)\)), and not counting the initial hashing of the keywords to be searched which is in \(\mathcal{O}(km)\)), for last week’s Exercise 4.1.11.

Exercise 4.2.7 (Inginious)

Implement a Linear Probing Hashtable

Exercise 4.2.8 (Inginious)

This is the question 21 of the advent of code 2022. It can be efficently solved using a hashtables in combination with a linked tree data-structure.