CISC4080, Computer Algorithms,
Lab 4

     

Home

Schedule

Assignments

Grading

 

Goal:

We will study:

  1. Practice using C++ STL hashtable
  2. Design basic hash function, and measure the collision and load factor.
  3. Practice and measure performance of universal hash function (i.e., randomized hash function).

(In-class) Preparation

  1. Copy the code over to your directory, compile and study the code to fully understand everything in the code, while at the same time,
    	cp ~zhang/public_html/cs4080/Demo/hashex1.cpp . 
    	cp ~zhang/public_html/cs4080/Demo/hashex2.cpp . 
    	cp ~zhang/public_html/cs4080/Demo/hashex3.cpp . 
    	cp ~zhang/public_html/cs4080/Demo/hashex4.cpp . 
    	
  2. (Demo in hashex1.cpp) C++ STL unsorted_set is a class template that implements a set container (i.e., it stores a set of unique elements), using a hash table data structure (in order to achieve near constant search, insert, and delete time). It's called "unordered" as the element types are not necessarily ordered, i.e., there is no necessarily an ordering relation defined on the element type.
    //from hashex1.cpp...
    int main(){
       //To declare a set of elements of a type T:
       //unsorted_set<T> setOfTypeT; 
    
    
        //a set of strings 
        unordered_set<string> words; 
        string word;
    
        //enter 5 words 
        //What if you enter duplicate words?
        for (int i=0;i<5;i++)
        {
    	cout <<"Enter a word:";
    	cin >> word;
    	words.insert (word);
        }
    
        //iterating through all words 
        for (string s: words)
        	cout <<s<<
    
    
        ...
    }
    
  3. (demo in hashex2.cpp)C++ STL unsorted_map an associative container which is a container class storing (key,value) pair (all pairs have unique keys), using a hash table data structure (in order to achieve near constant searching, inserting time). It's called "unordered" as the key types are not necessarily ordered, i.e., there is no necessarily an ordering relation defined on the key type.
  4. (demo in hashex3.cpp) Define your own hash function and comparison function. A short example of using customized hash function for string class.
  5. (demo in hashex4.cpp) Define your own class and hash function.

    demo of specifying user-defined hash function and comparator.

(Homework lab4) Driving Application

You are asked to write an application with the following functionalities:

  1. Prompt the user to input the number of coordinate points to generate. Each coordinate point is a x- and y- coordinate value (ranging from 0 to 100).
  2. Randomly generate the given number of random points, and store the set of points in an unsorted_set.
  3. Display the set of points by displaying a star character in each point. For example, if the set contains five coordinate points, (0,0),(1,1),(2,2),(1,3),(0,4), then the following will be shown:
    *   *
     * *
      *
    
    Note that here we assume the left-top corner is the origin, (0,0). For each point in our set, we display a star in its corresponding position on screen:
    0123456789...   
    *   *      0
     * *	   1
      *	   2
               3 
    	   ...
    
    Here is the pseudocode for displaying a set of points:
    for row=0 to MAX_ROW
    
       for col=0 to MAX_COL
          if (row,col) is in the set of points //A quick lookup helps!  
             cout <<"*"; //display a star
          else 
              cout <<" "; //display a space 
    
       cout << endl; //end the line, move to next row
    
    

Detailed Requirement

  1. Implement a Point class, which should have at least two data members x, y, you can also consider adding another data member to describe what color or charcter to display the point with. You should implement the following member functions for the class:
    • Constructor
    • Output operator (<<)
    • Comparison operator (==) (overload this operator as a friend function of the class. Don't overload it as member function!).
    • A few getter and setter methods
    You can get a sample implementation by running the following command:
    	cp ~zhang/public_html/cs4080/Demo/Point.cpp . 
    		
  2. Implement a hash function for Point class as function object, where the sum of x and y are calculated and then default hash function on int is called to map it to hash value. Test the function before continue on to next step. In the sample code given, two different hash functions have been given as examples:
    struct StudentHasherByName
    {
      // overload operator () to implement a hash function on Student object
      //  can be called as follows: StudentHasher() (Student (11,"Mary"))
      //                                         ^^ refer to operator that is defined here of StudentHasher class 
      // demonstrated in main() 
      size_t operator()(const Student & obj) const
      {
        return hash<string>()(obj.getName());
               //Note: hash is a template class defined in C++ STL in which the operator() is overloaded 
               //Here we call operator() of hash class, passing obj.GetName() as parameter.
    	   // basically, this will use system provided hash function to hash a string into an unsigned int
      }
    };
    
    struct StudentHasherByID
    {
      static int a;  //here are configurable parameters 
      // overload operator (), so that StudentHash can be called as a function
      size_t operator()(const Student & obj) const
      {
        return hash<int>()(a*obj.getId());
               //Note: hash is a template class defined in C++ STL in which the operator() is overloaded 
               //Here we call operator() of hash class, passing a*obj.getID() as parameter.
    	   // basically, this will use system provided hash function to hash a int into an unsigned int
      }
    };
    
  3. Declare an unordered_set of Point objects (using the above hash function). To use the class to store user-defined class, such as our Point class, objects, we need to provide our own hash function (and sometimes comparison operator too). You can follow the example code:
    	//as in the demo code 
    	unordered_set<Student, StudentHasher> set1;
    	unordered_set<Student, StudentHasher, StudentComparator > set2;
    	//The only difference in the above two sets is how the hashtable decides
    	// whether any two given Student objects are same or not:
    	// 1. set1 uses the == operator in Student class
    	// 2. set2 uses the function object StudentComparator 
    	

    Note: Formally, unsorted_set is defined as following, meaning that it takes up to four types as its parameters!

    template < class Key,                        // unordered_set::key_type/value_type
               class Hash = hash,           // unordered_set::hasher
               class Pred = equal_to,       // unordered_set::key_equal
               class Alloc = allocator      // unordered_set::allocator_type
               > class unordered_set;
    		
    Here are a quick summary of what each template parameter for unsorted_set:
    • The first parameter specifies the type of elements to be stored in the hashtable
    • The second parameter is a unary function object type that specifies how to hash the element type to type size_t (which is unsigned int).
    • The third parameter is a binary (taking two parameters) predicate (i.e., return a bool type value), it returns true if the two elements are same, and false if they are different. (Note that unsorted_set (as a set) does not store duplicate elements).
  4. Randomly generate 30 Points (where the x, and y value are randomly generated, and are between 0 and 40), and inserts these Points objects into the set.
  5. Display the set of points using star characters in the terminal window.

Extra Credits

In the above setting, given that the set of Points are randomly generated, ANY HASH FUNCTIONS WILL WORK WELL, in the sense that the elements will be equally likely hashed to different slots in the hash table.

What if the set of Points are not randomly generated, but instead pre-defined and fixed? In this case, a given hash function could potentially hash many elements to the same slot (leading to long lookup time). In such case, we could use randomization to get good average performance. More specifically, we can

  1. Declare a function object, in which there are two static int members, a,b
  2. Overload operator() on the object to implement a hash function on Point object. The hash function should first computes a linear combination of x and y (using a,b as coefficients), and then call default hash function for int to map it to hash value, i.e.,
    		return hash<int>()(a*x+b*y); //calling default hash function on int to map ax+by
    		
Please refer to the StudentHasherByID in the sample code to see how we can define a hash function that can be configured with parameters. To test the above hash function,
	Declare a set of Points, specifying the above hash function as second parameter 
	Call rehash() on the set to set the size to equal to the number of points in your file
	  // this guarentees that there is no rehash (given that default max load factor is 1)
	
	for (int i=0;i<20;i++)
	      Randomly generate values for a, b in the hash function object 
	      Reads the set of points from the file, insert these points into a set of Points (which uses the above hash function).
	      Display the bucket count and size of each buckets, calculate the longest bucket length 

	 Display the average value, maximum, minimum of longest bucket length during the above 20 runs 
	

Submission

Please submit each source files, class specficiation file (header file) and implemenetation file (.cpp file), and driver program using the script given. For example,

submit4080 LAB4 point.cpp
submit4080 LAB4 point.h
submit4080 LAB4 main.cpp