|
Goal:
We will study:
- Practice using C++ STL hashtable
- Design basic hash function, and measure the collision and load factor.
- Practice and measure performance of universal hash function (i.e., randomized hash function).
(In-class) Preparation
- Copy the code over to your directory, compile and study the code to fully understand everything in the code, while at the same time,
cp ~zhang/public_html/cs4080/Demo/hashex1.cpp .
cp ~zhang/public_html/cs4080/Demo/hashex2.cpp .
cp ~zhang/public_html/cs4080/Demo/hashex3.cpp .
cp ~zhang/public_html/cs4080/Demo/hashex4.cpp .
- (Demo in hashex1.cpp) C++ STL unsorted_set is a class template that implements a set container (i.e., it stores a set of
unique elements), using a hash table data structure (in order to achieve near constant
search, insert, and delete time). It's called "unordered" as the element types are not necessarily ordered, i.e., there is no necessarily an ordering relation defined on the element type.
//from hashex1.cpp...
int main(){
//To declare a set of elements of a type T:
//unsorted_set<T> setOfTypeT;
//a set of strings
unordered_set<string> words;
string word;
//enter 5 words
//What if you enter duplicate words?
for (int i=0;i<5;i++)
{
cout <<"Enter a word:";
cin >> word;
words.insert (word);
}
//iterating through all words
for (string s: words)
cout <<s<<
...
}
- (demo in hashex2.cpp)C++ STL unsorted_map an associative container
which is a container class storing (key,value) pair (all pairs have unique keys), using a hash table data structure (in order to achieve near constant
searching, inserting time). It's called "unordered" as the key types are not necessarily ordered, i.e.,
there is no necessarily an ordering relation defined on the key type.
- (demo in hashex3.cpp) Define your own hash function and comparison function.
A short example of using customized hash function for string class.
- (demo in hashex4.cpp) Define your own class and hash function.
demo of specifying user-defined hash function and comparator.
(Homework lab4) Driving Application
You are asked to write an application with the following functionalities:
- Prompt the user to input the number of coordinate points to generate. Each coordinate point is a x- and y- coordinate value (ranging from 0 to 100).
- Randomly generate the given number of random points, and store the set of points in an unsorted_set.
- Display the set of points by displaying a star character in each point.
For example, if the set contains five coordinate points, (0,0),(1,1),(2,2),(1,3),(0,4), then the following will be shown:
* *
* *
*
Note that here we assume the left-top corner is the origin, (0,0). For each point in our set, we display a
star in its corresponding position on screen:
0123456789...
* * 0
* * 1
* 2
3
...
Here is the pseudocode for displaying a set of points:
for row=0 to MAX_ROW
for col=0 to MAX_COL
if (row,col) is in the set of points //A quick lookup helps!
cout <<"*"; //display a star
else
cout <<" "; //display a space
cout << endl; //end the line, move to next row
Detailed Requirement
- Implement a Point class, which should have at least two data members x, y, you can also consider adding another data member to describe what color or charcter to display the point with. You should implement the following member functions for the class:
- Constructor
- Output operator (<<)
- Comparison operator (==) (overload this operator as a friend function of the class. Don't overload it as member function!).
- A few getter and setter methods
You can get a sample implementation by running the following command:
cp ~zhang/public_html/cs4080/Demo/Point.cpp .
- Implement a hash function for Point class as function object, where the sum of x and y are calculated and then default hash function on int is called to map it to hash value. Test the function before continue on to next step.
In the sample code given, two different hash functions have been given as examples:
struct StudentHasherByName
{
// overload operator () to implement a hash function on Student object
// can be called as follows: StudentHasher() (Student (11,"Mary"))
// ^^ refer to operator that is defined here of StudentHasher class
// demonstrated in main()
size_t operator()(const Student & obj) const
{
return hash<string>()(obj.getName());
//Note: hash is a template class defined in C++ STL in which the operator() is overloaded
//Here we call operator() of hash class, passing obj.GetName() as parameter.
// basically, this will use system provided hash function to hash a string into an unsigned int
}
};
struct StudentHasherByID
{
static int a; //here are configurable parameters
// overload operator (), so that StudentHash can be called as a function
size_t operator()(const Student & obj) const
{
return hash<int>()(a*obj.getId());
//Note: hash is a template class defined in C++ STL in which the operator() is overloaded
//Here we call operator() of hash class, passing a*obj.getID() as parameter.
// basically, this will use system provided hash function to hash a int into an unsigned int
}
};
- Declare an unordered_set of Point objects (using the above hash function).
To use the class to store user-defined class, such as our Point class, objects, we need to
provide our own hash function (and sometimes comparison operator too). You can follow the example code:
//as in the demo code
unordered_set<Student, StudentHasher> set1;
unordered_set<Student, StudentHasher, StudentComparator > set2;
//The only difference in the above two sets is how the hashtable decides
// whether any two given Student objects are same or not:
// 1. set1 uses the == operator in Student class
// 2. set2 uses the function object StudentComparator
Note: Formally, unsorted_set is defined as following, meaning that it takes up to four types as its parameters!
template < class Key, // unordered_set::key_type/value_type
class Hash = hash, // unordered_set::hasher
class Pred = equal_to, // unordered_set::key_equal
class Alloc = allocator // unordered_set::allocator_type
> class unordered_set;
Here are a quick summary of what each template parameter for unsorted_set:
- The first parameter specifies the type of elements to be stored in the hashtable
- The second parameter is a unary function object type that
specifies how to hash the element type to type size_t (which is unsigned int).
- The third parameter is a binary (taking two parameters) predicate (i.e., return a bool type
value), it returns true if the two elements are same, and false if they are different. (Note that unsorted_set (as a set) does not store duplicate elements).
- Randomly generate 30 Points (where the x, and y value are randomly generated, and
are between 0 and 40), and inserts these Points objects into the set.
- Display the set of points using star characters in the terminal window.
Extra Credits
In the above setting, given that the set of Points are randomly generated, ANY HASH FUNCTIONS WILL WORK WELL, in the sense that the
elements will be equally likely hashed to different slots in the hash table.
What if the set of Points are not randomly generated, but instead pre-defined and fixed? In this case, a given hash function could
potentially hash many elements to the same slot (leading to long lookup time). In such case, we could use randomization to get good
average performance. More specifically, we can
- Declare a function object, in which there are two static int members, a,b
- Overload operator() on the object to implement a hash function on Point object.
The hash function should first computes a linear combination of x and y (using a,b as
coefficients), and then call default hash function for int to map it to hash value, i.e.,
return hash<int>()(a*x+b*y); //calling default hash function on int to map ax+by
Please refer to the StudentHasherByID in the sample code to see how we can define a hash function that can be configured with parameters.
To test the above hash function,
Declare a set of Points, specifying the above hash function as second parameter
Call rehash() on the set to set the size to equal to the number of points in your file
// this guarentees that there is no rehash (given that default max load factor is 1)
for (int i=0;i<20;i++)
Randomly generate values for a, b in the hash function object
Reads the set of points from the file, insert these points into a set of Points (which uses the above hash function).
Display the bucket count and size of each buckets, calculate the longest bucket length
Display the average value, maximum, minimum of longest bucket length during the above 20 runs
Submission
Please submit each source files, class specficiation file (header file) and implemenetation file (.cpp file), and driver program using the script given. For example,
submit4080 LAB4 point.cpp
submit4080 LAB4 point.h
submit4080 LAB4 main.cpp
|