CS2852 Exercise: HashSet-based Dictionary

In this in-class exercise, you will modify your program that you developed in Lab 4 to use ONLY the HashSet class in place of the ArrayList, LinkedList, and SortedArrayList collections you originally used in that lab to store both the dictionary and the document.

HashSet is very similar to HashMap, in that it uses a hashtable internally to store key/value pairs. However, unlike HashMap, which requires separate Key/Value pairs, HashSet requires only a single Value which also doubles as the Key.

With HashMap, a Value does not necessarily have to be related to its Key. This way, duplicate Values can be stored provided they have different Keys (although in practice this is not done). Remember, a Map can never contain duplicate keys.

With a HashSet, the Value's Key is the Value itself - this means that duplicate Values cannot be stored in a HashSet, since duplicate Keys are prohibited.

HashSet is somewhat easier to use than HashMap for this reason. In JCF, HashSet is a Collection (whereas HashMap is a Map). Thus, HashSet has all the familiar methods of the other Collection classes (e.g. ArrayList) with which you are familiar. It is easy to substitute a HashSet where another Collection was previously used - provided you don't need to store duplicate values in the collection. A notable difference with HashMap is that the add() method, which always returns true for ArrayList and LinkedList, returns false if you try to add a duplicate value.

In this exercise, you need to modify your Dictionary class to count the number of words actually added to the dictionary as well as the number of duplicate words NOT added.

Internally, HashSet uses a hashtable whose initial table size and load factor can be specified. In this exercise, you will modify your previous Lab 4 application to use HashSets whose initial size is 16. You must modify your program to run with 3 different load factors for the HashSet: 0.05, 0.75, and 0.999.

As in Lab 4, run your program for each test and have it report the time taken to load the dictionary and perform the spell check of the document against the dictionary.

In the end, your program's output should look something like this:

|  Dictionary |       Words |  Duplicates |  Document |      Collection |                 Time to add |         Time to spell-check |
|   words.txt |      nnnnnn |      xxxxxx | kjv10.txt |     HashSet .75 |   n mins  n.nnnnnnnnnn secs |   n mins  n.nnnnnnnnnn secs |
|   kjv10.txt |      mmmmmm |      yyyyyy | words.txt |     HashSet .75 |   m mins  m.mmmmmmmmmm secs |   m mins  m.mmmmmmmmmm secs |
|   words.txt |      nnnnnn |      xxxxxx | kjv10.txt |     HashSet .999|   n mins  n.nnnnnnnnnn secs |   n mins  n.nnnnnnnnnn secs |
|   kjv10.txt |      mmmmmm |      yyyyyy | words.txt |     HashSet .999|   m mins  m.mmmmmmmmmm secs |   m mins  m.mmmmmmmmmm secs |
|   words.txt |      nnnnnn |      xxxxxx | kjv10.txt |     HashSet .05 |   n mins  n.nnnnnnnnnn secs |   n mins  n.nnnnnnnnnn secs |
|   kjv10.txt |      mmmmmm |      yyyyyy | words.txt |     HashSet .05 |   m mins  m.mmmmmmmmmm secs |   m mins  m.mmmmmmmmmm secs |

When you are finished, submit your files to Blackboard (HashSet Dictionary Exercise).