CS313E Assignment 9: Jumble Step 3 (10 points)

Due: by Wednesday, November 28, 2012 by 11:59pm. Notice that I will be out of town from 11/27 to 11/29, so I'd suggest doing this one early, if you think you might need help.

Your program listing should have the following information.

#  Files: Wordlist.py, HashSolver.py, README
#
#  Description:
#
#  Student's Name:
#
#  Student's UT EID:
#
#  Course Name: CS 313E 
#
#  Date Created:
#
#  Date Last Modified:

The Assignment

You can do this assignment with one other student. Only submit one version, but be sure to indicate on your submission what students participated.

This is the third in a series of assignments aimed at solving the Jumbles that you see in the paper or online: Jumble website. The idea is this: given a series of scrambled words, unscramble them.

In the earlier versions, you needed to try all permutations of string to see if any were in the Wordlist. You can avoid permutations entirely by realizing the following: a solution is any word in the Wordlist that has exactly the same letters as the input (including duplications).

That means that you could solve the problem by examining each word in the Wordlist to see if it contains exactly the same letters. That would be a relatively slow linear search, but you'd only have to do it once for each jumble.

However, you can do even better with a bit of pre-processing of the Wordlist. Assuming you have a hash function that is invariant under permutation (i.e., all permutations of a string hash to the same value), you can store your Wordlist in a hash table. Then your input string should hash to the same bucket as the solution word. You just need to find it in there. To do that, you search the list of words in the bucket for one that has the same letters. If you find one, you're done. If not, there is no solution to the Jumble.

Extend Your Wordlist ADT

To your Wordlist class file from Assignment 8, you will add another class HashedWordList, which extends your current Wordlist class and inherits from it. Your Wordlist will be implemented as a hash table using chaining. I'd suggest using as your hash function computeHash3 from slideset 11.

The HashedWordList class should override the following methods: __init__, addWord, If you didn't implement addWordsFromFile by repeatedly calling addWord, you may have to override addWordsFromFile also. You'll also be adding two new methods: findPerm, which replaces findWord, and loadFactor, which computes a pair containing the number of empty buckets and the average length of a (non-empty) bucket in your hash table. Unlike findWord, findPerm is looking for a string in the hash bucket with the same letters, not an exact match.

The interface for HashedWordList should be exactly the same as the interface for WordList, except that you're adding findPerm and not using findWord. Otherwise, the user of the class should not see any difference.

Write a Top-Level Driver

You will have to replace your top level driver program, because you are no longer checking permutations. Since you're using a different main program, call your driver file HashSolver.py. The user interface to the program should be the same. That is, there is no reason for the user to know that you've replaced the implementation, except that along with the other statistics printed after creating your Wordlist, you will print the load factor (average length of "buckets" in your table.

After created a hash table version of your Wordlist, accept strings from the user as before. When a user inputs a string S, you'll check to see if a permutation of S is in the Wordlist using the method findPerm. findPerm should perform the following steps:

  1. Hash the jumble input string S to h(S)
  2. Search the bucket at location HT( h(s) ) for any word with exactly the same letters as S;
  3. Return the word, if found, and the number of comparisons; if not found, return False, and the number of comparisons.
Unlike findWord, you'll have to return the word and not just a boolean value; otherwise, you won't be able to access the word to print it out.

To check that two strings have exactly the same letters (including duplicates), you can test the following:

   sorted( str1 ) == sorted( str2 )
The function sorted returns a list of characters in sorted order. There may be quicker ways to do this, but you shouldn't have many comparisons anyway.

As before, compute and print the statistics of the search (how many comparisons you made and how long it took). For this case, comparisons means the number of words you check in the bucket to which you hashed. Loop until the user enters "exit." User input should not be case sensitive. The output should be very nearly identical to that from Assignment 6 and Assignment 8.

Compare the Three Implementations

Take the README file you produced for Assignment 8 and extend it to add the comparison of this latest method. That is, run your new program with the same data as the previous two runs, and indicate in your new README file how the new method compares to the previous results. Explain the differences you see.

Sample Output

felix:~/cs313e/python/newjumble> python HashSolver.py
Using hash table wordlist.
Creating wordlist

The Wordlist contains 22633 words.
There are 1668 empty buckets
Non-empty buckets have an average length of 2.71411440221
Building the Wordlist took 0.656 seconds

Enter a scrambled word (or EXIT):  yodoz
Found word: doozy
Solving this jumble took 0.00010 seconds
Made  1  comparisons.

Enter a scrambled word (or EXIT):  ulpem
Found word: plume
Solving this jumble took 0.00012 seconds
Made  4  comparisons.

Enter a scrambled word (or EXIT):  tribte
Found word: bitter
Solving this jumble took 0.00010 seconds
Made  2  comparisons.

Enter a scrambled word (or EXIT):  sluvia
Found word: visual
Solving this jumble took 0.00011 seconds
Made  1  comparisons.

Enter a scrambled word (or EXIT):  exit
Thanks for playing!  Goodbye.
felix:~/cs313e/python/newjumble>