Section Problems Number 8 - Data Structures and Array Based Lists

Be prepared to discuss these problems in discussion section.
Thanks to Owen Astrachan of Duke University for ideas and materials.

Part 1.

This part refers to these files.

File Name Description
theManWhoWouldBeKing.txt An e text of The Man Who Would Be King by Rudyard Kipling from Project Gutenberg. File size 91 kilobytes.
jungleBook.txt An e text of The jungle Book by Rudyard Kipling from Project Gutenberg. File size 291 kilobytes.
UniqueCounter.java Java interface for counting the number of unique Strings in an array of Strings.
SlowUniqueCounter.java Implementation of UniqueCounter. Uses nested loop to examine Strings.
SortingUniqueCounter.java Implementation of UniqueCounter. Uses the Arrays.sort method to sort the array of Strings and then loop through once to count the number of unique Strings.
SetUniqueCounter.java Implementation of UniqueCounter. Uses parts of the Java collection framework to create a list with all the elements of the array of Strings then adds the elements of that list to a Set.
UniqueTester Shows results of the three implementations of UniqueCounter.

0. It is not clear what is meant by a word in this series of exercises. Examine one of the original files and the elements in the array of Strings produced from the file. What do you think is meant by a word?

 

1. The code for determining unique words in SlowUniqueCounter.java has a bug. The value returned is not, in general, correct (see output). Describe what the bug is and how to fix it. Create a data file for which the current (incorrect) version will return a correct result. Create a very simple array of Strings to illustrate the problem with this solution.

 

2. The code in SortingUniqueCounter has a bug. Describe what the problem is. How would you fix it? Create a data file for which the current (incorrect) version will return a correct result. Create a very simple array of Strings to illustrate the problem with this solution.

 

3. If the following line from SortingUniqueCounter

            if (! list[k].equals(last)){

is replaced by this line:

            if (list[k] != last){

the program compiles and runs, but indicates that the Melville has 16,587 different words rather than 4828 different words. Explain this output (note that there are 16,588  words in the Man Who Would Be King file).

 

4. Explain why the SortingUniqueCounter and SetUniqueCounter are faster than the SlowUniqueCounter when evaluating an array of Strings.

 

5. As shown in the sample output run below a run based on Kiplings's The Jungle Book was stopped. Based on the statistics shown, and the fact that there are 53,860 total words, but 9613 different words in the text, develop estimates for how long the program will take for each of the three classes. Provide reasons for each of your estimates.

 

Sample output from UniqueTester

read # words = 16588
reading theManWhoWouldBeKing.txt # words = 16588
4658 unique words in 0.348161288 seconds
4628 unique words in 0.016153729 seconds
4629 unique words in 0.008291556 seconds

read # words = 27463
reading romeoAndJuliet.txt # words = 27463
7322 unique words in 0.874056899 seconds
7281 unique words in 0.025881223 seconds
7282 unique words in 0.01091256 seconds

read # words = 53860
reading junglebook.txt # words = 53860

Part 2

This part refers to these files

File Name Description
IList.java A very simple list interface. limited functionality.
SimpleList.java An array based implementation of IList
SimpleListFixedIncrease.java A sub class of SimpleList but with a new resize method.
ListTester.java A test of the two implementations of IList.

1. Here is a run of ListTester. Perform your own test. Are your results similar?

SimpleList:
Time to add 1000 elements 0.002551721 seconds.
Time to add 2000 elements 0.002726324 seconds.
Time to add 4000 elements 0.001234793 seconds.
Time to add 8000 elements 0.004084038 seconds.
Time to add 16000 elements 6.52038E-4 seconds.
Time to add 32000 elements 0.003054299 seconds.
Time to add 64000 elements 0.021564472 seconds.
Time to add 128000 elements 0.035441376 seconds.
Time to add 256000 elements 0.129189324 seconds.
Time to add 512000 elements 0.220933666 seconds.
Time to add 1024000 elements 0.413992002 seconds.
Time to add 2048000 elements 0.71907966 seconds.

SimpleListFixedIncrease:
Time to add 1000 elements 1.82984E-4 seconds.
Time to add 2000 elements 5.14031E-4 seconds.
Time to add 4000 elements 1.58959E-4 seconds.
Time to add 8000 elements 3.31048E-4 seconds.
Time to add 16000 elements 0.001957511 seconds.
Time to add 32000 elements 0.008220039 seconds.
Time to add 64000 elements 0.106695404 seconds.
Time to add 128000 elements 0.459423474 seconds.
Time to add 256000 elements 3.5617971 seconds.
Time to add 512000 elements 14.376469077 seconds.
Time to add 1024000 elements 54.646370419 seconds.
Time to add 2048000 elements 140.489877065 seconds.

 

2. Explain the results of the experiment. Why is SimpleList so much faster?

 

3. Is the remove method in SimpleList correct? If not, fix it.