Section Problems Number 8 - Data Structures and Array Based Lists

Be prepared to discuss these problems in discussion section.
Thanks to Owen Astrachan of Duke University for ideas and materials.

Part 1.

This part refers to these files.

 File Name Description theManWhoWouldBeKing.txt An e text of The Man Who Would Be King by Rudyard Kipling from Project Gutenberg. File size 91 kilobytes. jungleBook.txt An e text of The jungle Book by Rudyard Kipling from Project Gutenberg. File size 291 kilobytes. UniqueCounter.java Java interface for counting the number of unique Strings in an array of Strings. SlowUniqueCounter.java Implementation of UniqueCounter. Uses nested loop to examine Strings. SortingUniqueCounter.java Implementation of UniqueCounter. Uses the Arrays.sort method to sort the array of Strings and then loop through once to count the number of unique Strings. SetUniqueCounter.java Implementation of UniqueCounter. Uses parts of the Java collection framework to create a list with all the elements of the array of Strings then adds the elements of that list to a Set. UniqueTester Shows results of the three implementations of UniqueCounter.

0. It is not clear what is meant by a word in this series of exercises. Examine one of the original files and the elements in the array of Strings produced from the file. What do you think is meant by a word?

1. The code for determining unique words in `SlowUniqueCounter.java` has a bug. The value returned is not, in general, correct (see output). Describe what the bug is and how to fix it. Create a data file for which the current (incorrect) version will return a correct result. Create a very simple array of Strings to illustrate the problem with this solution.

2. The code in `SortingUniqueCounter` has a bug. Describe what the problem is. How would you fix it? Create a data file for which the current (incorrect) version will return a correct result. Create a very simple array of Strings to illustrate the problem with this solution.

3. If the following line from `SortingUniqueCounter`

```            if (! list[k].equals(last)){
```

is replaced by this line:

```            if (list[k] != last){
```

the program compiles and runs, but indicates that the Melville has 16,587 different words rather than 4828 different words. Explain this output (note that there are 16,588  words in the Man Who Would Be King file).

4. Explain why the `SortingUniqueCounter` and ``` SetUniqueCounter``` are faster than the `SlowUniqueCounter` when evaluating an array of Strings.

5. As shown in the sample output run below a run based on Kiplings's The Jungle Book was stopped. Based on the statistics shown, and the fact that there are 53,860 total words, but 9613 different words in the text, develop estimates for how long the program will take for each of the three classes. Provide reasons for each of your estimates.

Sample output from UniqueTester

```read # words = 16588 reading theManWhoWouldBeKing.txt # words = 16588 4658 unique words in 0.348161288 seconds 4628 unique words in 0.016153729 seconds 4629 unique words in 0.008291556 seconds```

```read # words = 27463 reading romeoAndJuliet.txt # words = 27463 7322 unique words in 0.874056899 seconds 7281 unique words in 0.025881223 seconds 7282 unique words in 0.01091256 seconds```

```read # words = 53860 reading junglebook.txt # words = 53860```

Part 2

This part refers to these files

 File Name Description IList.java A very simple list interface. limited functionality. SimpleList.java An array based implementation of IList SimpleListFixedIncrease.java A sub class of SimpleList but with a new resize method. ListTester.java A test of the two implementations of IList.

1. Here is a run of ListTester. Perform your own test. Are your results similar?

SimpleList:
```Time to add 1000 elements 0.002551721 seconds. Time to add 2000 elements 0.002726324 seconds. Time to add 4000 elements 0.001234793 seconds. Time to add 8000 elements 0.004084038 seconds. Time to add 16000 elements 6.52038E-4 seconds. Time to add 32000 elements 0.003054299 seconds. Time to add 64000 elements 0.021564472 seconds. Time to add 128000 elements 0.035441376 seconds. Time to add 256000 elements 0.129189324 seconds. Time to add 512000 elements 0.220933666 seconds. Time to add 1024000 elements 0.413992002 seconds. Time to add 2048000 elements 0.71907966 seconds.```

SimpleListFixedIncrease:
```Time to add 1000 elements 1.82984E-4 seconds. Time to add 2000 elements 5.14031E-4 seconds. Time to add 4000 elements 1.58959E-4 seconds. Time to add 8000 elements 3.31048E-4 seconds. Time to add 16000 elements 0.001957511 seconds. Time to add 32000 elements 0.008220039 seconds. Time to add 64000 elements 0.106695404 seconds. Time to add 128000 elements 0.459423474 seconds. Time to add 256000 elements 3.5617971 seconds. Time to add 512000 elements 14.376469077 seconds. Time to add 1024000 elements 54.646370419 seconds. Time to add 2048000 elements 140.489877065 seconds.```

2. Explain the results of the experiment. Why is SimpleList so much faster?

3. Is the remove method in SimpleList correct? If not, fix it.