CS 305J Assignment 8, File Processing and Arrays

Programming Assignment 8 Individual Assignment.

Placed online: October 20
20 points, ~2% of total grade
Due: no later than 11 pm, Thursday, October 30
General Assignment Requirements

Description The purposes of this assignment are:
  1. To practice creating a structured program.
  2. To practice processing data from files.
  3. To practice using arrays.
  4. To practice finding and working with real data.

For this assignment you are limited to the language features in chapters 1 through  7 of the textbook.

In this program you will use many of the tools we have learned to process real data and look for interesting results. More specifically you will write a program that determines the distribution of initial digits in a set of data. The data is stored in a file. The final output will be a small table that lists how many of the elements of the data set were equal to 0, how many started with a 1, how many started with a 2, how many started with a 3, and so on up to 9. You are responsible for finding a real set of data and creating a file with that data in the correct format. The format of the data file is explained later in this handout.

The program should be broken up into the following methods. You may not use all of these methods in the final version, but they are excellent programming practice.

Complete a program named CountDigits with the following methods and features. (You do not have to do these in order. You may want to work on step 8 before the other steps are done.)

1. Write a method public static int countDigits(int num) that calculate the number of digits in an integer. This can be accomplished in many ways including repeated division by 10. Another approach is to convert the int to a String and use the length method. A simple way of converting an int to a String is by concatenating the int variable with an empty String, num + "" . Do not count the negative sign as a digit. Here are some example of expected results for the countDigits method. Note "->" means "should return". 

  • countDigits(12) -> 2
  • countDigits(0) -> 1
  • countDigits(571) -> 3
  • countDigits(-120) -> 3
  • countDigits(2000000000) -> 10

2. Write a method public static int getDigit(int num, int pos) that finds the digit at position pos in num. In other words the digit pos spots from the right. The right most digit, the ones place, is at position 0, the digit in the tens place is at position 1 and so forth. This method must handle positions that are beyond the start of the number and return 0 in those cases. Again, the method should handle negative numbers and ignore the negative sign. As usual there are many ways of solving this problem. You can either use repeated division and then the modulus operator, %, to "pick off" the digit that you want or convert the int to a String and use the charAt or substring methods. How do you convert from a char to an int? There are many ways, but one simple way is via casting. For example if c is a char variable equal to a digit between '0' and '9' we can convert it to an int via the following expressions: (int)(c - '0').Here are some example of expected results for the getDigit method.

  • getDigit(12, 0) -> 2
  • getDigit(12, 1) -> 1
  • getDigit(12, 2) -> 0
  • getDigit(12, 3) -> 0
  • getDigit(0, 0) -> 0
  • getDigit(43726, 1) -> 2
  • getDigit(43726, countDigits(43726) - 1 ) -> 4
  • getDigit(43726, 5) -> 0
  • getDigit(-12, 0) -> 2
  • getDigit(-12, 1) -> 1
  • getDigit(-12, 2) -> 0

3. Write a method public static int getLeadingDigit(int num). This method returns the leading digit of num. In other words the left most digit or most significant digit. Try to use the countDigits and getDigit methods when writing this method instead of repeating work already done. Here are some example of expected results for the getLeadingDigit method.

  • getLeadingDigit( 12345 ) -> 1
  • getLeadingDigit( 1999999999 ) -> 1
  • getLeadingDigit( -12345 ) -> 1

4. Write a method public static int[] tallyLeadingDigits(int[] data). This method takes in an array of ints and returns an array of ints that is a tally of how many of the elements of data were equal to 0, how many started with a 1, how many started with a 2, and so forth up to how many started with a 9. The returned array will have a length of 10. Here is an example of expected results for the tallyLeadingDigits method.

  • tallyLeadingDigits( new int[]{12, 1, 0, 131, 25, 2681, 99} ) -> {1, 3, 2, 0, 0, 0, 0, 0, 0, 1}

5. Write a method public static int[] createDataArray( File f ) throws FileNotFoundException. This method takes in a File that contains the raw data and returns an array of ints that contains the measurements. The format of the data file will be as follows

Number of data points, N
label 1
data 1
label 2
data 2
...
...
label N
data N

The first entry in the file indicates the number of data sets in the file. This allows you to create an array of the proper length. Each measurement in the data set will consist of two lines: a human readable label that your program will ignore and the measurement for that label. Here is a small sample data set, the population of the first nine cities in Texas based on alphabetical order

Place Name Population
Abbott 300
Abernathy 2,839
Abilene 115,930
Abram-Perezville 5,444
Ackerly 245
Addison 14,166
Adrian 159
Agua Dulce 737
Airport Road Addition 132

The data file for this data set would be:

9
Abbott
300  
Abernathy
2839
Abilene
115930 
Abram-Perezville
5444 
Ackerly
245
Addison
14166 
Adrian
159
Agua Dulce
737
Airport Road Addition
132

Given the above file the createDataArray method would return an array of length 9 with the following elements: [300, 2839, 115930, 5444, 245, 14166, 159, 737, 132]

6. Write a method public static void displayTable(int[] results). This method prints out the results of how many elements in a data set are equal to 0, how many start with 1, how many start with 2, and so forth up to 9. Using the data set for Texas towns shown above the tallyLeadingDigits method would return the following array: {0, 4, 2, 1, 0, 1, 0, 1, 0, 0}. If this array were sent to the displayTable method the output would be:

0s: 0
1s: 4
2s: 2
3s: 1
4s: 0
5s: 1
6s: 0
7s: 1
8s: 0
9s: 0
Total data points: 9

7. Complete your main method. This method will call the provided method getFile() which allows a program user to select the data file from a window as demonstrated in class. Process the chosen data file and print out the results using the methods above. Write your program assuming the data file has the correct format as described above. I am providing two data files, a small one with the data on Texas city population as shown above and a much larger one that has the gross box office receipts for various movies as reported by the Internet Movie Data Base.

8. Find a data source on the web that no one else has used (See the next part.) and transform it into a format suitable for input to the CountDigits program. The data must all be separate measurements of a single type of phenomena. For example: measurements of university and college enrollments, populations of cities in various states or countries, measurement of the number of articles submitted or updated each day on Wikipedia, the length of rivers in Canada, or the number of lifetime hits of 200 major league baseball players. Do not use lists of data for the top 100 measurements of a given phenomena. Use a sampling of data, not just the largest measurements. Additionally you must collect at least 150 measurements for your chosen phenomena. All data points must be within the limits of a Java int. That is no data point can be greater than 2,147,483,647. (231 - 1)

9. Post all of the following items to the class listserv with the title "Assignment 8 CountDigits data": the URL for your data source, a description of the data source, and an attached text file with the labeled data as described above and as shown in the example data I have provided.

10. In your assignment complete the comment at the top with the required information. What do you think the distribution of leading digits in measured phenomena will be? Was there a difference in your actual results?

This assignment is based on an idea and assignment proposed by Steve Wolfram.

Turn in your program named CountDigits.java and the data file your create named countDigitData.txt using the turnin program.

Files
File Responsibility
texasCityPop.txt (A small sample data file.) Provided by me.
movieGross.txt (A large sample data file.) Provided by me.
countDigitData.txt (Your data file.) Provided by you.
CountDigits.java (A shell file with a main method, the header information, some tests, and the method for obtaining a File based on a user choice.) Me and you, mostly you.
Checklist Did you remember to:
  • review the general assignment requirements?
  • worked on the assignment individually?
  • fill in the header in your file CountDigits.java?
  • complete all of the specified methods of the CountDigits program?
  • complete the test() method of CountDigits to show the tests on your methods?
  • ensure you wrote the program using good programming style?
  • ensure your program does not suffer a compile error or runtime error?
  • collect your own data, format it correctly, and post the required information to the class listserv?
  • turn in your Java source code in a file named CountDigits.java and your data file named countDigitData.txt to the proper account in the Microlab via the turnin program before 11 pm, Thursday, Octoeber 30?
Expected Output Here is the expected output for the two sample files. Note, do not write all your code and then try to test the program against these results! You need to test each method as you complete it.

Expected output for texasCityPop.txt

0s: 0
1s: 4
2s: 2
3s: 1
4s: 0
5s: 1
6s: 0
7s: 1
8s: 0
9s: 0
Total data points: 9

Expected output for movieGross.txt

0s: 0
1s: 393
2s: 222
3s: 142
4s: 114
5s: 88
6s: 76
7s: 76
8s: 72
9s: 67
Total data points: 1250

Back to the CS 305j homepage.