CS324e - Assignment 0

Placed Online: Thursday, August 30
Due: Thursday, September 13, no later than 11 pm
Points: 25, 2.5% of final grade.
Starter File: A0.java
Python Solution: Benford.py
Data File: Texas county populations from 2010 census

Individual Assignment. This is an individual assignment. You must complete the assignment on your own. You must complete the assignment on your own. You may use the examples shown in class as a guide, but you may not get help from anyone besides the instructor and TA. Copying code or getting code from anyone besides the the instructor or TA is cheating and will result in an F in the course. You can discuss approaches to the problem with others as long as you don't write code or look at code. You may get help on syntax errors from others. You can search the web for info on how to perform a particular task in Java, such as finding the index of a character in a String or creating a substring.

The purpose of this assignment is to learn / review programming in Java. You can write the program from scratch or translate a Python program into Java.

Help on various common Java data types and operations:


Background: Benford's law, also know as the first-digit law, states: given a list of numbers from a real world data set, the distribution of leading digits is often not uniform and instead skewed towards 1. Consider for example the populations of the counties in Texas from the 2010 census. Most people would guess there are roughly equal numbers of populations that start with 1, 2, 3, 4, 5, 6, 7, 8, and 9. (We don't consider 0 a leading digit for this assignment.) Given there are 254 Texas counties you might expect there to be about 28 counties (254 / 9 = 28) counties that have a leading digit of 1, 2, 3, and so forth. Travis County (home to UT) has a population of 1,024,266, a leading digit of 1.

Our intuition is often wrong. The breakdown of leading digits for populations of Texas counties according to the 2010 census are:

Leading Digit Number of Counties Percentage
1 80 31.5
2 38 15.0
3 41 16.1
4 26 10.2
5 15 5.9
6 15 5.9
7 17 6.7
8 13 5.1
9 9 3.5

Not what you expected! Populations with a leading digit of 1 occur almost 1/3rd of the time! Not 1/9th as most people would guess. Benford's law does not hold for all data sets (for example height of humans in inches), but does hold for a surprisingly large number of real world measurements.


Data Files:

Write a program that tests Benford's law for two files. Use the file that contains the Texas counties populations from the 2010 census.

The file format is one entry per line. The format of each line is:

[LABEL]\t[NUMBER]\n

[Label] is 1 or more characters. A label may contain any characters other than a tab or new line, including spaces. The label is followed by a single tab. [NUMBER] is an integer great than 0. Numbers consist only of digits 0 through 9, but they may not start with 0 and they must be greater than 0. There are no commas or any other characters in number other that the digits 0 through 9. Immediately after [NUMBER] is a newline character.


Assignment Description: Create a Java program with the following methods:

public static ArrayList<String> getData(String fileName): Creates and returns a list of strings with the entries from the data file with name fileName. The elements of the list are in the same order as they appear in the data file.

public static int[] getLeadDigitCounts(ArrayList<String> data): data is a list with the entries from a file. Each element in the list is a string with the label and the number separated by a single tab. There may be a newline character at the end of the string. This method returns an array of integers of length 9. The first element of the list (index 0) stores the number of elements in the list data that have a number with a leading digit of 1, the second element of the list (index 1) stores the number of elements in data that have a number with a leading digit of 2, and so forth.

public static void showResults(int[] counts): counts is an array of ints with length 9. It represents the count of leading digits. This method displays the total number of data points and for each leading digit the number of data points and the percentage of total data points with that leading digit rounded to one decimal place. Your output must match the output shown below.

public static void showLeadingDigits(char digit, ArrayList<String> data): digit is a char that contains a digit '1' to '9'. data is a list of strings with the entries from a file. The method prints all of the entries in data that have a leading digit equal to the parameter named digit. They are printed in the order they appear in data. Your output must match the output shown below.

public static void processFile(String name): name is a string that is the name of a file that matches the format of our data files. This method calls getData, getLeadDigitCounts, showResults for the given file. This method then prompts the user for a digit and calls showLeadingDigits for the given input. The program does not error check the input.

Here is a sample run of the program using theTexasCountyPop2010.txt file.

number of data points: 254

digit number percentage
1     80     31.5
2     38     15.0
3     41     16.1
4     26     10.2
5     15     5.9
6     15     5.9
7     17     6.7
8     13     5.1
9     9      3.5

Enter leading digit:
9

Showing data with a leading 9
Archer County 9054
Bowie County 92565
Brewster County 9232
Dimmit County 9996
Jack County 9044
Mitchell County 9403
Roberts County 929
Stephens County 9630
Terrell County 984


When you complete the assignment turn in your A0.java file which will include all your source code, using the turnin program. This page contains instructions for using turnin.

Be sure of the following: