Placed Online: Thursday, August 30
Due: Thursday, September 13, no later than 11 pm
Points: 25, 2.5% of final grade.
Starter File: A0.java
Python Solution:
Benford.py
Data File:
Texas
county populations from 2010 census
Individual Assignment. This is an individual assignment. You must complete the assignment on your own. You must complete the assignment on your own. You may use the examples shown in class as a guide, but you may not get help from anyone besides the instructor and TA. Copying code or getting code from anyone besides the the instructor or TA is cheating and will result in an F in the course. You can discuss approaches to the problem with others as long as you don't write code or look at code. You may get help on syntax errors from others. You can search the web for info on how to perform a particular task in Java, such as finding the index of a character in a String or creating a substring.
The purpose of this assignment is to learn / review programming in Java. You can write the program from scratch or translate a Python program into Java.
Help on various common Java data types and operations:
You will manipulating Java Strings a lot in this program. This page has info on using Strings.
You will also use Java ArrayLists a lot. Here is a page with common operations on ArrayLists.
And finally info on using Java arrays is at this page.
of course you can also type the operation you are trying to complete into your favorite search engine to see various code examples and tutorials
Background: Benford's law, also know as the first-digit law, states: given a list of numbers from a real world data set, the distribution of leading digits is often not uniform and instead skewed towards 1. Consider for example the populations of the counties in Texas from the 2010 census. Most people would guess there are roughly equal numbers of populations that start with 1, 2, 3, 4, 5, 6, 7, 8, and 9. (We don't consider 0 a leading digit for this assignment.) Given there are 254 Texas counties you might expect there to be about 28 counties (254 / 9 = 28) counties that have a leading digit of 1, 2, 3, and so forth. Travis County (home to UT) has a population of 1,024,266, a leading digit of 1.
Our intuition is often wrong. The breakdown of leading digits for populations of Texas counties according to the 2010 census are:
Leading Digit | Number of Counties | Percentage |
1 | 80 | 31.5 |
2 | 38 | 15.0 |
3 | 41 | 16.1 |
4 | 26 | 10.2 |
5 | 15 | 5.9 |
6 | 15 | 5.9 |
7 | 17 | 6.7 |
8 | 13 | 5.1 |
9 | 9 | 3.5 |
Not what you expected! Populations with a leading digit of 1 occur almost 1/3rd of the time! Not 1/9th as most people would guess. Benford's law does not hold for all data sets (for example height of humans in inches), but does hold for a surprisingly large number of real world measurements.
Data Files:
Write a program that tests Benford's law for two files. Use the file that contains the Texas counties populations from the 2010 census.
The file format is one entry per line. The format of each line is:
[LABEL]\t[NUMBER]\n
[Label] is 1 or more characters. A label may contain any characters other than a tab or new line, including spaces. The label is followed by a single tab. [NUMBER] is an integer great than 0. Numbers consist only of digits 0 through 9, but they may not start with 0 and they must be greater than 0. There are no commas or any other characters in number other that the digits 0 through 9. Immediately after [NUMBER] is a newline character.
Assignment Description: Create a Java program with the following methods:
public static ArrayList<String> getData(String fileName)
:
Creates and returns a list of strings with the entries from the data file with
name fileName
. The elements of the list are in the same order as they appear in
the data file.
public static int[] getLeadDigitCounts(ArrayList<String> data)
: data is a list with the
entries from a file. Each element in the list is a string with the label and the
number separated by a single tab. There may be a newline character at the end of
the string. This method returns an array of integers of
length 9. The first element of the list (index 0) stores the number of elements
in the list data that have a number with a leading digit of 1, the second
element of the list (index 1) stores the number of elements in data that have a
number with a leading digit of 2, and so forth.
public static void showResults(int[] counts)
: counts is an array
of ints with length 9. It represents the count of leading digits. This method displays
the total number of data points and for each leading digit the number of data
points and the percentage of total data points with that leading digit rounded
to one decimal place. Your output must match the output shown below.
public static void showLeadingDigits(char digit, ArrayList<String> data)
:
digit is a char that contains a digit '1' to '9'. data
is a list of strings with the entries from a file. The method prints all of the entries in
data that have a leading digit equal to the parameter named digit. They are
printed in the order they appear in data. Your output must match the output
shown below.
public static void processFile(String name):
name is a string that is the
name of a file that matches the format of our data files. This method calls
getData
, getLeadDigitCounts
, showResults
for the given file. This method then
prompts the user for a digit and calls showLeadingDigits
for the given input.
The program does not error check the input.
Here is a sample run of the program using theTexasCountyPop2010.txt file.
number of data points: 254
digit number percentage
1 80 31.5
2 38 15.0
3 41 16.1
4 26 10.2
5 15 5.9
6 15 5.9
7 17 6.7
8 13 5.1
9 9 3.5
Enter leading digit: 9
Showing data with a leading 9
Archer County 9054
Bowie County 92565
Brewster County 9232
Dimmit County 9996
Jack County 9044
Mitchell County 9403
Roberts County 929
Stephens County 9630
Terrell County 984
When you complete the assignment turn in your A0.java file which will include all your source code, using the turnin program. This page contains instructions for using turnin.
Be sure of the following:
Name your file A0.java
Fill in the header information at the top of A0.java
Your program matches the output of the shown above
You upload the file to your CS324e folder.