CS303E Project 3

Instructor: Dr. Bill Young
Due Date: Wednesday, December 1, 2021 at 11:59pm

Building a Query Processing Utility

A common type of application allows the user to input queries and retrieve information from a database. In this assignment, you will be building such a system to allow requesting information about the populations of Texas counties and statewide. The database will be in the form of a dictionary with Texas county names as keys and pairs of integers as values.

The raw information is in a file which you can download here: Input File. You must name your input file populationdata.csv, so that the TAs can run your code on their version of this file. The first few lines in the file are:

# County,cqr_census_2010_count,jan1_2020_pop_est
Anderson,58458,59120
Andrews,14786,19588
I extracted the data from a file with several additional fields. If you want to read about the data, there's a report available here: Description of the Data. You are certainly not required to view this, but it shows the raw data, percentages different, etc.

Note that some lines begin with a '#'; you must ignore those. (Don't count on that being only the first line in the file.) Other lines have the following three fields separated by commas: a Texas county name, the population of the county as of the 2010 census and an estimate of the county population as of January 1, 2020. (Yes, I know that the 2020 census data is now available, but this data is fine for this project.) Open and read the file. For each line that doesn't start with '#', create a record in a dictionary of the form: [ countyName: (census2010, estimate2020) ]. That is, the key is the county name (as a string), and the value is a tuple containing two integers (remember to convert them to ints). I strongly suggest inserting the county name keys as lowercase. While you're reading in this data, also sum up the census2010 and estimate2020 fields for all counties so that you can insert one final item for the entire state of Texas. That is, suppose the only counties in the file were Anderson and Andrews as shown above. Then your dictionary would have the form:

{ anderson: (58458, 59120), andrews: (14786, 19588), texas: (73244, 78708) }
While you're doing the above, also create a list of all the county names in the file. In this case, it would be the list:
[ anderson, andrews ]
Notice that I've lowercased the county names, and suggest that you do the same. It's a bit odd, but it will make lots of things easier. Be aware of the fact that some counties have names with two words (San Saba, Van Zandt, others).

The database contains all Texas counties and each appears only once. Also, the counties are in alphabetical order. Recent versions of Python (after 3.7) guarantee that dictionary keys are maintained in insertion order. So, if you insert them in order, they'll stay in order.

Notice that creating the list of county names is not strictly necessary. It would be the same as the list of keys in the dictionary (except for 'texas'). But it would be inefficient to grab the keys each time you want to list all of the counties. So create the list as suggested.

Now, that once you have the database (dictionary) and list of counties, you will build a query processing system that allows a user to ask questions about the population data in specific counties or statewide. The available commands are these: help, quit, counties, census (by county or statewide), and estimated (by county or statewide). The census data is from the 2010 census and the estimated from January, 2020. See examples of the use of these below.

Your program will initially print a Welcome message, which should look like this:

Welcome to the Texas Population Dashboard.
This provides census data from the 2010 census and 
estimated population data in Texas as of 1/1/2020.

Creating dictionary from file: populationdata.csv

Enter any of the following commands:
Help - list available commands;
Quit - exit this dashboard;
Counties - list all Texas counties;
Census <countyName>/Texas - population in 2010 census by specified county or statewide;
Estimated <countyName>/Texas - estimated population in 2020 by specified county or statewide.
Growth <countyName>/Texas - percent change from 2010 to 2020, by county or statewide.

Note that the commands print in bold (as do the prompts for user input below). To print a string in bold in Python, put \033[1m right before the string and \033[0m immediately after it. (Note this doesn't work on all systems; if it doesn't work for you, don't worry. You won't be penalized for not doing this.)

After printing the welcome message, enter a loop to accept commands from the user. You will prompt the user to enter any of the legal commands and respond appropriately. User input should not be sensitive to case. Again, I suggest converting all county names to lowercase when you read them in.

Here is some code that can help you parse a command that may contain a two word county name. You are welcome to use this code, or write your own. Assume that you've read the command submitted by the user into a string variable named commandInput

# Parse the command into a list of words, assuming there's no punctuation.
commWords = commandInput.split()

# Extract the first word in the command.  It will be the instruction to perform.
comm = commWords[0].strip()

# Extract the rest of the words and re-assemble them into a single string, 
# separated by spaces.  These are the instuction's argument, if any.
args = commWords[1:]
arg = " ".join(args).strip()
Following this code, comm will contain the initial command word and arg will contain the argument, if any. This preserves the case, but you can use arg.lower() to lowercase it.

For the Help command, print the command list exactly as in the Welcome message. I bolded the commands in the help message. (I also allowed "?" to work instead of Help, but didn't advertise that option. Why not make your code as user friendly as possible? Haven't you been in a system like this and forgotten what command to type to get help? You don't have to do this, but can if you like.)

For Quit, print a Goodbye message and exit the program. (In my version, I also allowed Exit to work the same way, but didn't advertise that in the help message. You don't have to do this, but can if you like.)

For Counties, print the counties 10 per line. I didn't bother to remove the comma after Zavala, but you can if you like. Notice that I printed each county using the string method .title() to capitalize each word. This doesn't quite work for counties like DeWitt and McCulloch, but it's close enough.

For Census the user can specify a county name or 'Texas'. Print the census 2010 value for the county or state. Notice that the output format is slightly different for an individual county or a statewide result.

Estimated is handled similarly to Census, but returning the 2020 estimated population.

For Growth the user specifies a county name or 'Texas'. Print the percentage difference between the 2010 census count and the 2020 estimated population. This is computed as ( (estimate2020 - census2010) / census2010 ) * 100.0. This should be reported to precision of two digits following the decimal point. Note that the percent change can be negative.

The Algorithm:

Don't attempt to do this entire assignment at once; break it into pieces and test each thoroughly before going on to the next one. Perhaps create a version of the input file with only 3-4 lines for testing.

Here is a suggestion of some steps to carry out this assignment.

  1. Write a separate function that reads lines from the input file and creates two data structures: a dictionary and a list of county names.
    1. Assume for this function that the input file exists. You'll check that in the main function.
    2. Read a line from the file; if it starts with '#' discard it. Otherwise, parse it into three fields: countyName, census2010, estimated2020. The fields are separated by commas. The Python .split(",") function should be useful here.
    3. Convert the census2010 and estimated2020 values into integers (remember you read them as strings); you may have to strip a newline from estimated2020, since it's at the end of the line.
    4. Keep separate running totals of census2010 and estimated2020; you'll need those later.
    5. Also keep a running list of county names.
    6. Associate the pair of integers (census2010, estimated2020) with key countyName in the dictionary.
    7. Repeat the steps above for all lines in the file.
    8. After you've processed all lines in the file, add one more record to the dictionary associating the total census2010 and estimated2020 with key 'texas' (note lowercase).
    9. Finally, return from the function the pair (dictionary, list of county names).

  2. Now you'll build the query processing functionality. This will be your main function.
    1. Check that the input file exists; if not, exit after printing an error message: File does not exist.
    2. From the input file, build your database (dictionary) and list of county names using the function described above.
    3. Print the welcome message.
    4. Enter a loop to accept commands from the user, parse them and process them. (Remember that case doesn't matter.)
    5. Individual commands should be handled as follow:
      1. Given a Help command, print the Help message.
      2. Given a Quit command, say goodbye and exit.
      3. Given a Counties command, print the list of counties, 10 per line, from the list you've accumulated. You may want to apply .title() to each before you print it.
      4. Given a Census command, if the input (lowercased) is 'texas', get the 2010 census info from the database. If it's other then 'texas', see if it's a Texas county (has a key in the dictionary) and access the info from database. If not, print an error message.
      5. The Estimated command is handled similarly to the Census command, except accessing the 2020 estimated population data.
      6. The Growth command accesses the 2010 census data and 2020 estimated population data for the county or state, and computes the percentage difference. If the argument is neither a county or 'texas', print that the county is not recognized.
      7. Any other commands should be rejected.

Sample Behavior:

> python Project3.py  # If the input file wasn't there. 
File does not exist

> python Project3.py

Welcome to the Texas Population Dashboard.
This provides census data from the 2010 census and 
estimated population data in Texas as of 1/1/2020.

Creating dictionary from file: populationdata.csv

Enter any of the following commands:
Help - list available commands;
Quit - exit this dashboard;
Counties - list all Texas counties;
Census <countyName>/Texas - population in 2010 census by specified county or statewide;
Estimated <countyName>/Texas - estimated population in 2020 by specified county or statewide;
Growth <countyName>/Texas - percent change from 2010 to 2020, by county or statewide.

Please enter a command: hELp
Enter any of the following commands:
Help - list available commands;
Quit - exit this dashboard;
Counties - list all Texas counties;
Census <countyName>/Texas - population in 2010 census by specified county or statewide;
Estimated <countyName>/Texas - estimated population in 2020 by specified county or statewide;
Growth <countyName>/Texas - percent change from 2010 to 2020, by county or statewide.

Please enter a command: coUNtiES
Anderson, Andrews, Angelina, Aransas, Archer, Armstrong, Atascosa, Austin, Bailey, Bandera, 
Bastrop, Baylor, Bee, Bell, Bexar, Blanco, Borden, Bosque, Bowie, Brazoria, 
Brazos, Brewster, Briscoe, Brooks, Brown, Burleson, Burnet, Caldwell, Calhoun, Callahan, 
Cameron, Camp, Carson, Cass, Castro, Chambers, Cherokee, Childress, Clay, Cochran, 
Coke, Coleman, Collin, Collingsworth, Colorado, Comal, Comanche, Concho, Cooke, Coryell, 
Cottle, Crane, Crockett, Crosby, Culberson, Dallam, Dallas, Dawson, Deaf Smith, Delta, 
Denton, De Witt, Dickens, Dimmit, Donley, Duval, Eastland, Ector, Edwards, Ellis, 
El Paso, Erath, Falls, Fannin, Fayette, Fisher, Floyd, Foard, Fort Bend, Franklin, 
Freestone, Frio, Gaines, Galveston, Garza, Gillespie, Glasscock, Goliad, Gonzales, Gray, 
Grayson, Gregg, Grimes, Guadalupe, Hale, Hall, Hamilton, Hansford, Hardeman, Hardin, 
Harris, Harrison, Hartley, Haskell, Hays, Hemphill, Henderson, Hidalgo, Hill, Hockley, 
Hood, Hopkins, Houston, Howard, Hudspeth, Hunt, Hutchinson, Irion, Jack, Jackson, 
Jasper, Jeff Davis, Jefferson, Jim Hogg, Jim Wells, Johnson, Jones, Karnes, Kaufman, Kendall, 
Kenedy, Kent, Kerr, Kimble, King, Kinney, Kleberg, Knox, Lamar, Lamb, 
Lampasas, La Salle, Lavaca, Lee, Leon, Liberty, Limestone, Lipscomb, Live Oak, Llano, 
Loving, Lubbock, Lynn, Mcculloch, Mclennan, Mcmullen, Madison, Marion, Martin, Mason, 
Matagorda, Maverick, Medina, Menard, Midland, Milam, Mills, Mitchell, Montague, Montgomery, 
Moore, Morris, Motley, Nacogdoches, Navarro, Newton, Nolan, Nueces, Ochiltree, Oldham, 
Orange, Palo Pinto, Panola, Parker, Parmer, Pecos, Polk, Potter, Presidio, Rains, 
Randall, Reagan, Real, Red River, Reeves, Refugio, Roberts, Robertson, Rockwall, Runnels, 
Rusk, Sabine, San Augustine, San Jacinto, San Patricio, San Saba, Schleicher, Scurry, Shackelford, Shelby, 
Sherman, Smith, Somervell, Starr, Stephens, Sterling, Stonewall, Sutton, Swisher, Tarrant, 
Taylor, Terrell, Terry, Throckmorton, Titus, Tom Green, Travis, Trinity, Tyler, Upshur, 
Upton, Uvalde, Val Verde, Van Zandt, Victoria, Walker, Waller, Ward, Washington, Webb, 
Wharton, Wheeler, Wichita, Wilbarger, Willacy, Williamson, Wilson, Winkler, Wise, Wood, 
Yoakum, Young, Zapata, Zavala, 

Please enter a command: countries
Command is not recognized.  Try again!

Please enter a command: cenSUS TRAvis
Travis county had 1024266 citizens in the 2010 Census.

Please enter a command: CENsus WalNUT
County Walnut is not recognized.

Please enter a command: census WeBB
Webb county had 250304 citizens in the 2010 Census.

Please enter a command: estIMATed weBb
Webb county had estimated population (January, 2020): 280547

Please enter a command: grOWth WEBB
Webb county had percent population change (2010 to 2020): 12.08%

Please enter a command: estimated TEXAS
Texas estimated population (January, 2020): 29149480

Please enter a command: cenSUS Texas
Texas total population in the 2010 Census: 25145565

Please enter a command: GRowTH teXas
Texas had percent population change (2010 to 2020): 15.92%

Please enter a command: estimated MAINE
County Maine is not recognized.

Please enter a command: quite
Command is not recognized.  Try again!

Please enter a command: quit
Thank you for using the Texas Population Database Dashboard.  Goodbye!

Turning in the Assignment:

The program should be in a file named Project3.py. Submit the file via Canvas before the deadline shown at the top of this page. Submit it to the assignment project3 under the assignments sections by uploading your python file.

Your file must compile and run before submission. It must also contain a header with the following format:

# File: Project3.py
# Student: 
# UT EID:
# Course Name: CS303E
# 
# Date Created:
# Date Last Modified: 
# Description of Program: