CS324e - Assignment 1

Placed Online: Monday, September 10
Due: Thursday, September 20, no later than 11 pm
Points: 75, 7.5% of final grade.
Starter File: A1.java
Sample output: A page that shows the expected output of the program. (Log based on encryptedShort.txt)
Encrypted Files: encryptedShort.txt and encryptedLong.txt

Pair Assignment. You may work with one other person on this assignment using the pair programming technique. Review this paper on pair programming. You are not required to work in a pair on the assignment. (You may complete it by yourself if you wish.) If you begin working with one partner and do not wish to finish the assignment with that partner you must complete the assignment individually. If you work with a partner, the intent is that you work together, at the same computer. One person "drives" (does the typing and explains what they are doing) and the other person "navigates" (watches and asks questions when something is unclear). You should not partition the work, work on your own, and then put things together.

You and your partner may not acquire from any source (e.g.  another student or an internet site) a partial or complete solution to a problem or project that has been assigned. You and your partner may not show other students your solution to an assignment. You may not have another person (current student other than your partner, former student, tutor, friend, anyone) “walk you through” how to solve an assignment. You may get help from the instructional staff and use code from class.

If you work with a partner you must fill in the header with both students' information. Turn in a single copy of your solution to only one of your turnin accounts.

The purpose of this assignment is to learn / review Java programming by implementing a program that decrypts a file that is assumed to have been encrypted with a substitution cipher.


Background Information: A substitution cipher is a simple way of encrypting or encoding text to try and keep unwanted people from knowing the contents of a message.  The code or cipher consists of a key as follows:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z  decoded or plaintext letter
J I X O W H V Z M Q U D T B C P Y N S A E L G F K R  encoded letter

To encrypt a message simply match the letter you are trying to encrypt with the top row and write down the letter directly below it from the bottom row.  For example, if we wanted to encrypt the sentence " CS is fun" we would see that C encrypts to X, S encrypts to S and so forth.  We would get the message "XS MS HER".  In order for the substitution cipher to work the sender and the receiver need to agree on the key beforehand.  Since you know the code, what does "TMUV MS J VWWU!" decipher to?

At first the substitution cipher appears pretty hard to crack.  Using the lower case letters only, there are 26! (403,291,461,126,605,635,584,000,000) possible keys. It would seem an intractable task to try every possible key.  However, code breakers have the English language (or whatever language the message is in) on their side.  In English, as with other languages, certain characters are used and appear much more frequently. A code breaker can use this fact and do frequency analysis on a message.  Frequency analysis is performed by taking an encrypted message and counting up the occurrence of each letter or character.  The longer the message the better, so this task is a good candidate for automation with a computer program.
 


Assignment Description: For this assignment you will write a program that does a frequency analysis on a file. A key will be created based on that frequency analysis. The file will then be displayed using the key. The user of the program will then be asked if they want to make a change to the key. They will be asked what character in the decrypted version of the text they want to change and what that character should decrypt to instead.

In this assignment only English letters A through Z are encrypted. All letters in the encrypted file will be upper case letters.

Some of the work is already done for you. There is a method in the A1.java starter file with a method to pick a file via a GUI and convert it to a String. The method is named convertFileToString().

Approach to the Assignment - Things to Do:

  1. After obtaining the String version of the file via the convertFileToString method you should display it.
     
  2. Create a way to tally up how often each upper case letter occurs. This could be an array or a map. Only count the uppercase letters in the file. Use the Character.isLetter(char ch) and the Character.isUpperCase(char ch) methods to determine if a character is an upper case letter or not.
     
  3. Display the letters and their frequencies.
     
  4. Now you will need to create a key to decrypt the text. Whatever character occurred most frequently in the encrypted text is initially set to represent whatever letters occurs most frequently in English. This happens to be 'E'. Your approach could be similar to how we sorted words based on frequency in the Zipf example in class. The STANDARD_FREQS_STRING constant in the A1.java class lists the English letters based on their expected frequency from most frequent to least frequent. (If there is a tie in frequency for a set of letters your key may be different than the sample solution.)
     
  5. After obtaining the key display the initial key (as in the sample log) and then display the decrypted version of the text. Create a new String that is the decrypted version of the text. Don't try and alter the original String that represents the encrypted file. Create a new String, character by character, by looking at the encrypted String and the key. (Sounds like a good candidate for a method.) Don't overwrite the String that is the encrypted version. You will need it again.
     
  6. The trouble with tje frequency analysis approach is with short messages there will be differences between the expected frequency of letters and the actual frequencies. The decrypted text won't be perfect unless it is several thousand characters long and even there there could be problems.

    So now the user of our program becomes a detective. Ask them if they want to make a change to the key. If they say yes, ask what decrypted character they want to change. Then ask what that decrypted character should decrypt to instead. For example, if we display the decrypted text and we see AHE over and over again we would have a pretty good hunch that whatever character is decrypting to A should probably decrypt to T instead. You would expect most texts to have a lot of instances of the word THE. You will have to change the decryption key so that the element (key in a map or element in an array) that decrypts to 'A' and change it to 'T'. If we left it at that we there would be a problem. There would now be two encrypted characters that decrypt to 'T'. That can make the process harder. So we will also change whatever encrypted character was decrypting to 'T' to now decrypt to 'A'. This isn't always a perfect change but it makes it easier to eventually decrypt the message. (You always want a one to one relationship between encrypted and decrypted letters.)
     
  7. After a change in the key display the new version of the decrypted file. Keep asking the user if they want to make a change, what they want to change, and display the new version until they want to stop. (While loop anyone?)
     
  8. After the user doesn't want any more changes display the final version of the key and the decrypted text.

Hints:

There are some other methods already complete in the A1.java starter file to help you deal with user input. I have also included a Pair class (frequency and letter) that implements the Comparable interface to make sorting by frequency easier after you have determined the frequency of each letter in the encrypted version of the text.

You may find the TreeMap and ArrayList classes useful. There are also many static methods in the Arrays and Collections classes that can be useful.

Look at  the sample log of execution. user input is shown as bold for illustrative purposes. This is what your program output should look like other than bolding of the user input.

You will be graded on completing the solution correctly and the style of your code. (Did you break the problem up into smaller problems / methods? Did you comment your methods and complex algorithms to explain what you are doing in plain English? Do your variable names make sense and help make the code more understandable? Do you have a consistent brace style and indenting?) More good advice on style can be found at the bottom of this page in the section Design, Style, and Documentation. Remember, your code should be easier for some else (me and especially the TA) to read and understand.


When you complete the assignment turn in your A1.java file which will include all your source code, using the turnin program. This page contains instructions for using turnin.

Be sure of the following: