CS 312 Assignment 9, Decryption Substitution Ciphers

Programming Assignment 9: Individual Assignment. You must complete this assignment by yourself. You cannot work with anyone else in the class or with someone outside of the class. You may not copy solutions from the world wide web. You are encouraged to get help from the instructional staff.

Placed online: Wednesday, October 30
20 points, ~2% of total grade
Due: no later than 11 pm, Thursday, November 7
General Assignment Requirements

The purpose of this assignment is to implement a program that decrypts a file that is assumed to have been encrypted with a substitution cipher.

For this assignment you are limited to the language features in chapters 1 through 7 of the textbook.

Provided Files:


Background Information: A substitution cipher is a simple way of encrypting or encoding text to try and keep unwanted people from knowing the contents of a message.  The key or cipher consists of a key as follows:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z  decoded or plaintext letter
J I X O W H V Z M Q U D T B C P Y N S A E L G F K R  encoded letter

To encrypt a message simply match the letter you are trying to encrypt with the top row and write down the letter directly below it from the bottom row.  For example, if we wanted to encrypt the sentence " CS is fun" we would see that C encrypts to X, S encrypts to S and so forth.  We would get the message "XS MS HER".  In order for the substitution cipher to work the sender and the receiver need to agree on the key beforehand.  Since you know the key used to encrypt the message, what does "TMUV MS J VWWU!" decipher to?

At first the substitution cipher appears pretty hard to crack.  If only using the lower case letters, there are 26! (403,291,461,126,605,635,584,000,000) possible keys. It would seem an intractable task to try every possible key.  However, code breakers have the English language (or whatever language the message is in) on their side.  In English, as with other languages, certain characters are used much more frequently than others. A code breaker can use this fact and do frequency analysis on a message.  Frequency analysis is performed by taking an encrypted message and counting up the occurrence of each letter or character.  The longer the message the better, so this task is a good candidate for automation with a computer program.

Assignment Description: For this assignment you will write a program that does a frequency analysis on a file. A key is created based on that frequency analysis. The file is then displayed using the initial key. The user of the program is then asked if they want to make a change to the key. They will be asked what character in the decrypted version of the text they want to change and what that character should decrypt to instead.

In this assignment only the printable ASCII characters will be encrypted. None of the other ASCII characters (referred to as ASCII control characters) are encrypted.

Some of the work is already done for you in the DecryptUtilities program. This has two static methods you will use. The first is DecryptUtilities.convertFileToString(). This method is the first thing called in the main method. When the method is called a window will open for you to select the encrypted file. The example output uses encryptedShortText.txt. The method creates and returns a String that contains the complete contents of the file.

Approach to the Assignment: You should read this handout in its entirety and then think about the various steps involved in the process. Plan out your program (what methods do you think you will need) on paper BEFORE you start coding. Write down what you think the parameters to those methods will be and what each method will do.

  1. After obtaining the String version of the file display it.
     
  2. Next create an array that is a frequency table (meaning an array) based on the String. This will be an array of ints. The index of each element maps to the ASCII character. You must create an array for all 128 ASCII characters. Thus index 32 is the count of how many spaces are in the encrypted text, index 65 is the count of how many A's, and so forth. Even though only the printable characters are encrypted, create a frequency table (meaning an array) for ASCII characters codes 0 to 127. Just as in the sample log of a program run you must display the frequency of all the printable ASCII characters and space. (Ignore ASCII chars 0 - 31 and 127).
     
  3. Next, use the other method from DecryptUtilities. The DecryptUtilities.getDecryptionKey(int[]) method takes a parameter that is an array of ints. The method expects that the array of ints has  a length of 128 and that it represents the frequency table for the encrypted text. The method then creates and returns an array of chars that is the initial decryption key. The array shall have a length of 128. Recall, the key is used to decrypt the message. The array of chars that is returned relies on mapping. Thus the index of the array indicates the ASCII code of the character in the encrypted text and the actual element is character than the encrypted character is changed into.

    Here is an example. Assume we obtain the array of chars from the DecryptUtilities.getDecryptionKey(int[]) method. Assume this is a portion of the array. (I don't show all of it.)
     
    index 65 66 67 68 69 70 71 72 73 74
    element 'c' ' ' '.' 'a' '!' 's' 't' 'r' 'e' 'B'

    Again, that is only a portion of the array. Index 65 maps to ASCII character 65. ASCII code 65 is 'A'. Thus an 'A' in the encrypted message will decrypt to a 'c' based on the current key. ASCII code 66 is 'B'. Thus a 'B' in the encrypted message will decrypt to a ' ' (a space) based on the current key. And so forth.

    So if we had this encrypted message:

    JIBDBFGDHEE

    and used the key shown above to decrypt it we would get

    Be a star!!

    J in the first character in the encrypted message. J has an ASCII code of 74. Going to index 64 in the array that represents the key we have a B. So J becomes B. I has an ASCII code of 73 so I becomes e. And so forth.

    The array returned from the method DecryptUtilities.getDecryptionKey(int[]) is used to transform the encrypted message to a decrypted message.
     

  4. After obtaining the initial key display it (as in the sample log) and then display the decrypted version of the text. Create a new String that is the decrypted version of the text. (Sounds like a good candidate for a method.) Don't overwrite the String that is the encrypted version. You will need it again.
     
  5. The trouble with this approach is that with short messages there will be differences between the expected frequency of letters and the actual frequencies. The decrypted text probably won't be perfect unless it is several thousand characters long and even there there could be mistakes.

    So now the user of our user becomes a detective. Ask them if they want to make a change to the key. If they say yes, ask what decrypted character they want to change. Then ask what that decrypted character should decrypt to instead.

    For example if we display the decrypted text and we saw tde over and over again we would be fairly certain that whatever character is decrypting to d should decrypt to h instead since you would expect most texts to have a lot of instances of the word the.

    Search the key for the element that contains 'd' and change it to 'h'. If we left it at that we there would be a problem. There would now be two encrypted characters that decrypt to 'h'. That can make the process harder. So we will also change whatever was decrypting to 'h' now decrypt to 'd'. (we swap the characters in the array after we find them both.) This isn't always a perfect change but it makes it easier to eventually get the correct key..
     
  6. After a change in the key display the new version of the decrypted file. Keep asking the user if they want to make a change, what they want to change, and display the new version until they want to stop. (While loop anyone?)
     
  7. After the user doesn't want any more changes display the final version of the key and the decrypted text.

Things to remember:

Turn in your program named Decrypt.java using the turnin program. If you are working with a partner turn in the file to only one person's account.


Checklist: Did you remember to:


More Background: This information isn't necessary, but it is simply some background info you may find interesting.

Printable ASCII Distribution: It is easy to find the frequency distribution of letters in standard English documents. It is not so easy to find a frequency distribution of all the printable ASCII characters. I generated a frequency distribution of printable ASCII characters by looking at about 17,000 texts from Project Gutenberg with more than 360,000,000 characters.

 etaonisrhdlucmfwgy,pb.vkIT-AE"SNORH'CLMBP_DGWF1x;jUYq:*V2J0z!?K83X94567)([]/|Q=Z#&+}{$~^`%@><\

That is in order from most to least frequent. The first character is the space, e is second, and so forth.

Being a good code cracker. Knowing what changes to make to the key is more of an art than a science. (Although you could write programs to do it.) Here is a description of why I made the changes I did in the log of a sample execution.

  1. It looks like space might be right, but clearly there are problems. I notice a lot of tre. So perhaps r should decrypt to h giving us the instead.
  2. I notice the word ta and thot. So I guess a should decrypt to o giving us to  and that..
  3. I notice the word fiedl. So I guess d should decrypt to l giving us field.
  4. I notice the word tywical. So I guess w should decrypt to p giving us typical.
  5. I notice the word feg and ghich. So I guess g should decrypt to w giving us few and which..
  6. I think m should be u and k should be I
  7. After that is goes really quickly and we have a passage from Sherlock Holmes.

Back to the CS 312 homepage.