CS 312 Assignment 9, Decryption Substitution Ciphers

Programming Assignment 9: Individual Assignment. You must complete this assignment by yourself. You cannot work with anyone else in the class or with someone outside of the class. You may not copy solutions from the world wide web. You are encouraged to get help from the instructional staff.

Placed online: Wednesday, October 28
20 points, ~2% of total grade
Due: no later than 11 pm, Thursday, November 12
General Assignment Requirements

The purpose of this assignment is to implement a program that decrypts a file that has been encrypted with a substitution cipher.

For this assignment you are limited to the language features in chapters 1 through 7 of the textbook.

Provided Files:

Given the same input, your program must produce the exact same output. Use a diff tool such as the one at this website ( www.quickdiff.com) to ensure your program produces the correct output. Even minor differences in output will cause you to fail grading tests and lose points


Background Information: A substitution cipher is a simple way of encrypting or encoding text to try and keep unwanted people from knowing the contents of a message.  The key or cipher consists of a key as follows:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z  decoded or plaintext letter
J I X O W H V Z M Q U D T B C P Y N S A E L G F K R  encoded letter

To encrypt a message simply match the letter you are trying to encrypt with the top row and write down the letter directly below it from the bottom row.  For example, if we wanted to encrypt the sentence " CS IS FUN" we would see that C encrypts to X, S encrypts to S and so forth.  We would get the message "XS MS HER".  In order for the substitution cipher to work the sender and the receiver need to agree on the key beforehand.  Since you know the key used to encrypt the message, what does "TMUV MS J VWWU!" decipher to?

At first the substitution cipher appears pretty hard to crack.  Even if we only encrypt upper case letters there are 26! (403,291,461,126,605,635,584,000,000) possible keys. It seems like an intractable task to try every possible key.  However, code breakers have the English language (or whatever language the message is in) on their side.  In English, as with other languages, certain characters are used much more frequently than others. A code breaker can use this fact and do frequency analysis on a message.  Frequency analysis is performed by taking an encrypted message and counting up the occurrence of each letter or character.  The longer the message the better, so this task is a good candidate for automation with a computer program.

Assignment Description: For this assignment you will write a program that does a frequency analysis on a file. A key is created based on that frequency analysis. The file is then displayed using the initial key. The user of the program is then asked if they want to make a change to the key. They will be asked what character in the decrypted version of the text they want to change and what that character should decrypt to instead.

  1.  sample log of a program run you must display the frequency of all the printable ASCII characters. (Ignore ASCII chars 0 - 31 and 127). See this section of the Wikipedia article for more on the printable ASCII characters.
     
  2. Use the other method from DecryptUtilities. The DecryptUtilities.getDecryptionKey(int[]) method takes a parameter that is an array of ints. The method expects that the array of ints has  a length of 128 and that it represents the frequency table for the encrypted text. The method then creates and returns an array of chars that is the initial decryption key. The array shall have a length of 128. Recall, the key is used to decrypt the message. The array of chars that is returned relies on mapping. Thus the index of the array indicates the ASCII code of the character in the encrypted text and the actual element is character than the encrypted character is changed into.

    Here is an example. Assume we obtain the array of chars from the DecryptUtilities.getDecryptionKey(int[]) method. Assume this is a portion of the array. (I don't show all of it.)
     
    index 65 66 67 68 69 70 71 72 73 74
    element 'c' ' ' '.' 'a' '!' 's' 't' 'r' 'e' 'B'

    Again, that is only a portion of the array. Index 65 maps to ASCII character 65. ASCII code 65 is 'A'. Thus an 'A' in the encrypted message will decrypt to a 'c' based on the current key. ASCII code 66 is 'B'. Thus a 'B' in the encrypted message will decrypt to a ' ' (a space) based on the current key. And so forth.

    So if we had this encrypted message:

    JIBDBFGDHEE

    and used the key shown above to decrypt it we would get

    Be a star!!

    J in the first character in the encrypted message. J has an ASCII code of 74. Going to index 64 in the array that represents the key we have a B. So J becomes B. I has an ASCII code of 73 so I becomes e. And so forth.

    The array returned from the method DecryptUtilities.getDecryptionKey(int[]) is used to transform the encrypted message to a decrypted message.
     

  3. After obtaining the initial key display it (as in the sample log) and then display the decrypted version of the text. Create a new String that is the decrypted version of the text. Don't overwrite the String that is the encrypted version. You'll need it again later.
     
  4. The trouble with this approach is that with short messages there will be differences between the expected frequency of letters and the actual frequencies. The decrypted text probably won't be perfect unless it is several thousand characters long. Even then there could be mistakes due to differences between the standard frequencies of characters and the actual frequency of characters in the original text.

    So now the user becomes a detective. Ask them if they want to make a change to the key. If they answer 'Y' or 'y', ask what decrypted character they want to change. Then ask what that decrypted character should decrypt to instead.

    For example if we display the decrypted text and we saw tde over and over again we would be fairly certain that whatever character is decrypting to d should decrypt to h instead since you would expect most texts to have a lot of instances of the word the.

    Search the key for the element that contains 'd' and change it to 'h'. If we left it at that we there would be a problem. There would now be two encrypted characters that decrypt to 'h'. That can make the process harder. So we will change whatever was decrypting to 'h' to decrypt to 'd' instead. (we swap the characters in the array after we find them both.) This isn't always a perfect change but it makes it easier to eventually get the correct key..
     
  5. After a change in the key display the new version of the decrypted file. Keep asking the user if they want to make a change, what they want to change, and display the new version until they want to stop.
     
  6. After the user doesn't want any more changes display the final version of the key and the decrypted text.

Things to remember:


Checklist: Did you remember to:


More Background: This information isn't necessary, but it is simply some background info you may find interesting.

Printable ASCII Distribution: It is easy to find the frequency distribution of letters in standard English documents. It is not so easy to find a frequency distribution of all the printable ASCII characters. I generated a frequency distribution of printable ASCII characters by looking at about 17,000 texts from Project Gutenberg with more than 360,000,000 characters.

 etaonisrhdlucmfwgy,pb.vkIT-AE"SNORH'CLMBP_DGWF1x;jUYq:*V2J0z!?K83X94567)([]/|Q=Z#&+}{$~^`%@><\

That is in order from most to least frequent. The first character is the space, e is second, and so forth.

Being a good code cracker. Knowing what changes to make to the key is more of an art than a science. (Although you could write programs to do it.) Here is a description of why I made the changes I did in the log of a sample execution.

  1. It looks like space might be right, but clearly there are problems. I notice a lot of tre. So perhaps r should decrypt to h giving us the instead.
  2. I notice the word ta and thot. So I guess a should decrypt to o giving us to  and that..
  3. I notice the word fiedl. So I guess d should decrypt to l giving us field.
  4. I notice the word tywical. So I guess w should decrypt to p giving us typical.
  5. I notice the word feg and ghich. So I guess g should decrypt to w giving us few and which..
  6. I think m should be u and k should be I
  7. After that is goes really quickly and we have a passage from a Sherlock Holmes novel.

Back to the CS 312 homepage.