CS 312 Assignment 9, Decryption Substitution Ciphers
Programming Assignment 9: Individual Assignment. You must complete this assignment by yourself. You cannot work with anyone else in the class or with someone outside of the class. You may not copy solutions from the world wide web. You are encouraged to get help from the instructional staff.
Placed online: Tuesday, March 29
20 points, ~2% of total grade
Due: no later than 11 pm, Thursday, April 14 (the
week after exam 2)
General Assignment Requirements
The purpose of this assignment is to implement a program that decrypts a file that has been encrypted with a substitution cipher.
For this assignment you are limited to the language features in chapters 1 through 7 of the textbook, except you may not use 2d arrays. Only arrays of a single dimension.
Provided Files:
Given the same input, your program must produce the exact same output. Use a diff tool such as the one at this website (https://www.diffchecker.com/) to ensure your program produces the correct output. Even minor differences in output will cause you to fail grading tests and lose points
Background Information: A substitution cipher is a simple way of encrypting or encoding text to try and keep unwanted people from knowing the contents of a message. The key or cipher consists of a key as follows:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z decoded or plaintext letter
J I X O W H V Z M Q U D T B C P Y N S A E L G F K R encoded letter
To encrypt a message simply match the letter you are trying to encrypt with the top row and write down the letter directly below it from the bottom row. For example, if we wanted to encrypt the sentence " CS IS FUN" we would see that C encrypts to X, S encrypts to S and so forth. We would get the message "XS MS HER". In order for the substitution cipher to work the sender and the receiver need to agree on the key beforehand. Since you know the key used to encrypt the message, what does "TMUW MS J LDZY!" decipher to?
At first the substitution cipher appears pretty hard to crack. Even if we only encrypt upper case letters there are 26! (403,291,461,126,605,635,584,000,000) possible keys. It seems like an intractable task to try every possible key. However, code breakers have the English language (or whatever language the message is in) on their side. In English, as with other languages, certain characters are used much more frequently than others. A code breaker can use this fact and do frequency analysis on a message. Frequency analysis is performed by taking an encrypted message and counting up the occurrence of each letter or character. The longer the message the better, so this task is a good candidate for automation with a computer program.
Assignment Description: For this assignment you will write a program that does a frequency analysis on a file. A key is created based on that frequency analysis. The file is then displayed using the initial key. The user of the program is then asked if they want to make a change to the key. They will be asked what character in the decrypted version of the text they want to change and what that character should decrypt to instead.
DecryptUtilities.getDecryptionKey(int[]) method takes a parameter that is an array of ints. The method expects that the array of ints has a length of 128 and that it represents the frequency table for the encrypted text. The method then creates and returns an array of chars that is the initial decryption key. The array
shall have a length of 128. Recall, the key is used to decrypt the message. The array of chars that is returned relies on mapping. Thus the index of the array indicates the ASCII code of the character in the encrypted text and the actual element is character than the encrypted character is changed into.
DecryptUtilities.getDecryptionKey(int[]) method. Assume this is a portion of the array. (I don't show all of it.)index |
65 |
66 |
67 |
68 |
69 |
70 |
71 |
72 |
73 |
74 |
element |
'c' |
' ' |
'.' |
'a' |
'!' |
's' |
't' |
'r' |
'e' |
'B' |
Again, that is only a portion of the array. Index 65 maps to ASCII character 65. ASCII code 65 is 'A'. Thus an 'A' in the encrypted message will decrypt to a 'c' based on the current key. ASCII code 66 is 'B'. Thus a 'B' in the encrypted message will decrypt to a ' ' (a space) based on the current key. And so forth.
So if we had this encrypted message:
JIBDBFGDHEE
and used the key shown above to decrypt it we would get
Be a star!!J in the first character in the encrypted message. J has an ASCII code of 74. Going to index
74 in the array that represents the key we have a B. So J becomes B. The
encrypted character in the message is I. I has an ASCII code of 73 so I becomes e
in the decrypted message. And so forth.
The array returned from the method DecryptUtilities.getDecryptionKey(int[])
is used to transform the encrypted message to a decrypted message.
tde over and over again we would be fairly certain that whatever character is decrypting to
d should decrypt to h instead since you would expect most texts to have a lot of instances of the word the.
Things to remember:
int x = ch; // if ch is a char, x now holds the ASCII code for that char
char ch = (char) x; // if x is an int, ch now holds the ASCII character associated with the code
x
By way of comparison my solution is 214 lines long (including many blank lines and lines with a single }) and 13 methods besides main.
Checklist: Did you remember to:
More Background: This information isn't necessary, but it is simply some background info you may find interesting.
Printable ASCII Distribution: It is easy to find the frequency distribution of letters in standard English documents. It is not so easy to find a frequency distribution of all the printable ASCII characters. I generated a frequency distribution of printable ASCII characters by determining the frequency of characters from approximately 17,000 texts from Project Gutenberg with more than 360,000,000 characters.
etaonisrhdlucmfwgy,pb.vkIT-AE"SNORH'CLMBP_DGWF1x;jUYq:*V2J0z!?K83X94567)([]/|Q=Z#&+}{$~^`%@><\
Those are in order from most to least frequent. The first character is the space, e is second, and so forth.
Being a good code cracker. Knowing what changes to make to the key is more of an art than a science. (Although you could write programs to do it.) Here is a description of why I made the changes I did in the log of a sample execution.