Program 8 - due Nov 18, 2004 at 11:59 pm
Cryptanalysis, or the art of decoding encoded messages, sometimes depends
on analyzing the frequency of characters in text. Though letter frequencies
are well-known for general English text, it is sometimes useful to study
letter frequencies in text from a certain domain. That is, manuals on aircraft
maintenance may well produce different letter frequencies than more general
English text. For more information, see this Wikipedia page:
http://en.wikipedia.org/wiki/Frequency_analysis
For this project, you will use a Map to measure letter frequencies in one
or more text files. The text files will be entered on the command line. Your
program will read each file and keep track of the frequency counts for each
letter (do not distinguish between uppercase and lowercase). The keys in
your Map should be instances of the wrapper class Character, and should represent
the letters in the English alphabet. The associated values will be objects
that store the frequency of the letters - you choose the type for these objects.
After your program has processed the input file(s), print the characters
and their frequencies (first in alphabetical order on the keys, and then
in decreasing order of frequency) to the screen. So your output should look
something like this:
For the files TomSawyer.txt, SoundandFury.txt:
Letter
Frequency
----------------------------------------
a
0.06
b
0.01
c
0.02
...
Letter
Frequency
----------------------------------------
e
0.13
t
0.10
h
0.09
...
All letters in the alphabet should appear in these lists with their frequency.
To produce the first table, use the entrySet() method for Maps. For the second
table, I suggest that you change how you storing the data, and use Arrays.sort
or Collections.sort to arrange the data in order.
You will be graded on your program design as well as the accuracy of your
solution. Submit the file WordAnalysis.java by 11:59 pm on the due date.