First Bytes

Vigenere Cipher Lab

Introduction: In this lab activity you will use MatLab and write functions to decrypt a message that is encoded with a Vigenere cipher.

This lab is available at www.cs.utexas.edu/users/scottm/FirstBytes/vigenereLab.htm

A Vigenere Cipher  is a more complicated and thus stronger cipher than either a Caesar cipher or a substitution cipher. The Vigenere cipher uses a matrix of Caesar ciphers and a code word. The matrix is always the same. Note Each row is simply a Caesar cipher with a shift one greater than the row before.

   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
 1 B C D E F G H I J K L M N O P Q R S T U V W X Y Z A 
 2 C D E F G H I J K L M N O P Q R S T U V W X Y Z A B 
 3 D E F G H I J K L M N O P Q R S T U V W X Y Z A B C 
 4 E F G H I J K L M N O P Q R S T U V W X Y Z A B C D 
 5 F G H I J K L M N O P Q R S T U V W X Y Z A B C D E 
 6 G H I J K L M N O P Q R S T U V W X Y Z A B C D E F 
 7 H I J K L M N O P Q R S T U V W X Y Z A B C D E F G 
 8 I J K L M N O P Q R S T U V W X Y Z A B C D E F G H 
 9 J K L M N O P Q R S T U V W X Y Z A B C D E F G H I 
10 K L M N O P Q R S T U V W X Y Z A B C D E F G H I J 
11 L M N O P Q R S T U V W X Y Z A B C D E F G H I J K 
12 M N O P Q R S T U V W X Y Z A B C D E F G H I J K L 
13 N O P Q R S T U V W X Y Z A B C D E F G H I J K L M 
14 O P Q R S T U V W X Y Z A B C D E F G H I J K L M N
15 P Q R S T U V W X Y Z A B C D E F G H I J K L M N O 
16 Q R S T U V W X Y Z A B C D E F G H I J K L M N O P 
17 R S T U V W X Y Z A B C D E F G H I J K L M N O P Q 
18 S T U V W X Y Z A B C D E F G H I J K L M N O P Q R 
19 T U V W X Y Z A B C D E F G H I J K L M N O P Q R S 
20 U V W X Y Z A B C D E F G H I J K L M N O P Q R S T 
21 V W X Y Z A B C D E F G H I J K L M N O P Q R S T U 
22 W X Y Z A B C D E F G H I J K L M N O P Q R S T U V 
23 X Y Z A B C D E F G H I J K L M N O P Q R S T U V W 
24 Y Z A B C D E F G H I J K L M N O P Q R S T U V W X 
25 Z A B C D E F G H I J K L M N O P Q R S T U V W X Y 
26 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

TEX ASTEXASTEX AS TEXAS TEXASTEX ASTEXA ST 
THE UNIVERSITY OF TEXAS FORMALLY OPENED IN 1883.

To encode a piece of text the sender and receiver must agree on a code word. Let's say the code word is TEXAS. Take the code work and repeat it over the top of the plain text. assume our message is "The University of Texas formally opened in 1883."

TEX ASTEXASTEX AS TEXAS TEXASTEX ASTEXA TE 
THE UNIVERSITY OF TEXAS FORMALLY OPENED IN 1883.

To encode a letter go to the column that has that letter, the plain text letter at the top. Find the row that starts with  letter from the code word that corresponds to the plain text letter. The cell where this row and column intersects is the encode letter. For example the first letter is T, the code word letter is also T. Column T and row T intersect at the M. The next letter is H. The code word letter is E. Column H and row E intersect at the L. The entire message is encrypted as:

TEX ASTEXASTEX AS TEXAS TEXASTEX ASTEXA TE 
THE UNIVERSITY OF TEXAS FORMALLY OPENED IN 1883.
MLB UFBZBRKBXV OX MIUAK YSOMSEPV OHXRBD AG 1883.


To decode the message one would take the encrypted text, repeat the code word above it. Take the columns starting with the letter from the code word, go down until you find the encoded letter, and the first letter in the row is the plain text letter. So if the code word is still TEXAS and the encrypted message is

ZS EOJGW! What is the message?

TE XASTE  (Code word)
ZS EOJGW! (Encrypted message)

Messages encoded with a Vigenere square cipher are much harder than substitution ciphers because frequency analysis fails. Consider the example above, With a five letter code word and all the letters different there are 5 different letters 'E' might encode to. This spreads out the Es. Also, sometimes a letter that means E in one spot means something else at another time. There is an excellent explanation at Simon Singh's, author of The Code Book, website about how to crack a Vigenere square cipher. (Read this page and then try the link on the top left, that says "Next Page ->" if you are interested in finding out more about how to determine the length of the code word.) 

The first and most difficult part is determining the length of the codeword. Once you have a good idea of the length of the code word cracking the cipher is relatively easy. 

For this lab you have received information that all code words are 5 letters in length.

Once you have an idea of how long the code word perform a number frequency analysis equal to the length of the code word. In other words if you believe the code word is 5 letters long then you would count up how often each character occurs using the 1st, 6th, 11th, 16th, 21st, 26th characters and continue with every 5th character. This will give you one set of data. Repeat this for the other 4 sets of data, the 2nd, 7th, 12th, 17th, 22nd, and so forth. Then the 3rd, 8th, 13th, 18th, 23rd, 28th and so forth. When you are done you will have 5 sets of data. This could be stored in a matrix in MatLab.

Here is a portion of a message encoded with a Vigenere square cipher and a code word of length 5:

SOUVWZ OEQ YLBKYWTSE: V ST UCNV FCL TSCS RA SDOIQ LV HYR HYSJF LVBZTZA, PVPSBGV GZHH XNNL HYRE AVV BHWCIGMUWKL LV HVYD BG AHKA KYNL AVVL UVICQ VV. BFJ OL VRIW JCDR ZLFV GGUWXUL ISTNMZS FS UPJZY DPPVELPSJ. V ATOXVFL O XEWHH DNFF CW LGB QFHDK UZIW TM KNDR TRE TLHKRJ AVRA A JCLYV, ISTNMZS PBM OOMR ZHR WVJZH-YNFK YEBOSSUTW PB KUW AVZAYZ MFH ZHJV USK HF QG PB TUAJOXB GCSI GZL MVNJZ HF CJLGVENL QZIAS ZZOWYHZRK. WSIUSWG, YBOLJVE, A HA DBJL QFAKJWFHK VT KUW PAGBJAOEPW VT TVNPZ CVTLFKVWZ WE GZPG GNJAWTHDHF DBELBK BX VII UAZHFEQ AVRA SUMFAW LZJR, TLQRHKL OJ V LYOMRD AVIBMNV KUW JCLALYM RAV TSVG HLCGYW HBU FWL HYVFNG KUSA VRIW OOGCWUSU GG SWKGDL DVBHSS, Z EWHZZMW DVRG AA AVNFZ HF QWTCTESJM KB HYSJRJCS FHJ JWMVD SWSRJAWVF. SSZ KUJVIXU LOS PRSYG NR ZHJV USK HF SANVK SGY QZIAS ZZOWYHP, NFK KV XFVK KUSA HYRJL OIR LPAVF OOSE GZL ZZTZA UIBOZ FRGZLF UVE, HBU RNLFP GATS KUSA VRCHLBJ QWTCTESJM ZF AU RRAYLF. EBO, SOITWSM SRUHIJR GM HYR LYCLODLR JGSAS FS LOS NBJSR RF S DVFYW, JWMVD SWSRJAWVF ZHJV QAZOGCWHFVQ AU ARAQ VHYRJ JCLALYWVF. AA WJ VEWCJFAIZV, BX JCLEKL, HF OW HH NNJ HBU GG RSVC XYSVQGT CW GZL DIRKZ OEQ XYSVQGT CW FHLSTU SUR WEWLRFZ GM OJFWTPCL. LOSP QAZOGCWHF RHLVARGAJOCYQ. HBU FG PB DNFF QFHFAFZRK DVVEW VFUVFHFZYQ AVVL OLFV FSMS, KBVHM KUWF VRIW NCER. AU CKUWY QFHFAFZRK, LJVA TLTFEW DOI PSTS, EBL VBCL XYSVQGT CW GZL DIRKZ OEQ XYSVQGT CW NKZSDODF, OEQ XYSVQGT CW FHLSTU VPGRCHLOIRV, IIK SJLSUBE VT IRDPUZBF KWJNHWSREWK. OEQ KV KV XFVK KUSA VVEW PB KUAZ QFHFAFP, JW OOMR S NFRIW YSJCGUGZOASWKL. 

After counting how often each letter appears for the 5 letter sets you get

 A  B  C  D  E  F  G  H  I  J  K   L  M  N  O  P  Q  R   S  T  U   V   W  X  Y  Z
78 19 11 15  0 20  0 52 12 35 28 106  8 11 39 40  0  5  30 33 38  60  13  0 42 45
29 43 58 13  0 50 29 61 15 15 13   1 24  1 55 14 19 23 104 16 14  61  54  0  6 22
 2  3 24 14 48 65 16  0 43 42 72  19 13 19  0 18  0 59  13 20 21 107  15 11 45 51
41 69 18  0 50 40 59 28 16 22  2  14  2 52 13 15 35 85  14 16 54  49   0  5 26 15
52  1  2 25 21 39 55 17  0 43 35  76 23 10 27  0 17  0  68 7  14  20 111 16 12 49

At this point we want to figure out the letter of the code word for each row, i.e. each frequency analysis. Since each row in the Vigenere square is simply a Caesar shift cipher we know the letters are in order, just not the shift. This time instead of trying every shift with brute force (Why wouldn't that work as well as it did in the Caesar lab) we plot the frequencies of the encoded letters and compare it to the normal frequencies. MatLab has a simple way of plotting data. To compare to plots we will use the subplot and bar commands. Assume the frequencies of the encoded message are stored in  freqs and that the normal frequencies are stored in an array called normal. If we wanted to plot the first row to figure out the first letter of the code word we would use the commands

subplot(2,1,1)
bar(normal)
subplot(2,1,2)
bar(freqs(1,1:26))

This produces a plot similar to the following

The top is the plot of letters as the normally occur in English. The bottom is the plot of frequencies for every fifth letter in the encoded message starting at 1. To determine the letter of the code word we need to determine which letter A is encoding to. Imagine sliding the bottom plot to the left or right to match up witht the top. One of the easiest ways to do this is look for the peak, which should be 'E' and then check to see if another fairly tall peak is 4 columns to the right. If so then that peak 4 columns to the right is most likely 'A' . Look at the bottom plot. the highest peak is on L. Sure enough 4 columns to the right is a peak that is the third or possibly forth highest. Most likely this is the 'A'. That peak is at letter 'H' so the first letter in the code word is 'H'. Take a look at the plot of the second row of freqs. Can you determine this letter of the code word?

Once you have the code word figured out you can write a function that takes the code word and the message and decodes it with the Vigenere square.

Files for this lab.

Description File
A pregenerated Vigenere square. To load simply say

load vinsquare.txt

You do not have to use the loadtext function.

vinsquare.txt
The first message encoded with the Vigenere square. You are certain the code length is 5. vinEncoded1.txt
The second message encoded with the Vigenere square. You are certain the code length is 5. vinEncoded2.txt
Function for loading text loadtext.m