Breaking the Vigenère Cipher

The Vigenère cipher is a very known cipher for centuries, you can read more about it from here. It uses a series of Caesar ciphers to encrypt the text. Sometime ago I came across a challenge in breaking the Vigenère cipher. But this was a variant of a Vigenère cipher which uses XOR gate instead of normal polyalphabetic substitution. Here is the encryption program. In a challenge you won’t get the encryption program but however you can code it.


In Python:

In the above code we will open the plain text file and start XORing each byte with the key. We won’t XOR new lines since the complexity would be high.
I would be very grateful to Dimitrios Kalemis for teaching me this method of breaking the Vigenere cipher 🙂
I will be using Excel and VBA for statistical analysis. Let’s try to break the following cipher text:

FB50F7C621B40EC24CB2C63BA80EC75EFCD526AC49CE1FFDD473B946CE1FFBDF32AA47C55EE6DB3CA30ECA51F69227A54B8B4FF3C120A441C54CBC921AB90ED95AFED327A85D8B4BFD9224A54FDF5AE4D721ED49C249F7C173A443C65AF6DB32B94B8B4FFED732BE5BD95AB2DD21ED5ECA56FC9227A20EDF57F7923BB843CA51B2DF3AA34A851FDBC673AE41C65AE1923BA243CE1FE6DD73B946CE1FF0DD20A243D81FF3DC37ED4CDE4CFBDC36BE5DCE4CB2DD35ED43CE51A99235A25C8B51FDC63BA440CC1FF0C727ED59C35EE69230A243CE4CB2DA3CA04B8B4BFD9227A54BC61FFBDC73B946CE1FFFDD20B90ECC5AFCD721AC428B5EFCD673A440DF5AFEDE3AAA47C953F79220A54FDB5AB2D132A30EC95AB2D373BE5BC955F7D127ED48C44DB2C23CA85AD946BC9203A24BDF4DEB923ABE0EDF57F79226A347DD5AE0C132A10EC75EFCD526AC49CE1FE5DA3AAE468B4BFAD773A54BCA4DE6923BA242CF4CB2C53AB9468B51F3C626BF4B8B5EFCD673A45AD85AFED47DED66CE1FE5DA3CED46CA4CB2D373AE41C54BF7DF23B90ECD50E09223A24BDF4DEB9230AC40C550E6923BAC58CE1FFFC730A50ED95AE1C236AE5A8B59FDC073A547C64CF7DE35E10EC44DB2D43CBF0ECA51EBC63BA440CC1FF7DE20A8008B68FAD721A858CE4DB2C63BA85CCE1FFBC173AC0ED85AFCC136ED41CD1FF0D732B85AD213B2DD21ED5EC448F7C07FED41D91FFAD321A041C546BE9232BE0EC251B2C63BA80EC650E6DB3CA30EC459B2D373BA4FDD5AB2DD35ED5AC35AB2C136AC028B56FC9227A54B8B58E0DD24B9468B50F49232ED48C750E5D721E10EDF57F7C036ED47D81FE2DD36B95CD21FFBDC73A45AD81FF0DB21B94685

We perform statistical analysis in each column. For place the bytes in columns corresponding to length of the key. Since the key length is assumed to be something in between 1 and 14 we have to write 14 macros to place the bytes in order. After that we sort the bytes in the highest occurring order. If we find columns with all bytes occurring few time then it’s not a valid key length. If we find columns where only few bytes occur many times that means it may correspond to the frequently occurring letters such as a space or the letter “e” in the English alphabet.
We can write a macro for excel to place the cipher text on columns ranging from 1 to 14.

Change the value in here “If (col = 8)” to change the column count. Sort the cipher text in each column and you will see the most occurring few bytes. They should be equal to spaces in the plain text. In my cipher text in we can guess the key length should be equal to 7 since we find the most occurring bytes when the column count is 7. So we XOR the most occurring bytes with 0x20 in each column and find the key. Through trial and error we have to test this. Sometimes in some columns different bytes can be frequently occur but still you can guess it ? In my case the most occurring bytes in each column are ‘8b’, ‘1f’, ‘b2′, ’92’, ’73’, ‘ed’, ‘0e’. Here is a small program that would XOR the given bytes with 0x20 which is space and provide us the key bytes.

Column Cipher character corresponding to space Key value = cipher character XOR Hex 20 (space)
1 8B 0x8B XOR 0x20 = 0xAB
2 1F 0x1F XOR 0x20 = 0x3F
3 B2 0xB2 XOR 0x20 = 0xB2
4 92 0x92 XOR 0x20 = 0xB2
5 73 0x73 XOR 0x20 = 0x53
6 ED 0xED XOR 0x20 = 0xCD
7 0E 0x0E XOR 0x20 = 0x2E

We can decrypt the cipher text using Excel macros as follows or perhaps write a program to do it 🙂

In C

In python

So this is the plain text 🙂

Poetry is the language of the imagination and the passions. It relates to whatever gives immediate pleasure or pain to the human mind. It comes home to the bosoms and businesses of men; for nothing but what comes home to them in the most general and intelligible shape can be a subject for poetry. Poetry is the universal language which the heart holds with nature and itself. He who has a contempt for poetry cannot have much respect for himself, or for anything else. Wherever there is a sense of beauty, or power, or harmony, as in the motion of a wave of the sea, in the growth of a flower, there is poetry in its birth.

This method of breaking is easy, but there are more other ways you could break. In cryptanalysis of the Vigenère cipher there are two main steps. First we try to find the length of the key by calculating the index of coincidence

$latex I.C = \frac{\sum_{i = A}^{i = Z}f_i{(f_i-1)} } {N(N-1)}&bg=232225&fg=bbbbbb&s=4$
To identify the specific key we use the Chi-Squared statistic.

$latex X^{2}(C,E) =\sum_{i = A}^{i = Z}\frac{(C_{i}- E_{i}) ^{2} }{E_{i} }&bg=232225&fg=bbbbbb&s=4$
Here is a full implementation using python which uses the above two statistics and break the cipher.

Special thanks to hasherezade for helping me with coding.

Thanks for reading!

4 thoughts on “Breaking the Vigenère Cipher

Leave a Reply