# Cryptanalysis of the Bifid cipher

For a recap of how the Bifid cipher works, see here.

## Determining the Period §

The first thing we have to do is determine the period. It turns out that estimating the period is much easier when the period is even, when it is odd we need more ciphertext to get good results. This depends on calculating the bigram distribution for our ciphertext with different 'steps'. An example will make it clearer, assume our ciphertext is as follows:

ABCDEFGHIJK

- Bigrams with a step of zero will be the following:
`AB, BC, CD, DE, EF, FG`etc. - Bigrams with a step of one will be the following:
`AC, BD, CE, DF, EG, FH`etc. - Bigrams with a step of two will be the following:
`AD, BE, CF, DG, EH, FI`etc.

and so forth. We calculate the bigrams with steps up to 20 or so. Then we calculate the variance of the bigram counts. If we got a bigram count as follows:

AB: 3, SE: 1, ER: 1, DF: 2, CD: 1

Then the variance would be 0.64 which is just the variance of the counts on their own. If we calculate the variance of the bigram counts for a slightly longer ciphertext (about 400 characters) with a period of 12, we get the following plot:

There is a clear spike at 6, which is half of 12. If we analyse a ciphertext with a length of 11, we get the following:

Notice the peak is now spread between 5 and 6. If the ciphertext is short, it may not be possible to use this method to determine the period, in which case you just have to assume the period is something, then try to determine the key-square under that assumption. If you can't crack it, then try a new period until you get something that works.

## Determining the Key-square §

Once the period is known, we have to determine the 5 by 5 polybius square used to encipher the message. We will use a Simulated Annealing algorithm to reveal it. This approach is more fully described on the page cryptanalysis of the playfair cipher, which is broken using the same technique. In short, Simulated Annealing involves making small changes to an initial key and measuring the fitness of the deciphered text. If the fitness improves, we keep the modified key, if not we keep the key with a certain small probability. We measure the fitness using quadgram probablities.

## C Code for breaking Bifid §

You can create your own Bifid ciphers to crack here.

I have gone with a c implementation for this example, because python gets a little bit slow. Compile it using:

gcc -O3 -lm bifidcrack.c scoreText.c -o bifid

then run using the command:

./bifid

The program will continue to run until it is killed by the user. Note that you must specify the period in the code as well,
if you don't know the period then you can use a method such as the one described earlier on this page to guess it, or you can just try all
possible periods until the correct one is found. To put in a different ciphertext, change the
`char cipher[]` variable, you can encipher your own text here. Remember that very short ciphertexts (less than 80 characters)
may not be solvable, 200 or so characters is good). The quadgram statistics are stored in the `qgr.h` file, which is used by `scoreText.c`. For longer ciphertexts you may need to increase the TEMP variable.

- bifidcrack.c
- scoreText.c
- scoreText.h
- qgr.h.zip Used by scoreText.c (extract to use)