8.2. Classical Cryptosystems
Classical cryptosystems (also called single-key or symmetric cryptosystems) are cryptosystems that use the same key for encipherment and decipherment. In these systems, for all Ek C and k K, there is a Dk D such that Dk = Ek1.
|
EXAMPLE:
The Caesar cipher discussed earlier had a key of 3, so the enciphering function was E3. To decipher "KHOOR," we used the same key in the decipherment function D3. Hence, the Caesar cipher is a classical cipher. |
There are two basic types of classical ciphers: transposition ciphers and substitution ciphers.
8.2.1. Transposition Ciphers
A transposition cipher rearranges the characters in the plaintext to form the ciphertext. The letters are not changed.
|
EXAMPLE:
The rail fence cipher is composed by writing the plaintext in two rows, proceeding down, then across, and reading the ciphertext across, then down. For example, the plaintext "HELLO, WORLD" would be written as:
HLOOL ELWRD
resulting in the ciphertext "HLOOLELWRD." |
Mathematically, the key to a transposition cipher is a permutation function. Because the permutation does not alter the frequency of plaintext characters, a transposition cipher can be detected by comparing character frequencies with a model of the language. If, for example, character frequencies for 1-grams match those of a model of English, but 2-gram frequencies do not match the model, then the text is probably a transposition cipher.
Attacking a transposition cipher requires rearrangement of the letters of the ciphertext. This process, called anagramming, uses tables of n-gram frequencies to identify common n-grams. The cryptanalyst arranges the letters in such a way that the characters in the ciphertext form some n-grams with highest frequency. This process is repeated, using different n-grams, until the transposition pattern is found.
|
EXAMPLE:
Consider the ciphertext "HLOOLELWRD." According to a Konheim's digram table [527], the digram "HE" occurs with frequency 0.0305 in English. Of the other possible digrams beginning with "H," the frequency of "HO" is the next highest, at 0.0043, and the digrams "HL," "HW," "HR," and "HD" have frequencies of less than 0.0010. Furthermore, the frequency of "WH" is 0.0026, and the digrams "EH," "LH," "OH," "RH," and "DH" occur with frequencies of 0.0002 or less. This suggests that "E" follows "H." We arrange the letters so that each letter in the first block of five letters (from "H" up to but not including the "E") is adjacent to the corresponding letter in the second block of five letters, as follows.
HE LL OW OR LD
Reading the letters across and down produces "HELLOWORLD." Note that the shape of the arrangement is different from that in the previous example. However, the two arrangements are equivalent, leading to the correct solution. |
8.2.2. Substitution Ciphers
A substitution cipher changes characters in the plaintext to produce the ciphertext.
|
EXAMPLE:
The Caesar cipher discussed earlier had a key of 3, altering each letter in the plaintext by mapping it into the letter three characters later in the alphabet (and circling back to the beginning of the alphabet if needed). This is a substitution cipher. |
A Caesar cipher is susceptible to a statistical ciphertext-only attack.
|
EXAMPLE:
Consider the ciphertext "KHOOR ZRUOG." We first compute the frequency of each letter in the ciphertext:
G | 0.1 | H | 0.1 | K | 0.1 | O | 0.3 | R | 0.2 | U | 0.1 | Z | 0.1 |
We now apply the character-based model. Let f(i) be the correlation of the frequency of each letter in the ciphertext with the character frequencies in English (see Figure 8-1). Let f(c) be the frequency of character c (expressed as a fraction). The formula for this correlation for this ciphertext (with all arithmetic being mod 26) is
f (i) = S0 c 25 f(c)p(c i) = 0.1p(6 i) + 0.1p(7 i) + 0.1p(10 i) + 0.3p(14 i) + 0.2p(17 i) + 0.1p(20 i) + 0.1p(25 i)
This correlation should be a maximum when the key k translates the ciphertext into English. Figure 8-2 shows the values of this function for the values of i. Trying the most likely key first, we obtain as plaintext "EBIIL TLOIA" when i = 6, "AXEEH PHKEW" when i = 10, "HELLO WORLD" when i = 3, and "WTAAD LDGAS" when i = 14.

|
The example above emphasizes the statistical nature of this attack. The statistics indicated that the key was most likely 6, when in fact the correct key was 3. So the attacker must test the results. The statistics simply reduce the number of trials in most cases. Only three trials were needed, as opposed to 13 (the expected number of trials if the keys were simply tried in order).
|
EXAMPLE:
Using Konheim's model of single-character frequencies [527], the most likely keys (in order) are i = 6, i = 10, i = 14, and i = 3. Konheim's frequencies are different than Denning's, and this accounts for the change in the third most probable key. |
8.2.2.1 Vigenère Cipher
A longer key might obscure the statistics. The Vigenère cipher chooses a sequence of keys, represented by a string. The key letters are applied to successive plaintext characters, and when the end of the key is reached, the key starts over. The length of the key is called the period of the cipher. Figure 8-3 shows a tableau, or table, to implement this cipher efficiently. Because this requires several different key letters, this type of cipher is called polyalphabetic.

|
EXAMPLE:
The first line of a limerick is enciphered using the key "BENCH," as follows.
Key | B ENCHBENC HBENC HBENCH BENCHBENCH | Plaintext | A LIMERICK PACKS LAUGHS ANATOMICAL | Ciphertext | B PVOLSMPM WBGXU SBYTJZ BRNVVNMPCS |
|
The index of coincidence measures the differences in the frequencies of the letters in the ciphertext. It is defined as the probability that two randomly chosen letters from the ciphertext will be the same. Let Fc be the frequency of cipher character c, and let N be the length of the ciphertext. It can be shown (see Exercise 7) that the index of coincidence . Figure 8-4 shows the expected values of IC for several periods. The lower the index of coincidence, the less variation in the characters of the ciphertext and (from our model of English) the longer the period of the cipher.

For many years, the Vigenère cipher was considered unbreakable. Then a Prussian cavalry officer named Kasiski noticed that repetitions occur when characters of the key appear over the same characters in the ciphertext. The number of characters between the repetitions is a multiple of the period.
|
EXAMPLE:
Let the message be THE BOY HAS THE BAG and let the key be VIG. Then:
Key | VIGVIGVIGVIGVIG | Plaintext | THEBOYHASTHEBAG | Ciphertext | OPKWWECIYOPKWIM |
In the ciphertext, the string OPK appears twice. Both are caused by the key sequence VIG enciphering the same ciphertext, THE. The ciphertext repetitions are nine characters apart. Hence, 9 is a multiple of the period (which is 3 here). |
We examine the ciphertext for multiple repetitions and tabulate their length and the number of characters between successive repetitions. The period is likely to be a factor of the number of characters between these repetitions. From the repetitions, we establish the probable period, using the index of coincidence to check our deduction. We then tabulate the characters for each key letter separately and solve each as a Caesar cipher.
|
EXAMPLE:
Consider the Vigenère cipher
ADQYS | MIUSB | OXKKT | MIBHK | IZOOO | EQOOG | IFBAG | KAUMF | VVTAA | CIDTW | MOCIO | EQOOG | BMBFV | ZGGWP | CIEKQ | HSNEW | VECNE | DLAAV | RWKXS | VNSVP | HCEUT | QOIOF | MEGJS | WTPCH | AJMOC | HIUIX | | | | | | |
Could this be a Caesar cipher (which is a Vigenère cipher with a key length of 1)? We find that the index of coincidence is 0.043, which indicates a key of length 5 or more. So we assume that the key is of length greater than 1, and apply the Kasiski method. Repetitions of two letters or more are as follows
Letters | Start | End | Gap length | Factors of gap length |
|---|
MI | 5 | 15 | 10 | 2, 5 | OO | 22 | 27 | 5 | 5 | OEQOOG | 24 | 54 | 30 | 2, 3, 5 | FV | 39 | 63 | 24 | 2, 2, 2, 3 | AA | 43 | 87 | 44 | 2, 2, 11 | MOC | 50 | 122 | 72 | 2, 2, 2, 3, 3 | QO | 56 | 105 | 49 | 7, 7 | PC | 69 | 117 | 48 | 2, 2, 2, 2, 3 | NE | 77 | 83 | 6 | 2, 3 | SV | 94 | 97 | 3 | 3 | CH | 118 | 124 | 6 | 2, 3 |
The longest repetition is six characters long; this is unlikely to be a coincidence. The gap between the repetitions is 30. The next longest repetition, MOC, is three characters long and has a gap of 72. The greatest common divisor of 30 and 72 is 6. Of the 11 repetitions, six have gaps with a factor of 6. The only factors that occur more in the gaps are 2 (in eight gaps) and 3 (in seven gaps). As a first guess, let us try 6.
To verify that this is reasonable, we compute the index of coincidence for each alphabet. We first arrange the message into six columns.
A | D | Q | Y | S | M | I | U | S | B | O | X | K | K | T | M | I | B | H | K | I | Z | O | O | O | E | Q | O | O | G | I | F | B | A | G | K | A | U | M | F | V | V | T | A | A | C | I | D | T | W | M | O | C | I | O | E | Q | O | O | G | B | M | B | F | V | Z | G | G | W | P | C | I | E | K | Q | H | S | N | E | W | V | E | C | N | E | D | L | A | A | V | R | W | K | X | S | V | N | S | V | P | H | C | E | U | T | Q | O | I | O | F | M | E | G | J | S | W | T | P | C | H | A | J | M | O | C | H | I | U | I | X | | |
Each column represents one alphabet. The indices of coincidence are as follows.
Alphabet | #1: | IC | = | 0.069 | Alphabet | #4: | IC | = | 0.056 | Alphabet | #2: | IC | = | 0.078 | Alphabet | #5: | IC | = | 0.124 | Alphabet | #3: | IC | = | 0.078 | Alphabet | #6: | IC | = | 0.043 |
All indices of coincidence indicate a single alphabet except for the ICs associated with alphabets #4 (period between 1 and 2) and #6 (period between 5 and 10). Given the statistical nature of the measure, we will assume that these are skewed by the distribution of characters and proceed on the assumption that there are six alphabets, and hence a key of length 6.
Counting characters in each column (alphabet) yields:
Column | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | #1 | 3 | 1 | 0 | 0 | 4 | 0 | 1 | 1 | 3 | 0 | 1 | 0 | 0 | 1 | 3 | 0 | 0 | 1 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | #2 | 1 | 0 | 0 | 2 | 2 | 2 | 1 | 0 | 0 | 1 | 3 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 4 | 0 | 4 | 0 | 0 | 0 | #3 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 1 | 4 | 0 | 0 | 0 | 4 | 0 | 1 | 3 | 0 | 2 | 1 | 0 | 0 | 0 | #4 | 2 | 1 | 1 | 0 | 2 | 2 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 4 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 | #5 | 1 | 0 | 5 | 0 | 0 | 0 | 2 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | 3 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | #6 | 0 | 1 | 1 | 1 | 0 | 0 | 2 | 2 | 3 | 1 | 1 | 0 | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 1 | 0 | 1 |
An unshifted alphabet has the following characteristics (L meaning low frequency, M meaning moderate frequency, and H meaning high frequency).
|
|