8.4. Cryptographic Checksums
Alice wants to send Bob a message of n bits. She wants Bob to be able to verify that the message he receives is the same one that was sent. So she applies a mathematical function, called a checksum function, to generate a smaller set of k bits from the original n bits. This smaller set is called the checksum or message digest. Alice then sends Bob both the message and the checksum. When Bob gets the message, he recomputes the checksum and compares it with the one Alice sent. If they match, he assumes that the message has not been changed.
|
EXAMPLE:
The parity bit in the ASCII representation is often used as a single-bit checksum. If odd parity is used, the sum of the 1-bits in the ASCII representation of the character, and the parity bit, is odd. Assume that Alice sends Bob the letter "A." In ASCII, the representation of "A" using odd parity is p0111101 in binary, where p represents the parity bit. Because five bits are set, the parity bit is 0 for odd parity.
When Bob gets the message 00111101, he counts the 1-bits in the message. Because this number is odd, Bob knows that the message has arrived unchanged. |
Definition 82.
A cryptographic checksum function (also called a strong hash function or a strong one-way function) h: A B is a function that has the following properties. For any x A, h(x) is easy to compute. For any y B, it is computationally infeasible to find x A such that h(x) = y. It is computationally infeasible to find x, x` A, such that x | x` and h(x) = h(x`). (Such a pair is called a collision.)
The third requirement is often stated as:
Given any x A, it is computationally infeasible to find another x` A such that x | x` and h(x`) = h(x).
However, properties 3 and 4 are subtlely different. It is considerably harder to find an x` meeting the conditions in property 4 than it is to find a pair x and x` meeting the conditions in property 3. To explain why, we need to examine some basics of cryptographic checksum functions.
Given that the checksum contains fewer bits than the message, several messages must produce the same checksum. The best checksum functions have the same number of messages produce each checksum. Furthermore, given any message, the checksum it produces can be determined only by computing the checksum. Such a checksum function acts as a random function.
The size of the output of the cryptographic checksum is an important consideration owing to a mathematical principle called the pigeonhole principle.
Definition 83.
The pigeonhole principle states that if there are n containers for n + 1 objects, at least one container will hold two objects. To understand its application here, consider a cryptographic checksum function that computes hashes of three bits and a set of files each of which contains five bits. This yields 23 = 8 possible hashes for 25 = 32 files. Hence, at least four different files correspond to the same hash.
Now assume that a cryptographic checksum function computes hashes of 128 bits. The probability of finding a message corresponding to a given hash is 2128, but the probability of finding two messages with the same hash (that is, with the value of neither message being constrained) is 264 (see Exercise 20).
Definition 84.
A keyed cryptographic checksum function requires a cryptographic key as part of the computation. A keyless cryptographic checksum does not.
|
EXAMPLE:
The DES in CBC mode can be used as a message authentication code if 64 bits or fewer are required. The message is enciphered, and the last n bits of the last output are the cryptographic hash. Because the DES requires a cryptographic key, this checksum function (called DES-MAC) is a keyed cryptographic checksum function. Because the DES is vulnerable to attack, so is this checksum technique. Furthermore, because the hash is at most 64 bits, finding two inputs that produce the same output would require 232 messages. |
Examples of keyless hash functions include MD2 [489]; MD4 [753]; MD5 [754]; the Secure Hash Algorithm (SHA-1) which produces 160-bit checksums [664, 663]; Snefru (either 128-bit or 256-bit checksums) [622]; and HAVAL, which produces checksums of 128, 160, 192, 224, and 256 bits [963]. Of these, Snefru is vulnerable to differential cryptanalysis if four rounds or fewer are used [92], so Merkle recommends using at least eight passes. Dobbertin devised a method of generating collisions in MD4 [274]; a similar method also works against MD5 but is slower [273].
8.4.1. HMAC
HMAC is a generic term for an algorithm that uses a keyless hash function and a cryptographic key to produce a keyed hash function [531]. This mechanism enables Alice to validate that data Bob sent to her is unchanged in transit. Without the key, anyone could change the data and recompute the message authentication code, and Alice would be none the wiser.
The need for HMAC arose because keyed hash functions are derived from cryptographic algorithms. Many countries restrict the import and export of software that implements such algorithms. They do not restrict software implementing keyless hash functions, because such functions cannot be used to conceal information. Hence, HMAC builds on a keyless hash function using a cryptographic key to create a keyed hash function.
Let h be a keyless hash function that hashes data in blocks of b bytes to produce a hash l bytes long. Let k be a cryptographic key. We assume that the length of k is no greater than b; if it is, use h to hash it to produce a new key of length b. Let k` be the key k padded with bytes containing 0 to make b bytes. Let ipad be a sequence of bytes containing the bits 00110110 and repeated b times; let opad be a similar sequence with the bits 01011100. The HMAC-h function with key k for message m is
HMAC-h(k, m) = h(k´ opad || h(k´ ipad || m))
where is exclusive or and || is concatenation.
Bellare, Canetti, and Krawczyk [65] analyze the security of HMAC and conclude that the strength of HMAC depends on the strength of the hash function h. Various HMAC functions are used in Internet security protocols (see Chapter 10).
|