3.8.2 What is a hash function? What is a message digest?

Next: 3.8.3 What are MD2, Up: 3.8 Misceallenous Previous: 3.8.1 What is the

3.8.2 What is a hash function? What is a message digest?

A hash function is a computation that takes a variable-size input and returns a fixed-size string, which is called the hash value. If the hash function is one-way, i.e., hard to invert, it is also called a message-digest function, and the result is called a message digest. The idea is that a digest represents concisely the longer message or document from which it was computed; one can think of a message digest as a ``digital fingerprint'' of the larger document. Examples of well-known hash functions are MD4, MD5, and SHS (see Questions 3.8.3 and 3.8.4).

Although hash functions in general have many uses in computer programs, in cryptography they are used to generate a small string (the message digest) that can represent securely a much larger string, such as a file or message. Since the hash functions are faster than the signing functions, it is much more efficient to compute a digital signature using a document's message digest, which is small, than using the arbitrarily large document itself. Additionally, a digest can be made public without revealing the contents of the document from which it derives. This is important in digital time-stamping, where, using hash functions, one can get a document time-stamped without revealing its contents to the time-stamping service (see Question 3.3.18).

A hash function used for digital authentication must have certain properties that make it secure enough for cryptographic use. Specifically, it must be infeasible to find a message that hashes to a given value and it must be infeasible to find two distinct messages that hash to the same value. The ability to find a message hashing to a given value would enable an attacker to substitute a fake message for a real message that was signed. It would also enable someone to falsely disown a message by claiming that he or she actually signed a different message hashing to the same value, thus violating the non-repudiation property of digital signatures. The ability to find two distinct messages hashing to the same value could enable an attack whereby someone is tricked into signing a message which hashes to the same value as another message with a quite different meaning. The digest must therefore be long enough to prevent an attacker from doing an exhaustive search for a collision. For example, if a hash function produces 100-bit strings, exhaustive search would take 2¹⁰⁰ attempts on average to match a given value, and approximately 2⁵⁰ attempts on average to find two inputs producing the same digest.

A digital signature system can be broken by attacking either the difficult mathematical problem on which the signature method is based or the hash function used to create the message digests. When choosing an authentication system, it is generally a good idea to choose a signature method and a hash function that require comparable efforts to break; any extra security in one of the two components is wasted, since attacks will be directed at the weaker component. Actually, attacking the hash function is harder in practice, since it requires a large amount of memory and the ability to trick the victim into signing a special message. With 2⁶⁴ operations, an attacker can find two messages that hash to the same digest under any of the MD hash functions; this effort is comparable to that necessary to break 512-bit RSA; thus MD5 is a good choice when using RSA with a 512-bit modulus. However, those with greater security needs, such as certifying authorities, should use a longer modulus and a hash function that produces a longer message digest; either SHS (160-bit digest) or a modified version of MD4 that produces a 256-bit digest would suffice.

Next: 3.8.3 What are MD2, Up: 3.8 Misceallenous Previous: 3.8.1 What is the

Denis Arnaud
12/19/1997