Computer Security lecture notes Copyright © 2004 Mark Dermot Ryan
The University of Birmingham
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,

Public key cryptography

Recap on computer security

Some concerns of computer security are:
Symmetric key encryption helps provide the first one, but leaves us with the problem of how to communicate the key securely. Hash functions give us a way to verify data integrity: I recalculate the hash and check it.  But how can I be sure that the one I am checking against is the right one? Public-key cryptography addresses both these problems. It also gives us a solution to the fourth issue, user authenticity.

The problem with symmetric-key encryption

Alica and Bob want to communicate securely. As usual, we assume all the messages are potentially overheard, and may be intercepted and replaced with fakes. A & B can communicate securely using symmetric key crypto. They agree on a key in advance, and all messages between them are encrypted using that key. Lack of knowledge of the key prevents intruders from making use of overheard messages, or introducing convincing fake messages. The intruder can prevent communication by intercepting messages and/or inserting garbage, but cannot change it in a way which would fool A & B.

The problem is: how can A&B agree the key in advance? They might use a different channel for the key, such as a telephone call, a meeting, etc. They can even split the key into 5 chunks and send each chunk by a separate channel. However, these options are ridiculously insecure (if the potential intruder has millions of dollars of resources) and ridiculously inconvenient if Alice is just buying a book from Bob (say, Amazon).

Public-key encryption

In public-key encryption, there are two keys: one is used to encrypt, called the public key (PK), and the other one is used to decrypt, called the private key or secret key (PK-1).
ciphertext = encrypt( plaintext, PK )
plaintext = decrypt( ciphertext, PK-1 )

An alternative notation is to write
ciphertext = { plaintext }PK
plaintext = { ciphertext }PK-1
provided that (as is typically the case), encryption and decryption are actually the same operation (only using different keys).

Alice lets her public key be known to everyone, but keeps the private key secret. Bob may send a confidential message to Alice like this:
  1. B gets A's public key (he can get it from her web page, or she can just send it to him).
  2. B encrypts the message with A's public key, and sends it.
  3. A decrypts the message with her private key.
For this to work, the system must guarantee that it is (effectively) impossible to decrypt the message without knowledge of the private key. In particular, it must be impossible to decrypt using the public key, or to derive the private key from the public key.

There are many algorithms for public key encryption. They all rely on it being computationally infeasible for the intruder to reverse the encryption by the public key, even though he has enough information to do it. Because he has all the information he needs, public key crypto appears to be less secure than symmetric key crypto. In symmetric key crypto, the cryptanalyst cannot be sure he has found the right plain text. (If it is really a "plain text" in ASCII, he can recognise it; but what if it is just some binary?) In public key crypto, he can be absolutely sure it is the right plaintext, if he can find it (to verify it, he simply encrypts it with the public key and compares). However, PK crypto is secure provided it is computationally unfeasible to reverse the encryption.

The concept of PK crypto was invented by Whitfield Diffie and Martin Hellman, and independently by Ralph Merkle. Ralph Merkle and Martin Hellman invented the first known public key crypto system in 1974 (see [1]). It was based on puzzles that were easier to solve if you knew the private key, than if you didn't. But the security of their system was eroded; circumstances were found in which parts of the encryption could be reversed, in a sequence of results spanning 10 years. Finally, Adi Shamir found a general way to break it in 1982, but by then there were many other PK crypto systems. In recent years, GCHQ has revealed that its James Ellis had the concept of PK in 1965 and its Clifford Cocks had an algorithm in 1973, but were unable to publish them because of GCHQ classification.

Encrypting a session key

Typically, public-key crypto algorithms are much slower than symmetric key algorithms. Therefore, we usually use PK crypto just to communicate a symmetric key for the current session, and then use that session key for the encryption of the messages we want to transmit.

Signing with public keys

PK crypto can be used for digitally signing messages, too.
signedMessage = { message }PK-1

This can only be done by the owner of the keypair (PK, PK-1). But anyone can verify the signature, by computing
message = { signedMessage }PK

Digital signatures are better than hand-written ones, because they guarantee integrity of the message.

It is not necessary to sign the complete message. Sending the plaintext message together with a signed hash of it is enough. This means that the signature part is a fixed length independent of the message size.

Key security and key distribution

When you generate a key pair, you need to keep the private key part secret. You can and should distribute the public key part widely. If Alice wants to send an encrypted message to Bob, she will encrypt it using his public key. But how can she reliably find out his public key? If an intruder I could persuade Alice that his (I's) key was Bob's key, then he could launch a man-in-the-middle attack. When Alice sends a message, he decrypts it, reads it, encrypts it with Bob's real key, and sends it on. Neither A nor B will be aware of his attack. This attack will also enable I apparently to sign messages on B's behalf. Since everyone thinks I's public key is B's public key, they will accept as genuine messages from Bob messages which are signed by I's private key.

To persuade A that I's key is B's key, I can hack into B's website or wherever else B stores his key, and put his own key there. So it would be better if B gave his key in person to A, so that A can really be sure it is his key.

RSA

RSA was invented in 1978 by Ron Rivest, Adi Shamir, and Leonard Adleman. It gets its security from the difficulty of factoring large numbers.  RSA is just one algorithm for public-key crypto. There are many others. But some of them have been found insecure, and others of them are impractical, because the keys are too long or the ciphertext is much longer than the plain text. Still others are good only for encryption, or only for signing. According to Schneier, only three algorithms are thought secure, practical, and good for both signing and encrypting. They are RSA, ElGamal, and Rabin's algorithm. We study RSA.

Some mathematical background

A number is prime if it can be divided only by itself and 1. For example, 7, 11, 101 are prime, but 15 is not prime because it can be divided by 3, and 1111 is not because it can be divided by 11. There are infinitely many prime numbers, or, to put it another way, you can find prime numbers as large as you like. Two numbers are relatively prime if they have no common factors except 1. For example, 15 and 49 are relatively prime, even though neither of them is prime. Obviously, any prime number is relatively prime to all numbers except its multiples.

How many relatively prime numbers to n are there, which are less than n? This number is called "Euler's totient" and is written phi(n). If n=57, all 56 of the numbers less than n are relatively prime to n, because n is prime. So phi(57)=56. Fact: if n is the composite pq, where p and q are prime, then phi(n) = (p-1)(q-1). For example, suppose n=35, which is 5*7. Then phi(n)=4*6=24, so there are relative primes to n below n, namely {1,2,3,4,6,8,9,11,12,13,16,17,18,19,22,23,24,26,27,29,31,32,33,34}.

Modulo arithmetic. In "mod n" arithmetic, all numbers are reduced to their remainder on division by n. For example, we are used to working in "mod 256", where (for example) 250 + 10 = 4 (mod 256), and 100 * 4 = 144 (mod 256), and 1002 = 16, 1003=64, 1004=0 (mod 256).  Fact: aphi(n) = 1 (mod n)

Running RSA

Generating the public and private keys. Pick two large prime numbers, p and q. Let n=pq. Typically, n is a 1024 bit number. Pick e relatively prime to (p-1)(q-1). Now find d such that ed=1 mod (p-1)(q-1). There is an algorithm which will find this d for you. The pair of numbers (e, n) is the public key. The pair of numbers (d, n) is the private key. The two primes p,q are no longer needed, and can be discarded, but should never be revealed.

Exercise. Which of the following key pairs are valid?
  1. K=(3,99), K-1=(27,99)
  2. K=(7,187), K-1=(23,187)
  3. K=(23,187), K-1=(7,187)
  4. K=(7,143), K-1=(23,143)
Can you invert the key (7,147)?

Message format. Divide the message into blocks, each block corresponding to a number less than n. For example, for binary data, the blocks will be (log2 n) bits.
Encryption. The encryption of message m is  c = me mod n.
Decryption. To decrypt c,  put m' = cd mod n.

Why it works.

m'
=
cd
(mod n)
 
by definition of decryption

=
med
(mod n)
by definition of encryption

=
mk(p-1)(q-1) + 1
(mod n)
since ed = k(p-1)(q-1) + 1, some k

=
mk phi(n) + 1
(mod n)
Fact 1

=
(mphi(n) )k m  (mod n)
elementary equivalences

=
m
(mod n)
Fact 2

Exercise. Encode the message 88 with the key (7,187). (Details [2],p.271, or similar calculation [1],p.468.)

How to break RSA.
Since the intruder has e and n, all she has to do is: factor n into its primes p,q (there is only one way of doing that). Next, find d such that ed = 1 mod (p-1)(q-1), using the same algorithm as the encrypter used. Then, decrypt the message using d. Yes, this works, and if you can do it, you can break RSA. The only difficulty is in the first step: finding p,q such that pq=n. If n is a 1024 bit number (i.e. 308 decimal digits) then it is likely to take you 10 million mips-years [1, p.161]. A 1GHz Pentium has about 500 mips power, so it will take 20,000 years. Nevertheless, Schneier recommends moving to 2048 bit keys in 2005. Such a number requires 1014 mips-years to factor. If you harnessed the power of 100 million computers on the internet (each providing 500 mips) on the internet, you could factor it in about 2000 years.

That is the brute-force way of attacking RSA. But perhaps there are other more subtle ways, such as finding new theorems about number theory which would help you factor large numbers. RSA has withstood decades of extensive cryptanalysis which suggests that we can be confident in its security, but there is no proof. (Presumably, someone who discovered a crack for RSA would not reveal it.) Advances in number theory, in computational complexity, and in computer technology, will all continue to erode confidence in RSA by improving our ability to factor large numbers, although every time we increase the key length by 30 bits we make this a billion times harder [2^30=1billion]. There is some possibility that a quantum computer could be built to factorise numbers (IBM research article).

Other ways to break RSA.  Broadly speaking, there are three ways:
  1. Break the crypto, either by computer technology or mathematical insight; see above.
  2. Break the way the crypto is used; see next lecture;
  3. Extract the private key by bribery, blackmail, extortion, etc. (This way is the most popular because it is the most reliable.)

References

[1] Bruce Schneier, Applied Cryptography. Second Edition, J. Wiley and Sons, 1996.
[2] William Stallings, Cryptography and Network Security, Principles and Practice,  Prentice Hall, 1999.