Public key cryptography
Recap on computer security
Some concerns of computer security are:
- data privacy/confidentiality
- data integrity
- service availability
- user authenticity
Symmetric key encryption helps provide the first one, but leaves us
with the problem of how to communicate the key securely. Hash functions
give us a way to verify data integrity: I recalculate the hash and
check
it. But how can I be sure that the one I am checking against is
the right one? Public-key cryptography addresses both these problems.
It
also gives us a solution to the fourth issue, user authenticity.
The problem with symmetric-key encryption
Alica and Bob want to communicate securely. As usual, we assume all the
messages are potentially overheard, and may be intercepted and replaced
with fakes. A & B can communicate securely using symmetric key
crypto. They agree on a key in advance, and all messages between them
are encrypted using that key. Lack of knowledge of the key prevents
intruders from making use of overheard messages, or introducing
convincing fake messages. The intruder can prevent communication by
intercepting messages and/or inserting garbage, but cannot change it in
a way which would fool A & B.
The problem is: how can A&B agree the key in advance? They might
use a different channel for the key, such as a telephone call, a
meeting, etc. They can even split the key into 5 chunks and send each
chunk by a separate channel. However, these options are ridiculously
insecure (if the potential intruder has millions of dollars of
resources) and ridiculously inconvenient if Alice is just buying a book
from Bob (say, Amazon).
Public-key encryption
In public-key encryption, there are two keys: one is used to encrypt,
called the public key (PK), and the other one is used to decrypt,
called the
private key or secret key (PK-1).
ciphertext = encrypt( plaintext, PK )
plaintext = decrypt( ciphertext, PK-1
)
An alternative notation is to write
ciphertext = { plaintext }PK
plaintext = { ciphertext }PK-1
provided that (as is typically the case), encryption and decryption are
actually the same operation (only using different keys).
Alice lets her
public key be known to everyone, but keeps
the private key secret. Bob may send a confidential message to
Alice
like this:
- B gets A's public key (he can get it from her web page, or she
can just send it to him).
- B encrypts the message with A's public key, and sends it.
- A decrypts the message with her private key.
For this to work, the system must guarantee that it is (effectively)
impossible to decrypt the message without knowledge of the private key.
In particular, it must be impossible to decrypt using the public key,
or
to derive the private key from the public key.
There are many algorithms for public key encryption. They all rely on
it being computationally infeasible for the intruder to reverse the
encryption by the public key, even though he has enough information to
do it. Because he has all the information he needs, public key crypto
appears to be less secure than symmetric key crypto. In symmetric key
crypto, the cryptanalyst cannot be sure he has found the right plain
text. (If it is really a "plain text" in ASCII, he can recognise it;
but
what if it is just some binary?) In public key crypto, he can be
absolutely sure it is the right plaintext, if he can find it (to verify
it, he simply encrypts it with the public key and compares). However,
PK crypto is secure provided it is computationally unfeasible to
reverse the encryption.
The concept of PK crypto was invented by Whitfield Diffie and Martin
Hellman, and independently by Ralph Merkle. Ralph Merkle and Martin
Hellman invented the first known public key crypto system in 1974 (see
[1]). It was based on puzzles that were easier to solve if you knew the
private key, than if you didn't. But the security of their system was
eroded; circumstances were found in which parts of the encryption could
be reversed, in a sequence of results spanning 10 years. Finally, Adi
Shamir found a general way to break it in 1982, but by then there were
many other PK crypto systems. In recent years, GCHQ has revealed that
its James Ellis had the concept of PK in 1965 and its Clifford Cocks
had
an algorithm in 1973, but were unable to publish them because of GCHQ
classification.
Encrypting a session key
Typically, public-key crypto algorithms are much slower than symmetric
key algorithms. Therefore, we usually use PK crypto just to communicate
a symmetric key for the current session, and then use that session key
for the encryption of the messages we want to transmit.
Signing with public keys
PK crypto can be used for digitally signing messages, too.
signedMessage = { message }PK-1
This can only be done by the owner of the keypair (PK, PK-1).
But anyone can verify the signature, by computing
message = { signedMessage }PK
Digital signatures are better than hand-written ones, because they
guarantee integrity of the message.
It is not necessary to sign the complete message. Sending the plaintext
message together with a signed hash of it
is enough. This means that the signature part is a fixed length
independent of the message size.
Key security and key distribution
When you generate a key pair, you need to keep the private key part
secret. You can and should distribute the public key part widely. If
Alice wants to send an encrypted message to Bob, she will encrypt it
using his public key. But how can she reliably find out his public key?
If an intruder I could persuade Alice that his (I's) key was Bob's key,
then he could launch a man-in-the-middle attack. When Alice sends a
message, he decrypts it, reads it, encrypts it with Bob's real key, and
sends it on. Neither A nor B will be aware of his attack. This attack
will also enable I apparently to sign messages on B's behalf. Since
everyone thinks I's public key is B's public key, they will accept as
genuine messages from Bob messages which are signed by I's private key.
To persuade A that I's key is B's key, I can hack into B's website or
wherever else B stores his key, and put his own key there. So it would
be better if B gave his key in person to A, so that A can really be
sure it is his key.
RSA
RSA was invented in 1978 by Ron Rivest, Adi Shamir, and Leonard
Adleman. It gets its security from the difficulty of factoring large
numbers. RSA is just one algorithm for public-key crypto. There
are many others. But some of them have been found insecure, and others
of them are impractical, because the keys are too long or the
ciphertext
is much longer than the plain text. Still others are good only for
encryption, or only for signing. According to Schneier, only three
algorithms are thought secure, practical, and good for both signing and
encrypting. They are RSA, ElGamal, and Rabin's algorithm. We study RSA.
Some mathematical background
A number is prime if it can be divided only by itself and 1. For
example, 7, 11, 101 are prime, but 15 is not prime because it can be
divided by 3, and 1111 is not because it can be divided by 11. There
are
infinitely many prime numbers, or, to put it another way, you can find
prime numbers as large as you like. Two numbers are relatively prime if
they have no common factors except 1. For example, 15 and 49 are
relatively prime, even though neither of them is prime. Obviously, any
prime number is relatively prime to all numbers except its multiples.
How many relatively prime numbers to n are there, which are less than
n? This number is called "Euler's totient" and is written phi(n). If
n=57, all 56 of the numbers less than n are relatively prime to n,
because n is prime. So phi(57)=56. Fact:
if n is the composite pq, where p and q are prime, then phi(n) =
(p-1)(q-1). For example, suppose n=35, which is 5*7. Then
phi(n)=4*6=24, so there are relative primes to n below n, namely
{1,2,3,4,6,8,9,11,12,13,16,17,18,19,22,23,24,26,27,29,31,32,33,34}.
Modulo arithmetic. In "mod n"
arithmetic, all numbers are reduced to their remainder on division by
n.
For example, we are used to working in "mod 256", where (for example)
250 + 10 = 4 (mod 256), and 100 * 4 = 144 (mod 256), and 1002
= 16, 1003=64, 1004=0 (mod 256). Fact: aphi(n) = 1 (mod n)
Running RSA
Generating the public and private keys.
Pick two large prime numbers, p and q. Let n=pq. Typically, n is a 1024
bit number. Pick e relatively
prime
to (p-1)(q-1). Now find d such that ed=1 mod (p-1)(q-1). There is an
algorithm which will find this d for you. The pair of numbers (e, n) is
the public key. The pair of numbers (d, n) is the private key. The
two primes p,q are no longer needed, and can be discarded, but should
never be revealed.
Exercise. Which of the
following key pairs are valid?
- K=(3,99), K-1=(27,99)
- K=(7,187), K-1=(23,187)
- K=(23,187), K-1=(7,187)
- K=(7,143), K-1=(23,143)
Can you invert the key (7,147)?
Message format. Divide the
message into blocks, each block corresponding to a number less than n.
For example, for binary data, the blocks will be (log2 n)
bits.
Encryption. The encryption of
message m is c = me mod n.
Decryption. To decrypt c,
put m' = cd mod n.
Why it works.
m'
|
=
|
cd
|
(mod n)
|
|
by definition of decryption
|
|
=
|
med
|
(mod n) |
|
by definition of encryption
|
|
=
|
mk(p-1)(q-1) + 1
|
(mod n) |
|
since ed = k(p-1)(q-1) + 1, some
k |
|
=
|
mk phi(n) + 1
|
(mod n) |
|
Fact 1
|
|
=
|
(mphi(n) )k m |
(mod n) |
|
elementary equivalences
|
|
=
|
m
|
(mod n) |
|
Fact 2
|
Exercise. Encode the message
88 with the key (7,187). (Details [2],p.271, or similar calculation
[1],p.468.)
How to break RSA. Since the
intruder has e and n, all she has to do is: factor n into its primes
p,q
(there is only one way of doing that). Next, find d such that ed = 1
mod
(p-1)(q-1), using the same algorithm as the encrypter used. Then,
decrypt
the message using d. Yes, this works, and if you can do it, you can
break RSA. The only
difficulty
is in the first step: finding p,q such that pq=n. If n is a 1024 bit
number (i.e. 308 decimal digits) then it is likely to take you 10
million mips-years [1, p.161]. A 1GHz Pentium has about 500 mips power,
so it will take 20,000 years. Nevertheless, Schneier recommends moving
to 2048 bit keys in 2005. Such a number requires 1014
mips-years to factor. If you harnessed the power of 100 million
computers on the internet (each providing 500 mips) on the internet,
you
could factor it in about 2000 years.
That is the brute-force way of attacking RSA. But perhaps there are
other more subtle ways, such as finding new theorems about number
theory
which would help you factor large numbers. RSA has withstood decades of
extensive cryptanalysis which suggests that we can be confident in its
security, but there is no proof. (Presumably, someone who discovered a
crack for RSA would not reveal it.) Advances in number theory, in
computational complexity, and in computer technology, will all continue
to erode confidence in RSA by improving our ability to factor large
numbers, although every time we increase the key length by 30 bits we
make this a billion times harder [2^30=1billion]. There is some
possibility that a quantum computer could be built to factorise numbers
(IBM
research article).
Other ways to break RSA. Broadly
speaking, there are three ways:
- Break the crypto, either by computer technology or mathematical
insight; see above.
- Break the way the crypto is used; see next lecture;
- Extract the private key by bribery, blackmail, extortion, etc.
(This way is the most popular because it is the most reliable.)
References
[1] Bruce Schneier, Applied
Cryptography. Second Edition, J. Wiley and Sons, 1996.
[2] William Stallings, Cryptography
and Network Security, Principles and Practice, Prentice Hall, 1999.