Also as we will see, encryption schemes are randomized: if you encrypt the same message twice, you will get different results. Hence, encryption requires good sources of randomness.
The goal of Part 3 of this course is to study encryption and its role in:
Protecting data at rest on your laptop or phone
Protecting data in transit over the Internet or messaging app
The power of encryption
Historically, encryption schemes have been designed and used by governments. It played a significant role in both of the world wars in the 20th century.
Today, encryption allows for confidential data exchange over the Internet. Alice can send a chat message, email, file, or database across the internet so that only Bob can read it.
Alice
Mallory
Bob
\(\longrightarrow\)
\(\longrightarrow\)
Remember that “Mallory” is just a stand-in for whoever Alice and Bob might consider to be their adversary.
Question. In the real world, who might be trying to snoop in on their conversation?
Since the Internet is just “the world passing notes in a classroom,” this might be a more realistic picture of the Internet.
Internet communication
Even though the entities in the middle of the picture have significantly more computing power, encryption allows Alice and Bob to communicate securely because they have one thing that the others do not: a cryptographic key.
Internet communication, with encryption
As a result, encryption is a powerful tool with immense social consequences.
“Cryptography rearranges power: it configures who can do what, from what.”
– Prof. Phillip Rogaway (UC Davis)
In particular, having the secret key gives someone the power to decrypt and read a message.
Encryption on the Internet is often performed in one of two ways.
Client-server encryption
End-to-end encryption
Alice encrypts her message with the tech company’s key (and hence, they can read it). Separately, the tech company encrypts a message to Bob.
Alice encrypts her message directly to Bob. While the message passes through the server, the tech company cannot read it.
14.2 Symmetric key cryptography
It is possible to construct public key encryption based on hard math problems like factoring and discrete logarithms.
But we will not study public key encryption in this course. In fact, I’m going to make a bold claim: even though it exists, in nearly all data science applications you should not use public key encryption!
I say this for two reasons:
Public key encryption allows anyone on the Internet to send you messages. In many applications though, you know exactly who is going to send you a message, so we can build cryptosystems that leverage this fact.
Losing control of the secret key for an encryption scheme has immediate, irrecoverable consequences.
Security over time
Let’s think about what happens in practice if Alice wants to connect to a website like kaggle.com.
Alice
kaggle.com
\(\quad \Longleftrightarrow \quad\)
Question. What happens if the website loses control of its secret key?
Forward and backward secrecy
With a digital signature scheme, Mallory can forge messages as the website in the future. In response, the website can revoke its key – that is, tell the world not to trust it anymore.
With a public key encryption scheme, Mallory can read messages sent to the website in the past. There is no way to recover from this data breach.
One key, rather than two
Fortunately, we do not need something as sophisticated as public key encryption, which allows anyone to send a message to the recipient.
In this case:
Alice and the website know that they are talking to each other!
When they first establish a connection, they can use Diffie-Hellman key exchange to agree on a secret that both of them know.
So we can use a simpler primitive: a symmetric key cryptosystem, which has only one key that can turn the lock to the left and to the right.
A message authentication code allows two parties to send messages to each other with integrity. It is the symmetric-key version of a signature scheme, where each party can both sign and verify.
A symmetric key encryption allows two parties to send messages to each other with confidentiality. Each party can both encrypt and decrypt.
Looking ahead: these primitives are nearly always used together. So for convenience, we will construct a combined primitive called authenticated encryption that performs both of these tasks at once.
“Confidentiality xor authenticity is not possible. If you don’t have both, often you don’t have either.”
– Prof. Matthew Green (Johns Hopkins)
For today though, let’s focus on encryption by itself.
Goals and non-goals
Let’s discuss the objectives of symmetric key encryption.
Symmetric key encryption must:
Hide the contents of Alice and Bob’s messages
Even if the adversary Mallory has full control over the network and can add, drop, alter, or re-order packets
Optionally: even if Mallory takes over Alice or Bob’s computer for some time, we might still want to provide confidentiality at other times
However, encryption on its own is not designed to hide “metadata.”
Encryption doesn’t provide anonymity: Mallory knows that Alice and Bob are communicating
Encryption doesn’t hide message length: Mallory sees how much data is flowing across the wire
Companies that use encryption sometimes forget about this fact!
Example 1. Here is a real interaction between Sen. Ron Wyden and Match, the parent company of Tinder.
Example 2. A company called Voatz tried to build an internet-based voting system for political elections… which is actually a terrible idea for many reasons that we will discuss toward the end of the semester. But they failed at the very first step:
“The length of the encrypted packet clearly leaks which candidate was selected.”
– Michael A. Specter, James Koppel, and Daniel Weitzner (link to source)
14.3 Defining symmetric encryption
Like signatures, an encryption scheme has three methods: KeyGen, Enc, and Dec.
KeyGen: Alice and Bob jointly sample a symmetric key \(k\) from a distribution \(K\) (say, a uniformly random bytestring of fixed length)
Enc\((k, P; r) \to C\): Alice transforms plaintext message \(P\) into a ciphertext \(C\), with one-time randomness \(R\)
Dec\((k, C) \to P\): Bob transforms the ciphertext back into the original plaintext message \(P\)
Encryption satisfies two properties.
Correctness: Dec undoes the action of Enc. That is, for any key \(k\), message \(P\), and randomness \(R\) it holds that \[\text{Dec}(k, \text{Enc}(k, P; R)) = P.\]
Secrecy (informally): If an attacker Mallory sees the ciphertext \(C\) but not the symmetric key \(k\), then she has no chance of learning any part of the plaintext message \(P\).
Question. How can we make the security requirement precise? In particular:
What does it mean to learn “part” of the message \(P\)?
How can we take advantage of the randomness \(R\) when defining and constructing an encryption scheme?
Perfect secrecy
Today, we will study one way to formalize the secrecy guarantee.
Definition. An encryption scheme satisfies perfect secrecy if for any two messages \(P_0\) and \(P_1\), and for any ciphertext \(C\),
where this probability is taken over the random choice of the key \(k \gets K\).
Here is another way to think about perfect secrecy.
If KeyGen samples keys uniformly at random, then perfect secrecy says that for any two messages \(P_0\) and \(P_1\), there are the same number of keys that will map \(P_0\) to \(C\) and that will map \(P_1\) to \(C\).
As a consequence, no amount of computing power is ever enough to recover the real message! Even if Mallory had the energy of the sun, it would not help.
Even a brute force attack will fail! Or more precisely, it willl find the correct message but also all possible incorrect messages, and will have no way to detect the difference.
14.4 One-time pads
Let’s build an encryption scheme that satisfies perfect secrecy.
Recall that Mallory can add, drop, alter, or re-order Alice’s messages. This is admittedly a powerful adversary to thwart.
So for today, we will consider a simpler adversary that only acts as a passive eavesdropper but does not change the contents of Alice’s messages. We call this adversary Eve.
Question. How can Alice encode messages so Mallory cannot read them?
To improve security, give each character its own key
To make the cipher computer-friendly, let’s think about bits instead of characters.
The xor logical operation measures whether 2 inputs are identical. Given two bits \(a\) and \(b\), we can calculate their logical xor\(a \oplus b\) with this truth table.
\(\bigoplus\)
0
1
0
0
1
1
1
0
The one-time pad cipher has a simple construction: the ciphertext is the XOR of the key and the message.
OTP “masks” private message by applying Caesar cipher independently to each bit
Since XOR is a lossless function, this encryption scheme is invertible – in fact, XOR is its own inverse.
Theorem. The one-time pad satisfies perfect secrecy!
There is a proof of this theorem in Section 2.1 of Boneh-Shoup, but I encourage you to try to prove this claim on your own before looking there. Remember that perfect secrecy says that for any two plaintext messages \(P_0\) and \(P_1\) and any ciphertext \(C\),
Even though it satisfies perfect secrecy, the one-time pad is rarely used on its own in practice today for a few reasons:
The symmetric key \(k\) must be as long as the plaintext message \(P\).
Reusing the same key for two or more messages breaks perfect security, and brings back the cryptograms issue.
The cipher is malleable! While OTP is safe against a passive eavesdropper Eve, an active attacker Mallory can manipulate the ciphertext in transit and cause the same change in the decrypted plaintext.