Lecture 25: Leader Election (the political kind)

Announcements

Remaining tasks
- Read MIT Media Lab blog post)
- Challenge 5-6 resubmissions
- Test 3 and Test 4 retries
Final Exam next Tuesday 5/6 at 12-2pm in lecture room
Please complete the course survey form

Recap of the semester

We have covered a lot of material this semester! Throughout this course, we have studied four fundamental topics.

Cryptocurrencies and the blockchains within them, which enable financial transactions in a decentralized way, without the need to give your money to intermediaries (like banks).
Broadcast protocols and agreement protocols in the (a)synchronous setting, which enable consensus over the Internet without needing to rely on intermediaries (like social media websites).
Encryption and its use to enable data sharing only to a specific person or server on the Internet, without the need to give your data to intermediaries (like Internet service providers).
Tools to perform data science without data sharing, like MPC and ZK, which enable data analysis and auditing without the need for trusted intermediaries (like cloud providers).

Each of the cryptographic tools we have built in this course is powerful on their own. And they can become even more powerful when we find clever ways to use them together!

Last time, we saw how to combine Topics 1 and 4 to create private cryptocurrencies.
Today, we will combine Topics 2 and 4: using tools for protecting data in use in a task that requires achieving agreement among people.

Learning outcomes

By the end of today’s lecture, you will learn:

How the tools of cryptography can be used to design election tallying systems.
Why we don’t use (all of) them, and what we do instead.

25.1 Secure Voting

Let’s talk today about political elections… or more specifically, the task of voting and tabulating the votes to determine the winner of the election.

There are many, many different kinds of election systems in regions across the world. For today, let’s think of the simplest type of election. (Many of the ideas I present today will generalize to other forms of elections as well.)

There is a single job that needs to be filled.
There are 2 candidates running for this job.
All voters have equal voting power.
Each voter chooses one of the two candidates.
Whoever receives the most votes wins the election. (This is often called plurality voting.)

Since this is a security course, we are going to explore voting as a security problem.

As a result, I’m going to stick to this super-simple election mechanism and sidestep many other interesting questions about voting that can also be addressed using data science methods, such as:

Which kinds of voting systems produce fairer, more representative outcomes.
Which kinds of election procedures and policies increase voter turnout.
The process of redistricting, or how to choose the specific region that the election winner is going to represent.
…and many more!

Question. Let’s apply the knowledge and intuition that you have gained in this course to the question of political elections.

What are the kinds of security issues that could occur in elections?
Which ones seem the most (or least) concerning to you?
What properties would you expect a “good” election system to provide?

This is a complicated question! One way to approach it is using the CIA triad that we learned in this course. Here is at least one kind of security property that we might want for each part of the triad.

Confidentiality: The election system should provide voters with ballot secrecy. That is, it should not be possible for other people to know who you have voted for, in order to avoid issues with peer pressure and voter intimidation. (This is one reason why, for example, the government sets up voting booths so that you can vote in confidence.)
Integrity: The tallying mechanism should have good data integrity, so that people have confidence that the final count has not been tampered with.
Availability: The voting process should be (1) broadly available so that it is easy for people to participate in the election and (2) censorship-resistant so that nobody can stop citizens from voting.

25.2 Cryptographic protections for ballot secrecy and integrity

“Cryptography is not just about secrets, it enables collaboration without blind trust.”

– Ben Adida

[This section is based on slides from Ben Adida (link1, link2).]

As a math problem, elections boil down to addition. Each voter chooses one of two candidates, and we want to compute the sum of all votes for each candidate.

In other words: we could think of elections as just the problem of secure addition. And we know how to solve that!

This is what cryptographers did when they first studied elections in the early 2000s.

Public-key encryption and homomorphic encryption

Here’s an idea: what if every voter encrypted their ballot and sent it to the election authorities? That would help to protect ballot secrecy.

So far, we have only talked about symmetric encryption between two people who have a shared symmetric key. In elections, we might instead want it to be the case that everyone can encrypt a message to the election authorities, so that only they can open the votes.

This is the idea behind public-key encryption.

Informal description of encryption and digital signatures (source: idea-instructions.com)

We can use the math surrounding the hardness of factoring or (what I’ll show here) discrete logarithms to build public key encryption schemes. Here is a simple but effective scheme called El Gamal encryption.

KeyGen: choose a secret exponent \(x\) as the secret key, and post the public key \(h = g^{x}\).
\(\operatorname{Enc}_{pk}(m)\): choose a random exponent \(r\) and construct the ciphertext \(c = (g^r, m \cdot h^r)\). Send these two group elements to the receiver.
\(\operatorname{Dec}_{sk}(c)\): Parse the ciphertext as \(c = (d, e)\). Compute \(m = \frac{e}{d^x}\).

To see that this scheme is correct, note that a valid encryption will be decrypted as:

\[\frac{e}{d^x} = \frac{m \cdot h^r}{(g^r)^x} = \frac{m \cdot g^{rx}}{g^{rx}} = m.\]

While I won’t prove it today, I claim that this scheme also maintains the secrecy of a message \(m\) against any attacker Mallory who sees ciphertext but does not know the recipient’s secret key.

One clever mathematical property of this encryption scheme is that it is homomorphic. That is: if you take the product of two ciphertexts, the result is the encryption of the product of the two underlying messages!

In math:

\[\operatorname{Enc}_{pk}(m_1) \times \operatorname{Enc}_{pk}(m_2) = (d_1 d_2, e_1 e_2)\]

decrypts to:

\[\frac{e_1 e_2}{(d_1d_2)^x} = \frac{e_1}{(d_1)^x} \cdot \frac{e_2}{(d_2)^x} = \frac{m_1 \cdot h^{r_1}}{(g^{r_1})^x} \cdot \frac{m_2 \cdot h^{r_2}}{(g^{r_2})^x} = m_1 m_2.\]

Note that this multiplication of ciphertexts can be performed even by people who do not have the secret key! This is another form of data protection in use – we can compute over data without being able to read it.

This is a nifty property that symmetric encryption schemes like AES do not have!

But also, it is not immediately clear how we can use this property to our benefit. Why would we want to multiply messages together?

If we make a small tweak to the encryption scheme (basically: put the message \(m\) in the exponent), then I claim we can create a new encryption scheme that is additively homomorphic. By this I mean that anyone in the world can manipulate ciphertexts to produce the sum of the underlying plaintexts:

\[\operatorname{Enc}_{pk}(m_1) \times \operatorname{Enc}_{pk}(m_2) = \operatorname{Enc}_{pk}(m_1 + m_2).\]

Hence, we can add numbers “under the cover” of encryption!

Question. How can we use this property in elections?

One possible answer:

Each voter encrypts their vote as a 0 or 1, depending on the candidate they prefer – using a key that only the election authorities have.
Everyone posts their encrypted ballots publicly, say on a public ledger.
The election authorities can now calculate the encryption of the final vote tally, and decrypt the final result.

Public ledger with encrypted ballots (source: Ben Adida’s slides)

One issue with this method is that the election authorities hold a secret key that allows them to decrypt any individual voter’s ballot too.

The good news is that we know how to solve this problem: just like we can build threshold signatures, we can also build threshold decryption schemes. They distribute trust among many different election authorities, so that a large quorum of them are required to decrypt any ciphertext.

Then, we would only have to trust that there exist a sufficient number of honest election authorities somewhere. Our hope is that honest authorities would only decrypt the final tally, and they would refuse to decrypt an individual ballot.

Threshold decryption of a ballot (source: Ben Adida’s slides)

Zero knowledge proofs of shuffling

Homomorphic encryption has addressed the ballot secrecy issue. But it has not addressed the question of data integrity.

As written, the election authorities could collude to make up a final answer.
Even worse: the public would not know that they have done so.

The good news is that we can address this problem as well, using zero knowledge proofs! In fact, the solution here uses the same technique as with Sudoku puzzles. We want to build a digital equivalent of “shuffling” for the ballots.

Since we’ve already asserted that there are several (say \(N\)) election authorities that we are distributing our trust between, let’s say that we ask each of them to perform a shuffle of all ballots, in turn. This system is called a mixnet.

Here is an alternative strategy. Rather than using homomorphic encryption at all, let’s just use ordinary public key encryption with \(N = 3\) election authorities whose public keys are \(pk_1\), \(pk_2\), and \(pk_3\).

Each voter produces an onion encryption (i.e., a layered encryption) \(c = \operatorname{Enc}_{pk_1}(\operatorname{Enc}_{pk_2}(\operatorname{Enc}_{pk_3}(m)))\), where \(m\) is their vote. They post this ballot to the public ledger.
The first election authority removes the outermost layer of the encryption, and also shuffles the ballots.
The second authority removes the middle layer of the encryption, and also shuffles the ballots.
The third authority removes the final layer of the encryption, shuffles the ballots, and publishes the result.

If the election authorities perform the shuffling honestly and also keep their permutations secret, then:

Everyone sees a pile of votes for each candidate.
But nobody can link the votes to the individual voters!

A set of ballots, after shuffling (source: Ben Adida’s slides)

However, a cheating election authority could perform the shuffle incorrectly, say by

Dropping valid ballots, and
Inserting fake ballots in the middle of the mixnet.

We can solve this problem with a ZK proof, so that the authority proves to the rest of the world that she performed the shuffle correctly. (This works similarly to the Sudoku proof; I’ll skip the math details in the interest of time.)

One benefit of this new approach is that now each voter can check the election authorities’ work to ensure that her vote is included in the final tally. So the resulting system looks more like this:

Public ledger with encrypted ballots, plus verification (source: Ben Adida’s slides)

25.3 Secure voting, revisited

What I’ve just presented to you is the state of cryptographic research into election systems by around 2005 (give or take). However, to my knowledge, these ideas are not present in any mainstream voting system.

Question. Why not?

Answer. Because election systems have many security considerations that go beyond ballot secrecy and data integrity.

I want to return back to the question that I posed at the start of today’s class, about thinking through the security questions involved with political elections.

What are the kinds of security issues that could occur in elections?
Which ones seem the most (or least) concerning to you?
What properties would you expect a “good” election system to provide?

Or maybe we should think of an even more abstract question.

Question. What is the point of an election system (beyond simply computing the correct result)? What would a “good” election system provide?

One possible answer: provide enough evidence to convince the losing candidate (and their voters) that they have lost the election.

“The People have spoken… the bastards!”

– Dick Tuck, 1966 concession speech

Our prior solution is insufficient in at least two ways:

It is not sufficient to detect problems, because there is no obvious recourse to that. Instead, we want to proactively prevent problems from happening.
It is not sufficient for an election to be secure. It must also be the case that people believe that the election is secure.

End-to-end voter verifiability

Voting is a complicated process with many moving pieces.

There is the voter registration process and the government’s work to verify eligibility.
On the day(s) of the election, voters and election authorities interact with many computational and paper systems.
A good election system provides a solid chain of custody throughout this entire process.

For an individual voter, a good election system protects against all the places where their vote could be dropped or miscounted. This is called end-to-end voter verifiability.

A good election system should convince the voters that each step in this process was done correctly. (By contrast, our ZK proofs from before only checked the “counted as collected” part of the voting process.)

“Each property should not only be true, but be verifiably true.”

– Ron Rivest

Many of these systems involve computers. The ballot itself could be recorded with a computer, or it could be recorded on paper and then counted with computers.

Computer	Paper

We as computational and data scientists know easily these systems can become either wonky (e.g., hitting an edge case for which there was no prior testing) or compromised (e.g., with malware). Given the importance of elections, another important security property emerges called software independence:

A voting system is software independent if an (undetected) change or error in its software cannot cause an undetectable change or error in an election outcome.
A voting system is strongly software independent if an (undetected) change or error in its software cannot cause an undetectable change or error in an election outcome, and moreover, a detected change or error in an election outcome (due to change or error in the software) can be corrected without re-running the election.

Paper auditing

One way that we achieve the goal of “no do-overs” is to ensure that close elections can be recounted, potentially by hand if need be.

For this process to be possible, it is important that there be a paper trail. This is why modern voting systems, even those involving a computer, provide a voter-verified paper audit trail.
Also, recall that elections must satisfy two equally important goals: that they be correct, and that people believe that they’re correct.

The cryptographic tools that we have studied in this course, like ZK proofs, do help to build evidence in the correctness of the process. But that evidence can be hard for people to understand and interpret, especially in a political process.

“With cryptography [like zero knowledge proofs], you’re just moving the black box. Few people really understand it or trust it.”

– California Secretary of State, 7/30/2008 (paraphrased)

For this reason:

We need to think about tools for auditing elections that are easy for people to understand – even if they aren’t paying much attention to the process.
This is a new requirement that we didn’t need in our previous examples of protecting data in use, such as the Boston wage gap study or the Sudoku ZK proof example.

Risk-limiting audits

One popular approach nowadays is the idea of risk-limiting audits. The high level idea is to check the vote tallying system to make sure it is returning the correct answer, by picking a small number of ballots at random and comparing the outcome of the machine tally to a manual count.

This can often be quite cheap to do! Consider for example a first-past-the-post election in which candidate A wins 60% of the vote and candidate B wins 40% of the vote.

It isn’t actually important to test that the machine is perfectly correct. That would require a full manual recount.
Instead, you would merely have to check that the machine doesn’t have more than 10% systematic error in one direction. This is a statistics problem, and you can calculate the random sample required to have less than this amount of error; it can usually be as low as a few dozen or hundred ballots.

To conduct the risk-limiting audits, you only need three things. (I’ll show an example of a real risk-limiting audit done in Rhode Island.)

First, you need the stack of paper ballots.

Image source: ABC station in Providence, RI

Second, you need some dice to choose some of the ballots from the pile at random.

And finally, you need cameras to show the process to the electorate. That’s it!

Perhaps one takeaway from this particular lecture is that we’ve gone backwards today: moving away from electronic systems toward their physical counterparts.
But I think a better way to think about election systems is that we are applying all of the concepts and lessons learned in this class, and adding the additional requirement that it needs to be easy to perform and to understand for a physical process like voting.

Summary

Let me end this topic with a quote from Prof. Matt Blaze, a long-time expert on election security.

I often say that election security is by far the hardest technical problem I’ve ever encountered. Why? Four reasons:

Contradictory critical requirements, particularly vote secrecy vs. transparency.

No truly neutral trusted third parties.

Election do-overs are generally impossible, so the ability to merely detect problems is insufficient. You have to reliably prevent them.

Much of the technology than can manage the complexity of elections is inherently untrustworthy.

If you want a broader list of election security properties, this list from Cornell is a good starting point. Here are some highlights that I haven’t had time to cover in today’s class.

Non-coercibility: Voters should not be able to prove how they voted, even if they wanted to! In particular, voters should not receive a receipt of their votes. Otherwise, they could be coerced or bribed.
Collusion resistance: Voter privacy must be assured even if everything fails, everyone colludes, and there is a court order to reveal all election data. In particular, vote secrecy must not depend only on communication protocol and cryptographic assumptions.
Reliability: Even in the event of numerous or major failures (such as loss of Internet communication), the system should be robust, and no loss of vote should happen.

Thanks!

I hope you have enjoyed this course as much as Tej and I have enjoyed teaching it. We look forward to seeing all of the things you build with your newfound knowledge!

Just a few reminders as the semester comes to a close:

Please complete the BU course survey form.
Remember that the final exam is next Tuesday 5/6.
If you are not graduating and would like to be a TA for DS 653 next fall, send me an email.