Lecture 23: Private Cryptocurrencies

Learning outcomes

By the end of today’s lecture, you will learn:

The difference between pseudonymous and anonymous financial payments.
How Zcash uses zero knowledge proofs to build “shielded transactions” that hide the sender, receiver, and amount of a transaction.
How Mina uses recursive zero knowledge proofs to design a succinct blockchain.

Announcements

This week’s reading: NBFMG Chapter 6
Due this week
- Challenge 5-6 submissions (due Friday)
- Homework 10 (due Saturday)
- Lab 10-11 worksheet (due Sunday)
In-person meetings
- Remaining labs are canceled (do the worksheet instead to prepare for Test 4)
- Thursday 4/25: in-person-only guest lecture by Prof. Andy Sellars

Recap of the semester

Throughout this course, we have studied four fundamental topics.

Cryptocurrencies and the blockchains within them, which enable financial transactions in a decentralized way, without the need to give your money to intermediaries (like banks).
Broadcast protocols and agreement protocols in the (a)synchronous setting, which enable consensus over the Internet without needing to rely on intermediaries (like social media websites).
Encryption and its use to enable data sharing only to a specific person or server on the Internet, without the need to give your data to intermediaries (like Internet service providers).
Tools to perform data science without data sharing.
- Secure multi-party computation protocols enable data analysis without the need to give your data to trusted intermediaries (like cloud providers).
- Zero knowledge proofs enable the world to hold an entity (such as a bank, leader, cloud providers, or any one of us) accountable, without the need for trusted intermediaries (like auditors).

Before spring break we focused on integrity and availability. For the last few weeks, we have also considered the importance of data confidentiality. Still though, hopefully you see a theme across all four topics:

The tools in this course allow for data-driven decision making, without the need to place blind trust in another company or person.

To be clear: it’s perfectly fine to trust other people!

At a personal level: forming relationships is a fundamental part of being human.
As a society: there is immense value to be gained by building civic, economic, legal, and political structures and then working to demonstrate their trustworthiness.

But it’s not so good if you are forced to trust a random intermediary just to get things done (either in the physical or digital world), and this problem is exacerbated if you cannot verify their behavior to know whether they are worthy of your trust.

By using these and other tools for data security and privacy, the goal is to:

Reduce the level of trust required in other entities.
Understand exactly what you are trusting other entities to do (i.e., what could go wrong, and how many of them need to collude in what ways for a bad outcome to occur).
Verify that trusted entities are acting correctly.

“Cryptography is how people get things done when they need one another, don’t fully trust one another, and have adversaries actively trying to screw things up.”

– Ben Adida

These four topics of cryptocurrencies/blockchains, broadcast & consensus algorithms, encrypted communication, and protecting data in use are all of the topics I wanted to cover in this course. So in some sense, we are done!

…Well, not quite. So far we have studied each of these topics in isolation. For the last few classes of this semester, I want to look at:

what happens when we combine them together to build really powerful data science tools, and
the impact that cryptography can have on society (in both good and bad ways).

To the latter point: we have a guest lecture on Thursday, April 24 by Prof. Andy Sellars of the BU Law school to talk about the connection between cryptography and the law. This lecture will not be recorded, so please attend in person.

22.4 zk-SNARKs

Recall that a ZK proof allows a prover Peggy to convince verifier Victor that a statement is true, without revealing the evidence or reason why. It satisfies three properties: completeness, soundness, and zero-knowledgeness.

Prover Peggy\((x, w)\)		Verifier Victor\((x)\)
	\(\begin{array}{c} \underset{\longleftarrow}{\pi} \\ \longrightarrow \\ \vdots \\ \longrightarrow \end{array}\)
		output: Accept or Reject

Let me give you a brief glimpse into the cutting-edge of ZK. There has been a substantial amount of work over the past decade to construct zk-SNARKs. These are:

zero knowledge
succinct
non-interactive
arguments of knowledge.

The word “snark” comes from a poem by Lewis Carroll, in which several people are hunting a fictional animal that he calls a snark.

Also: for the purposes of this class, you should think of the words “proof” and “argument” as synonyms.

These proofs are very small, say just a few hundred bytes in size. This is true even if:

the analysis involves a large volume of data (say, gigabytes or terabytes), and/or
the analysis itself is computationally intensive (say, a high-performance machine learning algorithm).

Here are some potential applications of zk-SNARKs:

A bank can prove that from all of their transactions over the past year, they did not transact with some entity \(Y\)… but without revealing their full transaction log.
Outsourcing a large computation to the cloud, and then validating that the result is correct without redoing the work yourself.
Building private cryptocurrencies (our task for later today!)

Why this is hard

Let me give you a glimpse into how zk-SNARKs are built. The main challenge is to provide succinctness and soundness together.

As a thought experiment: let’s think about what you might do to verify a computation done by someone else, if ZK and succinctness were not needed. Here is one approach:

When the untrusted cloud performs the computation, you can also ask them to write down their work at each step.
After they’re done, you can review all of their work and check that they’ve done it correctly.

Example. Suppose that you outsource to the cloud the act of calculating \(2^{20}\). For now, let’s pretend that the only way to calculate this number is to multiply a register by 2 a total of \(20\) times.

Suppose the cloud messes up in one step of the calculation, and here is the work that they show you.
Rather than calculating \(8192 \times 2 = 16,384\) the cloud instead produces the cell \(15,625\).
As a consequence, the cloud tells you that \(2^{20} = 1\) million.
If you see the entire working state of the cloud computer, then you can catch this bug!

Transcript of an erroneous computation (source: Vitalik Buterin’s blog)

Question. But what if the cloud can only send you a small amount of data, of your choice. What should you request? (Asking for just part of the working state will not work, as the cloud could lie elsewhere.)

What you need is some way to bind together all of the cloud’s working state \(1, 2, 4, 8, \ldots\) into some object that is:

Succinct, but also
Brittle, in the sense that if you tamper with even a single number then the object will be wildly different most of the time.

One mathematical object with this property is polynomials. They have the property that if you change a polynomial at even one point, then you end up changing it at nearly every point.

Or in other words:

Two different polynomials \(P\) and \(Q\), each of degree \(d\), intersect in at most \(d\) points.

Two polynomials (source: Why and How zk-SNARK Works)

Does this solve the problem?

The bad news: polynomials are not succinct. to describe a polynomial of a large degree \(d\), you need to write \(d+1\) coefficients.
The good news: polynomial commitments are succinct. These are special commitments that allow Peggy to put a polynomial in a (digital) envelope, and then open the polynomial at points of the Victor’s choosing.

The punchline here is:

If we can transform the action of “verifying an arbitrary program outsourced to the cloud” into “verifying some polynomial arithmetic,” we would be done!
Amazingly, this is possible to do!
Fortunately, there are ZK computer programs that handle the strange “moon math” of polynomials and R1CS behind the scenes.

23.1 The challenge of pseudonymity

Do not invest in cryptocurrencies! It is a great way to lose money.

Recall that Bitcoin transactions are “pseudonymous.”

Alice and Bob never have to write their names on the blockchain.
But their transaction history is public to the world, and that might be used to identify them.

Even if the blockchain itself doesn’t reveal the sender and receiver’s identity, it’s entirely possible that people can find out about your transaction in other ways. For instance:

Perhaps Alice reuses the same public key with many people. Now Mallory can track Alice’s other transactions, just as you all did in Challenge Problem 1.
Perhaps Mallory watches over Alice’ shoulder as she posts a cryptocurrency transaction to buy coffee.
Perhaps the sender just announces the transaction on the Internet.

Source: Zcash Foundation
Perhaps the amount of the transaction is so large that only a few people or organizations in the world could even produce it.

Source: Bitcoin transaction 80f4c71605a220a7e7c22ba46eaf6687f1748034918409297235f197176922c1
Perhaps Mallory is smart enough to connect many transactions on the blockchain together. For instance, consider this transaction posted not long after the previous one. Which of the two destination addresses do you think is the change address that is returning money back to the original sender?

Source: Bitcoin transaction 88ebf4c7dcfe35114bc0908691a5c990623bd77dc5eed8927636f2fbb321d9f5
Perhaps Mallory is clever enough to connect many transactions on the blockchain.

Source: Wired

23.2 The Zcash cryptocurrency

Bitcoin achieved its goal of a secure, decentralized ledger through replication – that is, making every financial transaction public to the world.

In other words, Bitcoin achieves strong integrity and availability.
But it does so at the expense of confidentiality.

The problems of establishing consensus and preventing double spending are indeed hard, and we spent several lectures discussing them. But why do they require the whole world to know exactly when you buy coffee in the morning? One problem doesn’t seem related to the other.

Today, let’s explore how to use zero knowledge proofs in order to design new cryptocurrencies that provide anonymous payments.

We’ll start by talking about Zcash, which is based on the Zerocoin and Zerocash papers.

Two kinds of addresses

Recall that in Bitcoin, you must maintain a secret key, from which you can derive a public key. As the name suggests, a public key can be posted to the world – it is safe to do so, in the sense that nobody will be able to calculate your secret key from it, so your coins are safe.

But just because you can publish a public key, doesn’t mean that you have to do so.

In Bitcoin, the hash of each public key is immediately posted to the ledger, and the real public key is posted whenever you want to spend a transaction. That’s why we see them on https://mempool.space.
But publishing your public key can lead to de-anonymization attacks, where others can figure out which real-world people correspond to which alphanumeric sequence of keys.

Bitcoin transaction

By contrast, the Zcash currency has two kinds of addresses:

Transparent addresses, which work exactly like Bitcoin does.
Shielded addresses, which are new and different.

If you and your friends only ever used transparent addresses in Zcash, everything would look basically the same as it does in Bitcoin.

You can send money to your friends by digitally signing a transaction to pay their addresses.
You can download the entire blockchain of all transactions that everyone has ever sent using transparent addresses.

There’s only one thing that might look a bit different to you: there is one account that stores an incredibly large amount of money in it.

Upon closer inspection, actually there are a large number of transactions that take money into and out of that wallet.
Even stranger, there are some weird transactions posted to the blockchain that you cannot decipher.

That’s because the big pool of money is actually the sum total of all shielded accounts. Anyone with a shielded account can move some (but not all!) of the money in the pool.

Four types of ZCash addresses (source: Anatomy of A Zcash Transaction)

Inspecting Zcash transactions

Let’s look at a transaction between two shielded accounts. It has a transaction id, as normal. But the identity of the accounts is concealed. From the perspective of the public blockchain, no money has moved! All of it stayed in the giant money bag.

Private transaction between two shielded accounts (source: Blockchair)

And here is a transaction that moves between the giant shielded money bag and a “normal” public account.

Deshielding transaction from a shielded account to a transparent one (source: Blockchair)

Public addresses on Zcash always begin with the letter t, and shielded accounts begin with the letter z (although we can’t see the shielded address here via the public view of the blockchain; only the account owner would see that).

23.3 How Zcash private transactions work

[This portion of the lecture notes is derived from Electric Coin Company: How Transactions Between Shielded Addresses Work.]

In order to explain the transactions between shielded addresses, it is helpful to recall how transactions between transparent (Bitcoin-like) addresses work.

To simplify the story for now, let’s leave aside the scripting language, and suppose that a transaction only contains a digital signature.

UTXO table

Recall that Bitcoin is based on the unspent transaction (UTXO) model. For a transaction to be valid:

The sender of the transaction must specifiy where they got their coins from.
The sender cannot send more than what they previously received. What goes out must be less than or equal to what goes in (and the difference goes to the miner as fees).

Bitcoin transactions (image source: researchgate.net)

In other words, we can think of the blockchain as storing a table of hashed public keys and associated monetary amounts.

After the first two transactions in the above figure, the table would look something like this.

Hashed key	Amount
H(pk1)	1.5
H(pk2)	2.2
H(pk3)	0.5

Here is the first change that Zcash makes, which on its own is relatively minor.

Let’s say that each transaction output has an associated “serial number” \(r\); you can think of this like the serial number on a dollar bill.
The sender of each transaction creates the serial number and sends it privately to the recipient.

Hashed key	Amount
H(pk1, r1)	1.5
H(pk2, r2)	2.2
H(pk3, r3)	0.5

When the third transaction is posted to the blockchain, suppose that the sender posts both a digital signature and the serial numbers of the just-spent coins.

Afterward, all nodes:

Check the validity of the digital signature.
Check that the serial numbers have never been spent before.
Make sure that the outgoing amount is no more than the incoming amount.

If all checks pass, the nodes update their record of the unspent transactions as follows.

Hashed key	Amount
~~H(pk1, r1)~~	~~1.5~~
~~H(pk2, r2)~~	~~2.2~~
H(pk3, r3)	0.5
H(pk4, r4)	1.8

The nodes also keep a list of spent serial numbers: in this case r1 and r2. If serial numbers are unique, this is a quick way to check that nobody posts the same transaction twice.

Adding ZK

The main idea of Zcash is to:

do the work described above in zero knowledge.

Concretely, the parties maintain a hashed version of the above table, so that nobody can read the keys or the amounts.

To make the above transaction, the sender posts hashes of the serial numbers H(r1) and H(r2)—which are called nullifiers in Zcash-lingo—along with a zero knowledge proof that:

The sender knows values pk1 and pk2 such that when you hash them together with the serial numbers, the results are the hashed keys H(pk1, r1) and H(pk2, r2) that are being spent.
The sender knows the secret keys corresponding to the public keys pk1 and pk2. (This is essentially what the digital signature did before… in fact, a signature is a special case of a ZK proof.)
These nullifiers have never been used in a previous transaction. This is where the name comes from – posting H(r1) effectively nullifies the coins held by pk1 to ensure that they aren’t spent twice.
The amount of money being sent to the recipient is less than or equal to the amount in the current account. (That is: the sender is only giving away the part of the giant money bag that it has the right to use.)

The sender creates a non-interactive zk-SNARK and posts it to the blockchain, and everyone verifies it. If so, they add H(r1) and H(r2) to their nullifier set.

The takeaway here is that:

All of the money owned by all shielded accounts is lumped together into one giant account.
With the correct key material and serial numbers, you have the right to re-assign some of it to belong to someone else. You post a ZK proof to convince the rest of the world that you have this right.
But: the rest of the world never learns the sender’s keys, receiver’s keys, or the amount transferred!

The actual Zcash protocol includes many tricks to make this process more efficient. For instance: rather than storing a big table of all hashed keys and hashed amounts, instead all of that data is put into a Merkle tree.

Transactions that move money between shielded and transparent accounts work similarly. In this case, it is publicly visible that the size of the giant account is increasing or decreasing, but nobody knows who has control of the money that just entered/left.

But there is one unfortunate catch:

To spend money from a shielded address, you need to create a ZK proof. Part of the statement being proved is that “nobody has ever used this nullifier in any other transaction in the blockchain.”
So, in order to construct this proof, the sender/prover first needs to download the entire blockchain. (The Zcash blockchain size is currently around 93 GB.)
Hence, unlike in Bitcoin, it’s not possible for light clients to spend money just by posting a signature and letting the nodes and miners sort out the details. In some sense, everyone needs to be a full node.

22.4 Recursive proofs

Remember that zero knowledge proofs enable the prover to demonstrate to the world that they possess some information, without the need to broadcast it to the world.

Typically the prover wants to hide the underlying information for confidentiality reasons.
But the idea works just as well if the prover simply wants to avoid sending the data for efficiency reasons.

Here’s an intriguing question: what happens if the prover’s information itself includes a non-interactive zero knowledge proof that the prover received earlier from someone else?

Example. Suppose I ask everyone in the class to solve one Sudoku puzzle, and give me a non-interactive ZK proof that you know the solution. Then, I want to make a single ZK proof that all of the Sudoku puzzles are solveable.

I cannot prove the statement “I know the solution to all Sudoku puzzles,” because I don’t know the solutions.
But what I can do is prove “I have in my possession valid non-interactive ZK proofs of all Sudoku puzzles.” This is a perfectly valid thing to do! It is called a recursive zero knowledge proof, or proof-carrying data.

This is somewhat mind-blowing. You can attach a tiny proof to any piece of data that explains its full provenance – which raw data it came from, and what data analysis was performed in order to produce it.

Recursive proofs of photograph alterations (source: PhotoProof)

This technique has a wide range of applications in data science. But today we will look at one application involving cryptocurrencies.

Mina cryptocurrency

Every cryptocurrency we have seen so far – Bitcoin, Ethereum, and Zcash – fundamentally relies on a blockchain data structure.

The individual transactions might include digital signatures, smart contracts, or ZK proofs; no matter what, they are periodically grouped together into a block and appended to the existing chain.
As a result, the blockchain for each cryptocurrency grows linearly with the number of transactions that have ever been made. For instance, the Bitcoin blockchain is currently 653 GB in size.

Using recursive zero knowledge proofs, we can substantially reduce the size of the blockchain.

Let’s imagine that the following picture contains part of the Zcash blockchain. Each block contains a series of transactions, and suppose that each one is a zero knowledge proof corresponding to a private transaction.

Question. How can the miner reduce the size of a block?

Answer. The miner can produce a recursive proof of the following statement:

“I have seen a series of proofs by individual senders, and they are all valid, and they result in specific nullifiers being added to the set.”

Remember that a single zk-SNARK has constant size, independent of the complexity of the statement that produced it. So we can compress all signatures in a block effectively down to the size of a single transaction!

But why stop there? Recursion is such a powerful technique that we can use it between blocks too.

Recall that when a new client joins the cryptocurrency, the first thing it must do is to download the blockchain (or at least the headers) and verify all of them. That’s a lot of work, and we’re asking every new client to do the same thing.

So here is an idea: when a miner produces a block, it proves that all blocks since the start of time are valid.

That sounds like a lot of work, but actually it isn’t, thanks to the power of recursion. The miner of block \(N-1\) has already posted a proof that the first \(N-1\) blocks are valid. So the current miner can create a proof that:

“I have seen individual proofs that each transaction in block \(N\) are valid, and I have seen a proof that the first \(N-1\) blocks contain valid transactions as well.”

(It does take some computing work for the miners to build these proofs, but compared to the immense amount of work that miners already do to solve the proof-of-work puzzle, this is nothing.)

The amazing result of all this recursion is a succinct blockchain.

The Mina blockchain can be summarized in about 20kB of data, even though the number of transactions on the network is continually growing.

The zk-SNARK is about 1 kB in size.
The Merkle tree of the account table (which I mentioned earlier today) is about 20 kB in size.

This is so small that anyone can easily download it and verify the entire blockchain.

Note that some people need to act as:

Full nodes that store the entire blockchain and perform the consensus protocol on top of it.
Miners that produce new blocks to add to the chain.

Fortunately, most people can act as ultralight clients. Thanks to the (recursive) ZK proofs, they only need the succinct summary of the blockchain in order to verify their balance, unlike with Bitcoin where the clients needed to place trust in the nodes/miners.

Summary

Today we have seen the privacy benefits that accrue when we combine zero knowledge proofs (part 4) with cryptocurrencies (part 1 of this course).

It turns out that combining consensus protocols (part 2) with cryptocurrencies also produces a huge win: we can replace the energy-intensive proof of work puzzles with an environmentally-friendly leader election protocol.

More generally: each of the cryptographic tools we have built in this course is powerful on their own. And they can become even more powerful when we find clever ways to use them together!