Lecture 20: Verifiable MPC

Announcements

Learning outcomes

By the end of today’s lecture, you will learn:

  • How to perform secure multiplication against a passive Eve and an active Mallory.
  • How to construct verifiable secret sharing, and how it can be used as the basis for MPC.

Review from last lecture

In the first half of this course, we designed blockchains and broadcast protocols that provide strong integrity and availability. They are decentralized protocols that allow for outsourcing or federating data among a large number \(n\) of parties (e.g., servers on the Internet).

  • Fundamentally, blockchains and broadcasts achieve decentralization based on the principle of replication. The idea is that if you want information to live for a long time (either on the Internet or between a set of Byzantine generals), then you should store it in many computers or many people’s minds.

  • Replication is a powerful technique. But on its own, it seems incompatible with confidentiality. After all, if you give your data to \(n\) servers, how can you protect your data aginst them?

In this unit of the course, we began to study the concept of secure multi-party computation, or MPC.

  • MPC is a powerful tool that allows you to distribute your data among \(n\) parties, without giving your data to any single one of them! If a threshold number \(t\) parties meet then they can jointly reconstruct the data, but any coalition of smaller size cannot read your data.

  • MPC is based on the idea of randomness, rather than replication. The idea is to give each party a share of your secret: a random number, but chosen so that all of the shares jointly can be used to reconstruct your data.

We have seen two examples of secure multi-party computation so far.

  • In Lecture 4, we saw how a group of \(n\) people can produce a threshold Schnorr signature indicating that \(t\) of the \(n\) people approve the message.

  • In the last lecture, we showed that it is possible for \(n\) parties to collectively perform secure addition.

\(a\)
\(b\)
\(c\)
\(a + b + c\)
\(a_1\) \(b_1\) \(c_1\) \(a_1 + b_1 + c_1\)
\(a_2\) \(b_2\) \(c_2\) \(a_2 + b_2 + c_2\)

Today, we will extend MPC in three ways.

  1. We will securely compute any mathematical function \(f\) over secret shares of the data, and then only reconstruct the final result while keeping confidential all inputs and intermediate variables.

  2. We will add integrity: that is, we will provide security against an actively malicious adversary Mallory, rather than a passive Eve.

  3. We will add availability: that is, we will make sure that all computing parties have shares of the data, and they know that everyone else has shares too.

20.1 Securely computing all functions

Secure multi-party computation protocols have three steps:

  1. The data holders split their secrets \(x_1, x_2, \ldots, x_m\), and send one share to each computing party.
  2. The computing parties perform some calculation over the secret shares.
  3. The computing parties give shares of the result to the data analyst, who can reconstruct \(y = f(\vec{x})\).

Steps 1 and 3 come directly from the split and reconstruct operations of the secret sharing construction.

The main challenge is in step 2. So far, we only know how to add secrets by adding their corresponding secret shares.

MPC of linear functions

The ideas that we have already shown for secure addition easily extend to cover all linear functions!

For instance, suppose that rather than calculating \(a + b + c\), we actually wanted to compute \(2a + 3b - c + 10\).

Since we are using additive secret sharing, and scalar multiplication also commutes over addition, each party can just compute its share of the result.

\(a\)
\(b\)
\(c\)
\(2a + 3b - c + 10\)
\(a_1\) \(b_1\) \(c_1\) \(2a_1 + 3b_1 - c_1 + 10\)
\(a_2\) \(b_2\) \(c_2\) \(2a_2 + 3b_2 - c_2 + 0\)

Secure multiplication

Next, let’s try to multiply two secrets. Suppose that two data holders input \(a\) and \(b\), respectively. Can we somehow calculate secret shares of the product \(c = a \cdot b\)?

\(a\)
\(b\)
\(c = a \cdot b\)
\(a_1\) \(b_1\) \(\color{red}{??}\)
\(a_2\) \(b_2\) \(\color{red}{??}\)

This is not as easy to compute. This time we cannot rely solely on commutativity of addition; we need to consider the distributive law as well.

Recall the relationships between the secret variables and their additive shares (all equations will be mod \(2^\ell\), but from now on I’m going to omit writing the modulus for brevity): \[\begin{align*} a &= a_1 + a_2 \\ b &= b_1 + b_2. \end{align*}\]

Hence, we want to compute: \[\begin{align*} c &= a \cdot b \\ &= (a_1 + a_2) (b_1 + b_2) \\ &= a_1 b_1 + a_1 b_2 + a_2 b_1 + a_2 b_2. \end{align*}\]

BU knows how to compute the term \(a_1 b_1\) on its own, and the Council knows how to compute \(a_2 b_2\) on its own. But it’s unclear how to compute the cross-terms \(a_1 b_2\) and \(a_2 b_1\) – that is, it’s unclear how to calculate these terms without the computing parties sending their shares to each other, but that would break data confidentiality.

I’ll show you today one way to resolve this problem. It relies on two useful insights.

Insight #1: let’s add a third computing server.

\(a\)
\(b\)
\(c = a \cdot b\)
\(a_1\) \(b_1\) \(\color{red}{??}\)
\(a_2\) \(b_2\) \(\color{red}{??}\)
\(a_3\) \(b_3\) \(\color{red}{??}\)

At first glance, this might look to make the calculation even harder to perform. Now the distributive law says that we must calculate:

\[\begin{align*} c &= (a_1 + a_2 + a_3) (b_1 + b_2 + b_3) \\ &= a_1 b_1 + a_1 b_2 + a_1 b_3 \\ &\quad + a_2 b_1 + a_2 b_2 + a_2 b_3 \\ &\quad + a_3 b_1 + a_3 b_2 + a_3 b_3. \\ \end{align*}\]

Insight #2: Let’s give each computing party two of the three shares. This is called replicated secret sharing.

  • This is still not enough information for each individual computing party to reconstruct any secrets.
  • But now a threshold of \(t = 2\) of the \(n = 3\) parties will collectively have enough information to reconstruct secrets.
\(a\)
\(b\)
\(c = a \cdot b\)
\(a_1, a_2\) \(b_1, b_2\) \(c_1 = a_1b_1 + a_1b_2 + a_2b_1\)
\(a_2, a_3\) \(b_2, b_3\) \(c_2 = a_2b_2 + a_2b_3 + a_3b_2\)
\(a_3, a_1\) \(b_3, b_1\) \(c_3 = a_3b_3 + a_3b_1 + a_1b_3\)

The punchline here is that: now there exists a computing server that can calculate each of the 9 terms in this product! So the three computing servers can calculate \(a \cdot b\) by working together in such a way that nobody has learned either \(a\) or \(b\).

\[\begin{align*} c &= (a_1 + a_2 + a_3) (b_1 + b_2 + b_3) \\ &= \color{red}{a_1 b_1} + \color{red}{a_1 b_2} + \color{gray}{a_1 b_3} \\ &\quad + \; \color{red}{a_2 b_1} + \color{blue}{a_2 b_2} + \color{blue}{a_2 b_3} \\ &\quad + \; \color{gray}{a_3 b_1} + \color{blue}{a_3 b_2} + \color{gray}{a_3 b_3}. \\ \end{align*}\]

There’s just one downside here: the computing parties have two shares of \(a\) and \(b\), but only one share of \(c\). So we cannot continue and multiply \(c\) times some other secret.

This is easy to fix with some communication: each computing party gives its share to one other party.

  • \(P_1\) sends \(c_1\) to \(P_3\).
  • \(P_2\) sends \(c_2\) to \(P_1\).
  • \(P_3\) sends \(c_3\) to \(P_2\).

Now our invariant is maintained: each computing party starts with 2 out of 3 shares of each input, and completes multiplication with 2 out of 3 shares of each output.

To summarize, we have constructed a secure multiplication scheme with:

  • One round of network communication between the computing parties.
  • Correctness, since the \(n = 3\) parties work together to calculate all 9 terms shown by the distribute property.
  • Perfect privacy, since each party has only 2 of the 3 additive shares of each secret.
  • A threshold of \(t = 2\) parties that can reconstruct the answer (because they collectively hold all 3 shares).

MPC of everything

So far, we have seen one way to perform secure multi-party computation of \(+\) and \(\times\).

Importantly: \(+\) and \(\times\) are a Turing-complete set of gates! That is, we can write any data analytic as a circuit with \(+\) and \(\times\) gates. (This may not be the fastest way to compute \(f\), but it will always work in principle.)

For instance, given the circuit below along with secret shares of \(s\), \(t\), and \(x\) it is possible for the 3 computing parties to work together to calculate:

  • secret shares of the intermediate value \(w\)
  • secret shares of the final result \(y\)

and then reconstruct only the output \(y\).

Hence, the protocol we have constructed today can do a secure computation of any function, as long as:

  • There is an honest majority of 2 honest servers out of 3.
  • The one adversarial server is run by a passive eavesdropper Eve (i.e., it still follows the protocol).

It turns out that you can also perform MPC in the dishonest majority setting, and against an actively-malicious adversary Mallory.

20.2 MPC against Mallory

Recall from last time that our current protocol is only secure against a passive eavesdropper Eve. That’s because a malicious computing party Mallory can tamper with the result by adding some number to her own share.

\(a\)
\(b\)
\(c = a \cdot b \color{red}{+ 10}\)
\(a_1, a_2\) \(b_1, b_2\) \(c_1 = a_1b_1 + a_1b_2 + a_2b_1 \color{red}{+ 10}\)
\(a_2, a_3\) \(b_2, b_3\) \(c_2 = a_2b_2 + a_2b_3 + a_3b_2\)
\(a_3, a_1\) \(b_3, b_1\) \(c_3 = a_3b_3 + a_3b_1 + a_1b_3\)

Here is one way to stop Mallory: we can add a fourth computing party for redundancy.

  • Using replicated secret sharing, we can build a secure computation protocol with \(n = 4\) parties and a threshold of \(t = 2\) parties required to reconstruct.
  • The data holders send 3 of 4 additive shares to each computing party.
\(a\)
\(b\)
\(c = a \cdot b\)
\(a_1, a_2, a_3\) \(b_1, b_2, b_3\) \(c_1\)
\(a_2, a_3, a_4\) \(b_2, b_3, b_4\) \(c_2\)
\(a_3, a_4, a_1\) \(b_3, b_4, b_1\) \(c_3\)
\(a_4, a_1, a_2\) \(b_4, b_1, b_2\) \(c_4\)

The math for multiplication has gotten a bit more complicated, so I won’t write out explicitly how the shares of \(c\) are calculated.

\[\begin{align*} c &= (a_1 + a_2 + a_3 + a_4) (b_1 + b_2 + b_3 + b_4) \\ &= a_1 b_1 + a_1 b_2 + a_1 b_3 + a_1 b_4 \\ &\quad + \; a_2 b_1 + a_2 b_2 + a_2 b_3 + a_2 b_4 \\ &\quad + \; a_3 b_1 + a_3 b_2 + a_3 b_3 + a_3 b_4 \\ &\quad + \; a_4 b_1 + a_4 b_2 + a_4 b_3 + a_4 b_4 \end{align*}\]

Fortunately, the punchline is simple:

  • For each of the 16 terms, there are two parties who know how to compute it!
  • So for any computation that Mallory performs, there is an honest party who can check her math and call her out if she cheats.

Let’s take a step back and admire what we have done so far.

  • We have been able to design MPC protocols to compute any data science computation.
  • We can protect data confidentiality and integrity against a malicious Mallory.

20.3 Verifiable Secret Sharing

For the rest of today’s lecture, I want to discuss a different – and more foundational – way to build MPC that is secure against Mallory.

  • Cryptographic secret sharing is a great tool to provide confidentiality against a passive eavesdropper Eve.
  • But as we saw in the last lecture, on its own it does not provide integrity or availability against an active adversary Mallory. Let’s discuss and address this issue.

Recall how additive secret sharing works. Say you have secret data \(x\) and convert it into a collection of \(n\) secret shares:

\[ x = x_1 + x_2 + \cdots + x_n \bmod{p}.\]

For now, suppose you don’t even want to compute over this data; you just want to outsource the data to several cloud providers and reconstruct it at some later date.

  • So, you can disperse the message among the cloud providers by giving share \(x_i\) to provider \(i\).

  • Then, many months or years later, you can retrieve the secret data by fetching all shares and adding them together.

This provides excellent confidentiality: the cloud providers can perform their job of holding onto your data for a long time, but no coalition of \(n-1\) or fewer of them can ever learn the data.

However, if one party provides an incorrect share \(x_i^* = x_i + \Delta\), then your entire secret is going to be incorrect by \(\Delta\):

\[x_1 + x_2 + \cdots + (x_i + \Delta) + \cdots + x_n = x + \Delta.\]

This is bad news for you: it means your data is corrupted, and even worse there’s no way for you to pinpoint which cloud provider is to blame.

Depending on how the economics of this hypothetical “private multi-cloud outsourcing” service works, it could also be bad news for the honest cloud providers: maybe you refuse to pay them as a result of the data corruption, even though they properly did their job and held onto your data for a long time.

Basically what we want is something like filecoin.io, which is a “decentralized storage network” based on “an open-source cloud storage marketplace, protocol, and incentive layer.” (We’re not going to explain the entirety of how filecoin works in today’s lecture, but we will describe some of the main ideas.)

The problem

In other words, this problem is very similar to the Byzantine generals question. You are a leader and you’re providing information to many other parties.

Image source: Wikipedia

The main differences here are that:

  • We are adding confidentiality through secret sharing, and
  • There are two stages to the protocol: a dispersal stage and a retrieval stage.

But the other trust questions from the Byzantine generals problem remain. You don’t necessarily trust the other parties entirely, and they don’t entirely trust you either.

Question. How can we provide each party with a verifiability guarantee? That is:

  • During dispersal, the cloud providers want to be convinced that they’re holding onto a share of a real secret that you will pay them for.
  • During retrieval, you want to be convinced that the shares are correct (even though you have forgotten the secret yourself). Otherwise, you want to be able to pinpoint which provider(s) are sending you bad shares.

A construction

One way to solve this problem is to use Merkle trees. Recall that Merkle trees allow for uploading many files to the cloud, so that the cloud can prove that each file is part of the overall collection.

Generating a Merkle proof (source: Alin Tomescu, Decentralized Thoughts blog)

Verifying a Merkle proof (source: Alin Tomescu, Decentralized Thoughts blog)

Concretely, here is a protocol that you can follow to disperse the shares and later retrieve the message.

In dispersal:

  • The leader creates all secret shares such that \(x = \sum x_i\), and builds a Merkle tree of all shares \(x_i\). Let’s call the Merkle root \(h\).
  • The leader sends each cloud provider its share \(x_i\), the Merkle root \(h\), and a proof \(\pi_i\) that connects its own share to the root \(h\). (The provider can check for itself that the proof \(\pi_i\) is correct, for the provided share \(x_i\) and Merkle root \(h\).)
  • The leader saves the (small) Merkle root \(h\) and deletes the rest of its state. (After all, the purpose of cloud outsourcing is so that you don’t have to keep the secret \(x\) yourself.)

In retrieval:

  • Each cloud provider sends their share \(x_i\) and proof \(\pi_i\) to the leader.
  • The leader verifies each proof, and pays the corresponding provider if it’s correct. Alternatively, if a provider has cheated, the leader detects the error and takes appropriate action (e.g., sue the cloud provider in court, or take money that the provider has placed in escrow in a smart contract, etc).

Question. There is actually a bug in this protocol. Can you find it?

One problem is that the leader might use different Merkle tree roots \(h\) with each cloud provider. So even if the provider is honest, if the leader is malicious then they could send one cloud provider the wrong root \(h^*\) and later accuse the provider of cheating during the retrieval phase.

We can fix this problem using techniques from earlier parts of this course! Here are two recourses:

  1. Have the leader post the Merkle root \(h\) to a blockchain so that all providers can see them and know that they have a valid share \(x_i\) and proof \(\pi_i\) that connect to the publicly-posted \(h\).
  2. Have the leader run a Byzantine Broadcast protocol to disperse \(h\). Broadcast ensures that each provider receives \(h\), and that they are convinced that everyone else receives \(h\) too. (This approach is cheaper, but would not convince an outside party like a judge in a later court case.)

So here is one way to fix the protocol.

In dispersal:

  • The leader creates all secret shares such that \(x = \sum x_i\), and builds a Merkle tree of all shares \(x_i\). Let’s call the Merkle root \(h\).
  • The leader and providers conduct a Byzantine Broadcast to disperse the Merkle root \(h\) to all parties.
  • Then, the leader sends each cloud provider its share \(x_i\) and a proof \(\pi_i\) that connects its own share to the broadcasted Merkle root \(h\). The provider refuses to participate further if the proof is incorrect.
  • The leader deletes all state.

Retrieval can work the same as before. And in fact, the leader doesn’t even need to store the Merkle root \(h\) because it can just request it from all providers later: if there is an honest majority or a PKI, then the leader can distinguish the correct \(h\) from an incorrect one sent by a faulty cloud provider.

Computing over verifiable secret shares

So far, this technique works to disperse secret data and then retrieve it later.

Let’s bring back the idea of computing over secrets.

  • Suppose the leader dispersed two secrets \(x\) and \(y\) using this protocol… or maybe two different leaders dispersed two different secrets, such as their company’s payroll data.

  • Now, we only want to retrieve the overall sum \(x + y\) of the two secrets, rather than retrieving \(x\) and \(y\) individually.

  • And as we saw earlier today, if we can take the sum of two secrets, then it’s pretty easy to extend the idea in order to perform any linear algebra operation over several secrets.

Can we do this? Well… not quite.

  • The good news is that the secret sharing scheme is linear, so it works fine! The providers could just send shares \(x_i + y_i\), rather than separately sending each of the two summands.
  • The bad news is that the Merkle tree is not linear, so provider \(i\) cannot prove that it is sending the right value \(x_i + y_i\).

So let’s replace the Merkle tree with something else.

Using the discrete log problem, we can create a new way to commit and prove properties of the secrets that is amenable to computation.

20.4 Commitment schemes from discrete log

A Merkle tree is a special case of a kind of cryptographic primitive that we discussed at the beginning of the semester: a commitment scheme. Recall that commitments are a digital analog to putting a written message inside of an envelope and placing it on the table.

  • After Alice commits to a message, then everyone knows that the message is inside of the envelope, but
  • Nobody can read the message until the envelope is opened. (This is related to, but not the same thing as, encryption of a message.)

Commitments have two security guarantees.

  • Binding: even against a malicious committer, it is infeasible for the committer to show a different message \(m^*\) in the open step than the message \(m\) that it originally used during the commit routine.
Mallory Bob
\(\quad \overset{c}{\longrightarrow} \quad\)
  • Hiding: even against a malicious receiver, from only the commitment \(c\) it is infeasible to learn anything about the message \(m\).
Alice Mallory
\(\overset{c}{\longrightarrow}\)

Feldman commitments

Let’s construct a commitment scheme. As is the norm in this course by now, we will construct commitments based on the hardness of some math problem, so that they are easy to compute but hard to invert.

Specifically, we will use the hardness of the discrete log problem. Remember that this problem states that for a uniformly random choice of \(x\):

\[ (g, x) \mapsto h = g^x \text{ is easy to compute, but } (g, h) \mapsto x = \log_g(h) \text{ is hard to compute}\]

Question. How can we construct a commitment scheme from discrete logs?

For today, we will only consider a simplified version of the problem where message \(m\) to be committed is itself chosen uniformly at random.

  • Admittedly this isn’t usually the case. for instance, the messages that you send to your friends over the Internet probably have a lot of structure to them, such as being phrases or sentences of English words.
  • But this will be the case for cryptographic secret shares! That will suffice for us today, so let’s go with this assumption for now.

Here is a commitment scheme designed by Paul Feldman in 1987. As a starting point, assume that all parties agree upon some base element \(g\) (e.g., it is specified in some open standard).

  • commit(m): create the commitment \(c = g^m\).
  • open(c, m): reveal \(m\), so that the receiver can calculate \(g^m\) itself and check that it is equal to \(c\).

This commitment scheme achieves:

  • Binding because there is a one-to-one mapping between messages \(m\) and commitments \(c\), so it is impossible to open to any other message.
  • Hiding because inverting \(c\) to recover \(m\) is computationally infeasible according to the discrete log assumption.

Additive homomorphism

We can use commitments to resolve the conundrum from before! As a reminder: we wanted to find a way to disperse several secrets, say \(x\) and \(y\), and then provide the option for the cloud providers to retrieve only the sum \(x + y\).

Specifically, we will use Feldman commitments to construct a new verifiable secret sharing scheme. The key insights here are that:

  • Each secret share individually is chosen uniformly at random, so we can use them in Feldman commitments.
  • Feldman commitments are linear… well, sorta. Specifically, given the commitments to two messages \(g^{m_1}\) and \(g^{m_2}\), we can compute the commitment to their sum \(g^{m_1} \times g^{m_2} = g^{m_1 + m_2}\).

Technically, this isn’t linearity in the sense of a linear algebra course like DS 121, because we need to multiply the commitments in order to add the underlying messages.

So let’s give this property a fancy new name: additive homomorphism. We say that Feldman and Pedersen commitments are additively homomorphic, because given several commitments it is possible (somehow) to compute the commitment to the sum of the underlying messages.

Verifiable secret sharing with commitments

Here is a new verifiable secret sharing scheme that we can use to disperse a secret over \(n\) cloud providers. As a starting point, we assume that there are \(n\) public constants \(g_1, g_2, \ldots, g_n\) that everybody knows before the protocol starts; importantly, they have nothing to do with our secrets.

In dispersal:

  • The leader creates all secret shares such that \(x = \sum x_i\), and calculates a commitment \(c_i = {g_i}^{x_i}\) to each share. Let’s call the vector of commitments \(\vec{c} = [c_1, c_2, \ldots, c_n]\).
  • The leader and providers conduct a Byzantine Broadcast to disperse all commitments \(\vec{c}\). (That is, every provider agrees on the commitment for everyone, not just themselves.)
  • Then, the leader sends each cloud provider its share \(x_i\). The provider can check for itself that \({g_i}^{x_i}\) equals the public commitment \(c_i\), and refuses to participate further if the value is incorrect.
  • The leader saves the commitment vector \(\vec{c}\) and deletes everything else.

Retrieving \(x\) works similarly as before:

  • Each cloud provider sends their share \(x_i\).
  • The leader can check that \({g_i}^{x_i}\) equals the appropriate commitment inside the vector.

Now let’s do secure addition of secrets. Suppose that two secrets \(x\) and \(y\) have been dispersed to the same set of cloud providers (even by different leaders!), with corresponding commitment vectors \(\vec{c}_x = [g_i^{x_i} : i \in \{1, \ldots, n\}]\) and \(\vec{c}_y = [g_i^{y_i}: i \in \{1, \ldots, n\}]\).

The cloud providers can form a verifiable secret sharing of their sum \(z = x + y\) by:

  • Adding their individual shares \(z_i = x_i + y_i\).
  • Componentwise-multiplying \(\vec{c}_x\) and \(\vec{c}_y\) to form a new vector of commitments \(\vec{c}_z = [g_i^{x_i} g_i^{y_i} : i \in \{1, \ldots, n\}]\).

This works because the new commitment corresponding to \(z_i\) equals \(g_i^{x_i} \times g_i^{y_i} = g_i^{x_i + y_i}\), which is a valid Feldman commitment that will properly open and verify during the retrieval stage.

  • So: we can retrieve \(z\) without ever learning the summands \(x\) and \(y\) individually!
  • Secure multiplication of secrets is more complicated so I will not show it here, but it is possible. From this, we can do verifiably secure computation of everything!

20.5 Final thoughts

Today, we have built protocols based on verifiable secret sharing that allow a collection of parties to perform data science with confidentiality, integrity, and availability.

In the remainder of today’s lecture, let me share a few assorted thoughts that (a) convey some extensions of this basic construction and (b) show how powerful the idea of verifiable secret sharing can be.

Vector commitments

One small downside of our construction is that we have to disperse a vector of commitments \(\vec{c}\).

  • The length of this vector equals the number of cloud providers \(n\), independent of the data size.
  • This is somewhat strange: if we increase the number of cloud providers to make the dispersal more secure, then we have to pay for this in extra storage size that is linear in \(n\).

By comparison, the Merkle tree looks nicer: the root \(h\) is independent of \(n\), and each individual proof \(\pi_i\) has size \(\log(n)\).

The good news is that it is possible to combine all of the commitments into a single element that has constant size, independent of both the message length and the number of providers!

  • The idea is simple and again is based on the hardness of discrete logarithms.

  • Rather than computing \(\vec{c} = [g_1^{x_1}, g_2^{x_2}, \ldots, g_n^{x_n}]\), we will instead compute a single commitment:

    \[\vec{c} = g_1^{x_1} \times g_2^{x_2} \times \cdots \times g_n^{x_n}.\]

This is called a vector commitment. The math of how to open such a commitment is not too difficult but beyond the scope of this course; it uses something called “bilinear maps.”

Relatedly: it is also possible to build a polynomial commitment, which means that you commit to a polynomial with something like \(c = g^{p(x)}\), and then can open the polynomial at a single point of your choice. This is useful when combined with the next comment…

Thresholds

We have shown today how to compose:

  • Verifiable secret sharing, which provides confidentiality and integrity, and
  • Byzantine Broadcast, which provides availability.

Notice that these constructions have two different thresholds at play here, of how many cloud providers need to collude in order to break different guarantees:

  • The privacy threshold \(t\) of cloud providers that would need to be malicious and colluding in order to reconstruct your secret data. (The protocol I showed today is based on additive secret sharing so \(t = n\), but it is possible to extend the techniques using polynomials to support an arbitrary \(t\).)
  • The agreement threshold \(f\) of providers that need to be faulty in order to break Byzantine Broadcast.

Without a PKI, we know that \(f < n/3\). But the privacy threshold \(t\) can actually be larger than this.

  • That is, we can protect the confidentiality of the data even in circumstances where the attacker controls enough parties that agreement breaks and we can no longer guarantee correct retrieval of the data.

  • This is a subtle point, and the papers on verifiable secret sharing missed it for decades!

A dual-threshold verifiable secret sharing scheme supports \(t > f + 1\). The highest privacy threshold possible is \(t = n\). We constructed such a scheme today: the privacy threshold was \(t = n\) but the agreement threshold might be lower if we don’t have a PKI.

Synchrony vs asynchrony

In today’s lecture, I’ve discussed protocols that have clearly-defined rounds – that is, I’ve assumed that the network is synchronous. It is important that the cloud providers receive their shares after a short delay, so that they can complain instantly and publicly if they haven’t received shares that are consistent with the commitment.

One could ask the same question in the asynchronous setting. That is, we could ask for asynchronous verifiable secret sharing. Constructing this primitive is more subtle than what we did today, for several reasons:

  • Termination is no longer guaranteed, as with all asynchronous protocols.
  • The best possible privacy threshold is now \(t = n - f\), since some of the faulty parties might never show up to reconstruction and you won’t know whether they’re slow or evading you.
  • The dispersal phase cannot include a step where a provider complains if their shares are inconsistent with the commitment, because asynchronous protocols are slow and the provider might not get its information for a long time.

To my knowledge, the most efficient asynchronous verifiable secret sharing scheme is one that former BU PhD student Nicolas Alhaddad, former BU undergrad Ziling Yang, and I built last year. It’s also dual-threshold. If you’re interested, here is the paper containing all of the details: https://eprint.iacr.org/2024/326.pdf.