[This post is part of my Auditing the Takoma Park Municipal Election series.]
We’ll now consider the auditing portion of the Takoma Park election. This is a little bit involved, so we’ll take our time. Importantly, the typical voter does not need to burden themselves with this complexity. All that Valerie, our voter, needs to do, is follow the voter experience description, which is quite straight-forward. The complexity of the auditing process is reserved for those who wish to audit the election. Anyone can be an auditor if they so choose, but no one is required to do so if all they want to do is cast a vote.
Recall that Valerie walked out of the booth with a piece of paper containing her ballot serial number and her confirmation codes:
This receipt corresponds to the ballot that Valerie cast:
though, of course, this ballot is now hidden from the public eye.
When the election closes, the election administrators post a list of all cast ballot IDs and their confirmation codes:
Valerie can check that her confirmation codes are listed correctly next to her ballot ID.
A little while later, the results of the election are announced:
- how can Valerie be certain that the confirmation codes listed on her ballot indeed correspond to the candidates she selected?
- given all of the confirmation codes posted for the world to see, how does this give the public proof that the tally was computed properly?
In voting system parlance, the first question is ballot casting assurance, the second is universal verifiability.
We’ll start with a simplified explanation that resolves both issues. Then we’ll notice that our approach completely ignores voter secrecy. Whoops. So then we’ll add some cool and simple cryptography techniques to restore voter secrecy.
Opening up sealed envelopes
Imagine that, for each answer to each question on each ballot, a confirmation code has been sealed in an envelope labeled with the appropriate ballot ID, the question number, and the candidate number. Specifically, for ballot 0042, there are 5 envelopes:
Now, assume that all of these envelopes are prepared before election day, and placed in a secure room that is under constant webcam and physical watch. When Valerie is assigned her ballot ID, #42, selects “Mickey Mouse”, and obtains the confirmation code CX, she can ask the election authorities to open the envelope pertaining to that ballot, question, and candidate choice, and ensure that the code matches.
Similarly, when it’s time to tally Valerie’s vote, all of ballot 42′s envelopes can be opened to discover which confirmation code matches up with which candidate, and the appropriate candidate counts can be tallied.
Any observer watching on the webcam, assuming they’re watching all the way through, can verify that the confirmation codes correspond to the candidates they were bound to before the election started. Of course, there’s no voter privacy, since the act of checking that a confirmation code corresponds to a given candidate immediately makes it clear that Valerie’s CX code corresponds to “Mickey Mouse”. Also, it’s not exactly practical to prepare physical envelopes for every ballot’s question and answer.
How do you mimic envelopes digitally?
The envelope seal-and-reveal trick can be recreated digitally using a cryptographic commitment scheme. A commitment scheme lets you take a piece of data to which you want to commit, i.e. “CX”, and produce a commitment, i.e. 8cmsdn3fcxcvdf so that, when others look at just that commitment string, they cannot discern the underlying value “CX”. Then, at some later time, you can reveal “CX” and how it corresponds to your published commitment string 8cmsdn3fcxcvdf. The commitment scheme must be such that 8cmsdn3fcxcvdf can only be opened as CX, not as some alternate value. In a sense, 8cmsdn3fcxcvdf is the sealed envelope, and the reveal process of the commitment scheme must ensure some link between the revealed value and that commitment string / envelope.
The nice thing about using cryptographic commitments is that there’s no need to place a physical room of envelopes under surveillance: just distribute all of the commitment strings to multiple parties before the election. It’s actually easier to do digitally than physically.
How do you build one of these commitment schemes? It depends on the specific security properties you’re looking for, but, approximately, a commitment scheme is a hash function with some randomness sprinkled it. (This is not a great idea if you want good security, but it’s close enough to the real thing, and it’s quite intuitive.) For example:
commit(m) = SHA1(random || m)
In other words, to commit to a message m, generate some randomness, concatenate m, hash, and produce the hash result as the commitment string. The one-wayness of the hash function prevents someone from guessing m from the commitment string.
Then, revealing the message involves simply revealing random and m. Anyone can re-perform the SHA1 hash to verify that m was indeed hiding there. The one-wayness of the hash function means that it’s going to be practically impossible for me to find a different random and m that together hash to the same commitment string, so once I commit, I can’t take it back.
[warning: do not use the above as a real commitment scheme, it's poor for a number of reasons, even if it's a good pedagogical tool. see don't hash secrets if you want to learn more.]
Cut and Choose
OK great, now we know how to commit and later reveal data digitally, but that doesn’t address the fact that, because we’re publicly revealing the confirmation-code-to-candidate correspondence for Valerie’s actual real-honest-to-goodness ballot, we’re carelessly revealing that Valerie voted for Mickey Mouse! Maybe we can audit some of the ballots, and let Valerie vote on a fresh one….
This is where another fantastic little crypto nugget comes in: cut-and-choose. The idea is this:
- produce twice as many ballots as we actually need,
- commit to all of their confirmation codes and publish all of the commitment strings for all to see,
- randomly select half of them for auditing: uncover all of the confirmation codes on each physical ballot, reveal the corresponding commitments, and compare.
If we do this procedure before the election, then we know that, with pretty high probability, the remaining ballots have the correct confirmation codes attached to their candidate names. Amazingly, though, no voter secrecy has been violated, since the actual ballots that Valerie and other voters will be using have not been revealed.
That’s the beauty of the cut and choose: audit half, use the other, giving us both verifiability and secrecy. Of course, this only works if the selection of ballots-to-audit is done sufficiently randomly that the election officials don’t know ahead of time which will be audited and which will be used for actual ballot casting. Thus the familiar notion of cut-and-choose: the people doing the cutting, i.e. the election officials with their commitment strings, are not the same as the people doing the choosing, i.e. the auditors who select which ballots to audit.
And since we can’t have every possible auditor involved in the random selection of audited ballots, we simply set a rule ahead of time: the selection of ballots to audit will be determined by a pseudorandom number generator seeded by some publicly verifiable and unpredictable data, i.e. the least significant digits of trade volumes for 100 pre-determined stocks on the New York Stock Exchange on the day of the election.
OK, so now Valerie can be confident, given the pre-election audit, that the ballot she’s using corresponds to the publicly committed confirmation codes. So what about the tally? Once we have a bunch of confirmation codes, how can we ensure that election officials are tallying ballots properly? We can’t pull the exact same cut-and-choose trick, since now we have marked ballots and selected confirmation codes that need to be tallied: we can’t simply throw away half of the cast votes to verify the tallying of the other half!
Instead, we’ll add indirection between the confirmation code and the candidate to which it corresponds. With one layer of indirection, we can play the same cut-and-choose game to reveal part of the path between confirmation code and candidate, but never the whole thing.
Consider specifically the first question in our sample election. Two candidate choices are available: “Mickey Mouse” and “Donald Duck”. Let’s decide that Mickey Mouse is candidate #1, and Donald Duck is candidate #2, just so we can easily refer to each candidate by number. For each ballot, election officials will perform the following steps:
- (a) start with the canonical ordering
- (b) pick a right-hand permutation, in this case a flip, and compute the intermediate candidate ordering
- (c) pick a left-hand permutation, in this case no flip, and compute the final candidate ordering
- (d) generate random confirmation codes and commit to them in the order of the final permuted candidate ordering, not in the order of the canonical candidate ordering.
At the end of this process, the officials have produced, for each ballot, an ordered list of sealed (committed) confirmation codes. The ordering may or may not be the canonical ordering of candidates, but there is a chain of transformations, performed in private, that link these commitments back to the real candidates to which they correspond.
How do we know that they’ve produced these confirmation codes and permutations correctly all the way through? Election officials produce commitments to the right permutation, the left permutation, and the confirmation codes. They place these commitments into a ballot table that contains the confirmation codes, a shuffle table that shuffles the order of the ballots, and a results table where the results will be announced. So, in private, the election officials produce:
which they then commit to:
Then, during the pre-election ballot audit, where we randomly select half of the ballots for auditing, we ask the officials to reveal all of these commitments for the audited ballots, thus showing that the permutations they prepared are correctly mapped, after both permutations, to the appropriate candidates in the canonical ordering:
and the corresponding ballots, where the mapping of confirmation code to candidate can be used to confirm the revealed permutations in the table above:
Notice how ballot #40′s confirmation codes are in the canonical order, because the two flip transformations cancel each other out. Meanwhile, ballot #41′s confirmation codes are in the reverse order, because one of its transformations is a flip. To determine the order of codes, one must know both transformations. The remaining ballots are thus proven well-formed with high probability, yet their candidate-order permutations remain hidden.
If this sounds confusing, it’s normal: we’re adding indirection on purpose to enable verification. Intuitively, once we’re sure that the permutation tables have been prepared correctly, for any given ballot that is actually cast, i.e. Valerie’s, we’re going to verify that either the right-hand permutation or the left-hand permutation was applied properly, but never both. Probabilistically, we’ll be pretty certain that there was no hanky-panky, but for no single vote will we know the complete permutation. Thus we’ll be able to trust the tally, but we won’t know how Valerie voted.
Back to the Ballot Casting
So the election is about to start. We know that the committed permutations and corresponding printed ballots are correct, because we audited half of them randomly and are using the remaining half. We now have physical ballots and a published set of commitments in three tables.
Again, the rows of the three tables are in different orders, otherwise it would be trivial to link a ballot to its canonicalized result, and thus to determine who voted for what. The correspondence between the three tables is stored in the middle, shuffle table, with each row referencing its associated row in each of the other two tables.
When Valerie casts her ballot, the scanner picks up that the first option to the first question was selected (which is Mickey Mouse). This information remains secret. Together, the election officials apply the complete permutation (right-hand then left-hand) to determine which coded candidate number this corresponds to. In our example, Valerie’s coded candidate selection to question 1 is #2. So the election officials open up the confirmation code commitment for candidate #2, and out pops “CX” for the world (including Valerie) to see associated with ballot #42′s first question. Valerie verifies this against her receipt. If it matches, given the pre-election audit verified by herself or by an auditor of her choice, she’s confident that the in-use ballots are good, and thus that her selection was correctly recorded.
Next, the election officials apply the left-hand permutation to all of the coded votes (privately, only they know how to do this), shuffle their order according to the shuffle table, and publish the permuted choices in the shuffle table for all to see. They then apply the right-hand permutation to these shuffle rows (again privately), shuffle the rows according to the shuffle table, and record the results in the results table.
Finally, the election officials are challenged: they must open up either the entire left-hand or right-hand of the shuffle table. If the left-hand is opened up, the public can verify that the corresponding ballot-table’s coded vote was permuted properly into the shuffle-table’s intermediate result. If the right-hand is opened up, the public can verify that the corresponding result-table’s decoded vote was permuted properly from the shuffle-table’s intermediate result. We’ll open up the right side:
Looks like the right side was performed correctly:
- the first shuffle table row was mapped to R4 with a flip transformation, which indeed turns intermediate result candidate #2 to canonical result candidate #1, Mickey Mouse.
- the second shuffle table row was mapped to R2 with a non-transformation, which indeed turns intermediate result candidate #2 to canonical result candidate #2, Donald Duck.
Since no complete path exists between the ballot able and the results table, every vote remains secret. However, given the random left/right auditing, cheating election officials are caught with 50% probability. Not good enough, of course, so we ask the election officials to generate many shuffle tables, while the ballot table remains fixed and the results table remains fixed. Each shuffle table is either entirely opened on the left, or entirely opened on the right. If all 20 verify, then election officials can get away with cheating only if they survive a 1 in a million game of chance. With 40 shuffle tables, the probability dwindles to one in a trillion, at which point these evil election officials are much better off trying their hand at the lottery, which they’ll win a couple times in a row before they manage to get away with faking the election results.
Take a step back. You’ve just witnessed one way to resolve the long-standing central dilemma of elections: how to verify a tally while keeping individual votes secret. If you’re followed this far, I hope you get the same sense of giddy amazement that I get when I think about these voting protocols. It’s a little bit of magic with enormous consequences: verifiable democracy.
Next we’ll cover the exact steps that the Scantegrity Takoma 2009 team are taking to prepare for the election, we’ll talk about my verification code, and we’ll start to really make sure that these ballot, shuffle, and results tables are generated correctly….