Takoma Park 2009: the conclusion

Well, it’s been a few weeks of craziness at home and catching up on other work, but I’ve finally wrapped up the Takoma Park 2009 audit. The final step: letting you, dear reader, run the audit all on your own.

You’ll find the complete instructions here on the auditing site.

I haven’t tested this on Windows, just Mac OS X, and it should work on Linux/Unix, too. You need Python 2.5 or above, PyCrypto, git, and subversion. You need about 30 minutes of download time, and 1 hour of processing. And then you can check the results you’ve computed against the results I’ve computed, against the official election results (which have some small variations since the results were certified, I’m not entirely sure why), and against the list of verification codes.

Takoma Park: verifying the shuffle and the unopened ballots

So the votes have been cast, the uncertified tally has been released, and the confirmation codes have been published for all voters to check. Now, it’s time to make sure that the coded votes, which were shuffled via the Shuffle Tables into the decoded votes in the Results table, were indeed shuffled and decoded correctly.

Having trouble remembering which table is which? Here’s a reminder:

pdr-setup

Now of course we don’t actually see these tables in cleartext, rather what we have right now is:

pdf-vote-cast

Next, the Scantegrity team used random stock data to seed a random number generator and decide which side to open up, the left or right. Now we said before it would be row-per-row in the shuffle table… but because of a subtle privacy issue, it turns out instead that there are 40 Shuffle Tables, and each one will be open either entirely on the left, or entirely on the right.

That’s what was done, and that’s what I verified just now in the Meeting 4 Audit.

Any Issues?

The same issue that I mentioned for Meeting 2 applies: the stock-data program pulls data that is, unfortunately, still being adjusted for a few days as stock trades reconcile, and thus it’s not possible for me to find the exact same seed. I have to trust that the Scantegrity team did it right. Not ideal, as I mentioned before, but also not extremely worrisome because the other parts of the audit provide a bit of insurance against any error here: next we’re going to open up every piece of data for the unused ballots, and there are a lot of those, and no carefully crafted randomness can hide cheating here.

Contested Ballots?

Scantegrity supports a “contested ballot” audit where, if a voter wants to contest that their confirmation code does not appear, all confirmation codes for that ballot are opened up to prove there was no hanky-panky. Since that original nomenclature, the Scantegrity team has decided to effectively act like *all* ballots are contested by default, so that all confirmation codes for all cast ballots are revealed to prove that there are no duplicates or other naughtiness. Of course, the correspondence between confirmation code and real, decoded candidate is not revealed.

If you voted in the Takoma Park election, you can check your ballot’s complete set of confirmation codes against my regenerated list.

Auditing Unused Ballots

For all ballots that were unused, the Scantegrity team is now forced to reveal all of the shuffle table rows and all of the confirmation codes. Since it’s not possible for the Scantegrity team to predict ahead of time which ballots would be used and which would remain unused, this audit gives us added confidence that the used ballots were okay, too, because the unused ones are.

Check out the Unused Ballot Audit. Everything checked out fine.

So… are we done?

Yes, we are! It looks like the Takoma Park election went well, and I can say with confidence that, if you voted in the election, wrote down your confirmation code, and checked it against my copy of the confirmation codes, then your vote counted the way you intended it to. And I can say this even without knowing how you voted. Pretty darn cool.

Over the next few days, I’ll write up a couple of followups regarding:

  • some recommendations for improving Scantegrity,
  • an explanation of what happens if the election officials are all corrupt, and
  • a script that lets you re-perform the entire Scantegrity audit on your own.

Takoma Park: auditing the auditor

Rick Carback from the Scantegrity team just pointed out to me that my totals are not quite the same as theirs, and he surmises that I may have read the Instant Runoff rules incorrectly. Specifically, my code considers that ballots that skip a rank, i.e. that go directly to choice #2 and never indicate a choice #1, are “exhausted”, meaning they don’t count anymore. In fact, the rules for Takoma Park state that, in that case, the next candidate choice counts, but if two choices are skipped, then it’s exhausted. He’s absolutely right, and I’ve updated my tally code appropriately, and now my numbers match….

Except for one more detail: the Scantegrity team is continuing the Instant Runoff candidate elimination past the point of a candidate gaining absolute majority. I think that’s wrong. It doesn’t affect the outcome, but it does affect the final tally count, so we’ll wait and see what the official word is.

In any case… isn’t it cool that we can audit each other and work out these differences with public code, public results, and complete oversight from anyone who wants to watch? That, again, is the power of open-audit elections using systems like Scantegrity or Helios.

Takoma Park: and those provisional ballots?

Coverage of the Takoma Park election continues, with a good article in Wired. And so does the audit!

Some people who showed up on election day couldn’t be verified as registered voters. Thanks to one of the useful HAVA provisions, they got to vote provisionally, meaning their ballot was set aside in an envelope labeled with their name, and their eligibility was checked later. A number of folks did turn out to be eligible, so their ballots need to be tallied. The Scantegrity team has scanned those ballots, and has thus updated its D (shuffle) and R (results) tables which now include all of the old plus new ballots.

So it’s time to run my audit code again, which I had to update for this new development (I didn’t realize we were counting provisional ballots, too!). It looks like all of the confirmation code reveals check out.

Here is the new list of confirmation codes, and the updated tally.

Though the vote totals did change, the low number of provisional ballots means that no race came close to having a new winner because of provisional ballots.

once upon a time, you mentioned something about vote secrecy…

The secrecy of provisional ballots is much less than that of normal ballots, since you can obviously see which candidates gained a small number of votes, and thus you can often tell how some provisional voters voted. I hear that Josh Benaloh has proposed a slightly different approach, which I think is a very nice twist: count *all* of the ballots by default, and then exclude the provisional ballots that fail. This would mean that only the privacy of the unqualified voters would be at stake, not that of the qualified voters, and that’s much nicer. Next time, maybe!

and now?…

We wait for complaints from voters who might not have been able to verify their confirmation codes. Then, once the complaint period ends, the Scantegrity team will meet again to do two last audits: the left-vs-right opening of the shuffle tables, and then the full opening of the spoiled ballots. The tallies already produced won’t change, but those meetings will confirm our confidence in those tallies.

Almost there!

Takoma Park: so can I see my confirmation code already?

[This post is part of my Auditing the Takoma Park Municipal Election series.]

So the votes have been cast, and voters went home. Some of them wrote down their confirmation codes. They probably checked those codes against the official Scantegrity web site. But why would they trust that web site to do all of the math right in the backend?

That’s where the audit work comes in. I’ve now run the Meeting 3 verification, and it looks good: the confirmation codes were properly opened, and I’ve posted my own re-computed version of the confirmation codes. If you’re a Takoma Park voter and you want extra certainty that your vote counted, you should check those confirmation codes and let me know if your confirmation codes don’t appear properly.

But it’s not just the confirmation codes, since we now have the unofficial tally. I’ve posted the tally that I have re-computed from these ballots. Very close to the preliminary results from the unverified opscan software itself, as in, off by only a couple of votes here and there in no way that comes close to changing the results.

Ummmm, but you said this was verifiable, so where is the discrepancy coming from?

There is paper involved, and there is scanning of paper involved. Whenever that happens, errors will occur, either in the normal opscan process, or in the reading of verification codes. If I had to bet money, I’d say it’s probably the opscan scanning that is off, while the Scantegrity code is exactly correct. But, we probably won’t know for sure, and it doesn’t make a difference as long as very few (hopefully no) voters complain about missing confirmation codes.

Now remember, again, this is the unverified tally. We have to give voters the chance to complain about their confirmation codes first. Only then will we run the final audit steps, Meeting 4 + the spoiled ballot checks.

Can I run this myself already?

Yes, check out my audit code from github:

git clone git://github.com/benadida/scantegrity-audit.git

and do a subversion checkout of the Scantegrity data:

svn checkout https://scantegrity.org/svn/data/takoma-nov3-2009

Instructions on how to run the verifications are in the README file, in this case:

python meeting3.py {DATA_DIR} {CONFIRMATION_CODES_OUTPUT_FILE_PATH}

for each of the 6 wards’ data directory. The confirmation codes are written to the given output file.

Then, to run the tally:

python tally.py {QUESTION_ID} {DATA_PATH_1} {DATA_PATH_2} ...

For QUESTION_ID 0, the mayoral race run across all wards, you’ll need all 6 data paths.

For QUESTION_ID 1, you’ll need to run tally.py against each individual ward’s data path, since those are different races.

Takoma Park: Meeting 2

[This post is part of my Auditing the Takoma Park Municipal Election series.]

OK, so a couple of days ago we verified the initial P table and D tables for all 6 wards in tomorrow‘s Takoma Park election. Now comes Meeting 2, which was held a couple of weeks ago to open up a random half of those ballot commitments to ensure that the P and D tables were generated correctly.

The short version of the story is that it all checks out, and the ballots look well-formed. Check out the detailed audit data.

That said, there was one issue that might reduce one’s confidence in the validity of the cut-and-choose, and there was another issue that was annoyingly preventing me from verifying the data until just now. So, let’s get our hands dirty and see …

Where do we get random numbers?

Meeting One was held on October 12th. How do I know this? I downloaded the data on October 13th, including the list of all files produced and their SHA1 fingerprints, and I signed it on October 13th at 3pm Pacific. You can download and verify the signature for yourself (DSA key ID 0F25B7E6). How can you be sure that’s my key? Well, you might want to ask me next time you see me in person, and I can confirm that the Scantegrity team didn’t hijack my blog and keep me locked up in a dungeon somewhere to prevent me from speaking out.

On October 14th, one day later, the Scantegrity team downloaded stock data from that day, using a script they had also committed to on October 12th as part of their Meeting 1 release. They used this stock data, which anyone can publicly verify and which is very hard to predict ahead of time, to generate the set of “challenge ballots,” meaning the P and D table rows that they would be forced to open.

Problem #1: Is Stock Data ever Final?

I discovered an annoying little issue: as it turns out, Google’s stock volumes are not stable, even a few hours after market close. They eventually add after-market trades, and there are trades that reconcile later that could affect the volume numbers. Indeed, on October 14th, the Scantegrity team got the following data:

NYSE:MMM    14-Oct-09,75.35,76.93,75.07,76.57,4120300
NYSE:AA     14-Oct-09,14.36,14.38,14.21,14.32,28884785
NYSE:AXP    14-Oct-09,35.27,35.31,34.57,35.09,15329442
NYSE:T      14-Oct-09,26.22,26.25,25.78,25.83,32644760
...

and today, I got the following data:

NYSE:MMM    14-Oct-09,75.35,76.93,75.07,76.57,4121804
NYSE:AA     14-Oct-09,14.36,14.38,14.21,14.32,28920161
NYSE:AXP    14-Oct-09,35.27,35.31,34.57,35.09,15334664
NYSE:T      14-Oct-09,26.22,26.25,25.78,25.83,32660582
...

Notice how the stock prices are the same, but the volumes are slightly higher in my dataset.

Of course the Scantegrity team didn’t do anything naughty here. But let’s be paranoid for a second.

Technically, because I can’t find a way to truly verify the original Scantegrity random-data seed, it’s conceivable that each line in this seed-file could be tweaked to any one of a few thousand values without detection, and thus that the officials could have done an exhaustive search of the hash domain to audit only the ballots they generated correctly, but never the ballots they “purposely” generated incorrectly. Those hypothetical incorrectly generated ballots could be set up to flip the selections of individual ballots in a way that we cannot detect right now, and if those ballots could be handed to people who are known to vote the “wrong” way, then they could be effectively forced to vote the “right” way.

Except, of course, that since each row can be audited with probability 50%, it would be computationally very difficult for a malicious administrator to cheat on more than 50 or 60 ballots. Very, very hard. 100 ballots? For all intents and purposes, impossible with today’s computing power. And then those 50 or 60 ballots would have to be handed to people you *know* are going to vote for the candidate you’re opposing… so in the worst-case scenario, with a very powerful adversary and a significant amount of coordination, this might swing the results by a few votes…..

Except not even that, because there is still a safety net: at the end of the election: the unused ballots will be spoiled and fully revealed. Chances are, if there is any significant number of bad ballots, they will be detected then, and an investigation can begin.

So this is a bit of a weakness, but not one that can realistically enable corruption of the election without detection. And of course, that’s the point: you can never prevent corruption attempts, but with open-audit voting like Scantegrity, you can detect it.

Another little tidbit

Even after this first pass, things still weren’t checking out, so I consulted with the Scantegrity team, and together we realized that the set of challenge ballots had been generated on Windows, but when the same program and random-data seed file were run on Linux or Mac, they generated a *different* challenge set. Why? It has do with carriage-return and line-feeds, and we’re still figuring out exactly how to prevent this in the future… but the point is that, by re-adding in the carriage return characters, everything checked out.

Is this a security vulnerability? No, it’s not, since there are only two possible representations for newlines, the Windows and Mac/Linux ways, so there’s no room to squeeze in any cheating here. It’s just a bug. And here’s what’s always been interesting to me about open-audit voting: your first verification might not work, your second verification might not work, because of annoying little bugs. But you can always iron out those bugs and re-run the verification. Because after all, there might be bugs in the audit procedure, too. With open-audit voting, you can often redo the audit, and when you do get a thumbs-up, you know things are in good shape. That’s powerful.

Conclusion… so far

Meeting 2 is verified… with the caveat that, hypothetically, we can’t be 100% certain that some hanky panky didn’t happen on the randomness generation. That’s okay, though, because realistically, in the worst-case scenario we can imagine, only a small handful of ballots could be affected, and in any case we’ll regain full confidence once we run the spoiled-ballot checker at the end.

One lesson I’m drawing from this: the cut-and-choose proofs based on public randomness are very tricky to pull off, because they can’t be re-done: the ballots are printed, and the challenged ballots are discarded. We can’t go back and re-do the proof of validity of the existing ballots. Since open-audit voting systems are powerful specifically because it’s always possible to undo something bad (re-vote, re-verify the tally, etc…), I wonder if Scantegrity might benefit a bit from a different proof protocol. I don’t know what that would look like yet, though….

In any case, Meeting 2 is verified to my satisfaction.

UPDATE: want to audit the data yourself? Go check out my audit code from github:

git clone git://github.com/benadida/scantegrity-audit.git

and do a subversion checkout of the Scantegrity data:

svn checkout https://scantegrity.org/svn/data/takoma-nov3-2009

Instructions on how to run the verifications are in the README file, in particular

python meeting1.py {DATA_DIR}

and

python meeting2.py {DATA_DIR}

making sure, for that second one, that you’ve copied the djia-stock-prices-latest.txt to {DATA_DIR}/pre-election-random-data.txt, where {DATA_DIR} is one of the wards.

Each verification of a ward’s single meeting will take a couple of minutes on an average PC. This isn’t the fastest audit code ever, it’s written to be easily audited, even if that makes it a bit slower than necessary.

Takoma Park Election: the 7 steps of auditing

[This post is part of my Auditing the Takoma Park Municipal Election series.]

If you’ve been following, we know what the voter experience is going to be like on Tuesday, and we know what the auditing process is going to be like. So, can we audit this thing already?

Yes, we can. Here are the steps:

  1. Meeting 1: the election officials get together, agree on election parameters, and generate the commitments to the Ballot Table of 5000 ballots (called the P Table for historical reasons) and the 40 Shuffle Tables (called the D Tables). Why 40 shuffle tables? It’s a way of increasing the certainty of the election verification, but we’ll talk about the details in a later blog post. The only thing to verify here is that the Ballot and Shuffle tables contain exactly the number of rows they’re supposed to contain.
  2. Meeting 2: the election officials get together and use some public source of randomness (stock closing prices) to decide which rows in the P table (and corresponding rows in the D tables) to open up for auditing in the cut-and-choose process. They then reveal those rows accordingly. For the remaining rows, meaning the ballots that will get printed and used for ballot casting, the election officials generate commitments to the confirmation codes and publish them. The job of the auditor here is to ensure that this random selection was done properly, that the row reveals match up with the data seen in Meeting 1, and that the left-hand and right-hand permutations in the D tables match up with the P table. The confirmation code commitments are good to record, but they can’t be checked yet.
  3. After the election happens, Meeting 3: the election officials get together and fill in the P tables with the encoded votes that were cast and the D tables with the intermediate decryption of these encoded votes. They also reveal the confirmation codes that voters should be able to check only, and how these confirmation codes correspond to the commitments from Meeting 2. An auditor should ensure that the ballots used here were not those audited in Meeting 2, and that the codes are revealed correctly. An auditor should also make available the list of confirmation codes that he verified.
  4. the Tally: well yeah, that’s kind of important… based on the Results Table, called the R table, published in Meeting 3, it’s straight-forward to re-compute the tally from the individual ballots there. In the case of the Takoma Park election, this is a relatively standard single-seat single-transferable election.
  5. Meeting 4: using public randomness again, the election officials open up either the left or right hand of each D table. The auditor must ensure that the randomness was properly used to generate the left/right challenges, that the reveals match the earlier commitments from Meeting 1, and that the revealed permutation, left-hand or right-hand, was properly applied: in the case of the left-hand permutation that the P-table vote is properly transformed into the intermediate D-table vote, in the case of the right-hand permutation that the D-table intermediate vote was properly transformed into the result, fully decrypted vote.
  6. Contested Ballots: if voters complain about a confirmation-code mixup after Meeting 3, then the contested ballots are fully opened up by administrators so that all confirmation codes can be revealed. An auditor should check that these contested ballots are properly opened (we expect very few.)
  7. Spoiled Ballots: in-person auditing staff will be randomly selecting ballots to audit, and all unused ballots will be audited too in this “spoiled-ballot” final check, where the full P-table row and corresponding D-table rows are revealed by election officials. The auditor should check that all reveals are done correctly and that all permutations match between the D tables and P table.

Oh yeah, and one more thing: everything has to happen 6 times because there are 6 wards, each of them run independently.

So, the election is in a couple of short days… where are we now?

Meetings 1 and 2 have occurred. And I have just audited Meeting 1, go see the Meeting 1 audit data.

I haven’t yet done the audit for Meeting 2, but I have already signed the files generated by the Takoma Scantegrity team, so that I can be certain that that data is locked and loaded. I’ll be auditing it shortly, before the Tuesday election of course.

Takoma Park 2009: Verifying the Tally from the Confirmation Codes

[This post is part of my Auditing the Takoma Park Municipal Election series.]

We’ll now consider the auditing portion of the Takoma Park election. This is a little bit involved, so we’ll take our time. Importantly, the typical voter does not need to burden themselves with this complexity. All that Valerie, our voter, needs to do, is follow the voter experience description, which is quite straight-forward. The complexity of the auditing process is reserved for those who wish to audit the election. Anyone can be an auditor if they so choose, but no one is required to do so if all they want to do is cast a vote.

Recall that Valerie walked out of the booth with a piece of paper containing her ballot serial number and her confirmation codes:

Ballot Receipt

This receipt corresponds to the ballot that Valerie cast:

Cast Ballot

though, of course, this ballot is now hidden from the public eye.

When the election closes, the election administrators post a list of all cast ballot IDs and their confirmation codes:

Valerie can check that her confirmation codes are listed correctly next to her ballot ID.

A little while later, the results of the election are announced:

results

So…

  1. how can Valerie be certain that the confirmation codes listed on her ballot indeed correspond to the candidates she selected?
  2. given all of the confirmation codes posted for the world to see, how does this give the public proof that the tally was computed properly?

In voting system parlance, the first question is ballot casting assurance, the second is universal verifiability.

We’ll start with a simplified explanation that resolves both issues. Then we’ll notice that our approach completely ignores voter secrecy. Whoops. So then we’ll add some cool and simple cryptography techniques to restore voter secrecy.

Opening up sealed envelopes

Imagine that, for each answer to each question on each ballot, a confirmation code has been sealed in an envelope labeled with the appropriate ballot ID, the question number, and the candidate number. Specifically, for ballot 0042, there are 5 envelopes:

Sealed Envelopes

Now, assume that all of these envelopes are prepared before election day, and placed in a secure room that is under constant webcam and physical watch. When Valerie is assigned her ballot ID, #42, selects “Mickey Mouse”, and obtains the confirmation code CX, she can ask the election authorities to open the envelope pertaining to that ballot, question, and candidate choice, and ensure that the code matches.

envelope-open-q1-c1

Similarly, when it’s time to tally Valerie’s vote, all of ballot 42’s envelopes can be opened to discover which confirmation code matches up with which candidate, and the appropriate candidate counts can be tallied.

Any observer watching on the webcam, assuming they’re watching all the way through, can verify that the confirmation codes correspond to the candidates they were bound to before the election started. Of course, there’s no voter privacy, since the act of checking that a confirmation code corresponds to a given candidate immediately makes it clear that Valerie’s CX code corresponds to “Mickey Mouse”. Also, it’s not exactly practical to prepare physical envelopes for every ballot’s question and answer.

How do you mimic envelopes digitally?

The envelope seal-and-reveal trick can be recreated digitally using a cryptographic commitment scheme. A commitment scheme lets you take a piece of data to which you want to commit, i.e. “CX”, and produce a commitment, i.e. 8cmsdn3fcxcvdf so that, when others look at just that commitment string, they cannot discern the underlying value “CX”. Then, at some later time, you can reveal “CX” and how it corresponds to your published commitment string 8cmsdn3fcxcvdf. The commitment scheme must be such that 8cmsdn3fcxcvdf can only be opened as CX, not as some alternate value. In a sense, 8cmsdn3fcxcvdf is the sealed envelope, and the reveal process of the commitment scheme must ensure some link between the revealed value and that commitment string / envelope.

The nice thing about using cryptographic commitments is that there’s no need to place a physical room of envelopes under surveillance: just distribute all of the commitment strings to multiple parties before the election. It’s actually easier to do digitally than physically.

How do you build one of these commitment schemes? It depends on the specific security properties you’re looking for, but, approximately, a commitment scheme is a hash function with some randomness sprinkled it. (This is not a great idea if you want good security, but it’s close enough to the real thing, and it’s quite intuitive.) For example:

commit(m) = SHA1(random || m)

In other words, to commit to a message m, generate some randomness, concatenate m, hash, and produce the hash result as the commitment string. The one-wayness of the hash function prevents someone from guessing m from the commitment string.

Then, revealing the message involves simply revealing random and m. Anyone can re-perform the SHA1 hash to verify that m was indeed hiding there. The one-wayness of the hash function means that it’s going to be practically impossible for me to find a different random and m that together hash to the same commitment string, so once I commit, I can’t take it back.

[warning: do not use the above as a real commitment scheme, it’s poor for a number of reasons, even if it’s a good pedagogical tool. see don’t hash secrets if you want to learn more.]

Cut and Choose

OK great, now we know how to commit and later reveal data digitally, but that doesn’t address the fact that, because we’re publicly revealing the confirmation-code-to-candidate correspondence for Valerie’s actual real-honest-to-goodness ballot, we’re carelessly revealing that Valerie voted for Mickey Mouse! Maybe we can audit some of the ballots, and let Valerie vote on a fresh one….

This is where another fantastic little crypto nugget comes in: cut-and-choose. The idea is this:

  1. produce twice as many ballots as we actually need,
  2. commit to all of their confirmation codes and publish all of the commitment strings for all to see,
  3. randomly select half of them for auditing: uncover all of the confirmation codes on each physical ballot, reveal the corresponding commitments, and compare.

cut-and-choose-ballots

If we do this procedure before the election, then we know that, with pretty high probability, the remaining ballots have the correct confirmation codes attached to their candidate names. Amazingly, though, no voter secrecy has been violated, since the actual ballots that Valerie and other voters will be using have not been revealed.

That’s the beauty of the cut and choose: audit half, use the other, giving us both verifiability and secrecy. Of course, this only works if the selection of ballots-to-audit is done sufficiently randomly that the election officials don’t know ahead of time which will be audited and which will be used for actual ballot casting. Thus the familiar notion of cut-and-choose: the people doing the cutting, i.e. the election officials with their commitment strings, are not the same as the people doing the choosing, i.e. the auditors who select which ballots to audit.

And since we can’t have every possible auditor involved in the random selection of audited ballots, we simply set a rule ahead of time: the selection of ballots to audit will be determined by a pseudorandom number generator seeded by some publicly verifiable and unpredictable data, i.e. the least significant digits of trade volumes for 100 pre-determined stocks on the New York Stock Exchange on the day of the election.

Indirection

OK, so now Valerie can be confident, given the pre-election audit, that the ballot she’s using corresponds to the publicly committed confirmation codes. So what about the tally? Once we have a bunch of confirmation codes, how can we ensure that election officials are tallying ballots properly? We can’t pull the exact same cut-and-choose trick, since now we have marked ballots and selected confirmation codes that need to be tallied: we can’t simply throw away half of the cast votes to verify the tallying of the other half!

Instead, we’ll add indirection between the confirmation code and the candidate to which it corresponds. With one layer of indirection, we can play the same cut-and-choose game to reveal part of the path between confirmation code and candidate, but never the whole thing.

Consider specifically the first question in our sample election. Two candidate choices are available: “Mickey Mouse” and “Donald Duck”. Let’s decide that Mickey Mouse is candidate #1, and Donald Duck is candidate #2, just so we can easily refer to each candidate by number. For each ballot, election officials will perform the following steps:

mixnet-construction

  • (a) start with the canonical ordering
  • (b) pick a right-hand permutation, in this case a flip, and compute the intermediate candidate ordering
  • (c) pick a left-hand permutation, in this case no flip, and compute the final candidate ordering
  • (d) generate random confirmation codes and commit to them in the order of the final permuted candidate ordering, not in the order of the canonical candidate ordering.

At the end of this process, the officials have produced, for each ballot, an ordered list of sealed (committed) confirmation codes. The ordering may or may not be the canonical ordering of candidates, but there is a chain of transformations, performed in private, that link these commitments back to the real candidates to which they correspond.

How do we know that they’ve produced these confirmation codes and permutations correctly all the way through? Election officials produce commitments to the right permutation, the left permutation, and the confirmation codes. They place these commitments into a ballot table that contains the confirmation codes, a shuffle table that shuffles the order of the ballots, and a results table where the results will be announced. So, in private, the election officials produce:

pdr-setup

which they then commit to:

pdf-committed

Then, during the pre-election ballot audit, where we randomly select half of the ballots for auditing, we ask the officials to reveal all of these commitments for the audited ballots, thus showing that the permutations they prepared are correctly mapped, after both permutations, to the appropriate candidates in the canonical ordering:

pdf-fullrow-audit

and the corresponding ballots, where the mapping of confirmation code to candidate can be used to confirm the revealed permutations in the table above:

pdf-fullrow-audit-ballots

Notice how ballot #40’s confirmation codes are in the canonical order, because the two flip transformations cancel each other out. Meanwhile, ballot #41’s confirmation codes are in the reverse order, because one of its transformations is a flip. To determine the order of codes, one must know both transformations. The remaining ballots are thus proven well-formed with high probability, yet their candidate-order permutations remain hidden.

If this sounds confusing, it’s normal: we’re adding indirection on purpose to enable verification. Intuitively, once we’re sure that the permutation tables have been prepared correctly, for any given ballot that is actually cast, i.e. Valerie’s, we’re going to verify that either the right-hand permutation or the left-hand permutation was applied properly, but never both. Probabilistically, we’ll be pretty certain that there was no hanky-panky, but for no single vote will we know the complete permutation. Thus we’ll be able to trust the tally, but we won’t know how Valerie voted.

Back to the Ballot Casting

So the election is about to start. We know that the committed permutations and corresponding printed ballots are correct, because we audited half of them randomly and are using the remaining half. We now have physical ballots and a published set of commitments in three tables.

Again, the rows of the three tables are in different orders, otherwise it would be trivial to link a ballot to its canonicalized result, and thus to determine who voted for what. The correspondence between the three tables is stored in the middle, shuffle table, with each row referencing its associated row in each of the other two tables.

When Valerie casts her ballot, the scanner picks up that the first option to the first question was selected (which is Mickey Mouse). This information remains secret. Together, the election officials apply the complete permutation (right-hand then left-hand) to determine which coded candidate number this corresponds to. In our example, Valerie’s coded candidate selection to question 1 is #2. So the election officials open up the confirmation code commitment for candidate #2, and out pops “CX” for the world (including Valerie) to see associated with ballot #42’s first question. Valerie verifies this against her receipt. If it matches, given the pre-election audit verified by herself or by an auditor of her choice, she’s confident that the in-use ballots are good, and thus that her selection was correctly recorded.

scanner

Next, the election officials apply the left-hand permutation to all of the coded votes (privately, only they know how to do this), shuffle their order according to the shuffle table, and publish the permuted choices in the shuffle table for all to see. They then apply the right-hand permutation to these shuffle rows (again privately), shuffle the rows according to the shuffle table, and record the results in the results table.

pdf-vote-cast

Finally, the election officials are challenged: they must open up either the entire left-hand or right-hand of the shuffle table. If the left-hand is opened up, the public can verify that the corresponding ballot-table’s coded vote was permuted properly into the shuffle-table’s intermediate result. If the right-hand is opened up, the public can verify that the corresponding result-table’s decoded vote was permuted properly from the shuffle-table’s intermediate result. We’ll open up the right side:

pdf-postelection-audit

Looks like the right side was performed correctly:

  • the first shuffle table row was mapped to R4 with a flip transformation, which indeed turns intermediate result candidate #2 to canonical result candidate #1, Mickey Mouse.
  • the second shuffle table row was mapped to R2 with a non-transformation, which indeed turns intermediate result candidate #2 to canonical result candidate #2, Donald Duck.

Since no complete path exists between the ballot able and the results table, every vote remains secret. However, given the random left/right auditing, cheating election officials are caught with 50% probability. Not good enough, of course, so we ask the election officials to generate many shuffle tables, while the ballot table remains fixed and the results table remains fixed. Each shuffle table is either entirely opened on the left, or entirely opened on the right. If all 20 verify, then election officials can get away with cheating only if they survive a 1 in a million game of chance. With 40 shuffle tables, the probability dwindles to one in a trillion, at which point these evil election officials are much better off trying their hand at the lottery, which they’ll win a couple times in a row before they manage to get away with faking the election results.

That’s it

Take a step back. You’ve just witnessed one way to resolve the long-standing central dilemma of elections: how to verify a tally while keeping individual votes secret. If you’re followed this far, I hope you get the same sense of giddy amazement that I get when I think about these voting protocols. It’s a little bit of magic with enormous consequences: verifiable democracy.

Next?

Next we’ll cover the exact steps that the Scantegrity Takoma 2009 team are taking to prepare for the election, we’ll talk about my verification code, and we’ll start to really make sure that these ballot, shuffle, and results tables are generated correctly….

Takoma Park 2009: the voter experience

For background on this post, check out the Auditing Takoma Park 2009 Election.

I’m gathering all documentation on a Google Site. This blog will continue to serve as the narrative, while the datasets and documentation will live on the Google Site, and I’ll refer to them as needed from this blog.

Let’s begin with an explanation of the voting process that Takoma Park citizens will experience on November 3rd, 2009.
(If you’re a Takoma Park resident: make sure to register by October 5th if you want to participate in this historic election!)

Say hello to Valerie, our token voter. At a high level, Valerie’s voting experience is identical to her past experience with a typical optical-scan election. She fills in the bubbles for the candidates of her choice, casts her ballot, and walks away. With one twist: if Valerie wants to, she can write down some confirmation codes that will let her audit her ballot later on.

More specifically:

Scantegrity II Ballot
  1. Valerie arrives at her precinct and identifies herself as she normally does for any other election.
  2. Valerie receives a ballot that looks exactly like an optical scan ballot: a sheet of paper with a bubble next to each candidate name. The ballot has a serial number.
  3. Valerie also receives a special “decoder” marker pen.
  4. When Valerie uses the pen to fill in a bubble, a two-letter code is revealed within the bubble.
  5. Valerie can record her two-letter confirmation codes for each candidate she has selected. This is her receipt. The printed ballot includes a detachable receipt, with the ballot serial number pre-printed, for just this purpose.

Filling out the Ballot

Then Valerie goes home. She’s got her receipt, and she wants to check that her vote was counted correctly.

Once home, Valerie logs into the verification web site, and looks up her ballot number, #00042. She compares the confirmation codes on the screen with the ones she recorded in the booth. If still unsure, she asks her favorite political organization to perform the check on her behalf. Importantly, this political organization doesn’t get to see the choices the Valerie actually made, only the confirmation codes.

So, now some questions:

  • what ensures that the confirmation code correctly maps to the option Valerie selected?
  • from there, how is the tallying of selected options verified?

In other words, great, Valerie’s got confirmation codes…. but what do they mean?

We’ll explore that in the next episode.

Auditing the Takoma Park Election

In November of this year, citizens of Takoma Park, Maryland will use the Scantegrity voting system in their municipal election. This is a significant milestone for open-audit voting systems: the first time a government official is elected using a voting system that is verifiable from start to finish by any observer, even resistant to insider attacks.

As I’m not a member of the Scantegrity team, the credit for this goes to the whole Scantegrity team.

Understandably, the Takoma Park Election Board wants an independent audit of this election. They asked for my help, and I happily agreed. I’m volunteering my time on this project in exchange for the freedom to tell everyone exactly how the auditing process goes. So, over the next few weeks, I’ll be preparing my election auditing code. I will be writing this code in public, for everyone to see, using the Scantegrity specifications, which I’ll explain and validate against the Scantegrity technical papers.

All code will be open-source and published, in real-time, on Github.

All updates on the process, including issues I find along the way in understanding / interpreting Scantegrity, I’ll post on this blog under the Takoma Park 2009 category.

I won’t look at or use the Scantegrity source code. Because I don’t need to! The point of an open-audit election is that it provides a mathematical proof of its correctness, and all I need to do is verify this proof, not the source code that produced it.

If you have questions or thoughts about how to make this audit most effective, please don’t hesitate to leave a comment or drop me an email. And if you want to learn about open-audit voting, then I suggest you roll up your sleeves and hack along with me, producing your own verification code. It’s not that hard, I promise, and it will show you in the most explicit way possible that anyone can audit an open-audit election.