Takoma Park: Meeting 2

[This post is part of my Auditing the Takoma Park Municipal Election series.]

OK, so a couple of days ago we verified the initial P table and D tables for all 6 wards in tomorrow‘s Takoma Park election. Now comes Meeting 2, which was held a couple of weeks ago to open up a random half of those ballot commitments to ensure that the P and D tables were generated correctly.

The short version of the story is that it all checks out, and the ballots look well-formed. Check out the detailed audit data.

That said, there was one issue that might reduce one’s confidence in the validity of the cut-and-choose, and there was another issue that was annoyingly preventing me from verifying the data until just now. So, let’s get our hands dirty and see …

Where do we get random numbers?

Meeting One was held on October 12th. How do I know this? I downloaded the data on October 13th, including the list of all files produced and their SHA1 fingerprints, and I signed it on October 13th at 3pm Pacific. You can download and verify the signature for yourself (DSA key ID 0F25B7E6). How can you be sure that’s my key? Well, you might want to ask me next time you see me in person, and I can confirm that the Scantegrity team didn’t hijack my blog and keep me locked up in a dungeon somewhere to prevent me from speaking out.

On October 14th, one day later, the Scantegrity team downloaded stock data from that day, using a script they had also committed to on October 12th as part of their Meeting 1 release. They used this stock data, which anyone can publicly verify and which is very hard to predict ahead of time, to generate the set of “challenge ballots,” meaning the P and D table rows that they would be forced to open.

Problem #1: Is Stock Data ever Final?

I discovered an annoying little issue: as it turns out, Google’s stock volumes are not stable, even a few hours after market close. They eventually add after-market trades, and there are trades that reconcile later that could affect the volume numbers. Indeed, on October 14th, the Scantegrity team got the following data:

NYSE:MMM    14-Oct-09,75.35,76.93,75.07,76.57,4120300
NYSE:AA     14-Oct-09,14.36,14.38,14.21,14.32,28884785
NYSE:AXP    14-Oct-09,35.27,35.31,34.57,35.09,15329442
NYSE:T      14-Oct-09,26.22,26.25,25.78,25.83,32644760
...

and today, I got the following data:

NYSE:MMM    14-Oct-09,75.35,76.93,75.07,76.57,4121804
NYSE:AA     14-Oct-09,14.36,14.38,14.21,14.32,28920161
NYSE:AXP    14-Oct-09,35.27,35.31,34.57,35.09,15334664
NYSE:T      14-Oct-09,26.22,26.25,25.78,25.83,32660582
...

Notice how the stock prices are the same, but the volumes are slightly higher in my dataset.

Of course the Scantegrity team didn’t do anything naughty here. But let’s be paranoid for a second.

Technically, because I can’t find a way to truly verify the original Scantegrity random-data seed, it’s conceivable that each line in this seed-file could be tweaked to any one of a few thousand values without detection, and thus that the officials could have done an exhaustive search of the hash domain to audit only the ballots they generated correctly, but never the ballots they “purposely” generated incorrectly. Those hypothetical incorrectly generated ballots could be set up to flip the selections of individual ballots in a way that we cannot detect right now, and if those ballots could be handed to people who are known to vote the “wrong” way, then they could be effectively forced to vote the “right” way.

Except, of course, that since each row can be audited with probability 50%, it would be computationally very difficult for a malicious administrator to cheat on more than 50 or 60 ballots. Very, very hard. 100 ballots? For all intents and purposes, impossible with today’s computing power. And then those 50 or 60 ballots would have to be handed to people you *know* are going to vote for the candidate you’re opposing… so in the worst-case scenario, with a very powerful adversary and a significant amount of coordination, this might swing the results by a few votes…..

Except not even that, because there is still a safety net: at the end of the election: the unused ballots will be spoiled and fully revealed. Chances are, if there is any significant number of bad ballots, they will be detected then, and an investigation can begin.

So this is a bit of a weakness, but not one that can realistically enable corruption of the election without detection. And of course, that’s the point: you can never prevent corruption attempts, but with open-audit voting like Scantegrity, you can detect it.

Another little tidbit

Even after this first pass, things still weren’t checking out, so I consulted with the Scantegrity team, and together we realized that the set of challenge ballots had been generated on Windows, but when the same program and random-data seed file were run on Linux or Mac, they generated a *different* challenge set. Why? It has do with carriage-return and line-feeds, and we’re still figuring out exactly how to prevent this in the future… but the point is that, by re-adding in the carriage return characters, everything checked out.

Is this a security vulnerability? No, it’s not, since there are only two possible representations for newlines, the Windows and Mac/Linux ways, so there’s no room to squeeze in any cheating here. It’s just a bug. And here’s what’s always been interesting to me about open-audit voting: your first verification might not work, your second verification might not work, because of annoying little bugs. But you can always iron out those bugs and re-run the verification. Because after all, there might be bugs in the audit procedure, too. With open-audit voting, you can often redo the audit, and when you do get a thumbs-up, you know things are in good shape. That’s powerful.

Conclusion… so far

Meeting 2 is verified… with the caveat that, hypothetically, we can’t be 100% certain that some hanky panky didn’t happen on the randomness generation. That’s okay, though, because realistically, in the worst-case scenario we can imagine, only a small handful of ballots could be affected, and in any case we’ll regain full confidence once we run the spoiled-ballot checker at the end.

One lesson I’m drawing from this: the cut-and-choose proofs based on public randomness are very tricky to pull off, because they can’t be re-done: the ballots are printed, and the challenged ballots are discarded. We can’t go back and re-do the proof of validity of the existing ballots. Since open-audit voting systems are powerful specifically because it’s always possible to undo something bad (re-vote, re-verify the tally, etc…), I wonder if Scantegrity might benefit a bit from a different proof protocol. I don’t know what that would look like yet, though….

In any case, Meeting 2 is verified to my satisfaction.

UPDATE: want to audit the data yourself? Go check out my audit code from github:

git clone git://github.com/benadida/scantegrity-audit.git

and do a subversion checkout of the Scantegrity data:

svn checkout https://scantegrity.org/svn/data/takoma-nov3-2009

Instructions on how to run the verifications are in the README file, in particular

python meeting1.py {DATA_DIR}

and

python meeting2.py {DATA_DIR}

making sure, for that second one, that you’ve copied the djia-stock-prices-latest.txt to {DATA_DIR}/pre-election-random-data.txt, where {DATA_DIR} is one of the wards.

Each verification of a ward’s single meeting will take a couple of minutes on an average PC. This isn’t the fastest audit code ever, it’s written to be easily audited, even if that makes it a bit slower than necessary.

Comments

One response to “Takoma Park: Meeting 2”

ACCURATE » Blog Archive » Tacoma Park: first ever e2e binding election

November 3, 2009

[…] Cryptographer Ben Adida, who is unaffiliated with the Scantegrity project or any other party in the election, has agreed to act as an independent auditor of the election. Working from nothing but the public specifications of how the system works, he’s independently verifying that the results are correct. […]