Benlog

security, privacy, transparency.

What the Oscars teach us about voting

Filed under: voting — February 27, 2010 @ 11:42 am

This year, the voting process for the Oscars has changed. Rather than indicating a single choice as they have done since 1946, members of the Academy will provide a first choice, a second choice, etc.. potentially ranking all 10 nominees for Best Picture if so desired. Some are speculating that this will affect the results. Some are writing really confusing articles about this change, with very misleading lines like “Getting the most votes is no longer enough.” Here’s the short version of this post: (1) of course ranked-voting is going to affect the Oscar results! and (2) this year, the result will actually reflect the will of the Academy far better than previous years.

Debating voting methodology can usually get very heated. In fact, if I say anything negative about ranked-voting, more formally called instant-runoff voting (IRV), a legion of IRV fans will descend upon this blog with tremendous fury. Thankfully, in this case, there’s little room for disagreement: it’s pretty obvious that IRV will much more adequately represent the opinion of the Academy. In fact, it’s surprising that the Academy has been using plurality single voting, which can easily yield wildly inaccurate results. It makes one question the validity of past Oscar winners, and not only because the election is completely un-auditable by anyone other than the designated auditor firm.

Say, for example, that 30% like Avatar best, 25% Hurt Locker, 20% Inglorious Bastards, 15% Up in the Air, and 10% District 9. (Apologies to the other Oscar nominees, but I need a simple example.) Using last year’s voting method, Avatar wins. With 30% of the vote. But wait, what if the fans of District 9 hated Avatar, and really prefer Hurt Locker second best? Since their first choice was District 9, a less popular movie, it seems they effectively don’t have an impact on the result of the election… unless we take their second choice into account. Ok, so we give those 10% to Hurt Locker, and now Hurt Locker wins. But wait, what if the fans of Up in the Air mostly prefer Avatar to Hurt Locker, so we eliminate “Up in the Air” for not having received enough votes, then give those to Avatar, then Avatar wins, but wait… you get the picture. It’s not that complicated. Basically, it means that if the movie you really want to see win has no chance of winning, then we’ll look at your second choice instead. The really crazy thing is that, with last year’s method, it’s conceivable that, even if all the fans of Inglorious Bastards, Up in the Air, and District 9 prefer Hurt Locker to Avatar, meaning that in a 2-way-only election, Hurt Locker would win 70-30, Avatar STILL wins under the system used for the last 64 years.

Because of this oddity, the fans of District 9 might realize that their favorite has no chance and be tempted to select only between the two favorites, Avatar and Hurt Locker. In other words, the dark horses are inherently handicapped. With IRV, there’s no reason to resort to such silliness: vote for the dark horse first if that’s really your preference, and if not enough others agree, your second choice will be “activated,” and you won’t have lost your chance to influence the result. So, this year, a dark horse movie has a better chance of winning. But not because the voting system gave the dark horse an unfair advantage! Rather, because IRV better represents the will of the Academy. Even if one of the favorites does win, it will be a much more legitimate win than every year prior.

And here’s the funny thing. That crazy plurality single vote system I just described… that’s how we vote for President in the United States.

Wait a minute…

Did I just imply that IRV is awesome? I should be more careful. Everything I just explained assumes that voters are well informed and rational. I’m willing to believe that voters are mostly rational, but I don’t think they’re well informed. Specifically, a voter might easily believe that voting first for District 9, then for Avatar yields a “weaker” vote for Avatar if District 9 is knocked out of the running. Or, they might think that voting only for District 9 will yield a stronger vote than if they add a second or third choice because, in some sense, District 9 is then the only acceptable winner for those single-movie voters. In other words, I suspect voters will still vote strategically with IRV, only this time with an incorrect, ill-informed strategy. This is speculation, I don’t have hard numbers to back it up, only (significant) anecdotal experience with voters who find IRV deeply confusing.

What we really want is a voting system that assumes realistic behavior from voters who are typically not fully informed experts. In a way, we need to reduce flexibility for voters so that the average voter will be less likely to choose an ill-informed strategy. That method is probably approval voting, where a voter marks every candidate they find acceptable. No ranking, just a checkmark next to each candidate. Instructions are then very straight-forward: mark every candidate you would be happy to see win. Not perfect in terms of ill-informed-strategy-resistance, but a heck of a lot better than all the misconceptions that come with IRV.

Oscar voting is actually even weirder

Of course, as if the insanity of the Oscars’ voting system over the last few years weren’t enough, there’s more weirdness.

To select the nominees, the Oscars effectively run a multi-seat Single Transferable Vote, which is like IRV where you rank the options, but this time you’re filling multiple spots. This is the way that Cambridge, Massachusetts elects its City Council, and it’s the way Australia elects its Parliament, and it’s incredibly confusing because of how votes are redistributed when a candidate is knocked out of the running or, more importantly, how to redistribute extra votes for a candidate that already has passed the victory bar. How confusing? Well, in Cambridge, the result of the election may depend on the order in which you count the ballots. Yep, you read that right, in a close election, the order of the ballots matters.

I’m not sure how it works exactly for the Oscar nomination process, but apparently the Oscars add a second complication: a nominee must be selected as a first choice by at least one person. Even if the movie is everyone’s second choice, it cannot be a nominee.

So, what this now means is that the Oscars are using a weirdly modified version of multi-seat Single Transferable Vote to select the nominees, and then a plurality single-vote to choose among those nominees, except this year where they’re re-running an IRV vote for Best Picture.

And to top it all off, you have to fully trust PriceWaterhouseCoopers, the auditors, who don’t even provide tallies, only the name of the winners.

Whoever said elections are simple?

For deniability, faking data even the owner can’t prove is fake

Filed under: crypto, privacy, voting — February 26, 2010 @ 5:29 pm

I was speaking with a colleague yesterday about Loopt, the location-based social network, the rise of location-based services and the incredible privacy challenges they present. I heard the Loopt folks give a talk a few months ago, and I was generally impressed with the measures they’re taking to protect their users’ data.

I particularly enjoyed the problem Loopt faced with respect to abusive spouses: if your spouse is spying on you, it’s not enough to turn off your location services, because then your abusive spouse will know that you’re hiding something. You have to actually be able to lie about your location, in other words Loopt has to let you fake your location data. And they do. And that’s awesome.

It’s just like voting: to be free to vote the way you want to vote, you have to be able to claim that you voted a certain way, even if you voted another way, and that claim has to be believable. In fact, when you think about it, because Loopt offers this “fake my data” feature, there’s no way for you to prove to someone else that you really are where you claim to be, at least not via Loopt. Because, if there were a way to say “okay, really, I’m here, no faking this time,” then there would be no deniability since abusive spouses could simply ask for the extra-no-faking version of the location.

In other words, to truly achieve deniability, you have to take away the user’s ability to certify their own data. That’s not obvious, and it’s interesting that location-based services and voting have this point in common.

Taxing Human Transactions – Part 1

Filed under: data, health, policy — February 18, 2010 @ 2:53 pm

The worst part of my job is dealing with the mess of document formats and coding systems in healthcare. The acronym soup is insane: HL7, CCD, CCR, CDA, Green CDA (which I just heard about from John Halamka’s blog but… no link!), and that’s just the document formats. Then there are coding systems like LOINC, SNOMED, SNOMED-CT, UMLS, ICD9, ICD10, RxNorm, … Interestingly enough, the issue is not how many there are. The issue is how they’re licensed. Here’s a screenshot from the HL7 website that should tickle your funny bone:

So, HL7 is unlocking the power of health information, and to do that they’re going to sell you a standard.

Meanwhile, the National Library of Medicine has toiled for years on the Unified Medical Language System (UMLS), which attempts to codify *everything* in medicine, from anatomy to viruses. It’s a pretty impressive piece of work. Conveniently, they provide a “meta-thesaurus” that maps other coding systems, like SNOMED, to UMLS. Brilliant! Awesome! Except… to use UMLS, you have to register. And you have to fill out a yearly survey. And you’re not allowed to redistribute the UMLS codes. Oh, and you have to sign a 10-page licensing agreement that explains how you can use UMLS, but you can only use SNOMED under these conditions, and this other coding system you can only use in these other conditions, and if you don’t have three lawyers and a few weeks on your hands, good luck answering this simple question: “can I use this in my open-source library and release it freely to the world?”

Imagine, for a second, if we had a similar situation without computers. Doctors would have to pay a fee to speak official medical terms when discussing your health. You would have to pay a fee to have those terms translated into plain English. Canon would have to pay a licensing fee before making fax machines able to send medical documents from one doctor to another. In short, every time a health transaction occurs using standardized language, there would be a tax.

This is insane. Folks in the health IT world are focused on much harder problems while ignoring this blatant ball-and-chain on innovation.

I submit that the quickest path to health-IT reform is the complete and unconditional freeing of these medical vocabularies and data formats. And I mean complete. No access fees, no yearly surveys, no constraint on redistribution, country of origin, commercial or non-commercial. Free. like HTTP and HTML. Like English. Like a patient-doctor conversation.

Take a precise example: my group at Children’s Hospital Boston just released Indivo X, the latest version of our Personally Controlled Health Record. It’s great, but there’s one key feature we had to strip out before shipping this free, open-source tool built using federal grant money: SNOMED codes. Sure, we’re a hospital with a license, we can use them internally. But we can’t redistribute them. So now, to install Indivo, instead of a 30-minute process, you need to go get a UMLS ID, wait 3 days for approval, then download the files, extract the codes we think are useful, and load them into the database. No exaggeration, you’ve now multiplied your time-to-working-install by 100.

This must change. Either the existing formats must be opened up, or new formats must emerge that do to the existing formats what HTTP and HTML did to Gopher: kill them with freedom. Taxing human interactions, simply because they’ve been digitized, is an unacceptable brake on innovation, and in a complex field like Health IT, it’s the last thing we need and the first thing we need to eliminate.

Buzz Kill

Filed under: policy, privacy — February 13, 2010 @ 9:20 pm

Everyone is talking about the privacy disaster that was the Google Buzz launch, and oh my goodness it was. I’ve never been so thankful that I don’t use gmail. I’m frankly surprised that they didn’t do a smaller beta first, or that there isn’t a group at Google charged with thinking about the privacy implications of every product release who would have clearly screamed “stop!”

If you want to think about the deep issues at play here, you really want to be reading Arvind Narayanan’s blog in general, and in particular his post on this issue:

When I enabled Buzz and realized what had happened, something changed for me in my head. I’d always regarded email and chat as a private medium. But that’s not true any more; Google forced me to discard my earlier expectations. Even if Google apologizes and retracts auto-follow (not that I think that’s likely), the way I view email has permanently changed, because I can’t be sure that it won’t happen again. I lost some of the privacy expectation that I had of not only Google’s services, but of email and chat in general, albeit to a lesser extent.

What I’ve tried to do in the preceding paragraphs is show in a step-by-step manner how Google’s move changed social norms. The larger players like Google and Microsoft have been very conservative when it comes to privacy, unlike upstarts like Facebook. So why did Google enable auto-follow? By all accounts, their hand was forced: they needed a social network to compete with Facebook and Twitter. Given the head-start that their competitors have, the only real way to compete was to drag their users into participating.

This is what deeply worries me about the current Cloud: for the convenience of universal access to our data, we are giving up control in the long run. We imagine these providers, Google, Facebook, etc., to be good custodians of our data, but their strategy, their needs, may significantly affect the way they do their jobs. Sometimes this is good: users will be protected by these custodians. But often, this will be bad in ways we can hardly imagine.

I mean, think about it: would you have believed it if two weeks ago, someone told you that Google was about to make public the list of the top 25 people you email? Heresy! That would be gmail suicide! And yet it happened. The backlash is strong, the feature will probably change, but in many ways the damage is done, and Google will probably suffer a lot less than one would have expected a priori.

As a computer scientist with a penchant for security, privacy, and autonomy, I hope I’m not the only one who feels I have a professional duty to help people avoid these kinds of situations. Computer scientists who handle other people’s data need a professional code of privacy ethics, and there need to be serious consequences, legal and financial, to this type of negligence.

I was wrong about the iPad

Filed under: policy, security — January 31, 2010 @ 4:00 pm

So I made a couple of predictions about the iPad, Apple’s tablet, and I realize in retrospect that, while I got some of the details right, I got the gist completely wrong. I thought it was going to be a special-purpose device. And most commentators are saying just that. But I was wrong and they are wrong. The iPad is very much meant to be a new approach to how we use computers in general. Still think it’s just a big iPhone? Watch these few minutes of video, a summary of how you interact with the iPad to create slides and edit documents in Apple’s productivity suite:

This is different. Much more natural to use, a different experience altogether. It’s going to sell like mad, and developers will be building apps for this in no time.

The real Apple fanboys (I’m only a poser Apple fanboy) are saying almost what I’m saying: this is a new model of computing, the critics are suffering from future shock. Yes, and yes.

That said, the Apple fanboys are taking one critical step too many by accepting the hand-waving argument that this revolutionary computing model justifies the Apple-controlled App Store. Apparently, it’s like driving an automatic vs. a stick-shift, or better yet it’s like the Prius where you need special skills to maintain it. Spare me the kool-aid, these analogies are incredibly bad. If you really want to use that analogy, at least realize that adding your own app to a computer is more like installing a GPS on the dashboard, not tuning the engine. Would you be okay with a Prius if somehow you didn’t have the right to install Honda-made seat covers, or tires made by Michelin? Well, if the Prius were good enough, you’d grind your teeth and deal, but in what world would you argue that it’s a feature that you can only install seat-covers approved by Toyota?

Yes, the iPad looks amazing, and yes, it will sell lots, and yes, it will redefine the way we interact with computers. But would we lose any of those things if Apple allowed you to add your own applications? No. The Apple death-grip is entirely orthogonal to all of those wonderful things. There could be a scary-red toggle deep down in the preferences, or a magical swipe pattern, or a software download from the Apple site with a big fat warning that says “be careful, if you enable the ‘risky install’ feature, you may be forced to reset your iPad to factory settings.” Most people would use the iPad untouched, but the ability to open it up to other stores would bring more competition and would prevent the App Store overlords from making clearly anti-competitive decisions like rejecting the Google voice app.

So I was right about one thing: the iPad is going to move us one step closer to Zittrain’s dystopic Future of the Internet. But because the iPad is much more of a general computing device than I expected, that step is going to be a much larger step, and Zittrain’s vision is coming true much faster than I thought. And that part is incredibly sad, no matter how badly I want to edit slides using finger-swipe gestures.

Sometimes it’s not counter-intuitive

Filed under: crypto, security — December 27, 2009 @ 5:36 pm

Bruce Schneier writes that it’s reasonable for unmanned drones to broadcast unencrypted video streams, because

  1. the video stream is not that useful to enemies, and
  2. given that many people need access to the video feed, the key distribution problem would be very difficult to manage, and some allies could be severely handicapped if they happened not to have the key.

So, Bruce is typically fantastic at finding those interesting areas of security where the answer is counter-intuitive. But huh? How can both of those points be true? If the video stream is valuable to allies, then I’m guessing it’s valuable to enemies.

But let’s say that, somehow, these contradictory points are, in fact, both true.

There isn’t a key management problem here. The command-and-control signal is already encrypted and authenticated, so the video feed could be encrypted via the same exact route back to the home base (which needs to happen anyways so the NSA pilots can, you know, pilot), at which point it is decrypted and can be syndicated to allies, troops on the ground, commanders, etc… I just don’t see the argument for the signal to be directly received by local troops, when the one person who needs the signal the most anyways is already sitting thousands of miles away.

Bruce is right that key management is often a very complicated problem. But I just don’t see how it’s relevant in this case.

a prediction regarding the Apple “Tablet”

Filed under: autonomy, policy — December 26, 2009 @ 8:31 pm

Why a prediction? Eh, cause it’s fun and cause I think the Apple Tablet will have a large impact on consumer computing.

I think Apple will launch a tablet computer in January that will be aimed at saving TV and print journalism. On-demand video and on-demand print magazines and newspapers will be at the forefront. And because those industries want Digital Rights Management, the Tablet will run the iPhone OS so that only approved apps can be installed. It will be great, and the “App Store” concept will continue to rock the house.

In the meantime, Zittrain’s Future of the Internet will be one gigantic step closer, with consumer computing devices tightly controlled by one benevolent dictator. For most people, this will be a very good thing. For innovation, this will be a very bad thing. But it may take a while before people miss it. After all, did people miss Skype before they ever knew it was possible?

Happy 2010, and here’s to hoping we can come up with safe and generative software platforms.

Takoma Park 2009: the conclusion

Filed under: Takoma Park 2009, crypto, voting — December 23, 2009 @ 9:20 pm

Well, it’s been a few weeks of craziness at home and catching up on other work, but I’ve finally wrapped up the Takoma Park 2009 audit. The final step: letting you, dear reader, run the audit all on your own.

You’ll find the complete instructions here on the auditing site.

I haven’t tested this on Windows, just Mac OS X, and it should work on Linux/Unix, too. You need Python 2.5 or above, PyCrypto, git, and subversion. You need about 30 minutes of download time, and 1 hour of processing. And then you can check the results you’ve computed against the results I’ve computed, against the official election results (which have some small variations since the results were certified, I’m not entirely sure why), and against the list of verification codes.

It’s a WRAP followup: maybe the goal was client-side certs?

Filed under: security, web — December 23, 2009 @ 2:48 pm

I’m having some interesting offline followup discussions with folks about oAuth WRAP and my relatively negative reaction to it. One of the comments seems to be that SSL will recreate exactly the security that HMAC signatures were trying to achieve, and it was really hard for developers to do oAuth right in the first place.

I definitely sympathize with “it’s hard to get security right,” and I certainly agree that we should begin to standardize oAuth libraries ASAP. The reference implementations are okay, but they’re not solid enough for widespread standardization, which means people are cooking up their own, which is bad news. So yes, being able to delegate the security implementation to a well tested library is a good idea.

But I don’t think server-side SSL replaces the security we got from HMAC-signed requests. The key idea of signed requests is that if I intercept a request, I can’t steal the credentials. The only way SSL compares is if certificates are definitely checked or if client-side certificates are used for authentication. I don’t buy the argument that oAuth WRAP client-side libraries will do proper certificate checking. And I think that, while very cool, client-side SSL certs would make life potentially more complicated for developers (and would rule out JavaScript implementations for the foreseeable future.)

I’m very open to the idea of simplifying oAuth, and maybe there’s something to oAuth WRAP that I’m not seeing…. but the point is, the current oAuth WRAP security claims are, I believe, misguided in practice, and I hope the oAuth WRAP team thinks this through a bit more before all the big name web sites standardize on it, and the next favored technique for hacking your Facebook account is DNS spoofing the oAuth WRAP transaction.

It’s a WRAP

Filed under: security, web — December 22, 2009 @ 1:58 pm

I’m just finding out about oAuth WRAP, a new, simplified version of oAuth which some are calling the “valet key” approach to web data sharing: don’t give your Facebook password to a random web app, instead use oAuth to mint them a valet key that lets the app access only some specific portions of your Facebook profile. I like and use oAuth, so I was a little bit confused as to what WRAP brings to the table. I read up a bit:

The main difference between OAuth and OAuth WRAP is that WRAP does not have elaborate token exchanges or signature schemes. Instead, all server-to-server WRAP calls happen via SSL. The “access token,” which grants your client the ability to make API calls on a user’s behalf, is protected by SSL rather than by a shared secret and signature scheme.

If I understand correctly, the experience is the same for the user, the connectivity requirements between the data host and the third-party consumer remain the same, but now we’ve gotten rid of those pesky signatures and instead we’re sending tokens in an HTTP header a bit like a password.

So, like a password, it can be replayed. And intercepted via Man-In-The-Middle. Oy.

But wait, you say, don’t worry, the token is sent over SSL, so it’s all good.

Right. What’s going to happen when someone “forgets to turn on SSL”, which is all too common when security is abstracted out “somewhere down in the stack.” Or when we stop dealing with those pesky certificate errors and just choose not to validate the cert, which will leave the protocol wide open to network attackers who can now literally play man-in-the-middle just by spoofing DNS on a wifi network, capturing the token, and replaying it to access all sorts of additional resources, effectively stealing the user’s credentials.

This might actually be worse than passwords, because at least you can work to educate users about SSL (and after their Facebook account gets hacked, they might actually care), but it’s very hard for users to gauge whether web applications are doing the right thing with respect to SSL certs when the SSL calls are all made by the backend which has trouble surfacing certificate errors.

I understand. Security is hard. Getting those timestamps and nonces right, making sure you’ve got the right HMAC algorithm… it’s non-trivial, and it slows down development. But those things are there for a reason. The timestamp and nonce prevent replay attacks. The signature prevents repurposing the request for something else entirely. That we would introduce a token-as-password web security protocol in 2010 is somewhat mind-boggling.

I see reasons to simplify oAuth. Maybe rethink the combination of consumer and access secrets, which is a bit messy. Maybe rethink the token renewal process and make it part of the core. But removing signatures? I think this is asking for long-term trouble in exchange for a modest amount of short-term simplicity.