Power & Accountability

So there’s this hot new app called Secret. The app is really clever: it prompts you to share secrets, and it sends those secrets to your social circle. It doesn’t identify you directly to your friends. Instead, it tells readers that this secret was written by one of their friends without identifying which one. The popularity of the app appears to be off the charts, with significant venture-capital investment in a short period of time. There are amazing stories of people seeking out emotional support on Secret, and awful stories of bullying that have caused significant uproar. Secret has recently released features aimed at curbing bullying.

My sense is that the commentary to date is missing the mark. There’s talk of the danger of anonymous speech. Even the founders of Secret talk about their app like it’s anonymous speech:

“Anonymity is a really powerful thing, and with that power comes great responsibility. Figuring out these issues is the key to our long-term success, but it’s a hard, hard problem and we are doing the best we can.”

And this is certainly true: we’ve known for a while that anonymous speech can reveal the worst in people. But that’s not what we’re dealing with here. Posts on Secret are not anonymous. Posts on Secret are guaranteed to be authored by one of your friends. That guarantee is enabled and relayed by the Secret platform. That’s a very different beast than anonymity.

In general, if you seek good behavior, Power and Accountability need to be connected: the more Power you give someone, the more you hold them Accountable. Anonymity can be dangerous because it removes Accountability. That said, anonymity also removes some Power: if you’re not signing your name to your statement, it carries less weight. With Secret, Accountability is absent, just like with anonymous speech, but the power of identified speech remains in full force. That leads to amazing positive experiences: people can share thoughts of suicide with friends who can help, all under the cloak of group-anonymity that is both protecting and empowering. And it leads to disastrous power granted to bullies attacking their victims with the full force of speaking with authority – the bully is one of their friends! – while carrying zero accountability. That kind of power is likely to produce more bullies, too.

This is so much more potent that anonymity. And if this fascinating experiment is to do more good than harm, it will need to seriously push the envelope on systems for Accountability that are on par with the power Secret grants.

Here’s a free idea, straight out of crypto land. In cryptographic protocols that combine a need for good behavior with privacy/anonymity protections, there is often a trigger where bad behavior removes the anonymity shield. What if Secret revealed the identity of those users found to be in repeated violation of a code of good behavior? Would the threat of potential shame keep people in line, leaving the good uses intact while disincentivizing the destructive ones?

there are 3 kinds of crypto

When we use terminology that is too broad, too coarse-grained, we make discussion more difficult. That sounds obvious, but it’s easy to miss in practice. We’ve made this mistake in spades with crypto. Discussing the field as one broad topic is counter-productive and leads to needless bickering.

I see 3 major kinds of crypto: b2c crypto, b2b crypto, and p2p crypto. I suggest that we use this terminology consistently to help guide the discussion. We’ll spend less time talking about differences in our assumptions, and more time building better solutions.

b2c crypto

Business-to-Customer Crypto (b2c) is used to secure the relationship between an organization and a typical user. The user roughly trusts the organization, and the goal of b2c crypto is to enable that trust by keeping attackers out of that relationship. Both the organization and the user want to know that they’re talking to each other and not to an impostor. The organization is usually acting like an honest-but-curious party: they’ll mostly do what they promise. The b2c-crypto relationship is common between Internet service providers (in the broad sense, including Google, Amazon, etc.) and typical Internet users, as well as between employees and their employer’s IT department.

Web-browser SSL is a great example of b2c crypto. Users start with a computer that has at least one web browser with a set of root certs. Users can continue using that browser or download, over SSL secured by those initial root certs, another browser they trust more. Users then trust their preferred browser’s security indicators when they shop on Amazon or read their Gmail.

A critical feature of b2c crypto is that users don’t ever manage crypto keys. At best they manage a password, and even then they’re generally able to reset it. Users make trust decisions based on brands and hopefully clear security indicators: I want a Mac, I want to use Firefox, and I want to shop on Amazon but only when I see the green lock icon.

b2b crypto

Business-to-Business (b2b) crypto is used to secure the relationship between organizations, two or more at a time. There are two defining characteristics of b2b crypto:

  • all participants are expected to manage crypto keys
  • end-users are generally not involved or burdened

DKIM is a good example of b2b crypto. Organizations sign their outgoing emails and verify signatures on incoming emails. Spam and phishing are reduced, and end-users see only the positive result without being involved in the process. Organizations must maintain secret cryptographic keys for signing those emails and know how to publish their public keys (usually in DNS) to inform other organizations.

OAuth qualifies as b2b crypto. Consumers and Producers of Web APIs establish shared secret credentials and use them to secure API calls between organizations.

Another good example is SSL certificate issuance. Both the web site seeking a certificate and the Certificate Authority have to generate and secure secret keys. The complexity of the certification process is mostly hidden from end-users.

p2p crypto

Peer-to-Peer (p2p) crypto is used to secure communication between two or more crypto-savvy individuals. The defining characteristic of p2p crypto is that the crypto-savvy individuals trust no one by default. They tend to run code locally, manage crypto keys, and assume all intermediaries are active attackers.

PGP is a great example of p2p crypto. Everyone generates a keypair, and by default no one trusts anyone else. Emails are encrypted and signed, and if you lose your secret key, you’re out of luck.

so how does this help?

This naming scheme provides a clear shorthand for delineating crypto solutions. Is your wonderful crypto solution targeted at the general public? Then it’s probably a combination of b2c crypto for users and b2b crypto for organizations that support them. Are you building a specialized communications platform for journalists in war zones? Then it’s probably p2p crypto.

The implementation techniques we use for various kinds of crypto differ. So when some folks write that Javascript Crypto is considered harmful, I can easily respond “yes, dynamically-loaded Javascript is a poor approach for p2p crypto, but it’s great for b2c crypto.” In fact, when you look closely at a similar criticism of Javascript crypto from Tony Arcieri, you see this same differentiation, only with much more verbiage because we don’t have clear terminology:

Before I keep talking about where in-browser cryptography is inappropriate, let me talk about where I think it might work: I think it has great potential uses for encrypting messages sent between a user and the web site they are accessing. For example, my former employer LivingSocial used in-browser crypto to encrypt credit card numbers in-browser with their payment processor’s public key before sending them over the wire (via an HTTPS connection which effectively double-encrypted them). This provided end-to-end encryption between a user’s browser and the LivingSocial’s upstream payment gateway, even after HTTPS has been terminated by LivingSocial (i.e. all cardholder data seen by LivingSocial was encrypted).

This is a very good thing. It’s the kind of defense that can prevent the likes of the attack against Target’s 40M customers last month. And that’s exactly the point of b2c crypto.

most users can’t manage crypto keys

I use the term p2p crypto because I like to think of it as “Pro-to-Pro.” Expecting typical Internet users to engage in p2p crypto is, in my opinion, a pipe dream: typical users can’t manage secret crypto keys, so typical users must rely on organizations to do that for them. That’s why successful general-public crypto is a combination of b2c crypto between individuals and the organizations they choose to trust, and b2b crypto across organizations. More expertise and care is expected of the organizations, little is expected of individual users, and some trust is assumed between a user and the organizations they choose.

You don’t have to agree with me on this point to agree with the nomenclature. If you’re interested in protocols where individuals manage their own secret keys and don’t trust intermediaries, you’re interested in p2p crypto. I happen to think that p2p crypto is applicable only to some users and some specific situations.

it’s the randomness, stupid

The New York Times is reporting that a flaw has been found in RSA. The original paper is here, and it looks like a second team was about to release similar information, so they’ve posted an explanatory blog post, which I recommend. A number of people are understandably concerned.

Since I couldn’t find a simple explanation of what happened, I figured I would write one up.

public-key encryption

Public-key encryption is fascinating. You generate a keypair composed of a public and a private key. You post the public key on your web site, and anyone can use it to encrypt data destined for you. Decryption requires the private key, and you keep it to yourself. Anyone can encrypt; only you can decrypt. This is how your web browser secures the communication channel with your bank: the bank publishes its public key, and your browser encrypts data against that public key before sending it to the bank. An eavesdropper knows the bank’s public key, but because they don’t have the private key, they can’t see the data you’re sending.

how do I get me one of them keypairs?

Cryptographic keys are just numbers with special properties. In a public-key encryption system, you typically pick one of these special big numbers randomly, you make that your private key, and from it you compute the public key. It’s easy to compute the public key from the private key. However, given the public key, it’s really, really hard to recover the private key. We don’t know how to do it without spending millions of years of computation for your average public key. That’s right, a single user’s public key, attacked with vast computational power, will not yield its corresponding private key. But if you have the private key, you can get the public key in a few milliseconds. That’s the magic, except it’s not magic, it’s math.

what do you mean by “randomly”?

If you barricade your front door, a thief will probably come in via the window. And so it is with public-key encryption. Attacking a public key directly in order to somehow extract its private counterpart is really, really hard. But maybe it’s not so hard to guess how that private key was selected in the first place. Remember, you have to pick the private key randomly. And, as it turns out, computers are really bad at picking numbers at random (humans are only marginally better.) So, if you’re not careful about how you picked that private key, an attacker might simply reconstruct how you picked it.

lots of people picking lots of keys

Cryptography is everywhere now, so there are millions of public keys made available on the Web. Just go to https://amazon.com, and Amazon will tell you its public key. If a bunch of folks use a not-so-random way to pick their private key, you might expect funny coincidences to happen. Alice in San Francisco and Bob in New York might independently end up with the same private key simply because they both used a similar process for selecting this private key from all possible values. If that happens, they would also have the same public key, and you would be able to easily discover this: just compare their public keys! The researchers found that this happens every now and then: they found a couple dozen public keys that were identical to at least one other public key. In and of itself, that’s kind of fascinating. But it’s not really shocking, right? Clearly, if people have the exact same public key, then they picked their private key poorly.

the funny thing about RSA

The funny thing about RSA, the most common approach to public-key encryption, is that its private key is composed of two numbers, both prime (which means they are divisible only by themselves and 1, for example: 11, 17, 41,…). The public key is then the product of those two primes. As it turns out, it’s really easy for computers to multiply numbers, even really big ones. But if you’re only given the product of two primes, it’s incredibly hard to recover those two factors. For example, take two primes each 200 digits long, multiply them together to get a 400-digit long number, and give that to a friend. Given all the computing power in the world for many lifetimes, your friend will not be able to recover those two prime numbers you initially picked.

Now, there are lots and lots of prime numbers. So many, in fact, that if you and I randomly select a 200-digit prime number, there is no conceivable chance we’d pick the same one. But, what if we don’t do it randomly? What if we both start out with 1 followed by 199 0s, and work our way up until we find the first prime number? Then of course we’d end up with the same one. Now maybe we’re not so stupid, and we have a clever way of picking a much more complex starting point, and then working our way up to find the next prime. Well, let’s hope we don’t both use the same clever method, because no matter how clever it is, if we both use the same method, we’re going to end up with the same prime.

So, back to the funny thing about RSA: because the private key is made up of two prime numbers, if people don’t choose those prime numbers randomly, then two different people might end up with one prime number in common, but not the other. So their public keys won’t be exactly the same: one will be p1 x p2, and the other will be p1 x p3. So it won’t be immediately obvious that we used poor randomness.

And now the final piece of fun math. It’s really hard to factor numbers, and it’s really easy to multiply them. Another thing that’s really easy is to find common factors between two numbers. So if I have two RSA public keys that share a prime factor, it’s really easy to determine that common prime factor. And then, with that prime factor, it’s easy to discover the other prime factor in each of the two keys. So, one RSA public key is very hard to break, but two RSA public keys that share a prime factor are trivial to break together.

So that’s what the researchers did. They looked at every pair of RSA public keys and found that 0.2% of them share a prime factor with another. Given that, they were able to fully factor those 0.2% of keys, and thus completely break their security.

This really shouldn’t be that much more shocking than the case where users have the exact same public key. It’s just that, with RSA, there is another way in which poor randomness could result in weak keys, without those keys being exactly identical. It’s fascinating, and it’s a great study, but the root cause is no different: it’s all about the randomness.

so other approaches are better?

No. This attack has nothing to do with RSA. It has everything to do with randomness. No matter the algorithm you pick for public-key encryption, you have to find a really good source of randomness to pick your private key. The cute thing here is that weak randomness was revealed in a new surprising way, because RSA public keys can share a prime factor without being immediately obviously identical. That’s cool, but it’s not a weakness of RSA.

how do I fix my code?

Make sure you’re using a secure random number generator to generate your keys. Make sure you’ve seeded it with good randomness, using operating-system calls if possible. And mostly, don’t panic. There’s no new attack here, only a very interesting revelation, using a very interesting trick, that a lot of people don’t pay sufficient attention to randomness when generating crypto keys.

We knew that. Now we really know it.

encryption is (mostly) not magic

A few months ago, Sony’s Playstation Network got hacked. Millions of accounts were breached, leaking physical addresses and passwords. Sony admitted that their data was “not encrypted.”

Around the same time, researchers discovered that Dropbox stores user files “unencrypted.” Dozens (hundreds?) closed their accounts in protest. They’re my confidential files, they cried, why couldn’t you at least encrypt them?

Many, including some quite tech-savvy folks, were quick to indicate that it would have been so easy to encrypt the data. Not encrypting the data proved Sony and Dropbox’s incompetence, they said.

In my opinion, it’s not quite that simple.

Encryption is easy, it’s true. You can download code that implements military-grade encryption in any programming language in a matter of seconds. So why can’t companies just encrypt the data they host and protect us from hackers?

The core problem is that, to be consumable by human users, data has to be decrypted. So the decryption key has to live somewhere between the data-store and the user’s eyeballs. For security purposes, you’d like the decryption key to be very far from the data-store and very close to the user’s eyeballs. Heck you’d like the decryption key to be *inside* the user’s brain. That’s not (yet) possible. And, in fact, in most cases, it isn’t even practical to have the key all that far from the data-store.

encryption relocates the problem

Sony needs to be able to charge your credit card, which requires your billing address. They probably need to do that whether or not you’re online, since you’re not likely to appreciate being involved in your monthly renewal, each and every month. So, even if they encrypt your credit card number and address, they also need to store the decryption key somewhere on their servers. And since they probably want to serve you an “update your account” page with address pre-filled, that decryption key has to be available to decrypt the data as soon as you click “update my account.” So, if Sony’s web servers need to be able to decrypt your data, and hackers break into Sony’s servers, there’s only so much protection encryption provides.

Meanwhile, Dropbox wants to give you access to your files everywhere. Maybe they could keep your files encrypted on their servers, with encryption keys stored only on your desktop machine? Yes… until you want to access your files over the Web using a friend’s computer. And what if you want to share a file with a friend while they’re not online? Somehow you have to send them the decryption key. Dropbox must now ask its users to manage the sharing of these decryption keys (good luck explaining that to them), or must hold on to the decryption key and manage who gets the key…. which means storing the decryption keys on their servers. If you walk down the usability path far enough – in fact not all that far – it becomes clear that Dropbox probably needs to store the decryption key not too far from the encrypted files themselves. Encryption can’t protect you once you actually mean to decrypt the data.

The features users need often dictate where the decryption key is stored. The more useful the product, the closer the decryption key has to be to the encrypted data. Don’t think of encryption as a magic shield that miraculously distinguishes between good and bad guys. Instead, think of encryption as a mechanism for shrinking the size of the secret (one small encryption key can secure gigabytes of data), thus allowing the easy relocation of the secret to another location. That’s still quite useful, but it’s not nearly as magical as many imply it to be.

what about Firefox Sync, Apple TimeMachine, SpiderOak, Helios, etc.

But but but, you might be thinking, there are systems that store encrypted data and don’t store the decryption key. Firefox Sync. Apple’s TimeMachine backup system. The SpiderOak online backup system. Heck, even my own Helios Voting System encrypts user votes in the browser with no decryption keys stored anywhere except the trustees’ own machines.

It’s true, in some very specific cases, you can build systems where the decryption key is stored only on a user’s desktop machine. Sometimes, you can even build a system where the key is stored nowhere durably; instead it is derived on the fly from the user’s password, used to encrypt/decrypt, then forgotten.

But all of these systems have significant usability downsides (yes, even my voting system). If you only have one machine connected to Firefox Sync, and you lose it, you cannot get your bookmarks and web history back. If you forget your Time Machine or SpiderOak password, and your main hard drive crashes, you cannot recover your data from backup. If you lose your Helios Voting decryption key, you cannot tally your election.

And when I say “you cannot get your data back,” I mean you would need a mathematical breakthrough of significant proportions to get your data back. It’s not happening. Your data is lost. Keep in mind: that’s the whole point of not storing the decryption key. It’s not a bug, it’s a feature.

and then there’s sharing

I alluded to this issue in the Dropbox description above: what happens when users want to share data with others? If the servers don’t have the decryption key, that means users have to pass the decryption key to one another. Maybe you’re thinking you can use public-key encryption, where each user has a keypair, publishes the public encryption key, and keeps secret the private decryption key? Now we’re back to “you can’t get your data back” if the user loses their private key.

And what about features like Facebook’s newsfeed, where servers process, massage, aggregate, and filter data for users before they even see it? If the server can’t decrypt the data, then how can it help you process the data before you see it?

To be clear: if your web site has social features, it’s very unlikely you can successfully push the decryption keys down to the user. You’re going to need to read the data on your servers. And if your servers need to read the data, then a hacker who breaks into the servers can read the data, too.

so the cryptographer is telling me that encryption is useless?

No, far from it. I’m only saying that encryption with end-user-controlled keys has far fewer applications than most people think. Those applications need to be well-scoped, and they have to accompanied by big bad disclaimers about what happens when you lose your key.

That said, encryption as a means of partitioning power and access on the server-side remains a very powerful tool. If you have to store credit card numbers, it’s best if you build a subsystem whose entire role is to store credit-card numbers encrypted, and process transactions from other parts of your system. If your entire system is compromised, then you’re no better off than if you hadn’t taken those precautions. But, if only part of your system is compromised, encryption may well stop an attacker from gaining access to the most sensitive parts of the system.

You can take this encryption-as-access-control idea very far. An MIT team just published CryptDB, a modified relational database that uses interesting encryption techniques to strongly enforce access control. Note that, if you have the password to log into the database, this encryption isn’t going to hide the data from you: the decryption key is on the server. Still, it’s a very good defense-in-depth approach.

what about this fully homomorphic encryption thing?

OK, so I lied a little bit when I talked about pre-processing data. There is a kind of encryption, called homomorphic encryption, that lets you perform operations on data while it remains encrypted. The last few years have seen epic progress in this field, and it’s quite exciting…. for a cryptographer. These techniques remain extremely impractical for most use cases today, with an overhead factor in the trillions, both for storage and computation time. And, even when they do become more practical, the central decryption key problem remains: forcing users to manage decryption keys is, for the most part, a usability nightmare.

That said, I must admit: homomorphic encryption is actually almost like magic.

the special case of passwords

Passwords are special because, once stored, you never need to read them back out, you only need to check if a password typed by a user matches the one stored on the server. That’s very different than a credit-card number, which does need to be read after it’s stored so the card can be charged every month. So for passwords, we have special techniques. It’s not encryption, because encryption is reversible, and the whole point is that we’d like the system to strongly disallow extraction of user passwords from the data-store. The special tool is a one-way function, such as bcrypt. Take the password, process it using the one-way function, and store only the output. The one-way function is built to be difficult to reverse: you have to try a password to see if it matches. That’s pretty cool stuff, but really it only applies to passwords.

So, if you’re storing passwords, you should absolutely be passing them through a one-way function. You could say you’re “hashing” them, that’s close enough. In fact you probably want to say you’re salting and hashing them. But whatever you do, you’re not “encrypting” your passwords. That’s just silly.

encryption is not a magic bullet

For the most part, encryption isn’t magic. Encryption lets you manage secrets more securely, but if users are involved in the key management, that almost certainly comes at the expense of usability and features. Web services should strongly consider encryption where possible to more strictly manage their internal access controls. But think carefully before embarking on a design that forces users to manage their keys. In many cases, users simply don’t understand that losing the key means losing the data. As my colleague Umesh Shankar says, if you design a car lock so secure that locking yourself out means crushing the car and buying a new one, you’re probably doing it wrong.

Wombat Voting: Open Audit Elections in Israel

My friend Alon Rosen is leading an effort with colleagues Amon Ta-Shma, Ben Riva, and Yoni Ben-Nun in Israel to implement and deploy in-person open-audit voting. The project is called Wombat Voting. It combines a number of existing cryptographic techniques in a very nice package. Oh, and they’ve implemented it and used it to run a 2000+ voter election, with apparently a few more elections in the pipeline. There’s a ton of press about them.

Here’s how it works:

Voters use an intuitive, touch-screen interface, receive a paper ballot they can physically cast in a transparent ballot box, and they get a physical encrypted receipt they can take home to make sure their vote actually counted. It’s awesome.

I’m extremely excited to see more truly verifiable voting systems implemented and deployed. Slowly but surely, we will get to a point where voting is truly auditable and democracy is actually verified. Israel, a high-tech democracy with engaged citizens, is a perfect place to get this kind of system going.

grab the pitchforks!… again

I’m fascinated with how quickly people have reached for the pitchforks recently when the slightest whiff of a privacy/security violation occurs.

Last week, a few interesting security tidbits came to light regarding Dropbox, the increasingly popular cloud-based file storage and synchronization service. There’s some interesting discussion of de-duplication techniques which might lead to Oracle attacks, etc., but the most important issue is that, suddenly, everyone’s realizing that Dropbox could, if needed, access your files. Miguel de Icaza wonders if Dropbox is pitching snake oil.

Yes, Dropbox staff can, if needed, access your files. I don’t mean to harp on my fellow technologists but… this has been obvious since day 1, because Dropbox offers a web-based interface to download your files, and even with the latest HTML5 technology, you’d be very hard-pressed to do in-browser file decryption. Let’s say you still don’t buy that, you still think that Dropbox might find a way to encrypt files and decrypt them in your browser. Dropbox also offers a password recovery mechanism, which means they can fully simulate you, the user, including, of course, getting at your files.

In other words, unless you’re ready to lose the convenience of password resets and web-based UI, Dropbox inherently has access to your files. Just like Facebook has access to your entire account, and Google to all of your docs, spreadsheets, etc. The only question is what kinds of internal safeguards do these companies have to prevent abuse by employees. Unless you’ve worked there, it’s hard to know. You could ask Dropbox to do third-party auditing, like Miguel proposes, but in my experience that provides little real security, since you have little way to know what that third-party actually did as part of their auditing (was it just “logic and accuracy” testing?)

The other thing we could ask is for the law to finally recognize that my files stored on Dropbox are no different than my files stored on a hard drive in my basement, from a legal perspective. They’re my property. And accessing them should require the same level of judicial oversight as a warrant to my home. That’s what a group of young MIT techies (myself included) and Harvard lawyers proposed in 1998.

But back to Dropbox. Did they do something wrong? Yes, they did. They exaggerated their security and privacy claims. Just like almost every other cloud data host today. I wish, instead of picking on whichever startup suddenly succeeds, we picked on the industry as a whole. Stop talking about encryption in transit and encryption at rest in the same breath, as if they were the same thing. Stop using “encryption” as a synonym for “secure.” Stop saying “military-grade security.” Start being honest about who can access what.

And we, technologists, should stop with the drama, and not fall prey to the inflated expectations that marketing-heavy security policies have set. The Dropbox weaknesses should have been obvious to technologists from day one. The problem is that all privacy policies and security statements make exaggerated claims using reassuring keywords. Let’s harp on that.

intelligently designing trust

For the past week, every security expert’s been talking about Comodo-Gate. I find it fascinating: Comodo-Gate goes to the core of how we handle trust and how web architecture evolves. And in the end, this crisis provides a rare opportunity.

warning signs

Last year, Chris Soghoian and Sid Stamm published a paper, Certified Lies [PDF], which identified the very issue that is at the center of this week’s crisis. Matt Blaze provided, as usual, a fantastic explanation:

A decade ago, I observed that commercial certificate authorities protect you from anyone from whom they are unwilling to take money. That turns out to be wrong; they don’t even do that much.

A Certificate Authority is a company that your web browser trusts to tell it who is who on the Internet. When you go to https://facebook.com, a Certificate Authority is vouching that, yes, this is indeed Facebook you’re talking to directly over a secure channel.

What Chris and Sid highlighted is an interesting detail of how web browsers have chosen to handle trust: any Certificate Authority can certify any web site. That design decision was reasonable in 1994, when there were only two Certificate Authorities and the world was in a rush to secure web transactions. But it’s not so great now, where a Certificate Authority in Italy can delegate its authority to a small reseller, who can then, in turn, certify any web site, including Facebook and Gmail, using more or less the level of assurance the small reseller sees fit.

what happened

It looks like someone from Iran hacked into one of the small resellers three degrees of delegation away from Comodo to issue to some unknown entity (the Iranian government?) certificates for major web sites, including Google and Microsoft. This gave that entity the power to impersonate those web sites, even over secure connections indicated by your browser padlock icon. It’s important to understand that this is not Google or Microsoft’s fault. They couldn’t do anything about it, nor could they detect this kind of attack. When Comodo discovered the situation, they revoked those certificates… but that didn’t do much good because the revocation protocol does not fail safely: if your web browser can’t contact the revocation server, it assumes the certificate is valid.

a detour via Dawkins, Evolution, and the Giraffe

Richard Dawkins, the world-famous evolutionary biologist, illustrates the truly contrived effects of evolution on a giraffe. The laryngeal nerve, which runs from the brain to the larynx, takes a detour around the heart. In the giraffe, it’s a ludicrous detour: down the animal’s enormous neck, around the heart, and back up the neck again to the larynx, right near where the nerve started to begin with!

If you haven’t seen this before, you really need to spend the 4 minutes to watch it:

In Dawkins’s words:

Over millions of generations, this nerve gradually lengthened, each small step simpler than a major rewiring to a more direct route.

and we’re back

This evolution is, in my opinion, exactly what happened with certificate authorities. At first, with only two certificate authorities, it made sense to keep certificate issuance as simple as possible. With each added certificate authority, it still made no sense to revamp the whole certification process; it made more sense each time to just add a certificate authority to the list. And now we have a giraffe-scale oddity: hundreds of certificate authorities and all of their delegates can certify anyone, and it makes for a very weak system.

This isn’t, in my mind, a failure of software design. It’s just the natural course of evolution, be it biology or software systems. We can and should try to predict how certain designs will evolve, so that we can steer clear of obvious problems. But it’s very unlikely we can predict even a reasonable fraction of these odd evolutions.

the opportunity

So now that we’ve had a crisis, we have an opportunity to do something that Nature simply cannot do: we can explore radically redesigned mechanisms. We can intelligently design trust. But let’s not be surprised, in 15 years, when the wonderful design we outline today has evolved once again into something barely viable.

taking further example from nature?

Nature deals with this problem of evolutionary dead-ends in an interesting way: there isn’t just one type of animal. There are thousands. All different, all evolving under slightly different selection pressures, all interacting with one another. Some go extinct, others take over.

Should we apply this approach to software system design? I think so. Having a rich ecosystem of different components is better. We shouldn’t all use the same web browser. We shouldn’t all use the same trust model. We should allow for niches of feature evolution in this grand ecosystem we call the Web, because we simply don’t know how the ecosystem will evolve. How do we design software systems and standards that way? Now that’s an interesting question…

everything I know about voting I learned from American Idol

Tonight, American Idol began online voting. Yes, I’m a fan of American Idol, but don’t let that fool you: I’m still a bitchin’ cryptographer. I suspect that American Idol online voting will give rise to many questions such as “wow, awesome, now when can I vote in US Elections with my Facebook account?” and “Why is online voting so hard anyways?” Perhaps I can be of assistance.

the voting process

So the process is much like other Facebook-connected sites: using Facebook Connect, you log in and grant the American Idol Voting site some permissions, including reading your profile info (ok), getting your email address (ok I guess), and accessing your Facebook data even if you’re offline (ummm, why?). Then you select your favorite contestant, solve a CAPTCHA, and click “vote”. You’re prompted to post the vote to your Facebook feed, and told you can vote up to 50 times.

My first question was “what’s the CAPTCHA defending against?” I have some thoughts on that, which I’ll get back to…

“a secure solution”

The news that American Idol would use online voting was reported with enthusiasm:

“We have been wanting to do online voting for several years, and now Facebook has offered us a secure solution and we are ready to go,” said Simon Fuller, Creator and Executive Producer, American Idol.

So what does that mean, exactly? What guarantees do American Idol producers have that the system is “secure?” Hard to say. But let’s explore a few possibilities.

ballot secrecy and coercion

American Idol voting is not secret: your vote is posted to your Facebook newsfeed! Of course, unless you’re a contestant’s mother, chances are no one’s going to be upset at you if you don’t vote “the right way.” In political elections, and in fact in many elections where the outcome impacts voters in a material way, ballot secrecy is important, and undue influence of voters is a concern. That’s what makes things particularly difficult in “real” online voting: you should receive some believable proof that your vote was counted properly, but somehow that information can’t be leaked to others who might try to influence you, waiting to see how you voted to decide whether to pay you or break your kneecaps.

one user = 50 votes?

The voting itself is happening on the American Idol site, not on Facebook, so what American Idol is getting from Facebook is mostly the identity layer: to vote, you must have a Facebook account. Between that and the CAPTCHA, it’s probably fairly difficult for an individual user to have disproportionate influence. I have a feeling that’s why they allow individual voters to vote up to 50 times and require a CAPTCHA. After all, if any user can vote 50 times, but the process is fairly time-intensive, how worthwhile is it to register more accounts so you can vote more than 50 times? If voters could legitimately vote only once, then it would be very enticing to create a few fake Facebook accounts to easily quintuple your impact. But to just double your impact with 50 legal votes each, you’re going to have to manually fill out 50 more CAPTCHAs. Eh. Not worth it, right?

In other words, I think the 50 votes per person + CAPTCHA produce the great equalizer: almost no one is going to bother trying to find ways to cast more votes, because the payoff isn’t worth the pain. Clever!

verifying the tally

In typical secret ballot elections, it’s quite hard to check that the tally was properly computed. After all, once the vote is submitted, via web, SMS, or phone, the tallying process is visible only to the organizers, and the voters must trust that process blindly. Now, physical in-person elections have admittedly only a little bit more auditability: you can kind of watch the ballot box and, if you’re really motivated, stick around to see the ballots counted. But in the online voting space, unless you’ve got some fancy solution, the process is totally opaque.

Except… voting for American Idol isn’t secret! So, technically, the tally could be recomputed from culling together all of the Facebook newsfeed posts…. And that’s actually a key insight into how the fancy truly auditable voting systems work: all of the votes are published for the world to see, in a special encrypted form that doesn’t reveal individual votes but can be intelligently combined and checked against the claimed tally. That’s what systems like Helios do.

was my vote captured correctly?

If you post your vote to your Facebook newsfeed, you can verify that it was recorded correctly. But what if something hijacks your browser, waits for you to log into Facebook, casts votes on your behalf (waiting for you to fill out the CAPTCHA or outsourcing it to some CAPTCHA solving farm), and opts not to post the results to Facebook? How can the American Idol producers ever detect this? They probably can’t.

The simplest way one might hijack your browser is via a technique called clickjacking: by wrapping the voting site in an HTML frame and layering a different user interface on top of it, a malicious site could trick you into voting for a different contestant than you intend. For example, the attacker might wait for you to cast your first vote freely, find out who you like by looking at your Facebook wall, and then switch the order of the candidates (by layering new photos on top of the underlying real site) to trick you into voting for a different candidate the other 49 times. Now, to American Idol’s credit, my quick-and-dirty attempt to frame their site and implement clickjacking failed: they’ve got some basic defense against clickjacking that I’m still investigating. Nice work! But of course, attacks that hijack the user’s browser can be much more intricate, including deploying and spreading a virus that takes full control of the browser and its display. There’s absolutely nothing a web site can do to defend against that.

And that, in fact, is the key issue we don’t know how to address when voting online in elections that have a high material impact. We don’t know how to make sure that your browser is really working on your behalf and hasn’t been hijacked by malware. It probably wouldn’t happen for American Idol (or would it?), but it surely would happen when voting for US President.

Facebook, the Control Revolution, and the Failure of Applied Modern Cryptography

In the late 1990s and early 2000s, it was widely assumed by most tech writers and thinkers, myself included, that the Internet was a “Control Revolution” (to use the words of Andrew Shapiro, author of a book with that very title in 1999). The Internet was going to put people in control, to enable buyers to work directly with sellers, to cut out the middle man. Why? Because the Internet makes communication and commerce vastly more efficient, obviating the need for a middle man to connect us.

Fast forward to 2011, and the world is vastly more centralized than it ever was. Almost everyone’s most intimate conversations are held by four companies. And one company knows basically everything about everyone under 25.

How did we get so giddy about the Internet that we didn’t see this coming? We missed an important detail: communication and commerce became vastly more efficient for everyone, including the would-be middle-men, the would be mediators. The Internet enabled economies of scale never before imagined. So while it is possible to host your own email server, it’s a lot easier to use gmail. While it’s possible to host your own web page, post your updates to your blog, subscribe to your friends’ RSS feeds hosted at different blogs, it’s a heck of a lot easier to use Facebook. The Internet put the 1990s middle-men out of business then enabled a new breed of data mediators that provide incredibly valuable services no individual user can dream of performing on their own: apply massively parallel facial recognition to billions of photos to find that one picture of you and your best friend’s grandmother, do deep graph analysis to find your long-lost friends and suggest you connect with them, learn how to filter spam messages so efficiently (thanks to training by billions of messages received on behalf of millions of users) that the spam wars are effectively over.

The Internet has been vastly more empowering to mediators than to individuals. And so we have, in fact, a Control Revolution of a very different nature: one company, namely Facebook, is effectively shaping the future of social interactions, what’s acceptable and what’s frowned upon, what’s private and what’s not.

I say this without any value judgment, purely as an observation. Facebook is making the rules, and when the rules change in Palo Alto, 550 million people follow.

The Failure of Applied Modern Cryptography

Cryptography in the 1980s was about secrecy, military codes, etc. I’m not talking about that.

Modern Cryptography is about individuals achieving a common goal without fully trusting one another. Think of a secret-bid auction. Or an election. Or two people discovering which friends they have in common without revealing the friends they don’t have in common. In all of these cases, people come together to accomplish a common result, but they cannot fully trust one another since their incentives are not perfectly aligned: I want to win the auction by bidding only one dollar more than you, Alice wants her candidate to beat yours, and Bob would like to find out which movie stars you’re friends with even though he knows none.

Modern cryptography teaches us how to accomplish these tasks without ever trusting a third party. That’s hard to imagine if you’re not steeped in the field. But that’s what modern cryptography does: take an interaction that is easily imaginable with the help of a trusted third party that deals with each individual, and replace the trusted third-party with a beautiful mathematical dance that achieves the same end-goal. No centralization of data in one big database, no trusted dealer/counter/connector, just individuals exchanging coded messages in a particular order and obtaining a trustworthy result. Cryptographers call this secure multi-party computation.

Modern Cryptography would, if properly implemented, give us all the functionality of Facebook without the aggregation of everyone’s data in a single data center. And we couldn’t be further from this world if we tried! We are headed for a world of increased data centralization and increased reliance on trusted third parties. Because they’re vastly more efficient, have economies of scale that allow them to provide features we didn’t dream of just a few years ago, and of course because the economic incentives of becoming that trusted third party are staggering.

As a privacy advocate, and again without value judgment, I can’t imagine a more surprising consequence of a technology that was meant to empower the little guy. It is, in a word, shocking.

Crisis in the Java Community… could they have used a secret-ballot election?

There is a bit of a crisis in the Java community: the Apache Foundation just resigned its seat on the Java Executive Committee, as did two individual members, Doug Lea and Tim Peierls. From what I understand, the central issue appears to be that Oracle, the new Java “owner” since they acquired Sun Microsystems, is paying lip service to the Java Community while taking the language and, more importantly, its licensing, into the direction they prefer, which doesn’t appear to be very open-source friendly.

That said, I’m not a Java Community expert, so I won’t comment much more on this conflict, other than to say, wait a minute, what’s this from Tim Peierls’s resignation note?

Several of the other EC members expressed their own disappointment while voting Yes. I’m reasonably certain that the bulk of the Yes votes were due to contractual obligations rather than strongly-held principles.

Wait a minute, the Executive Committee votes by public ballot? They’re influenced by contractual obligations? That’s fascinating, and that’s hardly democratic! It means that, even where standards bodies are concerned, the secret ballot might be a very interesting tool.

There are arguments against the secret ballot in this case, of course: maybe the Executive Committee members are representative of the Java Community, and as such they should serve their constituents? Much like legislators, their votes should be public so the community can decide whether or not to reelect them? In that case, contractual obligations to vote a certain way should be strictly disallowed or required to be published along with the vote… To whom are these Executive Committee members accountable? To themselves as well-intentioned guides of the Java community? To the people who elected them? It’s difficult to have it both ways, since one requires a secret ballot, and the other a public ballot.

Maybe the right solution is to publish all comments, but keep the ballots secret? There’s always a chance that a truly hypocritical member would consistently vote differently than their publicly stated opinions, but I’m not sure that risk is worse than the problems the Java Community just faced with what appears to be anything but a democratic vote. In a tough spot like this one, it seems to me that Executive Committee members should be able to vote their conscience without fear of retribution.

(Oh, and if the Java community is looking for a secure voting system, I might have a suggestion.)