A few months ago, Sony’s Playstation Network got hacked. Millions of accounts were breached, leaking physical addresses and passwords. Sony admitted that their data was “not encrypted.”
Around the same time, researchers discovered that Dropbox stores user files “unencrypted.” Dozens (hundreds?) closed their accounts in protest. They’re my confidential files, they cried, why couldn’t you at least encrypt them?
Many, including some quite tech-savvy folks, were quick to indicate that it would have been so easy to encrypt the data. Not encrypting the data proved Sony and Dropbox’s incompetence, they said.
In my opinion, it’s not quite that simple.
Encryption is easy, it’s true. You can download code that implements military-grade encryption in any programming language in a matter of seconds. So why can’t companies just encrypt the data they host and protect us from hackers?
The core problem is that, to be consumable by human users, data has to be decrypted. So the decryption key has to live somewhere between the data-store and the user’s eyeballs. For security purposes, you’d like the decryption key to be very far from the data-store and very close to the user’s eyeballs. Heck you’d like the decryption key to be *inside* the user’s brain. That’s not (yet) possible. And, in fact, in most cases, it isn’t even practical to have the key all that far from the data-store.
encryption relocates the problem
Sony needs to be able to charge your credit card, which requires your billing address. They probably need to do that whether or not you’re online, since you’re not likely to appreciate being involved in your monthly renewal, each and every month. So, even if they encrypt your credit card number and address, they also need to store the decryption key somewhere on their servers. And since they probably want to serve you an “update your account” page with address pre-filled, that decryption key has to be available to decrypt the data as soon as you click “update my account.” So, if Sony’s web servers need to be able to decrypt your data, and hackers break into Sony’s servers, there’s only so much protection encryption provides.
Meanwhile, Dropbox wants to give you access to your files everywhere. Maybe they could keep your files encrypted on their servers, with encryption keys stored only on your desktop machine? Yes… until you want to access your files over the Web using a friend’s computer. And what if you want to share a file with a friend while they’re not online? Somehow you have to send them the decryption key. Dropbox must now ask its users to manage the sharing of these decryption keys (good luck explaining that to them), or must hold on to the decryption key and manage who gets the key…. which means storing the decryption keys on their servers. If you walk down the usability path far enough – in fact not all that far – it becomes clear that Dropbox probably needs to store the decryption key not too far from the encrypted files themselves. Encryption can’t protect you once you actually mean to decrypt the data.
The features users need often dictate where the decryption key is stored. The more useful the product, the closer the decryption key has to be to the encrypted data. Don’t think of encryption as a magic shield that miraculously distinguishes between good and bad guys. Instead, think of encryption as a mechanism for shrinking the size of the secret (one small encryption key can secure gigabytes of data), thus allowing the easy relocation of the secret to another location. That’s still quite useful, but it’s not nearly as magical as many imply it to be.
what about Firefox Sync, Apple TimeMachine, SpiderOak, Helios, etc.
But but but, you might be thinking, there are systems that store encrypted data and don’t store the decryption key. Firefox Sync. Apple’s TimeMachine backup system. The SpiderOak online backup system. Heck, even my own Helios Voting System encrypts user votes in the browser with no decryption keys stored anywhere except the trustees’ own machines.
It’s true, in some very specific cases, you can build systems where the decryption key is stored only on a user’s desktop machine. Sometimes, you can even build a system where the key is stored nowhere durably; instead it is derived on the fly from the user’s password, used to encrypt/decrypt, then forgotten.
But all of these systems have significant usability downsides (yes, even my voting system). If you only have one machine connected to Firefox Sync, and you lose it, you cannot get your bookmarks and web history back. If you forget your Time Machine or SpiderOak password, and your main hard drive crashes, you cannot recover your data from backup. If you lose your Helios Voting decryption key, you cannot tally your election.
And when I say “you cannot get your data back,” I mean you would need a mathematical breakthrough of significant proportions to get your data back. It’s not happening. Your data is lost. Keep in mind: that’s the whole point of not storing the decryption key. It’s not a bug, it’s a feature.
and then there’s sharing
I alluded to this issue in the Dropbox description above: what happens when users want to share data with others? If the servers don’t have the decryption key, that means users have to pass the decryption key to one another. Maybe you’re thinking you can use public-key encryption, where each user has a keypair, publishes the public encryption key, and keeps secret the private decryption key? Now we’re back to “you can’t get your data back” if the user loses their private key.
And what about features like Facebook’s newsfeed, where servers process, massage, aggregate, and filter data for users before they even see it? If the server can’t decrypt the data, then how can it help you process the data before you see it?
To be clear: if your web site has social features, it’s very unlikely you can successfully push the decryption keys down to the user. You’re going to need to read the data on your servers. And if your servers need to read the data, then a hacker who breaks into the servers can read the data, too.
so the cryptographer is telling me that encryption is useless?
No, far from it. I’m only saying that encryption with end-user-controlled keys has far fewer applications than most people think. Those applications need to be well-scoped, and they have to accompanied by big bad disclaimers about what happens when you lose your key.
That said, encryption as a means of partitioning power and access on the server-side remains a very powerful tool. If you have to store credit card numbers, it’s best if you build a subsystem whose entire role is to store credit-card numbers encrypted, and process transactions from other parts of your system. If your entire system is compromised, then you’re no better off than if you hadn’t taken those precautions. But, if only part of your system is compromised, encryption may well stop an attacker from gaining access to the most sensitive parts of the system.
You can take this encryption-as-access-control idea very far. An MIT team just published CryptDB, a modified relational database that uses interesting encryption techniques to strongly enforce access control. Note that, if you have the password to log into the database, this encryption isn’t going to hide the data from you: the decryption key is on the server. Still, it’s a very good defense-in-depth approach.
what about this fully homomorphic encryption thing?
OK, so I lied a little bit when I talked about pre-processing data. There is a kind of encryption, called homomorphic encryption, that lets you perform operations on data while it remains encrypted. The last few years have seen epic progress in this field, and it’s quite exciting…. for a cryptographer. These techniques remain extremely impractical for most use cases today, with an overhead factor in the trillions, both for storage and computation time. And, even when they do become more practical, the central decryption key problem remains: forcing users to manage decryption keys is, for the most part, a usability nightmare.
That said, I must admit: homomorphic encryption is actually almost like magic.
the special case of passwords
Passwords are special because, once stored, you never need to read them back out, you only need to check if a password typed by a user matches the one stored on the server. That’s very different than a credit-card number, which does need to be read after it’s stored so the card can be charged every month. So for passwords, we have special techniques. It’s not encryption, because encryption is reversible, and the whole point is that we’d like the system to strongly disallow extraction of user passwords from the data-store. The special tool is a one-way function, such as bcrypt. Take the password, process it using the one-way function, and store only the output. The one-way function is built to be difficult to reverse: you have to try a password to see if it matches. That’s pretty cool stuff, but really it only applies to passwords.
So, if you’re storing passwords, you should absolutely be passing them through a one-way function. You could say you’re “hashing” them, that’s close enough. In fact you probably want to say you’re salting and hashing them. But whatever you do, you’re not “encrypting” your passwords. That’s just silly.
encryption is not a magic bullet
For the most part, encryption isn’t magic. Encryption lets you manage secrets more securely, but if users are involved in the key management, that almost certainly comes at the expense of usability and features. Web services should strongly consider encryption where possible to more strictly manage their internal access controls. But think carefully before embarking on a design that forces users to manage their keys. In many cases, users simply don’t understand that losing the key means losing the data. As my colleague Umesh Shankar says, if you design a car lock so secure that locking yourself out means crushing the car and buying a new one, you’re probably doing it wrong.