the responsibility we have as software engineers

I had the chance to chat this week with the very awesome Kate Heddleston who mentioned that she’s been thinking a lot about the ethics of being a software engineer, something she just spoke about at PyCon Sweden. It brought me back to a post I wrote a few years ago, where I said:

There’s this continued and surprisingly widespread delusion that technology is somehow neutral, that moral decisions are for other people to make. But that’s just not true. Lessig taught me (and a generation of other technologists) that Code is Law

[…]

In 2008, the world turned against bankers, because many profited by exploiting their expertise in a rapidly accelerating field (financial instruments) over others’ ignorance of even basic concepts (adjustable-rate mortgages). How long before we software engineers find our profession in a similar position? How long will we shield ourselves from the responsibility we have, as experts in the field much like experts in any other field, to guide others to make the best decision for them?

Well, I think that time has come.

Everyone uses software, very few people understand it. What seems obvious to a small elite group is completely opaque to the majority of the world. This gap is incredibly hard for us, the software engineering elite, to see. A few examples:

  • The Radiolab Podcast did a wonderful piece – Trust Engineers – where they explored the case of Facebook running experiments on its newsfeed. For non-engineers, there’s an incredible feeling of breached trust upon realizing that a set of flesh-and-blood humans have that much control over the algorithm that feeds them daily information. (And, for that matter, to most researchers used to interacting with an IRB, there’s complete shock at what Facebook did.) For most engineers, including a number of very good and ethical people at Facebook, it’s surprising that this is even an issue.
  • A couple of years ago, a friend of a friend – who happens to be a world-renowned physician and research scientist – asked me: “Ben, can the system administrators at work read my email? Even if they don’t have my password?” The answer is yes and yes. This is obvious to us engineers, so much so that we don’t even think twice about it. To a non-engineer, even an incredibly smart person, this is absolutely non-obvious.
  • A close friend, another very smart person, was discussing something with his young child recently, and I overheard “if you don’t know, ask the computer, the computer knows and it’s always right.” Where do I begin?

We, software engineers, have superpowers most people don’t remotely understand. The trust society places in us is growing so rapidly that the only thing that looks even remotely similar is the trust placed in doctors. Except, most people have a pretty good idea of the trust they’re placing in their doctor, while they have almost no idea that every time they install an app, enter some personal data, or share a private thought in a private electronic conversation, they’re trusting a set of software engineers who have very little in the form of ethical guidelines.

Where’s our Hippocratic Oath, our “First, Do No Harm?”

I try very hard to think about this in my own work, and I try to share this sense of duty with every engineer I mentor and interact with. Still, I don’t have a good answer to the core question. Yet it feels increasingly urgent and important for us to figure this out.

Power & Accountability

So there’s this hot new app called Secret. The app is really clever: it prompts you to share secrets, and it sends those secrets to your social circle. It doesn’t identify you directly to your friends. Instead, it tells readers that this secret was written by one of their friends without identifying which one. The popularity of the app appears to be off the charts, with significant venture-capital investment in a short period of time. There are amazing stories of people seeking out emotional support on Secret, and awful stories of bullying that have caused significant uproar. Secret has recently released features aimed at curbing bullying.

My sense is that the commentary to date is missing the mark. There’s talk of the danger of anonymous speech. Even the founders of Secret talk about their app like it’s anonymous speech:

“Anonymity is a really powerful thing, and with that power comes great responsibility. Figuring out these issues is the key to our long-term success, but it’s a hard, hard problem and we are doing the best we can.”

And this is certainly true: we’ve known for a while that anonymous speech can reveal the worst in people. But that’s not what we’re dealing with here. Posts on Secret are not anonymous. Posts on Secret are guaranteed to be authored by one of your friends. That guarantee is enabled and relayed by the Secret platform. That’s a very different beast than anonymity.

In general, if you seek good behavior, Power and Accountability need to be connected: the more Power you give someone, the more you hold them Accountable. Anonymity can be dangerous because it removes Accountability. That said, anonymity also removes some Power: if you’re not signing your name to your statement, it carries less weight. With Secret, Accountability is absent, just like with anonymous speech, but the power of identified speech remains in full force. That leads to amazing positive experiences: people can share thoughts of suicide with friends who can help, all under the cloak of group-anonymity that is both protecting and empowering. And it leads to disastrous power granted to bullies attacking their victims with the full force of speaking with authority – the bully is one of their friends! – while carrying zero accountability. That kind of power is likely to produce more bullies, too.

This is so much more potent that anonymity. And if this fascinating experiment is to do more good than harm, it will need to seriously push the envelope on systems for Accountability that are on par with the power Secret grants.

Here’s a free idea, straight out of crypto land. In cryptographic protocols that combine a need for good behavior with privacy/anonymity protections, there is often a trigger where bad behavior removes the anonymity shield. What if Secret revealed the identity of those users found to be in repeated violation of a code of good behavior? Would the threat of potential shame keep people in line, leaving the good uses intact while disincentivizing the destructive ones?

Letter to President Obama on Surveillance and Freedom

Dear President Obama,

My name is Ben Adida. I am 36, married, two kids, working in Silicon Valley as a software engineer with a strong background in security. I’ve worked on the security of voting systems and health systems, on web browsers and payment systems. I enthusiastically voted for you three times: in the 2008 primary and in both presidential elections. When I wrote about my support for your campaign five years ago, I said:

In his campaign, Obama has proposed opening up to the public all bill debates and negotiations with lobbyists, via TV and the Internet. Why? Because he trusts that Americans, when given the tools to see and understand what their legislators are doing, will apply pressure to keep their government honest.

I gushed about how you supported transparency as broadly as possible, to enable better decision making, to empower individuals, and to build a better nation.

Now, I’m no stubborn idealist. I know that change is hard and slow. I know you cannot steer a ship as big as the United States as quickly as some would like. I know tough compromises are the inevitable path to progress.

I also imagine that, once you’re President, the enormity of the threat from those who would attack Americans must be overwhelming. The responsibility you feel, the level of detail you understand, must make prior principles sometimes feel quaint. I cannot imagine what it’s like to be in your shoes.

I also remember that you called on us, your supporters, to stay active, to call you and Congress to task. I want to believe that you asked for this because you knew that your perspective as Commander in Chief would inevitably become skewed. So this is what I’m doing here: I’m calling you to task.

You are failing hard on transparency and oversight when it comes to NSA surveillance. This failure is not the pragmatic compromise of Obamacare, which I strongly support. It is not the sheer difficulty of closing Guantanamo, which I understand. This failure is deep. If you fail to fix it, you will be the President principally responsible for the effective death of the Fourth Amendment and worse.

mass surveillance

The specific topic of concern, to be clear, is mass surveillance. I am not concerned with targeted data requests, based on probable cause and reviewed individually by publicly accountable judges. I can even live with secret data requests, provided they’re very limited, finely targeted, and protect the free-speech rights of service providers like Google and Facebook to release appropriately sanitized data about these requests as often as they’d like.

What I’m concerned about is the broad, dragnet NSA signals intelligence recently revealed by Edward Snowden. This kind of surveillance is a different beast, comparable to routine frisking of every individual simply for walking down the street. It is repulsive to me. It should be repulsive to you, too.

wrong in practice

If you’re a hypochondriac, you might be tempted to ask your doctor for a full body MRI or CT scan to catch health issues before detectable symptoms. Unfortunately, because of two simple probabilistic principles, you’re much worse off if you get the test.

First, it is relatively unlikely that a random person with no symptoms has a serious medical problem, ie the prior probability is low. Second, it is quite possible — not likely, but possible — that a completely benign thing appears potentially dangerous on imaging, ie there is a noticeable chance of false positive. Put those two things together, and you get this mind-bending outcome: if the full-body MRI says you have something to worry about, you actually don’t have anything to worry about. But try convincing yourself of that if you get a scary MRI result.

Mass surveillance to seek out terrorism is basically the same thing: very low prior probability that any given person is a terrorist, quite possible that normal behavior appears suspicious. Mass surveillance means wasting tremendous resources on dead ends. And because we’re human and we make mistakes when given bad data, mass surveillance sometimes means badly hurting innocent people, like Jean-Charles de Menezes.

So what happens when a massively funded effort has frustratingly poor outcomes? You get scope creep: the surveillance apparatus gets redirected to other purposes. The TSA starts overseeing sporting events. The DEA and IRS dip into the NSA dataset. Anti-terrorism laws with far-reaching powers are used to intimidate journalists and their loved ones.

Where does it stop? If we forgo due process for a certain category of investigation which, by design, will see its scope broaden to just about any type of investigation, is there any due process left?

wrong on principle

I can imagine some people, maybe some of your trusted advisors, will say that what I’ve just described is simply a “poor implementation” of surveillance, that the NSA does a much better job. So it’s worth asking: assuming we can perfect a surveillance system with zero false positives, is it then okay to live in a society that implements such surveillance and detects any illegal act?

This has always felt wrong to me, but I couldn’t express a simple, principled, ethical reason for this feeling, until I spoke with a colleague recently who said it better than I ever could:

For society to progress, individuals must be able to experiment very close to the limit of the law and sometimes cross into illegality. A society which perfectly enforces its laws is one that cannot make progress.

What would have become of the civil rights movement if all of its initial transgressions had been perfectly detected and punished? What about gay rights? Women’s rights? Is there even room for civil disobedience?

Though we want our laws to reflect morality, they are, at best, a very rough and sometimes completely broken approximation of morality. Our ability as citizens to occasionally transgress the law is the force that brings our society’s laws closer to our moral ideals. We should reject mass surveillance, even the theoretically perfect kind, with all the strength and fury of a people striving to form a more perfect union.

patriots

Mr. President, you have said that you do not consider Edward Snowden a patriot, and you have not commented on whether he is a whistleblower. I ask you to consider this: if you were an ordinary citizen, living your life as a Law Professor at the University of Chicago, and you found out, through Edward Snowden’s revelations, the scope of the NSA mass surveillance program and the misuse of the accumulated data by the DEA and the IRS, what would you think? Wouldn’t you, like many of us, be thankful that Mr. Snowden risked his life to give we the people this information, so that we may judge for ourselves whether this is the society we want?

And if there is even a possibility that you would feel this way, given that many thousands do, if government insiders believe Snowden to be a traitor while outsiders believe him to be a whisteblower, is that not all the information you need to realize the critical positive role he has played, and the need for the government to change?

the time to do something is now

I still believe that you are, at your core, a unique President who values a government by and for the people. As a continuing supporter of your Presidency, I implore you to look deeply at this issue, to bring in outside experts who are not involved in national security. This issue is critical to our future as a free nation.

Please do what is right so that your daughters and my sons can grow up with the privacy and dignity they deserve, free from surveillance, its inevitable abuses, and its paralyzing force. Our kids, too, will have civil rights battles to fight. They, too, will need the ability to challenge unjust laws. They, too, will need the space to make our country better still.

Please do not rob them of that opportunity.

Sincerely,

Ben Adida

a hopeful note about PRISM

You know what? I’m feeling optimistic suddenly. Mere hours ago, all of us tech/policy geeks lost our marbles over PRISM. And in the last hour, we’ve got two of the most strongly worded surveillance rebuttals I’ve ever seen from major Internet Companies.

Here’s Google’s CEO Larry Page:

we provide user data to governments only in accordance with the law. Our legal team reviews each and every request, and frequently pushes back when requests are overly broad or don’t follow the correct process. Press reports that suggest that Google is providing open-ended access to our users’ data are false, period. Until this week’s reports, we had never heard of the broad type of order that Verizon received—an order that appears to have required them to hand over millions of users’ call records. We were very surprised to learn that such broad orders exist. Any suggestion that Google is disclosing information about our users’ Internet activity on such a scale is completely false.

And here’s Mark Zuckerberg of Facebook:

Facebook is not and has never been part of any program to give the US or any other government direct access to our servers. We have never received a blanket request or court order from any government agency asking for information or metadata in bulk, like the one Verizon reportedly received. And if we did, we would fight it aggressively. We hadn’t even heard of PRISM before yesterday.

Both companies emphasize government data requests transparency as a critical component of moving forward. I couldn’t agree more. We need to know about every legal process in place that gives government access to private user data.

epiphany?

Could PRISM mark a tech world epiphany that users care about privacy? I hope so. It certainly seems that major PR departments think so. 24-hour unequivocally worded responses from major Internet CEOs means they care. This is a good thing.

retreat is the wrong reaction

I’ve heard folks argue that PRISM means we need to bet it all on end-to-end encryption. I think that’s wrong, because that doesn’t fulfill users’ needs. But even putting that aside: if you believe the government is willing to penetrate professionally managed corporate servers without company permission or legal clarity, do you sincerely believe the government wouldn’t also penetrate your personal computer and steal the data before you encrypt it?

Services and data aggregation play a critical role in providing users the features they need to share, discover, and grow. They’re not going away. Don’t expect PRISM to herald the era of end-to-end encryption and dumb servers. Those will continue to play only a limited role for very specific use cases.

What we need is (1) companies that deeply respect users, and (2) legal processes that protect user data wherever it lives. I think we’re seeing the beginning of (1). Now, Obama, over to you for (2).

what happens when we forget who should own the data: PRISM

Heard about PRISM? Supposedly, the NSA has direct access to servers at major Internet companies. This has happened before, e.g. when Sprint provided law enforcement a simple data portal they could use at any time. They used it 8 million times in a year. That said, the scale of this new claim is a bit staggering. If the NSA has access to these 9 companies’ data, it has access to every American Citizen’s complete life.

what’s really happening?

I think we don’t know yet what’s happening.

I’m dubious that NSA has direct access to servers at Google, Facebook, Apple, etc. Those companies have strongly denied the claim, and I have trouble believing this happened on a large scale for years without someone at those companies leaking the information.

Might NSA be tapping all network traffic? Yeah, that’s probable. Might NSA have the facility to decrypt the encrypted traffic? For targeted searches, yeah, I believe that. For broad-scale searching across all traffic? I’m not so sure. It could be happening, but that would be tremendous, hard-to-fathom news.

I could be wrong here. Companies might be cooperating and lying about it. NSA might be eons ahead of what we expect in terms of computing capability and cryptographic breakthroughs. This is just my gut instinct.

is this okay?

So, let’s assume it is happening. Is it okay? Hell no it isn’t. There is no doubt in my mind that user data, whether stored in a lockbox in my home or on a server in Oregon, should first and foremost belong to me, and be covered by the same Constitutional protections as my home and private belongings. It is high time for the law to catch up, for a digital due process. Blanket surveillance, warrantless private data capture or seizure, are unacceptable, and should be revolting to anyone who cares about freedom and democracy.

lessons for technologists

I deeply believe that one should first look at one’s own actions before blaming others. And I think we, technologists, have some blame to shoulder.

We’ve let our guard down when it comes to user data ownership. We’ve made it increasingly acceptable to collect user data and make decisions about how best to use it without involving the user much. We’ve often allowed the definition of “using data for the user’s benefit” to loosen.

In other words, where user data ownership in the cloud was murky to begin with, we’ve made it murkier.

Unlike some of my colleagues, I don’t believe we can simply forgo the Cloud or use end-to-end encryption. Encryption cannot be layered on without consequences. You cannot provide the value that users want without some centralization of data and services.

But we can take a stronger stance against companies that abuse users’ trust and treat the data as their own rather than the user’s. We can set an example. We can state clearly that when we collect data, we do it with care, we do it for a clear purpose, and we allow the user to leave as easily as possible, removing traces of their data as best we can.

We can set the example that the user’s data, whatever server it’s on, belongs, by principle, to the user. And then we can and should ask our government to live up to the same standard.

Firefox is the unlocked browser

Anil Dash is a man after my own heart in his latest post, The Case for User Agent Extremism. Please go read this awesome post:

One of my favorite aspects of the infrastructure of the web is that the way we refer to web browsers in a technical context: User Agents. Divorced from its geeky context, the simple phrase seems to be laden with social, even political, implications.

The idea captured in the phrase “user agent” is a powerful one, that this software we run on our computers or our phones acts with agency on behalf of us as users, doing our bidding and following our wishes. But as the web evolves, we’re in fundamental tension with that history and legacy, because the powerful companies that today exert overwhelming control over the web are going to try to make web browsers less an agent of users and more a user-driven agent of those corporations. This is especially true for Google Chrome, Microsoft Internet Explorer and Apple Safari, though Mozilla’s Firefox may be headed down this path as well.

So so right… except for the misinformed inclusion of Firefox in that list. Anil: Firefox is the User Agent you’re looking for. Here’s why.

user agency

Two years ago, I joined Mozilla because Mozillians are constantly working to strengthen the User Agent:

In a few days, I’ll be joining Mozilla.

[..]

[I want] to work on making the browser a true user agent working on behalf of the user. Mozilla folks are not only strongly aligned with that point of view, they’ve already done quite a bit to make it happen.

browser extensions

Like Anil, I believe browser add-ons/extensions/user-scripts are critical for user freedom, as I wrote more than two years ago, before I even joined Mozilla:

Browser extensions, or add-ons, can help address this issue [of user freedom]. They can modify the behavior of specific web sites by making the browser defend user control and privacy more aggressively: they can block ads, block flash, block cookies for certain domains, add extra links for convenience (i.e. direct links to Flickr’s original resolution), etc.. Browser extensions empower users to actively defend their freedom and privacy, to push back on the more egregious actions of certain web publishers.

mobile

Again, like Anil, I saw, in that same blog post, the threat of mobile:

Except in the mobile space. Think about the iPhone browser. Apple disallows web browsers other than Safari, and there is no way to create browser extensions for Safari mobile. When you use Safari on an iPhone, you are using a browser that behaves exactly like all other iPhone Safaris, without exception. And that means that, as web publishers discover improved ways to track you, you continue to lose privacy and control over your data as you surf the Web.

This situation is getting worse: the iPad has the same limitations as the iPhone. Technically, other browsers can be installed on Android, but for all intents and purposes, it seems the built-in browser is the dominant one. Simplified computing is the norm, with single isolated applications, never applications that can modify the behavior of other applications. Thus, no browser extensions, and only one way to surf the web.

so Firefox?

To Anil’s concerns:

  • Firefox Sync, which lets you share bookmarks, passwords, tabs, etc. across devices, is entirely open-source, including the server infrastructure, and if you don’t want Mozilla involved, you can change your Firefox settings to point to a Sync server of your choosing, including one you run on your own using our open-source code. PICL (Profile in the Cloud), the next-generation Sync that my team is working on, will make it even easier for you to choose your own PICL server. We offer a sane default so things work out of the box, but no required centralization, unlike other vendors.
  • Mozilla Persona, our Web Identity solution, works today on any major browser (not just Firefox), and is fully decentralized: you can choose any identity provider you want today. This stands in stark contrast to competing solutions that tie browsers to vendor-specific accounts. Persona is the identity solution that respects users.
  • Firefox for Android is the only major mobile browser that supports add-ons. Anil, if you want “cloud-to-butt”, you can have it on Firefox for Android. You can also have AdBlock Plus. Try that on any other mobile browser.

the unlocked browser

Anil argues that we should talk about unlocked browsers. I love it. Let’s do that. Here’s my bet, Anil: write down your criteria for the ideal unlocked browser. I bet you’ll find that Firefox, on desktop, on mobile, and in all of the services Mozilla is offering as attachments, is exactly what you’re looking for.

connect on your terms

I want to talk about what we, the Identity Team at Mozilla, are working on.

Mozilla makes Firefox, the 2nd most popular browser in the world, and the only major browser built by a non-profit. Mozilla’s mission is to build a better Web that answers to no one but you, the user. It’s hard to overstate how important this is in 2012, when the Web answers less and less to individual users, more and more to powerful data silos whose interests are not always aligned with those of users.

To fulfill the Mozilla mission, the browser remains critical, but is no longer enough. Think of the Web’s hardware and software stack. The browser sits in the middle [1], hardware and operating system below it, cloud services above it. And the browser is getting squeezed: mobile devices, which outnumber desktop computers and are poised to dominate within a couple of years, run operating systems that limit, through technical means or bundling deals, which browser you can use and how you can customize their behavior. Meanwhile, browsers are fast becoming passive funnels of user data into cloud services that offer too little user control and too much lock-in.

Mozilla is moving quickly to address the first issue with Boot2Gecko, a free, open, and Web-based mobile operating system due to launch next year. This is an incredibly important project that aims to establish true user choice in the mobile stack and to power-charge the Open Web by giving HTML5 Apps new capabilities, including camera access, dialing, etc.

The Mozilla Identity Team is working on the top of the stack: we want users to control their transactions, whether using money or data, with cloud services. We want you to connect to the Web on your terms. To do that, we’re building services and corresponding browser features.

We’re starting with Persona, our simple distributed login system, which you can integrate into your web site in a couple of hours — a good bit more easily than our competitors. Persona is unique because it deeply respects users: the only data exchanged is that users wish to provide. For example, when you use Persona to sign into web sites, there is no central authority that learns about all of your activity.

From Persona, we’ll move to services connected to your identity. We’ll help you manage your data, connect the services that matter to you, all under your full control. We want to take user agency, a role typically reserved for the browser sitting on your device, into the cloud. And because we are Mozilla, and all of our code and protocols are open, you know the services will build will always be on your side.

All that said, we know that users pick products based on quality features, not grand visions. Our vision is our compass, but we work on products that fulfill specific user and developer needs today. We will work towards our vision one compelling and pragmatic product at a time.

The lines between client, server, operating system, browser, and apps are blurring. The Web, far more than a set of technologies, is now a rapidly evolving ecosystem of connections between people and services. The Mozilla Identity Team wants to make sure you, the user, are truly in control of your connections. We want to help you connect on your terms. Follow us, join us.


[1] David Ascher spoke about this in his post about the new Mozilla a few months ago.

cookies don’t track people. people track people.

The news shows are in a tizzy: Google violated your privacy again [CBS, CNN] by circumventing Safari’s built-in tracking protection mechanism. It’s great to see a renewed public focus on privacy, but, in this case, I think this is the wrong problem to focus on and the wrong message to send.

what happened exactly

(Want a more detailed technical explanation? Read Jonathan Mayer’s post. He’s the guy who discovered the shenanigans in question.)

Cookies are bits of data with which web sites tag users, so that when users return, the site can recognize them and provide continuity of service. This is mostly good for users, who don’t want to re-identify themselves every time they visit their favorite social network or e-commerce site. Cookies work mostly with strong compartmentalization: if cnn.com tags you, your browser sends that tag back only to cnn.com. This is important because users would be surprised (not the good kind of surprise) if one site could tag them once and then cause them to uniquely identify themselves with the same identifier to all other sites across the Web.

Things get complicated when web sites embed content served by third parties, for example ads within a news site. Should this third-party content also be able to tag your browser? Should the tag be sent back to that third party when its content is loaded?

Different browsers do different things. Firefox toyed with the idea of not sending the tag back to third parties, but in beta-testing realized that this would break some features that users have come to depend upon, for example Facebook sharing widgets. Safari chose a fairly unique approach: they mostly disallow third parties from tagging users, though they do allow existing tags to be read, so that things like Facebook widgets can still work.

For some reason (I won’t speculate why, Google claims it’s to enable the +1 button), Google used a known technique that tricks Safari into accepting a third-party tag from Google.

mechanism vs. intent

So the reason this whole controversy bugs me is that we’re discussing web privacy based on specific mechanisms, a bit like discussing home privacy by regulating infrared cameras. Sure, an infrared camera can be used to violate my home privacy, but it can be used for many good things, and there are many other ways to invade my home privacy. Cookies, like all technical mechanisms, have both good and evil uses. And browsers don’t all behave the same way with respect to cookies and other web features, so it’s typical for developers to find workarounds that effectively give them “standard behavior” from all browsers. Sometimes these workarounds are truly meant to help the user accomplish what they want. Sometimes these workarounds are used to evil ends, e.g. to track people without their consent.

Again, I don’t know what Google’s intentions were. All I know is that we’re prosecuting the wrong thing: a technical mechanism instead of the an intent to track. Cookies don’t track people. People track people. We should be focusing on empowering users to express their preferences on tracking and ensuring web sites are required to comply.

the tracking arms race

If we focus on technical mechanisms to protect user privacy, then we’re dooming users to an un-winnable arms race. There are dozens of ways of tracking users other than classic cookies. Google used a work-around for Safari third-party cookies, but let’s say they hadn’t. Let’s say instead they’d used Flash cookies, or cache cookies, or device fingerprinting, or a slew of other mechanisms that browsers do not defend against, in large part because it’s really hard to defend against these tracking mechanisms without also breaking key Web features. Would Google then be in the clear?

I fear that that’s exactly what we’re implying when we focus the privacy discussion on mechanisms of tracking. The trackers will move on to the next mechanism, and the browsers will scram to try to defend against these mechanisms without every being able to catch up. Blocking tracking at the technical level is, in my opinion, impossible.

the solution: Do Not Track and More

The beginning of a solution lies in the judo move that is Do Not Track, an idea that came out of a collaboration between Christopher Soghoian, Dan Kaminsky, and Sid Stamm (see the full history of DNT). Do Not Track was first implemented in Firefox last year, and soon thereafter in IE, Opera, and Safari. It’s being standardized now at the W3C. It simply lets the user express a preference for not being tracked. Is it a strong technical measure? No. It does nothing to directly prevent tracking. Instead, it lets the user express a preference. And, as support for it grows, it will become incredibly difficult for sites to justify tracking behavior, regardless of the mechanism, when the user has clearly expressed and communicated this choice.

We’ll need more than Do Not Track in the future. But it’s the right kind of battle. It doesn’t care about cookies or fingerprinting or who-knows-what.

If you want to get upset at Google, ask why they don’t provide Do Not Track support in Chrome. Ask why they don’t respect the Do Not Track flag on Google web properties when they see users waiving it. These are fights worth having. But fighting over cookies? That’s so last decade.

UPDATE: corrected origin credit for DNT header.

encryption is (mostly) not magic

A few months ago, Sony’s Playstation Network got hacked. Millions of accounts were breached, leaking physical addresses and passwords. Sony admitted that their data was “not encrypted.”

Around the same time, researchers discovered that Dropbox stores user files “unencrypted.” Dozens (hundreds?) closed their accounts in protest. They’re my confidential files, they cried, why couldn’t you at least encrypt them?

Many, including some quite tech-savvy folks, were quick to indicate that it would have been so easy to encrypt the data. Not encrypting the data proved Sony and Dropbox’s incompetence, they said.

In my opinion, it’s not quite that simple.

Encryption is easy, it’s true. You can download code that implements military-grade encryption in any programming language in a matter of seconds. So why can’t companies just encrypt the data they host and protect us from hackers?

The core problem is that, to be consumable by human users, data has to be decrypted. So the decryption key has to live somewhere between the data-store and the user’s eyeballs. For security purposes, you’d like the decryption key to be very far from the data-store and very close to the user’s eyeballs. Heck you’d like the decryption key to be *inside* the user’s brain. That’s not (yet) possible. And, in fact, in most cases, it isn’t even practical to have the key all that far from the data-store.

encryption relocates the problem

Sony needs to be able to charge your credit card, which requires your billing address. They probably need to do that whether or not you’re online, since you’re not likely to appreciate being involved in your monthly renewal, each and every month. So, even if they encrypt your credit card number and address, they also need to store the decryption key somewhere on their servers. And since they probably want to serve you an “update your account” page with address pre-filled, that decryption key has to be available to decrypt the data as soon as you click “update my account.” So, if Sony’s web servers need to be able to decrypt your data, and hackers break into Sony’s servers, there’s only so much protection encryption provides.

Meanwhile, Dropbox wants to give you access to your files everywhere. Maybe they could keep your files encrypted on their servers, with encryption keys stored only on your desktop machine? Yes… until you want to access your files over the Web using a friend’s computer. And what if you want to share a file with a friend while they’re not online? Somehow you have to send them the decryption key. Dropbox must now ask its users to manage the sharing of these decryption keys (good luck explaining that to them), or must hold on to the decryption key and manage who gets the key…. which means storing the decryption keys on their servers. If you walk down the usability path far enough – in fact not all that far – it becomes clear that Dropbox probably needs to store the decryption key not too far from the encrypted files themselves. Encryption can’t protect you once you actually mean to decrypt the data.

The features users need often dictate where the decryption key is stored. The more useful the product, the closer the decryption key has to be to the encrypted data. Don’t think of encryption as a magic shield that miraculously distinguishes between good and bad guys. Instead, think of encryption as a mechanism for shrinking the size of the secret (one small encryption key can secure gigabytes of data), thus allowing the easy relocation of the secret to another location. That’s still quite useful, but it’s not nearly as magical as many imply it to be.

what about Firefox Sync, Apple TimeMachine, SpiderOak, Helios, etc.

But but but, you might be thinking, there are systems that store encrypted data and don’t store the decryption key. Firefox Sync. Apple’s TimeMachine backup system. The SpiderOak online backup system. Heck, even my own Helios Voting System encrypts user votes in the browser with no decryption keys stored anywhere except the trustees’ own machines.

It’s true, in some very specific cases, you can build systems where the decryption key is stored only on a user’s desktop machine. Sometimes, you can even build a system where the key is stored nowhere durably; instead it is derived on the fly from the user’s password, used to encrypt/decrypt, then forgotten.

But all of these systems have significant usability downsides (yes, even my voting system). If you only have one machine connected to Firefox Sync, and you lose it, you cannot get your bookmarks and web history back. If you forget your Time Machine or SpiderOak password, and your main hard drive crashes, you cannot recover your data from backup. If you lose your Helios Voting decryption key, you cannot tally your election.

And when I say “you cannot get your data back,” I mean you would need a mathematical breakthrough of significant proportions to get your data back. It’s not happening. Your data is lost. Keep in mind: that’s the whole point of not storing the decryption key. It’s not a bug, it’s a feature.

and then there’s sharing

I alluded to this issue in the Dropbox description above: what happens when users want to share data with others? If the servers don’t have the decryption key, that means users have to pass the decryption key to one another. Maybe you’re thinking you can use public-key encryption, where each user has a keypair, publishes the public encryption key, and keeps secret the private decryption key? Now we’re back to “you can’t get your data back” if the user loses their private key.

And what about features like Facebook’s newsfeed, where servers process, massage, aggregate, and filter data for users before they even see it? If the server can’t decrypt the data, then how can it help you process the data before you see it?

To be clear: if your web site has social features, it’s very unlikely you can successfully push the decryption keys down to the user. You’re going to need to read the data on your servers. And if your servers need to read the data, then a hacker who breaks into the servers can read the data, too.

so the cryptographer is telling me that encryption is useless?

No, far from it. I’m only saying that encryption with end-user-controlled keys has far fewer applications than most people think. Those applications need to be well-scoped, and they have to accompanied by big bad disclaimers about what happens when you lose your key.

That said, encryption as a means of partitioning power and access on the server-side remains a very powerful tool. If you have to store credit card numbers, it’s best if you build a subsystem whose entire role is to store credit-card numbers encrypted, and process transactions from other parts of your system. If your entire system is compromised, then you’re no better off than if you hadn’t taken those precautions. But, if only part of your system is compromised, encryption may well stop an attacker from gaining access to the most sensitive parts of the system.

You can take this encryption-as-access-control idea very far. An MIT team just published CryptDB, a modified relational database that uses interesting encryption techniques to strongly enforce access control. Note that, if you have the password to log into the database, this encryption isn’t going to hide the data from you: the decryption key is on the server. Still, it’s a very good defense-in-depth approach.

what about this fully homomorphic encryption thing?

OK, so I lied a little bit when I talked about pre-processing data. There is a kind of encryption, called homomorphic encryption, that lets you perform operations on data while it remains encrypted. The last few years have seen epic progress in this field, and it’s quite exciting…. for a cryptographer. These techniques remain extremely impractical for most use cases today, with an overhead factor in the trillions, both for storage and computation time. And, even when they do become more practical, the central decryption key problem remains: forcing users to manage decryption keys is, for the most part, a usability nightmare.

That said, I must admit: homomorphic encryption is actually almost like magic.

the special case of passwords

Passwords are special because, once stored, you never need to read them back out, you only need to check if a password typed by a user matches the one stored on the server. That’s very different than a credit-card number, which does need to be read after it’s stored so the card can be charged every month. So for passwords, we have special techniques. It’s not encryption, because encryption is reversible, and the whole point is that we’d like the system to strongly disallow extraction of user passwords from the data-store. The special tool is a one-way function, such as bcrypt. Take the password, process it using the one-way function, and store only the output. The one-way function is built to be difficult to reverse: you have to try a password to see if it matches. That’s pretty cool stuff, but really it only applies to passwords.

So, if you’re storing passwords, you should absolutely be passing them through a one-way function. You could say you’re “hashing” them, that’s close enough. In fact you probably want to say you’re salting and hashing them. But whatever you do, you’re not “encrypting” your passwords. That’s just silly.

encryption is not a magic bullet

For the most part, encryption isn’t magic. Encryption lets you manage secrets more securely, but if users are involved in the key management, that almost certainly comes at the expense of usability and features. Web services should strongly consider encryption where possible to more strictly manage their internal access controls. But think carefully before embarking on a design that forces users to manage their keys. In many cases, users simply don’t understand that losing the key means losing the data. As my colleague Umesh Shankar says, if you design a car lock so secure that locking yourself out means crushing the car and buying a new one, you’re probably doing it wrong.

and the laws of physics changed

Google just introduced Google Plus, their take on social networking. Unsurprisingly, Arvind has one of the first great reviews of its most important feature, Circles. Google Circles effectively let you map all the complexities of real-world privacy into your online identity, and that’s simply awesome.

You can think of Circles as the actual circles of friends you have. The things that are easy to do in real life, like sharing a fun anecdote with the friends you generally go out with on Saturday nights, are easy to do in Circles. The things that are hard to do in real life, like planning your best friend’s surprise birthday party with all of his close friends but without him, are no easier in Circles: you have to make a new list of “everyone except Bob.” That’s great, because I don’t think our brains have evolved yet to really feel comfortable with a social model that supports all set operations, e.g. this circle minus this other circle. That’s usually how we get caught lying. (I mean the lies everyone tells as part of their normal social interactions.)

The most important point is that this feature shatters the previously universally accepted idea that privacy must change dramatically given social networking. For a few years, Facebook has defined the Laws of Physics of social networking. On Facebook, it’s not possible to show different people a different face. On Facebook, relationships are, for the most part, symmetrical. And so we all believed that this was the inevitable path forward with social networking. We conflated the fact that users wanted to connect online with the constraints that Facebook created, and we assumed users wanted those constraints. We forgot that software engineers define the Laws of Physics of the worlds they create. We weren’t living in the inherent world of social networking. We were living in Facebook’s definition of social networking.

We now know it doesn’t have to be this way. The Laws of Physics in the online world are mutable. Google just busted open a world of possibility. Users will question, now more than ever, why sharing must work the way it does on Facebook, given that Google has shown it can work differently.

It will make Facebook better. Which will make Google better. And so on. We may be witnessing the beginning of a new era of online privacy, a maturation of sorts. This is an incredibly exciting time.