cookies don’t track people. people track people.

The news shows are in a tizzy: Google violated your privacy again [CBS, CNN] by circumventing Safari’s built-in tracking protection mechanism. It’s great to see a renewed public focus on privacy, but, in this case, I think this is the wrong problem to focus on and the wrong message to send.

what happened exactly

(Want a more detailed technical explanation? Read Jonathan Mayer’s post. He’s the guy who discovered the shenanigans in question.)

Cookies are bits of data with which web sites tag users, so that when users return, the site can recognize them and provide continuity of service. This is mostly good for users, who don’t want to re-identify themselves every time they visit their favorite social network or e-commerce site. Cookies work mostly with strong compartmentalization: if cnn.com tags you, your browser sends that tag back only to cnn.com. This is important because users would be surprised (not the good kind of surprise) if one site could tag them once and then cause them to uniquely identify themselves with the same identifier to all other sites across the Web.

Things get complicated when web sites embed content served by third parties, for example ads within a news site. Should this third-party content also be able to tag your browser? Should the tag be sent back to that third party when its content is loaded?

Different browsers do different things. Firefox toyed with the idea of not sending the tag back to third parties, but in beta-testing realized that this would break some features that users have come to depend upon, for example Facebook sharing widgets. Safari chose a fairly unique approach: they mostly disallow third parties from tagging users, though they do allow existing tags to be read, so that things like Facebook widgets can still work.

For some reason (I won’t speculate why, Google claims it’s to enable the +1 button), Google used a known technique that tricks Safari into accepting a third-party tag from Google.

mechanism vs. intent

So the reason this whole controversy bugs me is that we’re discussing web privacy based on specific mechanisms, a bit like discussing home privacy by regulating infrared cameras. Sure, an infrared camera can be used to violate my home privacy, but it can be used for many good things, and there are many other ways to invade my home privacy. Cookies, like all technical mechanisms, have both good and evil uses. And browsers don’t all behave the same way with respect to cookies and other web features, so it’s typical for developers to find workarounds that effectively give them “standard behavior” from all browsers. Sometimes these workarounds are truly meant to help the user accomplish what they want. Sometimes these workarounds are used to evil ends, e.g. to track people without their consent.

Again, I don’t know what Google’s intentions were. All I know is that we’re prosecuting the wrong thing: a technical mechanism instead of the an intent to track. Cookies don’t track people. People track people. We should be focusing on empowering users to express their preferences on tracking and ensuring web sites are required to comply.

the tracking arms race

If we focus on technical mechanisms to protect user privacy, then we’re dooming users to an un-winnable arms race. There are dozens of ways of tracking users other than classic cookies. Google used a work-around for Safari third-party cookies, but let’s say they hadn’t. Let’s say instead they’d used Flash cookies, or cache cookies, or device fingerprinting, or a slew of other mechanisms that browsers do not defend against, in large part because it’s really hard to defend against these tracking mechanisms without also breaking key Web features. Would Google then be in the clear?

I fear that that’s exactly what we’re implying when we focus the privacy discussion on mechanisms of tracking. The trackers will move on to the next mechanism, and the browsers will scram to try to defend against these mechanisms without every being able to catch up. Blocking tracking at the technical level is, in my opinion, impossible.

the solution: Do Not Track and More

The beginning of a solution lies in the judo move that is Do Not Track, an idea that came out of a collaboration between Christopher Soghoian, Dan Kaminsky, and Sid Stamm (see the full history of DNT). Do Not Track was first implemented in Firefox last year, and soon thereafter in IE, Opera, and Safari. It’s being standardized now at the W3C. It simply lets the user express a preference for not being tracked. Is it a strong technical measure? No. It does nothing to directly prevent tracking. Instead, it lets the user express a preference. And, as support for it grows, it will become incredibly difficult for sites to justify tracking behavior, regardless of the mechanism, when the user has clearly expressed and communicated this choice.

We’ll need more than Do Not Track in the future. But it’s the right kind of battle. It doesn’t care about cookies or fingerprinting or who-knows-what.

If you want to get upset at Google, ask why they don’t provide Do Not Track support in Chrome. Ask why they don’t respect the Do Not Track flag on Google web properties when they see users waiving it. These are fights worth having. But fighting over cookies? That’s so last decade.

UPDATE: corrected origin credit for DNT header.

Posted in privacy, web | 4 Comments

it’s the randomness, stupid

The New York Times is reporting that a flaw has been found in RSA. The original paper is here, and it looks like a second team was about to release similar information, so they’ve posted an explanatory blog post, which I recommend. A number of people are understandably concerned.

Since I couldn’t find a simple explanation of what happened, I figured I would write one up.

public-key encryption

Public-key encryption is fascinating. You generate a keypair composed of a public and a private key. You post the public key on your web site, and anyone can use it to encrypt data destined for you. Decryption requires the private key, and you keep it to yourself. Anyone can encrypt; only you can decrypt. This is how your web browser secures the communication channel with your bank: the bank publishes its public key, and your browser encrypts data against that public key before sending it to the bank. An eavesdropper knows the bank’s public key, but because they don’t have the private key, they can’t see the data you’re sending.

how do I get me one of them keypairs?

Cryptographic keys are just numbers with special properties. In a public-key encryption system, you typically pick one of these special big numbers randomly, you make that your private key, and from it you compute the public key. It’s easy to compute the public key from the private key. However, given the public key, it’s really, really hard to recover the private key. We don’t know how to do it without spending millions of years of computation for your average public key. That’s right, a single user’s public key, attacked with vast computational power, will not yield its corresponding private key. But if you have the private key, you can get the public key in a few milliseconds. That’s the magic, except it’s not magic, it’s math.

what do you mean by “randomly”?

If you barricade your front door, a thief will probably come in via the window. And so it is with public-key encryption. Attacking a public key directly in order to somehow extract its private counterpart is really, really hard. But maybe it’s not so hard to guess how that private key was selected in the first place. Remember, you have to pick the private key randomly. And, as it turns out, computers are really bad at picking numbers at random (humans are only marginally better.) So, if you’re not careful about how you picked that private key, an attacker might simply reconstruct how you picked it.

lots of people picking lots of keys

Cryptography is everywhere now, so there are millions of public keys made available on the Web. Just go to https://amazon.com, and Amazon will tell you its public key. If a bunch of folks use a not-so-random way to pick their private key, you might expect funny coincidences to happen. Alice in San Francisco and Bob in New York might independently end up with the same private key simply because they both used a similar process for selecting this private key from all possible values. If that happens, they would also have the same public key, and you would be able to easily discover this: just compare their public keys! The researchers found that this happens every now and then: they found a couple dozen public keys that were identical to at least one other public key. In and of itself, that’s kind of fascinating. But it’s not really shocking, right? Clearly, if people have the exact same public key, then they picked their private key poorly.

the funny thing about RSA

The funny thing about RSA, the most common approach to public-key encryption, is that its private key is composed of two numbers, both prime (which means they are divisible only by themselves and 1, for example: 11, 17, 41,…). The public key is then the product of those two primes. As it turns out, it’s really easy for computers to multiply numbers, even really big ones. But if you’re only given the product of two primes, it’s incredibly hard to recover those two factors. For example, take two primes each 200 digits long, multiply them together to get a 400-digit long number, and give that to a friend. Given all the computing power in the world for many lifetimes, your friend will not be able to recover those two prime numbers you initially picked.

Now, there are lots and lots of prime numbers. So many, in fact, that if you and I randomly select a 200-digit prime number, there is no conceivable chance we’d pick the same one. But, what if we don’t do it randomly? What if we both start out with 1 followed by 199 0s, and work our way up until we find the first prime number? Then of course we’d end up with the same one. Now maybe we’re not so stupid, and we have a clever way of picking a much more complex starting point, and then working our way up to find the next prime. Well, let’s hope we don’t both use the same clever method, because no matter how clever it is, if we both use the same method, we’re going to end up with the same prime.

So, back to the funny thing about RSA: because the private key is made up of two prime numbers, if people don’t choose those prime numbers randomly, then two different people might end up with one prime number in common, but not the other. So their public keys won’t be exactly the same: one will be p1 x p2, and the other will be p1 x p3. So it won’t be immediately obvious that we used poor randomness.

And now the final piece of fun math. It’s really hard to factor numbers, and it’s really easy to multiply them. Another thing that’s really easy is to find common factors between two numbers. So if I have two RSA public keys that share a prime factor, it’s really easy to determine that common prime factor. And then, with that prime factor, it’s easy to discover the other prime factor in each of the two keys. So, one RSA public key is very hard to break, but two RSA public keys that share a prime factor are trivial to break together.

So that’s what the researchers did. They looked at every pair of RSA public keys and found that 0.2% of them share a prime factor with another. Given that, they were able to fully factor those 0.2% of keys, and thus completely break their security.

This really shouldn’t be that much more shocking than the case where users have the exact same public key. It’s just that, with RSA, there is another way in which poor randomness could result in weak keys, without those keys being exactly identical. It’s fascinating, and it’s a great study, but the root cause is no different: it’s all about the randomness.

so other approaches are better?

No. This attack has nothing to do with RSA. It has everything to do with randomness. No matter the algorithm you pick for public-key encryption, you have to find a really good source of randomness to pick your private key. The cute thing here is that weak randomness was revealed in a new surprising way, because RSA public keys can share a prime factor without being immediately obviously identical. That’s cool, but it’s not a weakness of RSA.

how do I fix my code?

Make sure you’re using a secure random number generator to generate your keys. Make sure you’ve seeded it with good randomness, using operating-system calls if possible. And mostly, don’t panic. There’s no new attack here, only a very interesting revelation, using a very interesting trick, that a lot of people don’t pay sufficient attention to randomness when generating crypto keys.

We knew that. Now we really know it.

Posted in crypto | 10 Comments

a simpler, webbier approach to Web Intents (or Activities)

A few months ago, Mike Hanson and I started meeting with James, Paul, Greg, and others on the Google Chrome team. We had a common goal: how might web developers build applications that talk to each other in a way that the user, not the site, decides which application to use? For example, how might a major news site provide a “share” button that connects to the user’s preferred sharing mechanism? Not everyone uses the same top-three social networks, yet users are constantly forced to search for their preferred service within a set of publisher-chosen buttons. That leads to undue centralization and significantly undercuts innovation and user choice. How incredibly inelegant!

We figured that, with a bit more browser smarts, we could do better.

to the design studio!

Mike and I proposed Web Activities, and put together a screencast.

The Google team proposed Web Intents, and put together a far more complete proposal.

Techcrunch covered our collaboration.

While all this was happening, the always amazing Tyler Close, of Web Introducer fame and also from Google, was whispering in our ears “Guys, I think you’re doing it wrong. It’s over-engineered. We can do simpler.” We all ignored him. I think that was a mistake. Tyler was right. Web Activities was over-engineered. And, I fear, Web Intents is too.

(Tantek also deserves credit for pointing out that we can do simpler.)

the glaring inconsistency

Web applications already have a mechanism for communicating with other web applications loaded within the same browser: postMessage. It isn’t perfect, but it works, and it is flexible enough that much innovation has been built on it. Google, Microsoft, Facebook all use it, oftentimes for embedding widgets within other pages, each in a very different way. At Mozilla, we use postMessage extensively for BrowserID, and we’ve built nice abstractions on top of it, like winchan to consistently build a message channel to a new popup window (including all IE workarounds).

postMessage is a very simple, very Webby, and very generative: it’s easy to build new ideas on top of it. It doesn’t care about mime types, dialogs, callbacks, etc. It’s just a simple, authenticated message channel. The only reason postMessage isn’t enough to do what we need is that the sender and receiver are, for the most part, tightly coupled. The sender has to specify its receiver, which means the user can’t easily step in and substitute the endpoint of her choice. postMessage tightly couples the sender and receiver of the channel. We’d like a loose coupling, where the user gets to mix and match senders and receivers.

So wait, if that’s the only gap, then why are we proposing a completely different approach to cross-application messaging? Why should tight and loose coupling of messaging channels be implemented in completely different ways? Given that the postMessage abstraction has been so successful and useful, the “right” way to move forward is to tweak it, minimally, not to redesign a different stack.

a minimalist way forward

A minimalist way forward is to use postMessage as is, and to provide only the bits necessary to enable loose coupling.

Here, Tyler comes to the rescue again. He proposed, in one of the last chats we had with Google, using custom protocol handlers as the target of postMessage channels. So, when a major news site wants to share an article, rather than postMessage’ing (or linking) to http://twitter.com/, it can use the one-indirection-away URL share://.... The browser can then jump in and substitute the user’s preferred implementation of a sharing provider at that custom protocol handler. Everything else, linking or communicating via postMessage, is then the same.The only difference is, there’s one level of indirection to give the user a chance to step in and say “that service, please.”

What’s even more interesting is that we already have basic mechanisms for sites to register themselves as custom protocol handlers: registerProtocolHandler. The current mechanisms aren’t quite good enough yet, but the tweaks we would need are far simpler than building a whole new messaging stack. Mozilla’s own Austin King has prototyped what some of these tweaks might look like using a JavaScript shim, and the results are surprisingly useful with only minor tweaks.

another minimalist approach

There’s also Ian Hickson’s proposal, which is a little bit different than using protocol handlers and has some nice properties. It’s quite similar to Tyler’s proposal in one key way: do the smallest amount of work to set up a message channel, and get out of the way. Mark Hammond has prototyped Ian’s proposal, and it looks like it can be nicely shimmed in pure JavaScript (with just one tweak to the API that’s probably worth considering even for the native implementation.) I like this proposal, too, and I wonder if it could be made to work with custom protocol handlers, which have a nice URL-based architecture.

so now what?

I propose that we stop for a second on the Web Intents discussion and ask ourselves: maybe we’ve been over-engineering this. Maybe we don’t need mime types and new HTML elements and new DOM properties, etc. Maybe there’s a much easier, good-enough solution, based on proven technology, with only minor tweaks to well-understood code paths. It won’t be perfect, we’ll probably need some JS libraries to make things more convenient for developers, but that’s okay. That’s better for the Web. Keep the platform simple, leave the real innovation to the edges.

I believe Web Intents, as currently proposed, are over-engineered. So are Web Activities. But it’s not too late to correct course. Let’s figure out the simplest way to involve the user in choosing an application, set up a message channel, and get out of the way.

Posted in mozilla, web | 11 Comments

encryption is (mostly) not magic

A few months ago, Sony’s Playstation Network got hacked. Millions of accounts were breached, leaking physical addresses and passwords. Sony admitted that their data was “not encrypted.”

Around the same time, researchers discovered that Dropbox stores user files “unencrypted.” Dozens (hundreds?) closed their accounts in protest. They’re my confidential files, they cried, why couldn’t you at least encrypt them?

Many, including some quite tech-savvy folks, were quick to indicate that it would have been so easy to encrypt the data. Not encrypting the data proved Sony and Dropbox’s incompetence, they said.

In my opinion, it’s not quite that simple.

Encryption is easy, it’s true. You can download code that implements military-grade encryption in any programming language in a matter of seconds. So why can’t companies just encrypt the data they host and protect us from hackers?

The core problem is that, to be consumable by human users, data has to be decrypted. So the decryption key has to live somewhere between the data-store and the user’s eyeballs. For security purposes, you’d like the decryption key to be very far from the data-store and very close to the user’s eyeballs. Heck you’d like the decryption key to be *inside* the user’s brain. That’s not (yet) possible. And, in fact, in most cases, it isn’t even practical to have the key all that far from the data-store.

encryption relocates the problem

Sony needs to be able to charge your credit card, which requires your billing address. They probably need to do that whether or not you’re online, since you’re not likely to appreciate being involved in your monthly renewal, each and every month. So, even if they encrypt your credit card number and address, they also need to store the decryption key somewhere on their servers. And since they probably want to serve you an “update your account” page with address pre-filled, that decryption key has to be available to decrypt the data as soon as you click “update my account.” So, if Sony’s web servers need to be able to decrypt your data, and hackers break into Sony’s servers, there’s only so much protection encryption provides.

Meanwhile, Dropbox wants to give you access to your files everywhere. Maybe they could keep your files encrypted on their servers, with encryption keys stored only on your desktop machine? Yes… until you want to access your files over the Web using a friend’s computer. And what if you want to share a file with a friend while they’re not online? Somehow you have to send them the decryption key. Dropbox must now ask its users to manage the sharing of these decryption keys (good luck explaining that to them), or must hold on to the decryption key and manage who gets the key…. which means storing the decryption keys on their servers. If you walk down the usability path far enough – in fact not all that far – it becomes clear that Dropbox probably needs to store the decryption key not too far from the encrypted files themselves. Encryption can’t protect you once you actually mean to decrypt the data.

The features users need often dictate where the decryption key is stored. The more useful the product, the closer the decryption key has to be to the encrypted data. Don’t think of encryption as a magic shield that miraculously distinguishes between good and bad guys. Instead, think of encryption as a mechanism for shrinking the size of the secret (one small encryption key can secure gigabytes of data), thus allowing the easy relocation of the secret to another location. That’s still quite useful, but it’s not nearly as magical as many imply it to be.

what about Firefox Sync, Apple TimeMachine, SpiderOak, Helios, etc.

But but but, you might be thinking, there are systems that store encrypted data and don’t store the decryption key. Firefox Sync. Apple’s TimeMachine backup system. The SpiderOak online backup system. Heck, even my own Helios Voting System encrypts user votes in the browser with no decryption keys stored anywhere except the trustees’ own machines.

It’s true, in some very specific cases, you can build systems where the decryption key is stored only on a user’s desktop machine. Sometimes, you can even build a system where the key is stored nowhere durably; instead it is derived on the fly from the user’s password, used to encrypt/decrypt, then forgotten.

But all of these systems have significant usability downsides (yes, even my voting system). If you only have one machine connected to Firefox Sync, and you lose it, you cannot get your bookmarks and web history back. If you forget your Time Machine or SpiderOak password, and your main hard drive crashes, you cannot recover your data from backup. If you lose your Helios Voting decryption key, you cannot tally your election.

And when I say “you cannot get your data back,” I mean you would need a mathematical breakthrough of significant proportions to get your data back. It’s not happening. Your data is lost. Keep in mind: that’s the whole point of not storing the decryption key. It’s not a bug, it’s a feature.

and then there’s sharing

I alluded to this issue in the Dropbox description above: what happens when users want to share data with others? If the servers don’t have the decryption key, that means users have to pass the decryption key to one another. Maybe you’re thinking you can use public-key encryption, where each user has a keypair, publishes the public encryption key, and keeps secret the private decryption key? Now we’re back to “you can’t get your data back” if the user loses their private key.

And what about features like Facebook’s newsfeed, where servers process, massage, aggregate, and filter data for users before they even see it? If the server can’t decrypt the data, then how can it help you process the data before you see it?

To be clear: if your web site has social features, it’s very unlikely you can successfully push the decryption keys down to the user. You’re going to need to read the data on your servers. And if your servers need to read the data, then a hacker who breaks into the servers can read the data, too.

so the cryptographer is telling me that encryption is useless?

No, far from it. I’m only saying that encryption with end-user-controlled keys has far fewer applications than most people think. Those applications need to be well-scoped, and they have to accompanied by big bad disclaimers about what happens when you lose your key.

That said, encryption as a means of partitioning power and access on the server-side remains a very powerful tool. If you have to store credit card numbers, it’s best if you build a subsystem whose entire role is to store credit-card numbers encrypted, and process transactions from other parts of your system. If your entire system is compromised, then you’re no better off than if you hadn’t taken those precautions. But, if only part of your system is compromised, encryption may well stop an attacker from gaining access to the most sensitive parts of the system.

You can take this encryption-as-access-control idea very far. An MIT team just published CryptDB, a modified relational database that uses interesting encryption techniques to strongly enforce access control. Note that, if you have the password to log into the database, this encryption isn’t going to hide the data from you: the decryption key is on the server. Still, it’s a very good defense-in-depth approach.

what about this fully homomorphic encryption thing?

OK, so I lied a little bit when I talked about pre-processing data. There is a kind of encryption, called homomorphic encryption, that lets you perform operations on data while it remains encrypted. The last few years have seen epic progress in this field, and it’s quite exciting…. for a cryptographer. These techniques remain extremely impractical for most use cases today, with an overhead factor in the trillions, both for storage and computation time. And, even when they do become more practical, the central decryption key problem remains: forcing users to manage decryption keys is, for the most part, a usability nightmare.

That said, I must admit: homomorphic encryption is actually almost like magic.

the special case of passwords

Passwords are special because, once stored, you never need to read them back out, you only need to check if a password typed by a user matches the one stored on the server. That’s very different than a credit-card number, which does need to be read after it’s stored so the card can be charged every month. So for passwords, we have special techniques. It’s not encryption, because encryption is reversible, and the whole point is that we’d like the system to strongly disallow extraction of user passwords from the data-store. The special tool is a one-way function, such as bcrypt. Take the password, process it using the one-way function, and store only the output. The one-way function is built to be difficult to reverse: you have to try a password to see if it matches. That’s pretty cool stuff, but really it only applies to passwords.

So, if you’re storing passwords, you should absolutely be passing them through a one-way function. You could say you’re “hashing” them, that’s close enough. In fact you probably want to say you’re salting and hashing them. But whatever you do, you’re not “encrypting” your passwords. That’s just silly.

encryption is not a magic bullet

For the most part, encryption isn’t magic. Encryption lets you manage secrets more securely, but if users are involved in the key management, that almost certainly comes at the expense of usability and features. Web services should strongly consider encryption where possible to more strictly manage their internal access controls. But think carefully before embarking on a design that forces users to manage their keys. In many cases, users simply don’t understand that losing the key means losing the data. As my colleague Umesh Shankar says, if you design a car lock so secure that locking yourself out means crushing the car and buying a new one, you’re probably doing it wrong.

Posted in crypto, mozilla, privacy, security, web | 14 Comments

an ode to lessig’s optimism, taking on gigantic challenges… and a quibble

Last night, I went to see Lessig pitch his latest book, Republic, Lost. His latest spiel is fantastic, fine-tuned, gripping, thrilling, inspiring. I’ve been an avid fan of Lessigian story-telling for 13 years now. The way he sets up his argument, the way he goes far beyond the obvious, far beyond the quick fix, and the way he absolutely destroys any shred of doubt that may remain about his thesis. I saw him giving one of his first “Code” lectures at Harvard in 1998. In 2002, I waited in line at the Supreme Court and got to see the last five minutes of his argument. I saw him in the TV studio debating Jack Valenti. I was at the Creative Commons launch in 2003. I saw his first Corruption lecture at Stanford in 2008. It just doesn’t get old.

The central thing I deeply admire about Lessig is that he takes on gigantic battles with care and determination. He’s not deluded about his chances, but he fights anyways. He looks for, and finds, incredibly aggressive wins. Copyright reform against the Disneys of the world didn’t work, but Creative Commons is genuinely affecting how we share. The corruption of the political process is an impossible challenge, yet Lessig sees a path, and I believe his is the the most likely path to success. I don’t yet know how Lessig will find the equivalent of the Creative-Commons-win in this much larger battle. But I know he’s thinking about it, and I believe that, in time, he will move the needle, significantly.

That kind of “crazy” optimism is deeply inspiring, because it is, indeed, the only way to change the world. Time is too precious not to focus on the big, gigantic, mind-blowing battles. Lessig reminds me of that every time I attend one of his talks.

So, a quibble. Lessig brought up one argument I’ve seen him make before: because vaccine policy is influenced by experts who may have received compensation from the pharmaceutical industry, people may lose trust in vaccine policy. Now let’s be clear: Lessig is not saying that vaccines are unsafe. He’s saying that, because some vaccine experts do not appear to be fully unbiased, it is understandable that people lose trust in vaccine policy.

I disagree, and I think it weakens Lessig’s argument to make this connection. I’d like to see Paul Offit and his peers deciding our vaccine policy (in a public forum of course), even though he’s getting rich from his amazing Rotavirus vaccine. Checks and balances in areas that require deep expertise cannot be achieved by banning from advisory boards all experts with a potential conflict of interest. In fact, that’s a recipe for disaster by way of mediocrity. We have other checks and balances for this. We can require peer-reviewed publications. We can fund counter-studies. We can let the truth rise to the top via competition. This country’s national vaccine policy is something to be proud of.

There is, however, a subtle but serious corruption in the medical world that should make it into Lessig’s slideshow: pharmaceutical reps routinely treat physicians to dinners, trips, etc. They leave free drug samples, they leave pens and paper pads with drug logos prominently featured, they suggest that new drugs are better than old tried-and-true drugs, and sometimes they very subtly suggest off-label uses. Drug companies receive prescription records for individual physicians: they know where they’re having an impact and can calculate very clear Return On Investment. The result: Vioxx. Physicians aren’t evil, but they are human. The grey areas in medicine are large and common, providing fertile ground for skilled influencing.

That needs to stop: where vaccine policy is a mostly public forum with competing ideas, there isn’t any oversight or counter-balance to drug-rep influence. We can change this. Doctors could be required to provide to all patients, alongside the insane HIPAA disclosure form, a funding disclosure form of all compensation received from drug reps. That disclosure form alone might make doctors think twice before prescribing a drug, and drug reps before paying for dinner. And institutions should follow the path blazed by Mass General, banning their physicians from accepting gifts and banning pharmaceutical reps from physician offices.

Posted in policy | 2 Comments

BrowserID and me

A few weeks ago, I became Tech Lead on Identity and User Data at Mozilla. This is an awesome and challenging responsibility, and I’ve been busy. When I took on this new responsibility, BrowserID was already well under way, so we were able to launch it in my second week on the project (early July). It’s been a very fun ride.

Here’s the BrowserID demo at the Mozilla All-Hands last week:

Given my prior work on email-based authentication (EmID, Lightweight Email Signatures, BeamAuth), you might think BrowserID was my brainchild. In fact, it really wasn’t. And, in a testament to the shrinking impact of academic publication venues, none of the BrowserID team had ever heard of my work on email-based authentication before I arrived at Mozila, even though Mozilla folks are quite well versed in the state of the art. But who cares: when I found out about the ongoing work and how we agreed on just about every design principle, I was incredibly excited. And when I realized the fantastic work the team had already done on defining a scaffolding and adoption path for the technology, I was super impressed.

BrowserID started with the Verified Email Protocol, designed by Mike Hanson and Dan Mills, who came up with the approach after extensive exploration of web-based identity approaches over the last two years. It’s a simple idea: users can prove to web sites that they own a particular email address. That’s how they register, and that’s how they log back in the next time they visit the site. BrowserID, the code and site, was initially bootstrapped by Lloyd Hilaiel and Mike Hanson. Shane Tomlinson and I joined the team in June. We now also have an awesome UX design team (Bryan and Andy) and the team continues to grow (yay Austin!)

So, that’s what I’m working on these days: BrowserID and other Identity+UserData efforts at Mozilla. I’m excited to be leading this technical effort. The team is amazing, and we’ve got big aggressive plans to help you control your identity and data on the Web.

Posted in identity, personal, web | Leave a comment

my 9.11

Maybe it’s silly to add yet another story to the list of “where I was on 9/11.” I suffered no direct loss, while some people I know did. Many other world events were far, far more awful. But as I did experience 9/11 in person, I feel the need to write down some thoughts, some memories.

On the night of September 10th, 2001, I was having drinks with an old friend (I’m having trouble remembering which friend!) in Chelsea, about 3 miles north of the World Trade Center. We stayed up late. We talked about world politics, terrorism, the Middle East. So when the alarm clock radio came on a bit before 9am (hey, I’m a software guy), with talk of a plane hitting a building, I thought I was having some messed-up dream based on the night’s conversation. By the time I turned on the TV, the second plane had hit. My mom called. “Mom, that’s a second plane, it’s not an accident.” I got a call from my sister who lived a few miles uptown, she was fine, too. I noticed a voicemail from my friend and coworker Josh. His wife, who worked in the WTC, was fine, as she was away on a trip. Then the cell phones went dead.

I threw on clothes and comfortable clothes. I watched the first tower collapse on TV. I grabbed a backpack, put on jogging shoes, and ran out of my apartment on 21st street, between 7th and 8th avenues. From my street I couldn’t see the towers, so I headed to 7th avenue. By the time I got there, all I could see was smoke. I had to ask someone on the street: “where’s the second tower?”

I started walking downtown, towards my office in TriBeCa, about a mile north of the WTC. I panicked for a moment: what if this was just the beginning and more was underway? I stepped into a convenience store, bought 3 candybars and 2 bottles of water. As I walked downtown, I passed cars stopped in the street, doors open, people standing with their radios turned on, straight out of some apocalypse movie. Then I started seeing people covered in dust. I made it to my office just south of Canal St, walked in, and logged on. I think I spent an hour clicking “respond”, writing “yes, I’m fine, more later.”

I found an unused IP address on one of our servers and set up a page to list “I’m ok” messages. Looks like the Internet Archive has a copy. I remember thinking I might be aggregating thousands of messages from around the city, though in the end it was obviously limited to my circle of friends. I guess I wanted to help, in any way I could.

Somewhere in there I chatted with co-workers, emailed our clients and partners, told them we were alright, asked how they were doing. One of our clients was on Wall Street and lost a number of friends.

In the mid afternoon, my friend Greg and I went up on the roof of our office building and talked, looking down Greenwich St towards WTC 7 surrounded by smoke of the collapsed towers. We went downstairs, heard on the radio that WTC 7 had collapsed, ran back up, and sure enough, it was gone. That night, Amanda, Greg, my sister and I crowded into my little apartment, ate frozen pizza and watched the news until we couldn’t anymore.

One of my best friends was stuck in Pennsylvania on a business trip. He wanted to be home in NYC, but couldn’t.

I gave my team the week off.

The next few days were odd.

I wasn’t able to fly out to the West Coast that Friday to see my friend Rodrigo’s new baby.

I stayed up nights listening to fighter jets flying overhead. I participated in far too many email arguments about “how to respond.” I was crazy and emotional. I made stupid, childish arguments. Within a few weeks, I would be calmer and, I think, more reasonable. I wish our leaders had also taken a moment to breathe.

People started going about their business again. There was the smell of burn in the air, but people tried not to talk about it. People were nice. Very nice. I went out a lot. I followed my friend Arjun to see indie bands on the lower east side. I didn’t want to stay in, ever, I needed to go out and be with people. I remember a faux-french band we saw, the guitarist wore an “I [heart] NYC” t-shirt he’d modified to say “J’[aime] NYC”. They didn’t mention the burning buildings. No one talked about it. Instead people went out of their way to be friendly, to reach out, to help.

That was, in some way, the saddest part of that experience. Everyone wanted to help. Doctors waited for the wounded, but none came. People lined up to give blood, but none was needed. People lined up to help downtown but there wasn’t room for everyone. So we tried to help each other, in small ways, every day.

Posted in personal | 1 Comment

with freedom comes responsibility: open publishing

As of a few months ago, I’m no longer on a publish-or-perish academic track. Mozilla gives me the freedom to publish, but no pressure. Coincidentally, the publishing world is at a bit of a crossroads. Some organizations, like USENIX, are increasingly open: all papers are published for the world to see, many talks are videotaped and available openly. Others, like IEEE, are increasingly closed, with tighter and tighter constraints on authors, more paywalls and obstacles to the dissemination of knowledge.

I’ve got increased freedom, so I intend to use it. Starting today, I will not publish nor review papers destined for closed venues. Academic publications should be available for the world to read, to learn from, to build upon. If you’d like me on your program committee, if you’d like me to review a journal publication, if you’d like me to help with a paper, please understand that I will refuse if the conference/journal isn’t truly open. In the short term, this probably means I’ll only work with USENIX, and maybe IACR which appears to be moving towards true open-access.

My move isn’t exactly courageous. I have the luxury to make this decision, while many of my colleagues do not. I hope a few tenured professors make this move, though, as they have both the luxury and a good bit more influence than I do. Matt Blaze is starting down this path. Dan Wallach is helping tweak the IEEE approach. All of these efforts are incredibly important.

This is about free dissemination of knowledge. This is the point of the Internet. Academics who stand for discovery and learning should be outraged by the direction most publishers are taking today, and should at the very least encourage those publishers who are doing it well. Hesitating between ACM and USENIX? Go with USENIX. IACR holding a vote on open-access publishing? Make your voice heard.

Posted in personal | 5 Comments

and the laws of physics changed

Google just introduced Google Plus, their take on social networking. Unsurprisingly, Arvind has one of the first great reviews of its most important feature, Circles. Google Circles effectively let you map all the complexities of real-world privacy into your online identity, and that’s simply awesome.

You can think of Circles as the actual circles of friends you have. The things that are easy to do in real life, like sharing a fun anecdote with the friends you generally go out with on Saturday nights, are easy to do in Circles. The things that are hard to do in real life, like planning your best friend’s surprise birthday party with all of his close friends but without him, are no easier in Circles: you have to make a new list of “everyone except Bob.” That’s great, because I don’t think our brains have evolved yet to really feel comfortable with a social model that supports all set operations, e.g. this circle minus this other circle. That’s usually how we get caught lying. (I mean the lies everyone tells as part of their normal social interactions.)

The most important point is that this feature shatters the previously universally accepted idea that privacy must change dramatically given social networking. For a few years, Facebook has defined the Laws of Physics of social networking. On Facebook, it’s not possible to show different people a different face. On Facebook, relationships are, for the most part, symmetrical. And so we all believed that this was the inevitable path forward with social networking. We conflated the fact that users wanted to connect online with the constraints that Facebook created, and we assumed users wanted those constraints. We forgot that software engineers define the Laws of Physics of the worlds they create. We weren’t living in the inherent world of social networking. We were living in Facebook’s definition of social networking.

We now know it doesn’t have to be this way. The Laws of Physics in the online world are mutable. Google just busted open a world of possibility. Users will question, now more than ever, why sharing must work the way it does on Facebook, given that Google has shown it can work differently.

It will make Facebook better. Which will make Google better. And so on. We may be witnessing the beginning of a new era of online privacy, a maturation of sorts. This is an incredibly exciting time.

Posted in identity, privacy, web | 6 Comments

with great power…

When Arvind writes something, I tend to wait until I have a quiet moment to read it, because it usually packs a particularly high signal to noise ratio. His latest post In Silicon Valley, Great Power but No Responsibility, is awesome:

We’re at a unique time in history in terms of technologists having so much direct power. There’s just something about the picture of an engineer in Silicon Valley pushing a feature live at the end of a week, and then heading out for some beer, while people halfway around the world wake up and start using the feature and trusting their lives to it. It gives you pause.

So true. I’ve been thinking about this issue a lot recently, especially as good technologists in the Valley are in exceptionally good financial / career health, while the rest of the country, and sometimes even the other half of our cities, are suffering through a long and deep recession.

Here’s one story that blew my mind a few months ago. Facebook (and I don’t mean to pick on Facebook, they just happen to have a lot of data) introduced a feature that shows you photos from your past you haven’t seen in a while. Except, that turned out to include a lot of photos of ex-boyfriends and ex-girlfriends, and people complained. But here’s the thing: Facebook photos often contain tags of people present in the photo. And you’ve told Facebook about your relationships over time (though it’s likely that, even if you didn’t, they can probably guess from your joint social network activity.) So what did Facebook do? They computed the graph of ex-relationships, and they ensured that you are no longer proactively shown photos of your exes. They did this in a matter of days. Think about that one again: in a matter of days, they figured out all the romantic relationships that ever occurred between their 600M+ users. The power of that knowledge is staggering, and if what I hear about Facebook is correct, that power is in just about every Facebook engineer’s hands.

Here’s another story. I used to lecture MIT Undergraduates about web security. My approach was basically: (a) hack a few of the student project web sites, then (b) hack a few public web sites to make the students understand how widespread the problems are. In late 2003, I showed students how to buy movie tickets for free (the price of the ticket was held in a hidden variable in a web form… duh). I ended my lecture with “but just because you can do this, doesn’t mean you should. Please don’t do this.” Over the years, I’ve received a few emails from former students to the tune of “hey Ben, you gave an awesome lecture, I still remember how a bunch of us went out to see Matrix 3 for free that weekend!”

I shudder to think about what happens when you put those two stories together. While the earliest hackers may have had a particularly well developed ethical sense, I get the sense that our profession’s average ethical sense doesn’t nearly measure up to the incredible power we have gained precipitously over the last 15 years.

And then there’s the additional point Arvind makes, which I’ve observed directly too:

I often hear a willful disdain for moral issues. Anything that’s technically feasible is seen as fair game and those who raise objections are seen as incompetent outsiders trying to rain on the parade of techno-utopia.

Yes! There’s this continued and surprisingly widespread delusion that technology is somehow neutral, that moral decisions are for other people to make. But that’s just not true. Lessig taught me (and a generation of other technologists) that Code is Law, or as I prefer to think about it, that Code defines the Laws of Physics on the Internet. Laws of Physics are only free of moral value if they are truly natural. When they are artificial, they become deeply intertwined with morals, because the technologists choose which artificial worlds to create, which defaults to set, which way gravity pulls you. Too often, artificial gravity tends to pull users in the direction that makes the providing company the most money.

A parting thought. In 2008, the world turned against bankers, because many profited by exploiting their expertise in a rapidly accelerating field (financial instruments) over others’ ignorance of even basic concepts (adjustable-rate mortgages). How long before we software engineers find our profession in a similar position? How long will we shield ourselves from the responsibility we have, as experts in the field much like experts in any other field, to guide others to make the best decision for them?

Posted in policy, privacy | 7 Comments