Benlog

security, privacy, transparency.

(your) information wants to be free

Filed under: data,privacy,security — April 28, 2011 @ 12:46 am

A couple of weeks ago, Epsilon, an email marketing firm, was breached. If you are a customer of Tivo, Best Buy, Target, The College Board, Walgreens, etc., that means your name and email address were accessed by some attacker. You probably received a warning to watch out for phishing attacks (assuming it wasn’t caught in your spam filter).

Yesterday, the Sony Playstation Network of 75 million gamers was compromised. Names, addresses, and possibly credit cards were accessed by attackers. This may well be the largest data breach in history.

And a few days ago, it was discovered that iPhones keep track of your location over extended periods of time and copy that data to backups, even if you explicitly tell your iPhone not to track your location. There are believable claims that law enforcement has already used this information without a court order. Apple now says this was a bug and they’re fixing it.

In 1984, Stewart Brand famously said that information wants to be free. John Perry Barlow reiterated it in the early 90s, and added “Information Replicates into the Cracks of Possibility.” When this idea was applied to online music sharing, it was cool in a “fight the man!” kind of way. Unfortunately, information replication doesn’t discriminate: your personal data, credit cards and medical problems alike, also want to be free. Keeping it secret is really, really hard.

I get the sense that many think Epsilon and Sony were stupidly incompetent, and Apple was evil. This fails to capture the nature of digital data. It’s just incredibly hard to secure data when one failure outweighs thousands of successes. In the normal course of development, data gets copied all over the place. It takes a concerted effort to enumerate the places where data end up, to design defensively against data leakage, and to audit the code after the fact to ensure no mistakes were made. One mistake negates all successes.

Here’s one way to get an intuitive feel for it: when building a skyscraper, workers are constantly fighting gravity. One moment of inattention, and a steel beam can fall from the 50th floor, turning a small oversight into a tragedy. The same goes for software systems and data breaches. The natural state of data is to be copied, logged, transmitted, stored, and stored again. It takes constant fighting and vigilance to prevent that breach. It takes privacy and security engineering.

The kicker is that, while it’s unlikely to get into the business of building skyscrapers by accident, it’s incredibly easy to find yourself storing user data long before you’ve laid out decent privacy and security practices: Sony built game consoles, and then one day they were suddenly storing user data. It’s also far too common for great software engineers to deceive themselves into thinking that securing user data is not so hard, because hey, they would never be as stupid as those Sony engineers.

So, am I excusing Epsilon, Sony, and Apple? Not at all. But if we keep thinking that they were just stupid/evil, then we are far from understanding and fixing the problem.

I’ve just finished reading Atul Gawande’s The Checklist Manifesto, which I strongly recommend. As industries mature (flying airplanes, practicing medicine, building complex software systems,…), they must build in processes to counteract inevitable human weaknesses. There’s bound to be resistance from experienced practitioners who see the introduction of process as insulting to their craft. Programmers are, in this sense, a lot like doctors. But it’s time to stop being heroes and start being professionals. Storing user data safely is easy until it’s not.

We are constantly fighting nature to meet our stated goals: we don’t want buildings to fall, disease to kill us, or private information to leak. For a little while, it’s okay to fail catastrophically and act surprised. But eventually, these failures are no longer surprising, they’re just negligent. That time of transition for software architects is now. Every company that dabbles in user data should assign a dedicated security and privacy team whose sole responsibility is to protect user data. We will not eliminate all failures, but we can do much, much better.

grab the pitchforks!… again

Filed under: crypto,data,privacy,web — April 19, 2011 @ 12:49 pm

I’m fascinated with how quickly people have reached for the pitchforks recently when the slightest whiff of a privacy/security violation occurs.

Last week, a few interesting security tidbits came to light regarding Dropbox, the increasingly popular cloud-based file storage and synchronization service. There’s some interesting discussion of de-duplication techniques which might lead to Oracle attacks, etc., but the most important issue is that, suddenly, everyone’s realizing that Dropbox could, if needed, access your files. Miguel de Icaza wonders if Dropbox is pitching snake oil.

Yes, Dropbox staff can, if needed, access your files. I don’t mean to harp on my fellow technologists but… this has been obvious since day 1, because Dropbox offers a web-based interface to download your files, and even with the latest HTML5 technology, you’d be very hard-pressed to do in-browser file decryption. Let’s say you still don’t buy that, you still think that Dropbox might find a way to encrypt files and decrypt them in your browser. Dropbox also offers a password recovery mechanism, which means they can fully simulate you, the user, including, of course, getting at your files.

In other words, unless you’re ready to lose the convenience of password resets and web-based UI, Dropbox inherently has access to your files. Just like Facebook has access to your entire account, and Google to all of your docs, spreadsheets, etc. The only question is what kinds of internal safeguards do these companies have to prevent abuse by employees. Unless you’ve worked there, it’s hard to know. You could ask Dropbox to do third-party auditing, like Miguel proposes, but in my experience that provides little real security, since you have little way to know what that third-party actually did as part of their auditing (was it just “logic and accuracy” testing?)

The other thing we could ask is for the law to finally recognize that my files stored on Dropbox are no different than my files stored on a hard drive in my basement, from a legal perspective. They’re my property. And accessing them should require the same level of judicial oversight as a warrant to my home. That’s what a group of young MIT techies (myself included) and Harvard lawyers proposed in 1998.

But back to Dropbox. Did they do something wrong? Yes, they did. They exaggerated their security and privacy claims. Just like almost every other cloud data host today. I wish, instead of picking on whichever startup suddenly succeeds, we picked on the industry as a whole. Stop talking about encryption in transit and encryption at rest in the same breath, as if they were the same thing. Stop using “encryption” as a synonym for “secure.” Stop saying “military-grade security.” Start being honest about who can access what.

And we, technologists, should stop with the drama, and not fall prey to the inflated expectations that marketing-heavy security policies have set. The Dropbox weaknesses should have been obvious to technologists from day one. The problem is that all privacy policies and security statements make exaggerated claims using reassuring keywords. Let’s harp on that.

intelligently designing trust

Filed under: crypto,policy,security,web — March 30, 2011 @ 12:44 am

For the past week, every security expert’s been talking about Comodo-Gate. I find it fascinating: Comodo-Gate goes to the core of how we handle trust and how web architecture evolves. And in the end, this crisis provides a rare opportunity.

warning signs

Last year, Chris Soghoian and Sid Stamm published a paper, Certified Lies [PDF], which identified the very issue that is at the center of this week’s crisis. Matt Blaze provided, as usual, a fantastic explanation:

A decade ago, I observed that commercial certificate authorities protect you from anyone from whom they are unwilling to take money. That turns out to be wrong; they don’t even do that much.

A Certificate Authority is a company that your web browser trusts to tell it who is who on the Internet. When you go to https://facebook.com, a Certificate Authority is vouching that, yes, this is indeed Facebook you’re talking to directly over a secure channel.

What Chris and Sid highlighted is an interesting detail of how web browsers have chosen to handle trust: any Certificate Authority can certify any web site. That design decision was reasonable in 1994, when there were only two Certificate Authorities and the world was in a rush to secure web transactions. But it’s not so great now, where a Certificate Authority in Italy can delegate its authority to a small reseller, who can then, in turn, certify any web site, including Facebook and Gmail, using more or less the level of assurance the small reseller sees fit.

what happened

It looks like someone from Iran hacked into one of the small resellers three degrees of delegation away from Comodo to issue to some unknown entity (the Iranian government?) certificates for major web sites, including Google and Microsoft. This gave that entity the power to impersonate those web sites, even over secure connections indicated by your browser padlock icon. It’s important to understand that this is not Google or Microsoft’s fault. They couldn’t do anything about it, nor could they detect this kind of attack. When Comodo discovered the situation, they revoked those certificates… but that didn’t do much good because the revocation protocol does not fail safely: if your web browser can’t contact the revocation server, it assumes the certificate is valid.

a detour via Dawkins, Evolution, and the Giraffe

Richard Dawkins, the world-famous evolutionary biologist, illustrates the truly contrived effects of evolution on a giraffe. The laryngeal nerve, which runs from the brain to the larynx, takes a detour around the heart. In the giraffe, it’s a ludicrous detour: down the animal’s enormous neck, around the heart, and back up the neck again to the larynx, right near where the nerve started to begin with!

If you haven’t seen this before, you really need to spend the 4 minutes to watch it:

In Dawkins’s words:

Over millions of generations, this nerve gradually lengthened, each small step simpler than a major rewiring to a more direct route.

and we’re back

This evolution is, in my opinion, exactly what happened with certificate authorities. At first, with only two certificate authorities, it made sense to keep certificate issuance as simple as possible. With each added certificate authority, it still made no sense to revamp the whole certification process; it made more sense each time to just add a certificate authority to the list. And now we have a giraffe-scale oddity: hundreds of certificate authorities and all of their delegates can certify anyone, and it makes for a very weak system.

This isn’t, in my mind, a failure of software design. It’s just the natural course of evolution, be it biology or software systems. We can and should try to predict how certain designs will evolve, so that we can steer clear of obvious problems. But it’s very unlikely we can predict even a reasonable fraction of these odd evolutions.

the opportunity

So now that we’ve had a crisis, we have an opportunity to do something that Nature simply cannot do: we can explore radically redesigned mechanisms. We can intelligently design trust. But let’s not be surprised, in 15 years, when the wonderful design we outline today has evolved once again into something barely viable.

taking further example from nature?

Nature deals with this problem of evolutionary dead-ends in an interesting way: there isn’t just one type of animal. There are thousands. All different, all evolving under slightly different selection pressures, all interacting with one another. Some go extinct, others take over.

Should we apply this approach to software system design? I think so. Having a rich ecosystem of different components is better. We shouldn’t all use the same web browser. We shouldn’t all use the same trust model. We should allow for niches of feature evolution in this grand ecosystem we call the Web, because we simply don’t know how the ecosystem will evolve. How do we design software systems and standards that way? Now that’s an interesting question…

i changed my mind on nuclear power

Filed under: policy — March 16, 2011 @ 8:14 pm

Until this recent catastrophe in Japan (it’s awful, please consider helping out), I was very pro nuclear-power. I’ve never been afraid of technology, and I was raised in France, where 80% of electricity comes from nuclear power and there has been no serious safety problem with it. Plus, nuclear power can be green. And with newer technology, it can be made passively safe, where even if everything fails, a meltdown cannot occur (unlike the Japanese reactors, unfortunately.)

So the recent crisis has changed my mind. I don’t think we can afford the risk of nuclear power. I’m not a nuclear power expert, and I would welcome counter-arguments. But I am fairly well versed in thinking about risk and risk mitigation. Three things now worry me greatly about nuclear power:

  • Dramatic outcomes: in case of dramatic failure, the outcome could be disastrous on a scale that’s difficult to comprehend. You think the oil spill in the Gulf of Mexico was bad (and it was)? Try decades or centuries of life-killing radioactivity. Imagine a meltdown that could contaminate large, heavily populated areas. The damage could be enormous. Yes, the probability is very, very low. But as we are seeing today in Japan, it’s far from zero, and if they had not reacted as well as they did, the result could be indeed as bad as I describe here. (To folks I work with on voting technology: isn’t this what we worry about regarding Internet voting for public office? That the outcome of an attack would be dramatically bad, not matter how low the likelihood?)
  • Storing nuclear waste: a friend on Facebook said “if Romans had used nuclear power, we would still be guarding their nuclear dump sites.” Think about that for a second. That’s just breathtaking. Are we ready to impose on our descendents 1000 years from now? We can barely figure out broad swaths of history from that long ago, let alone instructions on how to safeguard nuclear materials. Maybe it can be done. But it seems incredibly arrogant of us to assume that it’s okay to impose this burden on the next hundred generations.
  • Regulation (or lack thereof): this is my most pragmatic point, and it applies mostly to the US. We can’t even get our act together in this country to agree on requiring relief wells for deep-water oil drilling. Do we really think we can get our act together to regulate a nuclear industry to be truly safe? It looks like even Japan couldn’t quite do it, and they’re far more open to government safety regulation than we are.

So, I’m open to others’ arguments. But right now, I’m thinking nuclear power is not such a great idea.

degrees of trust: software vs. data hosts

Filed under: privacy,web — March 16, 2011 @ 4:14 pm

Overjoyed by all the SSL goodness around me (Twitter offers SSL-only as an option, so does Facebook, Google offers 2-factor auth), I started dutifully upgrading my web browsing experience on Firefox, specifically installing the EFF Add-On that turns on HTTPS everywhere it can, in particular when using Google (it uses encrypted.google.com by default). I googled myself to test it out, and I found this interesting blog post by CSS Squirrel from a few months ago, in regards to the issue I have with Opera Mini.

CSS Squirrel says:

Ben Adida offered the following question as a counter: “Does privacy matter? Cause Opera Mini proxies all of your connections, even SSL, via its servers.” It’s a valid question, especially considering his expertise in the field of privacy and security.

Actually it’s a valid question regardless of my credentials :)

Not being an expert on how Opera does things, I poked at both Bruce Lawson and Molly Holzschlag, both Opera employees.

Both of them said “If you don’t trust us (Opera), then don’t use the service.”

[...]

So is Opera Mini fast? Yes. Is it secure? Yes.

But that’s not good enough. Trust is not a simple yes/no concept. I trust my dog walker to come into my home, walk my dog, and not go opening up drawers to find my medical records. But I’m not going to leave my medical records out in the open either, cause that’s just asking for trouble. I trust that the Opera browser, installed on my machine, is not phoning home my personal data, because that would be a huge breach of expectation. But if I use Opera Mini and all of my data is being shipped to Opera on every HTTP call, do I trust them never to look at it? Do I trust their security system to be so good that they won’t ever be hacked?

There are degrees of trust. I trust that most reputable installed software won’t phone home with my data. I trust that some data hosts won’t analyze my data too deeply, but certainly many will. And I’m pretty sure many data hosts will get hacked or will leak data unintentionally. So, it’s unreasonable to judge your software publishers and data hosts with the same degree of trust. There isn’t enough of a taboo against data hosts perusing your data. Facebook is mining our data, everyone knows it, and our general reaction is “oh well, what are you gonna do.” But if Microsoft Word scanned your hard drive and shipped your personal info back to Redmond, you’d be looking for a pitchfork right about now.

Opera Mini is misleading because it presents itself as an installable piece of software, when in fact it is almost a data host, and the degree of trust one should consider, when using Opera Mini, is a lot higher than that which is implied by their packaging.

benadida@mozilla

Filed under: personal — March 7, 2011 @ 6:44 pm

In a few days, I’ll be joining Mozilla.

What started as a fun lunch with Sid and Alex quickly turned into passionate brainstorming with Mike, Pascal, and Lloyd on the Mozilla Labs team. I told them I wanted to deeply explore a few ideas I’ve written about and prototyped (here and here, for example) and more importantly to work on making the browser a true user agent working on behalf of the user. Mozilla folks are not only strongly aligned with that point of view, they’ve already done quite a bit to make it happen. Check out Mike Hanson’s post just a few days ago on using Web Applications for Service Discovery. And check out what the entire team just released: Open Web Apps. Not to mention the less closely related but still totally awesome work Sid and Alex are doing with Do Not Track. This is the first effort I know of that is successfully using technology to declare a preference (“please don’t track me”), which can then be leveraged by policy makers to ensure that it is respected. As a long-time student of the interplay between tech and policy, I love this hack.

My job will be to join this fantastic team and see what I can contribute. I suspect that will involve some privacy, some crypto, some data portability, and a lot of web hacking. I’ll continue to blog here at benlog.com, though when I speak here I’m speaking for myself only, not on behalf of my soon-to-be employer. And I’ll continue to hack a bit on Helios, my voting system, especially as it continues to inform how one might do advanced crypto in a web browser.

I’m super excited.

Jumpstarting Health IT innovation

Filed under: health — March 3, 2011 @ 6:21 pm

Until last month, I was lead architect on the SMART Project at Harvard Medical School and Children’s Hospital Boston (now I’m an advisor). One key issue that all Health IT folks grapple with is how to make the Health IT ecosystem more dynamic and innovative, because technology in that space moves so slowly. The SMART Project is one attempt to jumpstart Health IT innovation.

If you’re interested in this stuff, you might want to read the blog post I wrote on the SMART site about how SMART addresses the Presidential Report on Health IT. Key ideas:

  1. PCAST very eloquently identifies lack of interoperability as the central problem of healthcare IT. To spark innovation, PCAST proposes a universal exchange language, with atomic data elements and a strong metadata philosophy for privacy and provenance. We wholeheartedly agree.
  2. One of PCAST’s suggestions is to eschew efforts to define universal semantics, leaving to the market the task of semantic harmonization. While we agree that the market will significantly contribute to semantic harmonization, our experience indicates that an organized initial foray into standardizing semantics for common, well-understood health data is critical to getting the ball rolling.
  3. With SMART, we are building such a universal health language — inspired by existing standards and built on existing coding systems — and empowering with it a modern, web-based application ecosystem. We specifically address PCAST recommendations of data atomicity, metadata tagging, and semantic extensibility, while simultaneously addressing what some have identified as weaknesses in the PCAST report, notably the risk of incomplete patient context when working with disaggregated atoms of data.

More details over at the SMART blog.

everything I know about voting I learned from American Idol

Filed under: crypto,voting — March 2, 2011 @ 12:54 am

Tonight, American Idol began online voting. Yes, I’m a fan of American Idol, but don’t let that fool you: I’m still a bitchin’ cryptographer. I suspect that American Idol online voting will give rise to many questions such as “wow, awesome, now when can I vote in US Elections with my Facebook account?” and “Why is online voting so hard anyways?” Perhaps I can be of assistance.

the voting process

So the process is much like other Facebook-connected sites: using Facebook Connect, you log in and grant the American Idol Voting site some permissions, including reading your profile info (ok), getting your email address (ok I guess), and accessing your Facebook data even if you’re offline (ummm, why?). Then you select your favorite contestant, solve a CAPTCHA, and click “vote”. You’re prompted to post the vote to your Facebook feed, and told you can vote up to 50 times.

My first question was “what’s the CAPTCHA defending against?” I have some thoughts on that, which I’ll get back to…

“a secure solution”

The news that American Idol would use online voting was reported with enthusiasm:

“We have been wanting to do online voting for several years, and now Facebook has offered us a secure solution and we are ready to go,” said Simon Fuller, Creator and Executive Producer, American Idol.

So what does that mean, exactly? What guarantees do American Idol producers have that the system is “secure?” Hard to say. But let’s explore a few possibilities.

ballot secrecy and coercion

American Idol voting is not secret: your vote is posted to your Facebook newsfeed! Of course, unless you’re a contestant’s mother, chances are no one’s going to be upset at you if you don’t vote “the right way.” In political elections, and in fact in many elections where the outcome impacts voters in a material way, ballot secrecy is important, and undue influence of voters is a concern. That’s what makes things particularly difficult in “real” online voting: you should receive some believable proof that your vote was counted properly, but somehow that information can’t be leaked to others who might try to influence you, waiting to see how you voted to decide whether to pay you or break your kneecaps.

one user = 50 votes?

The voting itself is happening on the American Idol site, not on Facebook, so what American Idol is getting from Facebook is mostly the identity layer: to vote, you must have a Facebook account. Between that and the CAPTCHA, it’s probably fairly difficult for an individual user to have disproportionate influence. I have a feeling that’s why they allow individual voters to vote up to 50 times and require a CAPTCHA. After all, if any user can vote 50 times, but the process is fairly time-intensive, how worthwhile is it to register more accounts so you can vote more than 50 times? If voters could legitimately vote only once, then it would be very enticing to create a few fake Facebook accounts to easily quintuple your impact. But to just double your impact with 50 legal votes each, you’re going to have to manually fill out 50 more CAPTCHAs. Eh. Not worth it, right?

In other words, I think the 50 votes per person + CAPTCHA produce the great equalizer: almost no one is going to bother trying to find ways to cast more votes, because the payoff isn’t worth the pain. Clever!

verifying the tally

In typical secret ballot elections, it’s quite hard to check that the tally was properly computed. After all, once the vote is submitted, via web, SMS, or phone, the tallying process is visible only to the organizers, and the voters must trust that process blindly. Now, physical in-person elections have admittedly only a little bit more auditability: you can kind of watch the ballot box and, if you’re really motivated, stick around to see the ballots counted. But in the online voting space, unless you’ve got some fancy solution, the process is totally opaque.

Except… voting for American Idol isn’t secret! So, technically, the tally could be recomputed from culling together all of the Facebook newsfeed posts…. And that’s actually a key insight into how the fancy truly auditable voting systems work: all of the votes are published for the world to see, in a special encrypted form that doesn’t reveal individual votes but can be intelligently combined and checked against the claimed tally. That’s what systems like Helios do.

was my vote captured correctly?

If you post your vote to your Facebook newsfeed, you can verify that it was recorded correctly. But what if something hijacks your browser, waits for you to log into Facebook, casts votes on your behalf (waiting for you to fill out the CAPTCHA or outsourcing it to some CAPTCHA solving farm), and opts not to post the results to Facebook? How can the American Idol producers ever detect this? They probably can’t.

The simplest way one might hijack your browser is via a technique called clickjacking: by wrapping the voting site in an HTML frame and layering a different user interface on top of it, a malicious site could trick you into voting for a different contestant than you intend. For example, the attacker might wait for you to cast your first vote freely, find out who you like by looking at your Facebook wall, and then switch the order of the candidates (by layering new photos on top of the underlying real site) to trick you into voting for a different candidate the other 49 times. Now, to American Idol’s credit, my quick-and-dirty attempt to frame their site and implement clickjacking failed: they’ve got some basic defense against clickjacking that I’m still investigating. Nice work! But of course, attacks that hijack the user’s browser can be much more intricate, including deploying and spreading a virus that takes full control of the browser and its display. There’s absolutely nothing a web site can do to defend against that.

And that, in fact, is the key issue we don’t know how to address when voting online in elections that have a high material impact. We don’t know how to make sure that your browser is really working on your behalf and hasn’t been hijacked by malware. It probably wouldn’t happen for American Idol (or would it?), but it surely would happen when voting for US President.

a personal update

Filed under: personal — January 30, 2011 @ 11:02 pm

Tomorrow (Jan 31st) is my last day on the Research Faculty at Harvard Medical School and Children’s Hospital Boston. It’s been a fantastic ride thanks entirely to the folks with whom I had the pleasure of working, in particular Zak Kohane and Ken Mandl. Ultimately, I finally noticed what was staring me in the face: I love building software systems, and the right place for me to do that now is industry. I’m no stranger to it, and I’m excited to be back.

I’m taking two weeks off. I won’t be blogging or tweeting (much). I’ll be digging into a very thoughtful gift just received (what timing!): the Flour Bakery recipe book. If you live in Boston and you haven’t been to Flour, you’re simply missing out (or you don’t like baked goods, which is just too sad to think about.) This week’s goal, currant scones. Should be at least interesting, maybe delicious.

As to what I’m doing next… that’s for a blog post to be written 2 weeks from now. I’ve had some fantastic discussions with amazing people these last few weeks, and there are a few more conversations to be had before the picture is truly filled in. See you on the flip side of a few days spent with family and friends.

the difference between privacy and security

Filed under: privacy,security,web — January 26, 2011 @ 11:51 am

Facebook today rolled out new security features, both of which are awesome: SSL everywhere, and social re-authentication. True, SSL everywhere should probably be a default, even though I continue to believe that the cost is significantly underestimated by many privacy advocates. Regardless, this announcement is great news.

The only nitpick I have, and I point it out because I think it’s significant in Facebook’s case, is that the announcement confuses privacy and security. The first paragraph mentions Data Privacy Day, then the general concept of controlling your data, then transitions to the new security features. But those are quite different.

Security is about stopping the bad guys from stealing your data. Privacy is about controlling the good guys’ handling of your data. (Ron Rivest is said to have phrased this most eloquently, but I can’t find his quotation.)

So, SSL and social re-authentication provide security because they prevent bad guys from seeing your network traffic at the coffee shop or stealing your login. That’s fantastic, but it has little to do with privacy. If Facebook wanted to celebrate Data Privacy Day specifically, they might consider giving users more control over their data on Facebook. Maybe letting users control who gets to tag them in photos (i.e. not my stalker). Or letting users indicate fields by which advertisers cannot target them (i.e. sexual orientation.) Those would be privacy features.

I don’t mean to knock Facebook’s announcement: it’s great. But it’s about security, not privacy.