Web 2.0 Security & Privacy Workshop

Today, I was at the IEEE Web 2.0 Security & Privacy Workshop, where I presented a short position paper on extending the web browser to enable secure private-data mashups. I started the day not sure what to expect: maybe a day-long complaint about how web 2.0 concepts are insecure and we need to stop and think, or a slew of interesting new proposals. I had purposely ignored the posted papers: I wanted to get the authors’ pitch first.

My conclusion: academics have just crashed the web security space. The amount of interest is exploding, the level of knowledge has vastly increased. There remains a gap between what academics want to see happen (e.g. “we need to stop using ‘eval’”) and what practitioners will accept, but this is the launch of a healthy new research area.

Session #1: Broad Issues and Opinions

Paul Karger from IBM Research worries that mashups legitimize man-in-the-middle attacks. I think Paul’s point is too vast and condemns too much of many exciting web applications. However, there is reason to worry about the proxying of login credentials, where sites ask for your Google password so they can access your address book. Sites that host your data need to think about building better APIs to enable secure mashups.

Michael Steiner, also from IBM Research, reviewed the various security issues faced by mashups. Overall, the complaints are correct, but I worry that the proposed recommendations are a bit too vague. I tried, in my paper, to make more precise recommendations. But Michael’s point are generally important, and anyone exploring web mashup security issues would do well to read this paper.

Zulfikar Ramzan from Symantec (who was in grad school with me at MIT a few years ago) explored the nitty gritty details of various advanced attacks. He focused on drive-by pharming, where an attacker can corrupt the DNS settings of your home broadband router by exploiting a web security bug in its management interface. He quickly mentioned the disastrous effect of a certain cross-site scripting bug found on Google and in Google Desktop (which have since been fixed.) I liked that he got into the details of the attacks, because details are always stronger change motivators than generalities. His claim that things are about to get worse is, in my mind, obviously true.

Session #2: Models

Michael Hart from Stonybrook University talked about access control and the potential for privacy violation with current access models, e.g. MySpace photos resulting in job loss. The proposal is Content-Based Access Control (CBAC), e.g. “don’t let parents access pictures from parties.” This is a great idea, though there is significant overlap with MIT’s Policy Aware Web (PAW) project. This proposal seems to be a bit more automated, using tagging and such to help determine rules. The overlap with PAW is clear, though, given their prototype under development, Policy-aware Blogging (PLOG). I’ll point this out to them so they can contrast their approach with PAW. (A secondary part of the talk addressed monitoring of Wikipedia, though it wasn’t entirely clear to me how that was linked to the CBAC proposal.)

Sebastian Gajek from the Horst Görtz Institute for IT-Security pitched the idea of formal cryptographic modeling of browser-based protocols. This is great stuff, and it’s a good idea to open up and pursue this area of research. That said, I’m pessimistic: even basic networking protocols eschew formal modeling, because the resulting protocols are often too inefficient. Sebastian claims they can model user behavior in their model…. that’s interesting. I have a feeling that most of the audience isn’t listening, though. Cryptographic protocol modeling and web 2.0 crowds don’t mix well. But this is a valiant attempt.

Sachiko Yoshihama from IBM Research Tokyo reviewed a few types of attacks and proposed tweaks to the browser security model. The first suggestion is a policy-based “channel”, where different modules from different origins can communicate with one another along certain well defined guidelines, with a pub/sub architecture. This sounds similar to HTML5’s Cross-Document Messaging proposal. The second suggestion is to use data tainting to monitor the information flow and prevent cross-site scripting attack. I just don’t see how this can be done without breaking too many existing sites. A third suggestion appears to be JavaScript isolation/namespaces, which is similar to one of my proposals.

Lunch Featured Talk: Rob Franco from Microsoft

Rob Franco from Microsoft gave the featured keynote talk over lunch. He covered the security architecture differences between IE6 and IE7, specifically regarding isolation of attacks and reducing code to least-privilege execution. He pointed out the phishing detector in IE7, which pings a central security service if a site fits a few phishing heuristics. Rob claims 35 million prevented phishing attacks using this filter. An interesting question from the audience: “can an attacker use the service as an oracle to design a good phishing attack?” I think not, since the center is simply a blacklist, but an interesting question nevertheless.

Rob also discussed Windows Cardspace, Microsoft’s new identity solution. He live-demoed CardSpace, including the UI that shades the whole screen and focuses the user on the authentication process (I like this a lot.) He discussed how CardSpace introduces public-key crypto “under the covers.” Users don’t realize that they are using public-key crypto, their experience is simply more secure. As Ben Laurie has pointed out before, the issue here is that users now sign their logins, which has significant privacy implications: logins are now non-repudiable.

Rob then discussed Helen Wang’s research, including BrowserShield, a JavaScript sanitizer, which fixes JavaScript attacks against browser flaws at the proxy level, by effectively rewriting the JavaScript. BrowserShield is nice in that it can be implemented as a separate local firewall that filters all incoming JavaScript, to eliminate vulnerabilities, while the browser is still being patched. Strider Monkeys are another interesting approach: monkey programs mimic humans surfing the web to catch attacks as early as possible. One audience member asked about giving each user a strider monkey that “looks ahead” to various links and sees what might be a problem. Interesting. And cute, too: “your monkey says this site is unsafe!”

Session #3: Architecture

Benjamin Livshits started out by saying that we need to help developers build more secure software. I fully agree. He begins by saying that “Default is Unsafe”, and uses cross-site scripting as an example. He adds that writing safe web code is not trivial. This is clearly true. He points to the XSS cheatsheet, which is a mind-blowing collection of ways to get around XSS filters. He advocates framework-supplied safe defaults, though he notes that client-side enforcement is not clear. Benjamin then proposed a few safe defaults:

Declare certain portions of the page as “JavaScript-free” (for XSS prevention when content is user-contributed.) This is interesting, though one then has to be careful to close all contained tags correctly, and it’s not clear that the containment mechanism will be reliably implementable in a JavaScript toolkit.
Isolation of widgets in the same HTML document. This seems quite hard, and it really assumes that the widget can’t somehow override the underlying toolkit’s JavaScript.

Collin Jackson points out that there are other ways for widgets to “break out”, including stylesheets, deceiving forms, etc… (e.g. MySpace widget that prompts you to log in but sends your credentials somewhere else.)

Michael Steiner came back to talk about mashup component isolation via server-side analysis and instrumentation. A server-side analyzer restricts tree-walking, maintains invariants and integrity. A code rewriter performs namespace isolation and other rewriting. I’m a skeptic of such analyzers and rewriters, but I’m no expert, so may be interesting. Ahah, indeed, this approach requires banning “eval”, Flash, and Java.

Then I presented. (More on that later.)

Stanislav Malyshev from Zend presented PHP server-side security approaches. Input filtering, watching for XSS, etc… He mentioned data tainting, noting that the implementation is complicated and has a performance effect. Static code analysis was considered, though the obvious shortcomings are noted given the dynamic and typeless nature of PHP. Dynamic analysis might do interesting things, but it has performance implications and requires significant PHP engine modifications.

Session #4: Trust & Deception

Richard Chow from PARC talked about inference using the web. “The web is a proxy for all of human knowledge.” For example, an FBI redacted document can be reconstituted by Googling the nearby un-redacted terms. If the words “Saudi magnate” appear un-redacted near the redacted term, then a Google search tells you the hidden content is “Bin Laden.” By comparing search results for “Saudi magnate” and “Bin Laden Saudi magnate”, one can test the merit of the potential “Bin Laden” answer. The team extracted and stemmed “HIV”-associated terms, trying to figure out what terms in a health record would be associated with HIV. They checked their results of related terms with a medical doctor, and found that they got a lot of the terms right.

(As a funny aside, the team discovered that “Montagnier” was associated with HIV, but the medical doctor they consulted didn’t understand why. Most French people, like myself, know that Prof. Luc Montagnier discovered the HIV virus, but there was a long conflict between him and Robert Gallo, a US professor, who had claimed the discovery and whom many Americans associate with the discovery. Gallo has since conceded that Montagnier discovered the virus, but one has to wonder if this fact might be less well known to US doctors. The web, however, knows.)

Johannes Helander from Microsoft Research talked about a framework for reasoning about trust. Annarita Giani from UC Berkeley talked about detecting deception in the context of web 2.0, specifically the concept of “cognitive attacks,” where the user’s attention and perception are exploited for nefarious purposes. I followed these talks a bit less than the previous ones, simply because of conference fatigue.

Discussion

We discussed the future of the conference. I pointed out that web 2.0 might not be a great term to keep, we should just go to “web security.” One person commented that web 2.0 has specific security issues in terms of user contributions. I don’t quite agree. I think the security principles distill down to “the user shouldn’t get screwed.”

Lots of tech topics brought up: authorization, trust/reputation of mashups, offline apps, new technologies (e.g. silverlight), mobile devices, enterprise, evolution of cookies and HTTP, identity and trust, accountability, privacy, search engines, security standards for w2.0 APIs, impact/interaction of privacy policies, security appliances, financial mashups, TPMs and the web, mashup resistance…. no time to discuss any further, the workshop wraps up!

UPDATE: I looked up the Wikipedia entry on Robert Gallo, and the page claims that Gallo and Montagnier agreed to share credit for the HIV discovery. That is not my memory of the events, but I may well be mistaken.

Comments

3 responses to “Web 2.0 Security & Privacy Workshop”

leon @ trusted-id» Blog Archive » Benlog » Web 2.0 Security & Privacy Workshop

May 28, 2007

[…] Benlog » Web 2.0 Security & Privacy Workshop […]
Shadi

December 16, 2007

Hello

Do you know if the current anti phishing based on web site heuristics are going to work on pages which are designed by HTML5 as some new tags are added and some format of older tags are changed? Or the anti phishing tools should be updated based on the new formats?

please email me the answer what ever you think
Shadi

December 16, 2007

Hello

Do you know if the current anti phishing based on web site heuristics are going to work on pages which are designed by HTML5 as some new tags are added and some format of older tags are changed? Or the anti phishing tools should be updated based on the new formats?

please email me the answer what ever you think