2007: Controlled End-User Web APIs for Private-Data Mashups

As far as technology goes, 2007 will be about web security. With everyone storing more and more personal data on various web sites, and with the continuing innovation of mash-ups, it’s inevitable. And it won’t be the web security issues of the last few years, either, it will all be about how to do private-data mash-ups securely.

Case in Point: Google just patched a serious security problem that allowed an Evil Web Site (EWS) to access your gmail contact list, as long as you were logged into gmail and you simply visited the EWS. The root cause: Google wanted one of its web applications to be able to integrate with your gmail contact list, and their chosen implementation was to use your browser as the transfer point. This makes sense, because your browser is likely already authenticated to gmail, so there’s a nice abstraction of authentication going on here: the application that wants to integrate with your gmail contacts simply instructs your browser to fetch it, and if your browser is already authenticated, then it succeeds.

Of course, the problem is that, if a web site can instruct your browser to fetch data, e.g. your contact list, from a remote service, e.g. gmail, then you’ve got a problem: an EWS can do the same thing behind your back. Specifically, here’s the data flow we need to carefully consider:

  1. Alice visits Web Site One, which runs some code, call it Application One, inside Alice’s browser.
  2. Application One then contacts Web Site Two for some information, possibly using Alice’s credentials transparently (e.g. a cookie.)
  3. Web Site Two returns some data to Application One running inside Alice’s browser.
  4. Application One takes this data and makes use of it inside Alice’s browser for some purpose.
  5. Application One takes this data and submits it back to Web Site One.

If we allow this complete loop to take place, then Web Site One can steal all sorts of private data that Alice has stored at Web Site Two. At the same time, there are some very good reasons for allowing some of this data flow to occur. One use case is already well understood: Web Site Two might be a public information source, like Google Maps, with which a mash-up is much desired. Another use case is the contact-list example, where Web Site One would like to integrate with Alice’s private contact list at Web Site Two.

The interesting question is: how do we allow the contact-list integration without enabling wholesale private data theft? Two possible solutions come to mind:

Quarantine the Program: in some cases, it might be possible to quarantine Application One so that, once it starts talking to Web Site Two, it can no longer contact Web Site One (or any other web site) again: this effectively blocks step 5 in the data flow. This approach has limited applicability: in the contact-list example, you obviously want to do something with the contact list once it’s been loaded, and that probably involves some feature at Web Site One. There may be some edge cases where quarantine is actually useful (and doable), but they will remain edge cases.

Control the Cross-Application Data Transfer: The data transfer from Web Site Two to Application One should be explicit. In other words, in step 3, Web Site Two should know that the request is coming from Application One, should have the ability to interact with Alice directly (without letting Application One mediate), e.g. to let her select the contact she wants to use, and to explicitly return the data Alice chose to share with Application One.

I think 2007 will see an enormous rise in technologies and tricks to enable this second option: Controlled Cross-Application (aka Cross-Domain) Data Transfer. There are some existing ideas in this realm:

My feeling is that the messaging approach of the WHAT WG is the right way to go, with some cross-domain iframe hacks while the details of the specs get fleshed out. Flash is interesting, but HTML/Javascript is still the easier (and more open) way to develop web applications.

But 2007 will bring one major pitfall along with this: many folks continuing to forget why the cross-domain restriction is there in the first place. You can start to see this with people claiming that the dangerous data flow above can be broken by removing pre-existing authentication from cross-domain requests. This argument makes two large mistakes:

There are other inherent authentication mechanisms that are undetectable by the browser: many corporate intranets serve data to machines within their network without authentication, since the requests are coming from inside the network. There are also many publishers who offer site licenses to universities by white-listing their IP-address space. There’s no way for the browser to know when it’s accessing an intranet or internet service, so allowing cross-domain requests is like poking a hole in all corporate firewalls, in all home network firewalls, etc… It simply will not happen.

The browser as the focal point for authentication is a good idea: having authentication handled by the user’s browser is a really good thing. It means you can begin to abstract out authentication: Application One makes the call to Web Site Two, lets the user authenticate there and select the data to send back to Application One, and Application One never needs to understand Web Site Two’s authentication mechanism. It also means the user is in control: if Application One can contact Web Site Two without going through my browser, that’s a significant loss of control over my private data.

So that’s my prediction for 2007: a big explosion in controlled cross-domain technologies. Think of them as end-user Web APIs: assembling web application components via the browser with end-user control. It’s going to be exciting, and there are going to be enormous icebergs along the way.

2 thoughts on “2007: Controlled End-User Web APIs for Private-Data Mashups

Comments are closed.