Protecting against web history sniffing attacks: an alternative

When a web site links to another web site, the link appears in a different color, usually a lighter shade of blue, if you’ve already visited the site. Unfortunately, this means that a malicious web site can learn what sites you visit by putting up a few links and checking to see how your browser is rendering them. Arvind explained the shockingly bad outcome of this small flaw a few weeks ago.

Today, Mozilla is proposing an interesting way to “plug” this leak, by attacking the problem from both ends. First the style changes for visited links are now limited: you can’t change the font-size of a visited link. Second the web page can no longer fully introspect and discover those small style variations that are still allowed. It’s really fantastic to see Mozilla working on this particularly nefarious issue, which has been one of the elephants-in-the-room of web security for the last few years.

But I’m not sure this approach is the right one. It is, exactly as Mozilla put it, a “plug” for a leak. It doesn’t really address the essence of the issue, and I suspect clever attackers will find other, smaller leaks to exploit. Meanwhile, setting the precedent that the browser will now fake some of its rendering information to the page’s own JavaScript is a little bit odd.

The core issue is that a web page is allowed to use private information it should never have access to as a kind of black-box processor on its own rendering. That may have been fine when web pages were static content, but now that web pages are full-fledged programs that can attack these black boxes a thousand times a second without the user noticing, it’s a problem.

An Alternative: tweaking the meaning of ‘visited’

So here’s my proposal (which may well have been mentioned by others before me.) a proposal I thought I’d come up with, but really is just a subconscious reappearance of work by Collin Jackson, Andrew Bortz, Dan Boneh, and John Mitchell: safehistory.

A browser should consider a link “visited” depending on where that link appears. If I’ve clicked on cnn.com/stories/123 from fooblog.com, then the next time that link is shown on fooblog.com, it should appear as visited. But if that same link appears on barblog.com, then it should simply be considered a new link.

That way, a web page only has access to information that it technically already could have collected itself, by tracking the outgoing clicks on its site. No more black-box access to private data that it can manipulate thousands of times a second. No more leaks to plug (in this context). Conceptually, this is cleaner and much more in line with the security model we need for the Web.

Also, I have a feeling that this might be a better mental model of how most people think of visited links. A visited link, in this new model, means “I have already followed this path from A to B,” rather than “I have already seen B via some other site C.” After all, how often do people find the same link from two different sources at different domains? It happens, of course, but how often is it useful to know that this link was visited via a different path when the anchor text at both sites will almost certainly be different, making the information that you’ve already visited this site fairly vague (which site was it?)

But what about link aggregators, like reddit, digg, etc…

Some sites just aggregate tons of links to stories around the web, and in those cases you might want to know that you’ve already seen the story via some other link-aggregator or even on your own. Now, that need may be a lot less important than you think initially: often links from aggregators are customized with outgoing link-tracking, referrer codes, etc.. that actually prevent visited-link-highlighting from activating anyways. But, this may be a legitimate case where advanced users want to know that they’ve seen these stories before. In those cases, I can imagine a “super-referrer” whitelist in the browser, where certain sites are trusted not to abuse their ability to use your history black-box rendering processor. Advanced users would have to add the link aggregators they trust to this whitelist.

Because it’s about trust

At the end of the day, it’s about trust. We should not trust random web sites with black-box access to programs that depend on our private data. If I’ve never clicked on a link from site A to site B, then site A should know nothing about site B, and should not be able to run some program that very tightly depends on information about my visit to site B.

A visited link should mean “you’ve already walked this path before,” not “you’ve already seen this destination.”

6 thoughts on “Protecting against web history sniffing attacks: an alternative

  1. Sid mentioned yesterday that the main problem with implementing safehistory in Firefox was efficiency. Looking up visited link color was already a bottleneck step in rendering, without the additional complexity of treating links as pairs.

    I also disagree that safehistory fits the user’s mental model better. But it is certainly much cleaner conceptually.

  2. With a proper Referer+Link index, I can’t imagine this would be any different performance-wise.

    As for mental model… I do think this is worth testing. I tend to think that users think of links as “oh I already clicked this link.” Not “I already visited that destination.” The cases where those two are different are, I think, only meaningful to advanced users, and those users can whitelist link aggregators or learn to deal with the slightly new model.

  3. Seems like with the right index, looking up referer+link would be no more expensive than just looking up the link. Is that much harder than the proposed change?

    As for mental models… we should probably test it. My intuition tells me that only advanced users would be surprised, and those could easily learn the new model or whitelist the link aggregators they use. “I clicked on that link” usually includes context of the current page… but maybe my intuition is influenced by my desire for a cleaner model🙂

  4. Ok, it’s a little bit more subtle than that. Due to performance reasons, link styling is currently being split into an “asynchronous” mode, which means that it is computed after the page has been rendered. In this model, color is the only property that :visited could possibly be allowed to influence, because changing the font (for example) would require re-layouting the rest of the page.

    Given this constraint, which exists largely for performance reasons, conceptual simplicity is already screwed, and they might as well do what they have done, since it comes at little additional cost.

    There’s a fairly common use case where safehistory conflicts with mental models. Search Google for something, click a few links, fail to find what you were looking for, repeat on Bing. Now most of the Bing results overlap with Google’s, but you can’t see at a glance which ones you’ve already explored. Same for social news, etc. Dan and I discussed ways to avoid this problem via whitelisting, but IMO the default safehistory policy is noticeable to average users.

  5. but if you’re going to break sites, including sites that do things totally legitimately, isn’t it worth getting the model right? Whitelisting is, I think, a *very* good way to solve the search engine problem. And still I disagree with you about the visual cues that average users pick up. I bet you $20 that if we do a user study with and without safehistory, the average wouldn’t know the difference🙂

  6. You might be right. In fact more and more sites are doing away with :visited altogether in their styles, including Twitter and Facebook.

Comments are closed.