When a web site links to another web site, the link appears in a different color, usually a lighter shade of blue, if you’ve already visited the site. Unfortunately, this means that a malicious web site can learn what sites you visit by putting up a few links and checking to see how your browser is rendering them. Arvind explained the shockingly bad outcome of this small flaw a few weeks ago.
Today, Mozilla is proposing an interesting way to “plug” this leak, by attacking the problem from both ends. First the style changes for visited links are now limited: you can’t change the font-size of a visited link. Second the web page can no longer fully introspect and discover those small style variations that are still allowed. It’s really fantastic to see Mozilla working on this particularly nefarious issue, which has been one of the elephants-in-the-room of web security for the last few years.
But I’m not sure this approach is the right one. It is, exactly as Mozilla put it, a “plug” for a leak. It doesn’t really address the essence of the issue, and I suspect clever attackers will find other, smaller leaks to exploit. Meanwhile, setting the precedent that the browser will now fake some of its rendering information to the page’s own JavaScript is a little bit odd.
The core issue is that a web page is allowed to use private information it should never have access to as a kind of black-box processor on its own rendering. That may have been fine when web pages were static content, but now that web pages are full-fledged programs that can attack these black boxes a thousand times a second without the user noticing, it’s a problem.
An Alternative: tweaking the meaning of ‘visited’
So here’s my proposal (which may well have been mentioned by others before me.) a proposal I thought I’d come up with, but really is just a subconscious reappearance of work by Collin Jackson, Andrew Bortz, Dan Boneh, and John Mitchell: safehistory.
A browser should consider a link “visited” depending on where that link appears. If I’ve clicked on cnn.com/stories/123 from fooblog.com, then the next time that link is shown on fooblog.com, it should appear as visited. But if that same link appears on barblog.com, then it should simply be considered a new link.
That way, a web page only has access to information that it technically already could have collected itself, by tracking the outgoing clicks on its site. No more black-box access to private data that it can manipulate thousands of times a second. No more leaks to plug (in this context). Conceptually, this is cleaner and much more in line with the security model we need for the Web.
Also, I have a feeling that this might be a better mental model of how most people think of visited links. A visited link, in this new model, means “I have already followed this path from A to B,” rather than “I have already seen B via some other site C.” After all, how often do people find the same link from two different sources at different domains? It happens, of course, but how often is it useful to know that this link was visited via a different path when the anchor text at both sites will almost certainly be different, making the information that you’ve already visited this site fairly vague (which site was it?)
But what about link aggregators, like reddit, digg, etc…
Some sites just aggregate tons of links to stories around the web, and in those cases you might want to know that you’ve already seen the story via some other link-aggregator or even on your own. Now, that need may be a lot less important than you think initially: often links from aggregators are customized with outgoing link-tracking, referrer codes, etc.. that actually prevent visited-link-highlighting from activating anyways. But, this may be a legitimate case where advanced users want to know that they’ve seen these stories before. In those cases, I can imagine a “super-referrer” whitelist in the browser, where certain sites are trusted not to abuse their ability to use your history black-box rendering processor. Advanced users would have to add the link aggregators they trust to this whitelist.
Because it’s about trust
At the end of the day, it’s about trust. We should not trust random web sites with black-box access to programs that depend on our private data. If I’ve never clicked on a link from site A to site B, then site A should know nothing about site B, and should not be able to run some program that very tightly depends on information about my visit to site B.
A visited link should mean “you’ve already walked this path before,” not “you’ve already seen this destination.”