(your) information wants to be free – obamacare edition

My friends over at EFF just revealed that Healthcare.gov is sending personal data to dozens of tracking sites:

It’s especially troubling that the U.S. government is sending personal information to commercial companies on a website that’s touted as the place for people to obtain health care coverage. Even more troubling is the potential for companies like Doubleclick, Google, Twitter, Yahoo, and others to associate this data with a person’s actual identity.

The referenced AP story uses even more damning language:

The government’s health insurance website is quietly sending consumers’ personal data to private companies that specialize in advertising and analyzing Internet data for performance and marketing, The Associated Press has learned.

Sounds pretty bad, right? Except it’s almost certainly not what it sounds like. It’s almost certainly a simple mistake.

How could this be a mistake, you ask? Here’s what almost certainly happened:

  1. Someone at Healthcare.gov wanted to analyze patterns of usage of the site. This is often done to optimize sites for better usage. So they added a tracker to their page for MixPanel, for Optimizely, for Google Analytics, and a couple of other sites that help you understand how people use your site. In all likelihood, different departments added different trackers, each for their own purposes, almost certainly with good intentions of making the web site more usable.
  2. Meanwhile, someone else responsible for social media of HealthCare.gov added a “Tweet This” button, and someone else added a YouTube video. Once again, these come in the form of widgets, often snippets of JavaScript code, that load resources from their respective home base.
  3. Separately, someone built the web form that lets you enter basic information about yourself so you can find a health plan. That information is, in large part, fairly personal: your age, your zip code, whether or not you smoke, etc. And for some reason, almost certainly completely random, they used a web form with an action type of GET.
  4. Here’s the first mildly technical point. When you submit a GET form, the data in the form is appended to the URL, like so:
    https://healthcare.gov/results?zip=12345&gender=male&parent=1&pregnant=1&...

    Not a big deal, since that data is going to Healthcare.gov anyways.

  5. And now for the second mildly technical point. For tracking purposes, trackers often blindly copy the current URL and send it to their homebase, so that the trackers can tell you users spent 5s on this page, then 10s on that page, etc. In addition, when your browser requests an embedded YouTube video, or an embedded tracker, it sends the current URL as part of the request in a so-called Referrer field.
  6. Put those two technical points together, and boom: a web site that collects personal information with GET forms and uses third-party tracking widgets tends to send form data to those third parties.

This is extremely common. Many web sites with sufficiently large engineering teams have no idea how many trackers they’ve embedded. It’s typical for a web site to move from one site analysis tool to another and to forget to remove the first tracking widget in the process. When the Wall Street Journal reported on these issues a couple of years ago with their fantastic What They Know series, they forgot to mention that their own page has a half-dozen trackers embedded.

I’ve said it before, and I’ll say it again: unfortunately, your information wants to be free. My favorite analogy remains:

when building a skyscraper, workers are constantly fighting gravity. One moment of inattention, and a steel beam can fall from the 50th floor, turning a small oversight into a tragedy. The same goes for software systems and data breaches. The natural state of data is to be copied, logged, transmitted, stored, and stored again. It takes constant fighting and vigilance to prevent that breach. It takes privacy and security engineering.

So, am I letting Healthcare.gov off the hook? Not at all, they should have done their due diligence and done a more thorough privacy audit. And using GET forms is particularly sloppy, since it leads to data sprayed all over the place in logs, referrers, etc.

But was this a deliberate attempt at sharing private data with private companies? Not a chance. The press should do a better job of reporting this stuff. And, to my wonderful friends at EFF, this is a gentle nudge to say: so should you. It’s important to differentiate between negligence and malice, to not spread fear, uncertainty, and doubt, even when it’s issues we care about.

The good news is that HealthCare.gov has already responded by (a) reducing their number of trackers significantly and (b) submitting form data using XMLHttpRequest or POST. The bad news is how many people now actually believe that this was intentional, conspiratorial data selling. If that was Healthcare.gov’s intentions, there are much sneakier ways of doing that without getting caught so easily.

Oh, and if you want to understand more about trackers and block them as you surf the web, try the very excellent Ghostery extension for your browser.