As you’ve probably heard by now, a very serious CPU bug was disclosed a few days ago. Lots of folks have tried to explain it in non-technical terms. I’ve not been satisfied with any of these, and as someone who believes it is a solemn responsibility of experts to make important topics accessible to all, that bugs me. So I spent some time reading up on the issues and coming up with my own explanation by analogy.
(I want to credit Redhat’s explanation for (a) being one of the best out there and (b) inspiring the food/restaurant analogy, which I embraced and extended.)
The Winter Olympics are coming up, and US Athletes are getting ready. Now, athletes have a regimen they follow scrupulously. And they’re very competitive, so sometimes their regimen is secret. In particular, I’ll ask you to suspend disbelief on one thing: these athletes are sooo competitive, that even what they eat for breakfast is secret. After all, breakfast is the most important meal of the day…. right? Right.
To address this requirement, the Olympic village has set up the Secret-Breakfast Cafeteria. When an athlete, say Jamie, approaches the counter, she asks for her special secret breakfast made to order:
“Hi, I’m Jamie, please make me my secret breakfast.”
The chef then follows a rigorous checklist:
- check Jamie’s identity against the recipe instructions
- beat 3 eggs
- grate some gruyere
- chop up some ham
- cook the omelette
- serve Jamie her breakfast
The breakfast is delivered under secrecy of a plate cover, so Jamie can retreat to her table and eat it in secret. (I know, it’s hard to imagine exactly how that happens in secret, just stay with me!)
The approach described so far works well if there’s one athlete to feed every 15 minutes. But with lots of athletes to feed, the cooking operation needs to scale.
First, to go faster, the chef is going to perform a mise-en-place, where he fetches all the ingredients from the pantry prior to cooking and sets them up close to this work area. This makes every other step faster.
Second, one chef isn’t going to be enough. Instead, many chefs will work together, each with a specific role. One chef is responsible for fetching recipes and checking identities, a few chefs are responsible for taking care of individual ingredient prep, another for cooking it all together, and finally a last chef for serving the plate.
A lot of these tasks can be done in parallel. The ingredients can be prepared in parallel by different chefs, to start. Heck, if speed is the preeminent issue (and it is, cause athletes are hungry), then it’s worth optimistically starting to cook Jamie’s breakfast before the identity-checking chef has finished his job. In other words, it’s okay to follow the recipe tasks out of order. If, perchance, the identity check fails, then the breakfast should be discarded instead of served. Yes, that means food gets discarded occasionally. Most of the time, though, no one’s trying to impersonate Jamie, so it’s reasonable to be optimistic, cause that will make the cafeteria faster most of the time at the cost of wasting some food. And in any case, even with a little food waste, as long as a secret breakfast doesn’t make it out of the kitchen to an impostor, then the secret recipe is still secret, and the Secret-Breakfast Cafeteria lives up to its name. Right?
Now imagine Chloe, another athlete, who’s trying to steal Jamie’s breakfast recipe. She shows up and lies:
“Hi, I’m Jamie, please make me my secret breakfast.”
[except this is Chloe, not Jamie]
First, one chef begins the identity check. But that takes a while, so the other chefs optimistically start their work. Remember there’s that mise-en-place, right? One chef moves the ingredients for Jamie’s breakfast from the pantry to the work area: eggs, gruyere, and ham. A few other chefs then begin their parallel dance: one cuts the ham, another grates the gruyere, another beats the eggs.
Now, just as the omelette comes together, the ID-checking chef returns with some unexpected and concerning news: that’s not Jamie. The chefs immediately halt their work and throw away that beautiful, almost-ready omelette. Chloe, instead of getting her breakfast, receives a rejection: you’re not Jamie, you can’t have Jamie’s breakfast. Defeated, Chloe sighs and says “fine, then give me some waffles, a side of ham, and some eggs over-easy.” Miffed, she waits for her food.
Except… when the food comes out, she notices that the side-of-ham and eggs are ready a little bit faster than the waffles. Now, why would that be?
Chloe’s eyes widen as she realizes that, like Sherlock Holmes, she’s pieced together the mystery: it’s the mise-en-place! Those ingredients must have been already at the chef’s work area, pre-fetched from the pantry! From that subtle but unambiguous timing information, Chloe concludes that Jamie’s breakfast must have included ham and eggs. That may not be Jamie’s complete recipe, but it’s a heck of a lot more information than Chloe should have uncovered, given that Jamie was expecting a totally secret breakfast. A couple more days of breakfast psy ops and Chloe will have all the elements of Jamie’s recipe, thereby breaking the central guarantee of the Secret Breakfast Cafeteria.
Connecting Back to CPUs
Today’s computing architectures look a lot like the Secret-Breakfast Cafeteria. There are a number of processing units (our chefs) working in a parallel, executing tasks out-of-order, throwing away the data they’ve optimistically computed if a bounds check fails. Computers do increasingly more things in parallel because, over the last few years, it’s become harder to make each processing unit churn out results faster, so instead CPUs are filled with more processing units. Rather than each instruction happening faster, CPUs just do more instructions in parallel and pull off increasingly crazy tricks to parallelize code that isn’t explicitly written to be parallelized.
And of course, in modern processors, there’s caching. Accessing main memory is very slow, so copies of main memory that are in active use are brought closer to the CPU for faster access (like the mise-en-place). Once brought closer to the CPU, cached memory tends to stay there until other sections of main memory are needed.
Put those two things together: caching and out-of-order execution, and you understand the simplest variant of these newly discovered attacks. The other variants, though more complex, are roughly the same idea: some code is executed optimistically by the processor, that code accesses secret data, and, although the output is properly discarded when the processor detects it should never have executed that code in the first place, traces left in the cache can later be observed via timing behaviors.
And that’s Meltdown and Spectre.
Why does this Matter?
Where does this apply, you might wonder? Everywhere. Your phone runs multiple apps. Your web browser runs multiple pages. Your virtual cloud server may be co-hosted with another company’s virtual cloud server on the same physical machine. Each app, each web page, each virtual server is supposed to be hermetically sealed from others. To break that seal even a little bit is a catastrophic security issue.
[With apologies to Jamie Anderson and Chloe Kim, two awesome Team USA Olympic athletes who are, to the best of my knowledge, very fine people who would never steal each other’s breakfast recipes.]