Microsoft Vista has speech recognition, so it’s conceivable that a malicious web site could play a sound that orders your computer to delete a file, at which point Windows might respond as if you’d given the order. I don’t blame Microsoft for this one, because it’s really an attack channel I doubt many people had considered.
What’s fascinating is that it provides a crystal clear explanation for something that most non-computer-savy folks might otherwise have trouble understanding. A web site might order your computer to do something, and the crux of most Internet security issues is that, somehow, the computer has to be able to distinguish what commands are coming from some rogue web site, and what commands are coming from you, the user.
Sometimes it’s trivial: only you can double-click an icon to start an application.
Sometimes it’s a little bit more complicated: who gets to open up a new browser window? You, of course, but does the web site’s programmer? You might want to let web sites you like do that when they need to (e.g. gmail), but you also don’t want the annoying pop-ups and pop-unders.
And then sometimes it’s near impossible: how can we determine if the sound is coming from you or the computer? Turn off speech recognition when the sound is playing? Okay, so that means you can’t play any MP3s while you speak to your computer. Filter out the speaker sound from the microphone input? Possible, but fairly complicated and likely quite CPU-intensive.
Security is a tough business. This case of the speech recognition back-channel is truly interesting, and I don’t think there’s anything close to an easy fix, short of forcing a mouse-click confirmation of every voice-activated action.
(As a side note, I have to say that I remain unconvinced that speech is a good computer interface except in certain very limited situations. Very few things anger me as much as the robotic customer service programs trying to soothe me with “I’m sorry, I didn’t understand that, please say ‘agent’ to speak to a human.”)