He proposes, instead of just transmitting the game's audio, to use a video livestream. You can do that on www.ustream.tv for example. You just need some way to deliver input, and afaik some capture driver is needed for that. I guess it's based on WIA (Windows Image Aquisition). Common capture hardware comes with such, as do webcams.
Effectively you could watch someone gaming and talk to him at the same time.