MS probably has looked into this and it wasn't robust enough. This likely requires near perfect positioning to work.
Consumers usually don't remember the 90% of the time it works, they remember the 10% of the time it doesn't work. That's why games for Eyetoy and webcams are so limited, there are too many edge cases that are just unsolvable without the help of a depth camera. Also, this is running on the PC which has a lot more resources to give to analysis. I expect stuff like this will be in the next generation Kinect, but it takes a lot of time to make it work for everyone in a variety of settings.
I do think that the work they are doing now will pay off large dividends in the future. It'll be hard for competitors to catch up when kinect will have had a 5 year headstart on the R&D.







