I went to a talk by Microsoft Resarch Labs this weekend here in Cambridge, UK. They where talking about the science of Kinect. It was a very interesting talk and at least help me understand Kinect a lot more. For instance listening to the talk I realized that Kinect and Move, despite the differences, where using some of the same basic scientific techniques. I thought I share my impressions here in case other people where interested too.
First of all the Kinect camera is not a '3D camera' in the same sense that a 3D camera for making 3D movies is a 3D camera. Instead Kinect relies on the same technique that Move does for sensing how far away you are. They have a fixed size object and by determining how large it looks to their camera they can detect if you are close by or not. So in the case of Move that object is obvious it is the coloured light bulb on top of the Move controller. For Kinect they had done something I found quite clever, they have a little lamp/projector on the front of Kinect that projects out a pattern in the infrared spectrum. What this pattern looks like and how big it is Kinect knows of course, so when the infrared camera on the Kinect sees it projected on you, it can judge how far way you are. So they are using the same basic technique here as the Move. Of course the Kinect is doing it in a somewhat more elaborate fashion than Move, as this pattern of course covers your whole body, so it can create a 3D representation of all of you using it, with the effect being similar in some ways to those pin-cushion toys people have that you can use to imprint your hand or face.
The second camera on the Kinect is basically a normal 2D camera.
The other interesting aspect of the talk was how they did the movement detection. They are not doing motion tracking as such, ie they are not detecting that something is your hand and then try to follow it around as you move it. Instead they had this matrix of about 1 million images which they try to do a pattern matching with. So almost no matter how you stand it is likely that you will be standing somewhat aking to one of those images. The technology for this part of Kinect actually came from their image research department, who had been researching on how to detect objects in photos. Their example was how a computer could automatically realize that there is a cow is in a photo for instance. They had then taken that basic research and reaplied to human poses.
They didn't go to much into detail on how they combined the two sources of information, the '3D' camera and the 'pose recognition', but I assume that between the two of them and the game often knowing what you are supposed to do, they are able give the experience that Kinect offers.
So while I am still not sold on Kinect being able to offer an interesting experience for the traditional triple-AAA games, it is a very interesting bit of tech and it also reminded me of one of the core principles of applying science to solve a problem, that if a problem is to hard to overcome, then redefine the problem :)










