Natal's firmware code and research were done and gathered from many different sectors of MS. It is almost like Natal is a patchwork of various R&D that has been going on at MS for years. Trying to attach a cost to this development to Natal alone would be erroneous.
I thought I already discussed how the EyeToy's motion detection via RGB color values was inferior to Natal's methods. This was backed up by a Sony engineer stating the same thing in a video. This is also why the wands are necessary. I really don't think there is even an argument to be had about this. A discussion could be had about PSEye+Gem and Natal, but PSEye alone is really no argument.
And yes, facial and vocal recognition is a software solution. And it appears that it is a solution that Natal's firmware will be handling. This allows for developers to more easily add Natal capabilities to their game. A developer can code a game to work with either Natal or a controller and not have to be limited by the former. For example, if the camera data was to be processed by the console's CPU/GPU, the game's other areas: graphics, sound, AI, etc, are affected by this even though the user might not even have the camera hardware. If it was developed with peak CPU/GPU usage when using a controller, when a user uses the camera, the frame rate will take a hit unless the game was developed where a degrading of other areas was performed. I don't think I explained this very well.
Perhaps a detailed example of this will be more effective. I will use, as an example, the Geometry Wars port using Natal. This game could be played with either the controller or Natal(and no, I don't think Natal is a good fit for GW, but for the sake of examples, I will use it).
//Publicized indexes of the hands.
static final byte rightHandIndex = 12;
static final byte leftHandIndex = 24;
static final byte chestIndex = 6;
private void calculatePlayerActions(byte playerIndex, boolean usingNatal)
{
vector3 acceleration;
vector3 turretVector;
if(usingNatal)
{
//For this particular frame, Natal has already sent the player skeletal data
//to the 360.
Vector3 rightHand = playerSkeletons[playerIndex].getSkeletonPoint(rightHandIndex);
Vector3 leftHand = playerSkeletons[playerIndex].getSkeletonPoint(leftHandIndex);
Vector3 chestPosition = playerSkeletons[playerIndex].getSkeletonPoint(chestIndex);
//initialize turretVector using the unit vector perpendiculer to the vector (rightHand - leftHand)
//initialize acceleration using the hands and chest positions.
}
else
{
//Initialize acceleration by using the information about the left thumb stick.
//Initialize turretVector by using the information about the right thumb stick.
}
}
I know the two vectors that are being initialized are never used for anything in this example. It is just an example of a possible scenario.
Not much, if any, differences between performance of either using the controller or Natal. If the calculations were not done on Natal and were done via the CPU. The first call to getSkeletonPoint would have a rather large overhead as it would start the calculations of the skeletal system provided by the API(ie, developers don't need to code this, MS did the work for them). This allows for some possible retrofitting of games using Natal. Sure, games being retrofitted for Natal will most likely not be as good as one designed from the ground up to use Natal, the option to tack on Natal functionality is there. Facial recognition and vocal recognition directly apply with the example as well.
Using the PSEye for this same scheme would require even more performance issues and a possible failure to determine the depth differences between the hands and the chest due to RGB processing lacking in this department. The developers might even have to code the detection algorithms themselves if Sony does not provide them with PSEye, which is a double hit. Coding and testing necessary for the detection algorithms and the performance hit from using them.