The true cost of memory operations is not the number of bytes transferred, but the number of transactions serialized through the memory controllers.
"The flagship NVIDIA card, the GTX 580, has 16 SMs and runs with a shader clock speed of 1544 MHz. Each SM has 32 ALUs that can retire a fused multiply-and-add (that's two ops) per cycle. The product of these factors (1,544,000,000 x 16 x 32 x 2) is a staggering 1581.1 GFlops. ATI currently manufactures devices with arrays of VLIW vector processors per core, for chips with an even higher density of ALUs. The flagship ATI GPU, the Radeon 6970, has 24 SMs, each with 16 4-way VLIW4 ALUs. It comes clocked at 880MHz. Executing fused multiply-and-adds (2 ops) gives a staggering theoretical arithmetic throughput of 2703.3 GFlops (880,000,000 x 24 x 16 x 4 x 2)." NVIDIA CARD FREQUENCY DOUBLES THE RADEON CARD, BUT IT IS ACTUALLY THE RADEON CARD THE ONE THAT OUTPERFORM .
There you have it, not from me, but from GPGPU programers. Things that we understand from actual consoles do not apply any more for the nextgen consoles (read paragraph above regarding clock speeds on GPU). This means Nintendo is in the same spot as PS3 this generation. No game can look superior on the Wii U beause simply the utilization of the codes are not the correct.
Now, if Sony adopt GPGPU and Microsoft don't, what will happen? Will the developers support xbox720 more than Wii U and PS4, when both together move more consoles and software than Microsoft alone? Remember that Microsoft relies on 3rd party, they position themselves in that spot because developers were using the most efficient and easy codes to program at the time.