By using this site, you agree to our Privacy Policy and our Terms of Use. Close
KreshnikHalili said:

I liked your GPGPU explanation much more than this one.
First of all, ESRAM is 102 GB/s READ WRITE. That means that if you could theoretically do 100% read/write calls synchronously, you could get 204 GB/s by using ESRAM only. There is already an interview with some microsoft guy somewhere, which stated that this is not possible in a real scenario and that about 160 (or something like that) GB/s are realistic. In addition to that, you get the 68 GB/s from DDR3 which can feed the GPU in parallel to ESRAM (you stated 63 GB/s - why?). The data move engines which operate at 30 GB/s are used to move data from DDR3 to ESRAM (if necessary). those engines have their own processing logic and don´t use any CPU-processing power to do their work. Also don´t forget the fact that ESRAM is embedded, which means it has next to zero latency in comparison to normal memory.
This architecture is constructed to make full use of tiled-ressources. If done right, you have most of the important ones cached on the ESRAM.

All people who have done some cache-optimization for CPU-heavy tasks know that complicated algorithms could get multiple times faster if optimized right.
X1 ESRAM architecture is like an imitation of that but for rendering. You already mentioned the limitations though. Since graphical-processing data can get really big, 32MB can be a pain in the ass. Tiled ressources would open-up the limitation and allow for much more stuff to be proceeded through the ESRAM in parallel to DDR3. the problem though is how far you can use tiled-ressources and how fast the GPU can process all the data. In the end you have only so much development time and everything has to gear into each other and work. it gets really complicated from here on out.

All-in-all probably not the best design choice (because of the room the big ESRAM takes) but let´s see what future games can take out of it.


Not even that. The absolute Bandwidth record is ~145GB/s in one specific task (alpha blending) where read & write are possible in the same time.

But most GPU tasks don't allow simultaneous read and write, so in fact the bandwith is much more less and maybe 80-90 GB/s realistically feasible with only read or only write with the majority of tasks.

 

Really, like previous gen, Microsoft really have really bamboozled people with their bandwidth PR.