By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Gaming Discussion - Tech Talk Part 2: The mighty ESRAM: Resolutiongate

Part Two of the Tech Talk series. And this time we are talking about ESRAM!!!

First.. what is eSRAM?
This stands for Static memory. And the "e" means its embedded on the processor die. Pretty straight forward. It has an effective total bandwidth of 102GB/s.  You see, there are 3 bandwidth pools in the XB1 that works independently of eachother. System ram@68GB/s. ESRAM@102Gb/s and lastly; the data move engines (DME) which has 30GB/s and is designed to move data from GPU (more specifically the eSRAM) to and from system memory without any computational hit. Pretty ingenious actually.............................(Edit 1)

The nitty gritty
eSRAM (in the XB1) is 32MB in size. This is very very very very important to remember. So while it has a ridiculous amount of bandwidth, it doesn't have a lot of it. This doesn't just limit what it can be used for, but also how it can be used. If used effectively, it would be primarily used as a cache. Using it this way would boost the XB1s memory performance to near PS4 levels. Albeit some concessions still have to be made. Let me explain.

As a cache, GPU assets that are read/written to frequently (a lot) would be stored on the eSRAM. This is because if the GPU has to read from the slower DDR3 memory  instead it would slow the render pipeline down and ultimately stall the XB1. So using it to cache data that the GPU accesses often prevents the GPU from waiting for data to work on and/or where to keep said data. If done right, it could match the PS4s memory bandwidth performance. Problem is however, you can only store so much on the ESRAM. 32MB. Lets get a little technical..

All 3D rendering hardware needs at least three bitmaps stored on memory. These are called the

  • Back-buffer [B-buffer] - where the GPU drawa the current frame
  • Front buffer [F-buffer] - which is where the last drawn B-buffer is stored and then sent to your screen for you to see
  • Z buffer - which is a map of the depth of every pixel and which makes it possible for the GPU to easily draw traingles as needed, render targets go here too. 
Now you MUST have at least the B and Z buffers in really fast memory cause DDR3 memry is not fast enough to effectively handle moving data around at the required speeds those buffers need to operate in. The F-buffer can chill on the system memory. 

Think of a buffer as a picture. A 20mpx picture has more data than a 5mpx picture. A 1080p HDR Back buffer takes up ~16MB of memory. Then the Z buffer takes up another 8MB. All you have left is 8MB. And you have not even added other more complex post processing effects (AA, bokeh...etc) to the B-buffer which would make it even larger. And the scary thing about some of these post processing efects is that something like 2x MSAA (hence why FXAA is so pupular) will literally bumb the b-buffer size by around 70%!!!!!!!!  Make no mistake, the ESRAM exists solely to host the back buffer and z-buffer. But this is an easy problem to fix. If you are microsoft that is and you shit rainbows.

It just so happens that they do..... kinda.....

[Just get it out there Lazy dev Go To Solution]Just drop the resolution. Lower resolution, smaller sized back buffer, smaller sized back buffer less eSRAM being used. Win win. Some devs take it a step further and even cull some stuff from the z-buffer (thats why you may see some PS4 games having geometry assets that seem to be missing on the XB1). Ths gives you more room to do all the post processing stuff that gets aded t every frame. Stuff like that just can't exist anywhere else but on the ESRAM.

Poor ESRAM, it gets blamed for everything pixels are bad too!!
A common misconception is that the resolutiongate (720p vs 900p vs 1080p)is irrelevant and minuscule. Truth is that poeple couldnt be more wrong, especially when considering the hardware impact of resolution and its memory footprint. Remember, a buffer is an image. That image is made up of pixels. The GPU "draws" these pixels. A picture's resolution is practically the building block of that picture. But lets put it to rest, When someone says 900p vs 1080p. Its easy to look at that and just go "herp derp u'all bitching over 180p!!!! hahahahahah". If you have ever done something like that before, slap yourself now and call yourself stupid. Numbers time!!

 

  • 900p = 1600 x 900 = 1, 440,000 pixels.
  • 1080p = 1920 x 1080 = 2, 073, 600 pixels
  • that is a difference of 600,000 pixels!!!!!!!!!!! NOT just a 180 pixel  difference lol.
  • 900p is ~40% less than 1080p
If you look at that, then its easier to understand how dropping resolution on the XB1 from 1080p could make a world of difference if it means they don't have to store 600k worth of pixels in ESRAM. 

 

Its not all doom and gloom (aka. we know how to optimize "unlazyy dev appraoch")
Remmeber when I said MS kinda shits rainbows? Well, I realy meant it. Cause that is the only way they could pull this off. Which they did by the way.

Microsoft is a software company. And not just any software company, they are the  guys that created the DX shader language. These guys know their shit. So, first off, they were smart enough to put in ESRAM to begin with. They would have gone for more but then the APU die would have been too big and harder to manufacture. But hey, at least they put it in. Only if they could have squuezed in 128MB.. or even 64MB, then all the resolution crap would be non existent. Ah well....

What MS did, or at least they are hoping developers will do, is something called Tiling.

Contrary to how it sounds, tiling doesn't mean you stack images on images and can thus cram more into ESRAM. Nope, it just means that you can break up images or components of your render pipeline so that way you can do more with the little ESRAM you have. Let me explain. say you have a racing game. And the sky, top half of your screen is alway going to blue, Or better yet, a particular image/skybox. What tiling lets you do is;  

  • to render just that portion of the sky on the b-buffer once  
  • Cut it off and send it using the DME to DDR3 memory 
  • then access it from theer since it doesn't need to be refreshed or re-rendered every single time. Brilliant. 
That means that instead of using 16MB to render a 1080p image, you only need to use like say 10MB. Saving a whole 22MB of ESRAM space for your Z-buffer and everything else. Which in turn means you can do more too. Or means that you won't have to drop the resolution of your game to 900p or 720p.

The problem is that the tiling thingy wasn't ready at the start of the current gen. And even now devs are really just beginning to wrap their heads around it cause the SDK really has to be on point to allow devs access parts of an image in sync from two different pools of memory. Trust me.... its no easy feat and its a small miracle that MS could pull it off.

And contrary to what some may say... ESRAM has nothing to do with a games framerate. Well kinda. This is because what are mostly in ESRAM are caches, and they are flushed once used. So there is nothing to stall the pipeline. ESRAM is there to save the framerate. Cause without it the GPU will spend too much time making simple render calls. 

Conclusion
So its not all that bad. And truth be told is that if anyone could pull of the kinda software magic it requires to get ESRAM to shuffle and manage as much data as it need to to keep up with a modern day HD game with all effects turned on... it had to be microsoft. Was ESRAM a bad idea? Yes it was, but a necesary evil though cause MS had to go with a minimum of 8Gb or memory. And at the time 8GB of GDDR5 did not exist. Its still nowhere near as bad as the cell processor debacle though. Will we one day see every XB1 game also running at the same resolution with its PS4 counterpart? Yes we will. There will still be an IQ advantage for the PS4 though but they would at least both be running at the same resolution at some point. Well..... at leaast with a lot of games.

 

edit 1 (adjusted some numbers and removed some talk about microsoft math) (Thanks kreshnik for clearing up some info)
edit 2 (proof read and corrected errors/typos and rearranged content to make it easier to read) (thanks Binary for the suggestion)



Around the Network

I liked your GPGPU explanation much more than this one.
First of all, ESRAM is 102 GB/s READ WRITE. That means that if you could theoretically do 100% read/write calls synchronously, you could get 204 GB/s by using ESRAM only. There is already an interview with some microsoft guy somewhere, which stated that this is not possible in a real scenario and that about 160 (or something like that) GB/s are realistic. In addition to that, you get the 68 GB/s from DDR3 which can feed the GPU in parallel to ESRAM (you stated 63 GB/s - why?). The data move engines which operate at 30 GB/s are used to move data from DDR3 to ESRAM (if necessary). those engines have their own processing logic and don´t use any CPU-processing power to do their work. Also don´t forget the fact that ESRAM is embedded, which means it has next to zero latency in comparison to normal memory.
This architecture is constructed to make full use of tiled-ressources. If done right, you have most of the important ones cached on the ESRAM.

All people who have done some cache-optimization for CPU-heavy tasks know that complicated algorithms could get multiple times faster if optimized right.
X1 ESRAM architecture is like an imitation of that but for rendering. You already mentioned the limitations though. Since graphical-processing data can get really big, 32MB can be a pain in the ass. Tiled ressources would open-up the limitation and allow for much more stuff to be proceeded through the ESRAM in parallel to DDR3. the problem though is how far you can use tiled-ressources and how fast the GPU can process all the data. In the end you have only so much development time and everything has to gear into each other and work. it gets really complicated from here on out.

All-in-all probably not the best design choice (because of the room the big ESRAM takes) but let´s see what future games can take out of it.



KreshnikHalili said:

I liked your GPGPU explanation much more than this one.
First of all, ESRAM is 102 GB/s READ WRITE. That means that if you could theoretically do 100% read/write calls synchronously, you could get 204 GB/s by using ESRAM only. There is already an interview with some microsoft guy somewhere, which stated that this is not possible in a real scenario and that about 160 (or something like that) GB/s are realistic. In addition to that, you get the 68 GB/s from DDR3 which can feed the GPU in parallel to ESRAM (you stated 63 GB/s - why?). The data move engines which operate at 30 GB/s are used to move data from DDR3 to ESRAM (if necessary). those engines have their own processing logic and don´t use any CPU-processing power to do their work. Also don´t forget the fact that ESRAM is embedded, which means it has next to zero latency in comparison to normal memory.
This architecture is constructed to make full use of tiled-ressources. If done right, you have most of the important ones cached on the ESRAM.

All people who have done some cache-optimization for CPU-heavy tasks know that complicated algorithms could get multiple times faster if optimized right.
X1 ESRAM architecture is like an imitation of that but for rendering. You already mentioned the limitations though. Since graphical-processing data can get really big, 32MB can be a pain in the ass. Tiled ressources would open-up the limitation and allow for much more stuff to be proceeded through the ESRAM in parallel to DDR3. the problem though is how far you can use tiled-ressources and how fast the GPU can process all the data. In the end you have only so much development time and everything has to gear into each other and work. it gets really complicated from here on out.

All-in-all probably not the best design choice (because of the room the big ESRAM takes) but let´s see what future games can take out of it.

I liked the GPGPU write up more too. This one was really hard to simplify. There is a lot of conflicting stories about the 102GB/s bandwith thing. It gets more confusing when you consider exactly how eSRAM is built. The reason why GDDR5 has such high bandwidths is cause of their clamshell gated design. Chips on either side of the board. Dual lane data transfer. eSRAM is not like that at all. But I can't be certain, this isn't really even about how much bandwidth it has either way. I also pointed out how fast it is. This is more about how its used.

You are right on the money with the tilig stuff though. Thats what it means to make th emost of the eSRAM. And I didn't proof read this post, didn't know I put 63GB/s instead of 68Gb/s lol. Thanks for clearing that up. Will make some edits.



Great write up. Enjoyed reading both. Would be nice to see a piece on FPS, how they are generated what components benifit the FPS and some shortfalls with current architectures.



Tiling is not "Microsoft shitting rainbows". Tiling in fast RAM has been something used on AMD GPUs including the Xbox 360 and in all the Adreno mobile GPUs. This tiling does add some overhead though.

The memory bandwidth is only half of the story though. A bigger problem is that by having eSRAM taking up area on the die, the XB1 had to cut Compute Units. The XB1 has 12 CUs compared to the PS4's 18 CUs. There are also other units reduced, like texture units and frame buffer units. Those are the units that do the actual shading, texturing, and writing of pixels.

When going from 900p to 1080p there are 40% more pixels to shade, texture and write, and there is no way to "shit rainbows" and make up for those missing CUs and other units.



My 8th gen collection

Around the Network

I really love these tech talks



 

KreshnikHalili said:

I liked your GPGPU explanation much more than this one.
First of all, ESRAM is 102 GB/s READ WRITE. That means that if you could theoretically do 100% read/write calls synchronously, you could get 204 GB/s by using ESRAM only. There is already an interview with some microsoft guy somewhere, which stated that this is not possible in a real scenario and that about 160 (or something like that) GB/s are realistic. In addition to that, you get the 68 GB/s from DDR3 which can feed the GPU in parallel to ESRAM (you stated 63 GB/s - why?). The data move engines which operate at 30 GB/s are used to move data from DDR3 to ESRAM (if necessary). those engines have their own processing logic and don´t use any CPU-processing power to do their work. Also don´t forget the fact that ESRAM is embedded, which means it has next to zero latency in comparison to normal memory.
This architecture is constructed to make full use of tiled-ressources. If done right, you have most of the important ones cached on the ESRAM.

All people who have done some cache-optimization for CPU-heavy tasks know that complicated algorithms could get multiple times faster if optimized right.
X1 ESRAM architecture is like an imitation of that but for rendering. You already mentioned the limitations though. Since graphical-processing data can get really big, 32MB can be a pain in the ass. Tiled ressources would open-up the limitation and allow for much more stuff to be proceeded through the ESRAM in parallel to DDR3. the problem though is how far you can use tiled-ressources and how fast the GPU can process all the data. In the end you have only so much development time and everything has to gear into each other and work. it gets really complicated from here on out.

All-in-all probably not the best design choice (because of the room the big ESRAM takes) but let´s see what future games can take out of it.


Not even that. The absolute Bandwidth record is ~145GB/s in one specific task (alpha blending) where read & write are possible in the same time.

But most GPU tasks don't allow simultaneous read and write, so in fact the bandwith is much more less and maybe 80-90 GB/s realistically feasible with only read or only write with the majority of tasks.

 

Really, like previous gen, Microsoft really have really bamboozled people with their bandwidth PR.



ICStats said:
Tiling is not "Microsoft shitting rainbows". Tiling in fast RAM has been something used on AMD GPUs including the Xbox 360 and in all the Adreno mobile GPUs. This tiling does add some overhead though.

The memory bandwidth is only half of the story though. A bigger problem is that by having eSRAM taking up area on the die, the XB1 had to cut Compute Units. The XB1 has 12 CUs compared to the PS4's 18 CUs. There are also other units reduced, like texture units and frame buffer units. Those are the units that do the actual shading, texturing, and writing of pixels.

When going from 900p to 1080p there are 40% more pixels to shade, texture and write, and there is no way to "shit rainbows" and make up for those missing CUs and other units.

Yes tiling adds some overhead, but its ultimaely the best way to make the XB1 perform better than without it.

Its all about the memory bandwidth. Every decision MS made in how the system was designed is tied to memory bandwidth. We shouldn't forget that sony were actually at one point shooting to have 4GB of GDDR5 even sny didn't know they could end up with 8GB of the stuff. They practically just lucked out. Why sony ended up with an all round more powerful APU was cause they probably figured that since they would have less RAM then they should make sure that their system spends less time actually crunching data. MS were playing it safe, 8GB of GDDR5 ram didn't exists t the time and they new they were gonna have lots of the system Ram going to the OS. So it was a no brainer for them. They had to put in 8Gb of DDR3 ram. And that automatically meant they would compensate with ESRAM.

Whats funny is that the 360 used 512Mb of graphic memory for their system, so MS was no stranger to the benefits of such an architecture. Sony just took a gamble that paid off.



Probably my favourite threads on this forum at the moment. Though please proof read and edit because some of the grammatical errors made my eyes bleed

It was an enjoyable read otherwise and educative. Also thanks to the initial responders for providing even more informative stuff.

What's next?



“The fundamental cause of the trouble is that in the modern world the stupid are cocksure while the intelligent are full of doubt.” - Bertrand Russell

"When the power of love overcomes the love of power, the world will know peace."

Jimi Hendrix

 

binary solo said:
Probably my favourite threads on this forum at the moment. Though please proof read and edit because some of the grammatical errors made my eyes bleed

It was an enjoyable read otherwise and educative. Also thanks to the initial responders for providing even more informative stuff.

What's next?

Yh my bad.... have to always rememebr to proof read. Guess that swhy I am not a writer lol.

I have a ton of topics to cover, next topic should be up sometime tomorrow. already working on it. 

I also appreciate the unput of the first responders. These threads are never really a VS or this is better than that hing. I sincerely hope they are just informative and that at least someon can learn someting from them. So I above all else appreciate it when some posters just coorect certain errors I made instead of assume that those errors were made on purpose lol. Some people can be that way.

And whenever I am irrefutably corrected in the thread, I will make the necesary eidits to the OP and give credit to whoever corrected it.