The Playstation 4 uses one pool of memory DDR5 to achieve optimal performance, The Xbox One on the other hand has to use two
pools of memory to achieve optimal performance DDR3 and ESRAM. This takes time as the developer has to see which parts of the
game will run in DDR3 and which parts of the game will run in ESRAM, no wonder many developers are having trouble achieving optimal
performance on Xbox One.
Here is a digital foundry interview with the architects of the Xbox One. I have posted some of the main points below, but you can see the
full interview here: http://www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview
Digital Foundry: So you didn't want to go for a daughter die as you did with Xbox 360?
Nick Baker: No, we wanted a single processor, like I said. If there'd been a different time frame or technology options we could maybe have had a different technology there but for the product in the timeframe, ESRAM was the best choice.
Digital Foundry: If we look at the ESRAM, the Hot Chips presentation revealed for the first time that you've got four blocks of 8MB areas. How does that work?
Nick Baker: First of all, there's been some question about whether we can use ESRAM and main RAM at the same time for GPU and to point out that really you can think of the ESRAM and the DDR3 as making up eight total memory controllers, so there are four external memory controllers (which are 64-bit) which go to the DDR3 and then there are four internal memory controllers that are 256-bit that go to the ESRAM. These are all connected via a crossbar and so in fact it will be true that you can go directly, simultaneously to DRAM and ESRAM.
Digital Foundry: Simultaneously? Because there's been a lot of controversy that you're adding your bandwidth together and that you can't do this in a real-life scenario.
Nick Baker: Over that interface, each lane - to ESRAM is 256-bit making up a total of 1024 bits and that's in each direction. 1024 bits for write will give you a max of 109GB/s and then there's separate read paths again running at peak would give you 109GB/s. What is the equivalent bandwidth of the ESRAM if you were doing the same kind of accounting that you do for external memory... With DDR3 you pretty much take the number of bits on the interface, multiply by the speed and that's how you get 68GB/s. That equivalent on ESRAM would be 218GB/s. However, just like main memory, it's rare to be able to achieve that over long periods of time so typically an external memory interface you run at 70-80 per cent efficiency.
The same discussion with ESRAM as well - the 204GB/s number that was presented at Hot Chips is taking known limitations of the logic around the ESRAM into account. You can't sustain writes for absolutely every single cycle. The writes is known to insert a bubble [a dead cycle] occasionally... One out of every eight cycles is a bubble, so that's how you get the combined 204GB/s as the raw peak that we can really achieve over the ESRAM. And then if you say what can you achieve out of an application - we've measured about 140-150GB/s for ESRAM. That's real code running. That's not some diagnostic or some simulation case or something like that. That is real code that is running at that bandwidth. You can add that to the external memory and say that that probably achieves in similar conditions 50-55GB/s and add those two together you're getting in the order of 200GB/s across the main memory and internally.
One thing I should point out is that there are four 8MB lanes. But it's not a contiguous 8MB chunk of memory within each of those lanes. Each lane, that 8MB is broken down into eight modules. This should address whether you can really have read and write bandwidth in memory simultaneously. Yes you can there are actually a lot more individual blocks that comprise the whole ESRAM so you can talk to those in parallel and of course if you're hitting the same area over and over and over again, you don't get to spread out your bandwidth and so that's why one of the reasons why in real testing you get 140-150GB/s rather than the peak 204GB/s is that it's not just four chunks of 8MB memory. It's a lot more complicated than that and depending on how the pattern you get to use those simultaneously. That's what lets you do read and writes simultaneously. You do get to add the read and write bandwidth as well adding the read and write bandwidth on to the main memory. That's just one of the misconceptions we wanted to clean up.
Andrew Goossen: If you're only doing a read you're capped at 109GB/s, if you're only doing a write you're capped at 109GB/s. To get over that you need to have a mix of the reads and the writes but when you are going to look at the things that are typically in the ESRAM, such as your render targets and your depth buffers, intrinsically they have a lot of read-modified writes going on in the blends and the depth buffer updates. Those are the natural things to stick in the ESRAM and the natural things to take advantage of the concurrent read/writes.
Digital Foundry: So 140-150GB/s is a realistic target and you can integrate DDR3 bandwidth simultaneously?
Nick Baker: Yes. That's been measured.
Andrew Goossen: Of course with Xbox One we're going with a design where ESRAM has the same natural extension that we had with eDRAM on Xbox 360, to have both going concurrently. It's a nice evolution of the Xbox 360 in that we could clean up a lot of the limitations that we had with the eDRAM. The Xbox 360 was the easiest console platform to develop for, it wasn't that hard for our developers to adapt to eDRAM, but there were a number of places where we said, "Gosh, it would sure be nice if an entire render target didn't have to live in eDRAM," and so we fixed that on Xbox One where we have the ability to overflow from ESRAM into DDR3 so the ESRAM is fully integrated into our page tables and so you can kind of mix and match the ESRAM and the DDR memory as you go.
Sometimes you want to get the GPU texture out of memory and on Xbox 360 that required what's called a "resolve pass" where you had to do a copy into DDR to get the texture out - that was another limitation we removed in ESRAM, as you can now texture out of ESRAM if you want to. From my perspective it's very much an evolution and improvement - a big improvement - over the design we had with the Xbox 360. I'm kind of surprised by all this, quite frankly.
Digital Foundry: Obviously though, you are limited to just 32MB of ESRAM. Potentially you could be looking at say, four 1080p render targets, 32 bits per pixel, 32 bits of depth - that's 48MB straight away. So are you saying that you can effectively separate render targets so that some live in DDR3 and the crucial high-bandwidth ones reside in ESRAM?
Andrew Goossen: Oh, absolutely. And you can even make it so that portions of your render target that have very little overdraw... For example, if you're doing a racing game and your sky has very little overdraw, you could stick those subsets of your resources into DDR to improve ESRAM utilisation. On the GPU we added some compressed render target formats like our 6e4 [six bit mantissa and four bits exponent per component] and 7e3 HDR float formats [where the 6e4 formats] that were very, very popular on Xbox 360, which instead of doing a 16-bit float per component 64pp render target, you can do the equivalent with us using 32 bits - so we did a lot of focus on really maximizing efficiency and utilisation of that ESRAM.










