By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Microsoft Discussion - sebbbi (Xbox One programmer) on B3D lays out effective use of ESRAM (technical)

Madword said:
Getting the impression that this is the new secret sauce.





Around the Network
fallen said:
starworld said:
fallen said:

Thought this was fun though technical. Seems proof Xone can be a beast.

 

First MJP, who works at Ready at Dawn (The Order 1886 guys) had this to say, seemingly painting a difficult picture of working with the ESRAM, specifically fitting everything in 32 megabytes(Though he admits he has never worked with Xbox One, only PS4)

 

Deferred renderers are *very* common for next-gen titles, especially those in development. With the major middleware providers all using deferred rendering, games using forward rendering are very likely to be the minority from this point on (even considering games using Forward+/Tiled Forward/whatever you want to call it).

Back when we were in the prototype stage we were using a deferred renderer, with a tiled compute-based approach similar to what Frostbite uses. At the time we had a G-Buffer setup like this:

Lighting target: RGBA16f
Normals: RG16
Diffuse albedo + BRDF ID: RGBA8
Specular albedo + roughness: RGBA8
Tangents: RG16
Depth: D32

So if you're looking to target 1920x1080 with that setup, then you're talking about (8 + 4 + 4 + 4 + 4 + 4) * 1920 * 1080 = 55.3MB. On top of that we supported 16 shadow-casting lights which required 16 1024x1024 shadow maps in an array, plus 4 2048x2048 cascades for a directional light. That gives you 64MB of shadow maps + another 64MB of cascade maps, which you'll want to be reading from at the same time you're reading from your G-Buffers. Obviously some of these numbers are pretty extreme (we were still prototyping) and you could certainly reduce that a lot, but I wanted to give an idea of the upper bound on what an engine might want to be putting in ESRAM for their main render pass. However even without the shadows it doesn't really bode well for fitting all of your G-Buffers in 32MB at 1080p. Which means either decreasing resolution, or making some tough choices about which render targets (or which portions of render targets, if using tiled rendering) should live in ESRAM. Any kind of MSAA at 1080p also seems like a no-go for fitting in ESRAM, even for forward rendering. Just having a RGBA16f target + D32 depth buffer at 2xMSAA requires around 47.5MB at 1920x1080.

 

Then Sebbbi, who works on the Trials HD series, responds

MJPs g-buffer layout is actually only two RTs in the g-buffer rendering stage and one RT in the lighting stage. And a depth buffer of course. Quite normal stuff.

On GCN you want to pack your data to 64 bpp (4 x 16 bit integer) render targets because that doubles your fill rate compared to using more traditional 32 bpp RTs (GCN can do 64 bit filling at same ROP rate as 32 bit filling).

I assume that the packing is like this:
Gbuffer1 = normals + tangents (64 bit)
Gbuffer2 = diffuse + brdf + specular + roughness (64 bits)
Depth buffer (32 bits)

Without any modifications this takes 40 megabytes of memory (1080p).

The lighting step doesn't need extra 8 MB for the 4x16f RT, because compute shader can simultaneously read and write to the same resource, allowing you to to lighting "in-place", writing the output over the existing g-buffer. This is also very cache friendly since the read pulls the cache lines to L1 and the write thus never misses L1 (GCN has fully featured read & write caches).

It's also trivial to get this layout down to 32 MB from the 40 MB. Replace gbuffer1 with a 32 bit RT (32 MB target reached at 1080p). Store normal as 11+11 bit using lambert azimuth equal area projection. You can't see any quality difference. 5+5 bits for tangents is enough (4 bits for exponent = mip level + 1 bit mantissa). 11+11+5+5=32. Also if you only use the tangents for shadow mapping / other planar projections, you don't need them at all, since you can analytically calculate the derivatives from the stored normal vector.

This layout is highly efficient for both g-buffer rendering and lighting. And of course also for post processing since all your heavy data fits in the fast memory. Shadow maps obviously need to be sampled from main memory during the lighting, but this is actually a great idea since the lighting pass woudn't otherwise use any main memory BW at all (it would be completely unused = wasted).

A good exchange and an indication to me that the ESRAM can do nice things if worked with. It seems like MJP thought there would be memory issues with 32MB ESRAM at 1080P, but Sebbbi shows with clever optimzation it all MJP's example can be nicely packed in32

LINK please.

http://forum.beyond3d.com/showthread.php?t=61416&page=8

the posts are on that page


well this good news for x1 owners but if 1080p games start being the norm, but it seems developers haven't mastered this techinque or maybe the GPu differnce is so huge it doesn't matter?



Esram is a bottleneck and does make things harder, but it's not the only thing limiting XBONE games.

If it was just a problem fitting the frame buffer in RAM, the frame rate would be all but constant. Take Ryse. It was a launch title so they could only fit a 900p buffer, but with bigger esram the game would be 1080p? Would the frame rate magically go up too? Ryse had dips to 20 fps at 900p so even with bigger ESRAM the game would be like 15 fps at 1080p.

People believing ESRAM is the only reason games are sub HD, can you counter my points?



I told you better sdk is going to help reach better resolution. Developers need to have better resource optimization and use the hardware properly if hardware components of xbox one used it can easily output 1080 60fps. With time and better directx Microsoft also said low level api will be coming which makes it better higher framerate(low level access api) proper resource optimization leads to 1080p.



NobleTeam360 said:
*Reads article not understanding anything they said*

Thank God I'm not the only one



Please excuse my bad English.

Currently gaming on a PC with an i5-4670k@stock (for now), 16Gb RAM 1600 MHz and a GTX 1070

Steam / Live / NNID : jonxiquet    Add me if you want, but I'm a single player gamer.

Around the Network

Or just use GDDR5 normally and have it still be faster...



Kratos said:
Madword said:
Getting the impression that this is the new secret sauce.






LemonSlice said:
Kratos said:
Madword said:
Getting the impression that this is the new secret sauce.




Those are the not-so-secret sauces - the 'in your face sauce' and the 'giant sauce in a suit.'



I've said it before and I'll say it again. ESRAM can be worked around and some unique benefits can be drawn from it but the Xbone GPU is a good deal weaker than the PS4 version. That will always be the case.



JoeTheBro said:
Esram is a bottleneck and does make things harder, but it's not the only thing limiting XBONE games.

If it was just a problem fitting the frame buffer in RAM, the frame rate would be all but constant. Take Ryse. It was a launch title so they could only fit a 900p buffer, but with bigger esram the game would be 1080p? Would the frame rate magically go up too? Ryse had dips to 20 fps at 900p so even with bigger ESRAM the game would be like 15 fps at 1080p.

People believing ESRAM is the only reason games are sub HD, can you counter my points?


You obviously know a lot more than I do about coding and whatnot, but the ryse thing is just that it's a launch title. Launch titles always stink(compared to 2nd gen releases). Just look at a game like thief, runs horribly on ps4, doesn't look that great, but that has nothing to do with the power of the system, but the poor code.  Not saying the x1 will suddenly start having 1080p titles, but frame dips, and things like that are usually due to poor coding, or lack of time(launch titles). More ram on ps4 wouldn't have made thief run better either, right?