By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Microsoft Discussion - sebbbi (Xbox One programmer) on B3D lays out effective use of ESRAM (technical)

Thought this was fun though technical. Seems proof Xone can be a beast.

 

First MJP, who works at Ready at Dawn (The Order 1886 guys) had this to say, seemingly painting a difficult picture of working with the ESRAM, specifically fitting everything in 32 megabytes(Though he admits he has never worked with Xbox One, only PS4)

 

Deferred renderers are *very* common for next-gen titles, especially those in development. With the major middleware providers all using deferred rendering, games using forward rendering are very likely to be the minority from this point on (even considering games using Forward+/Tiled Forward/whatever you want to call it).

Back when we were in the prototype stage we were using a deferred renderer, with a tiled compute-based approach similar to what Frostbite uses. At the time we had a G-Buffer setup like this:

Lighting target: RGBA16f
Normals: RG16
Diffuse albedo + BRDF ID: RGBA8
Specular albedo + roughness: RGBA8
Tangents: RG16
Depth: D32

So if you're looking to target 1920x1080 with that setup, then you're talking about (8 + 4 + 4 + 4 + 4 + 4) * 1920 * 1080 = 55.3MB. On top of that we supported 16 shadow-casting lights which required 16 1024x1024 shadow maps in an array, plus 4 2048x2048 cascades for a directional light. That gives you 64MB of shadow maps + another 64MB of cascade maps, which you'll want to be reading from at the same time you're reading from your G-Buffers. Obviously some of these numbers are pretty extreme (we were still prototyping) and you could certainly reduce that a lot, but I wanted to give an idea of the upper bound on what an engine might want to be putting in ESRAM for their main render pass. However even without the shadows it doesn't really bode well for fitting all of your G-Buffers in 32MB at 1080p. Which means either decreasing resolution, or making some tough choices about which render targets (or which portions of render targets, if using tiled rendering) should live in ESRAM. Any kind of MSAA at 1080p also seems like a no-go for fitting in ESRAM, even for forward rendering. Just having a RGBA16f target + D32 depth buffer at 2xMSAA requires around 47.5MB at 1920x1080.

 

Then Sebbbi, who works on the Trials HD series, responds

MJPs g-buffer layout is actually only two RTs in the g-buffer rendering stage and one RT in the lighting stage. And a depth buffer of course. Quite normal stuff.

On GCN you want to pack your data to 64 bpp (4 x 16 bit integer) render targets because that doubles your fill rate compared to using more traditional 32 bpp RTs (GCN can do 64 bit filling at same ROP rate as 32 bit filling).

I assume that the packing is like this:
Gbuffer1 = normals + tangents (64 bit)
Gbuffer2 = diffuse + brdf + specular + roughness (64 bits)
Depth buffer (32 bits)

Without any modifications this takes 40 megabytes of memory (1080p).

The lighting step doesn't need extra 8 MB for the 4x16f RT, because compute shader can simultaneously read and write to the same resource, allowing you to to lighting "in-place", writing the output over the existing g-buffer. This is also very cache friendly since the read pulls the cache lines to L1 and the write thus never misses L1 (GCN has fully featured read & write caches).

It's also trivial to get this layout down to 32 MB from the 40 MB. Replace gbuffer1 with a 32 bit RT (32 MB target reached at 1080p). Store normal as 11+11 bit using lambert azimuth equal area projection. You can't see any quality difference. 5+5 bits for tangents is enough (4 bits for exponent = mip level + 1 bit mantissa). 11+11+5+5=32. Also if you only use the tangents for shadow mapping / other planar projections, you don't need them at all, since you can analytically calculate the derivatives from the stored normal vector.

This layout is highly efficient for both g-buffer rendering and lighting. And of course also for post processing since all your heavy data fits in the fast memory. Shadow maps obviously need to be sampled from main memory during the lighting, but this is actually a great idea since the lighting pass woudn't otherwise use any main memory BW at all (it would be completely unused = wasted).

A good exchange and an indication to me that the ESRAM can do nice things if worked with. It seems like MJP thought there would be memory issues with 32MB ESRAM at 1080P, but Sebbbi shows with clever optimzation it all MJP's example can be nicely packed in 32 MB's.



Around the Network

Getting the impression that this is the new secret sauce.



Making an indie game : Dead of Day!

fallen said:

Thought this was fun though technical. Seems proof Xone can be a beast.

 

First MJP, who works at Ready at Dawn (The Order 1886 guys) had this to say, seemingly painting a difficult picture of working with the ESRAM, specifically fitting everything in 32 megabytes(Though he admits he has never worked with Xbox One, only PS4)

 

Deferred renderers are *very* common for next-gen titles, especially those in development. With the major middleware providers all using deferred rendering, games using forward rendering are very likely to be the minority from this point on (even considering games using Forward+/Tiled Forward/whatever you want to call it).

Back when we were in the prototype stage we were using a deferred renderer, with a tiled compute-based approach similar to what Frostbite uses. At the time we had a G-Buffer setup like this:

Lighting target: RGBA16f
Normals: RG16
Diffuse albedo + BRDF ID: RGBA8
Specular albedo + roughness: RGBA8
Tangents: RG16
Depth: D32

So if you're looking to target 1920x1080 with that setup, then you're talking about (8 + 4 + 4 + 4 + 4 + 4) * 1920 * 1080 = 55.3MB. On top of that we supported 16 shadow-casting lights which required 16 1024x1024 shadow maps in an array, plus 4 2048x2048 cascades for a directional light. That gives you 64MB of shadow maps + another 64MB of cascade maps, which you'll want to be reading from at the same time you're reading from your G-Buffers. Obviously some of these numbers are pretty extreme (we were still prototyping) and you could certainly reduce that a lot, but I wanted to give an idea of the upper bound on what an engine might want to be putting in ESRAM for their main render pass. However even without the shadows it doesn't really bode well for fitting all of your G-Buffers in 32MB at 1080p. Which means either decreasing resolution, or making some tough choices about which render targets (or which portions of render targets, if using tiled rendering) should live in ESRAM. Any kind of MSAA at 1080p also seems like a no-go for fitting in ESRAM, even for forward rendering. Just having a RGBA16f target + D32 depth buffer at 2xMSAA requires around 47.5MB at 1920x1080.

 

Then Sebbbi, who works on the Trials HD series, responds

MJPs g-buffer layout is actually only two RTs in the g-buffer rendering stage and one RT in the lighting stage. And a depth buffer of course. Quite normal stuff.

On GCN you want to pack your data to 64 bpp (4 x 16 bit integer) render targets because that doubles your fill rate compared to using more traditional 32 bpp RTs (GCN can do 64 bit filling at same ROP rate as 32 bit filling).

I assume that the packing is like this:
Gbuffer1 = normals + tangents (64 bit)
Gbuffer2 = diffuse + brdf + specular + roughness (64 bits)
Depth buffer (32 bits)

Without any modifications this takes 40 megabytes of memory (1080p).

The lighting step doesn't need extra 8 MB for the 4x16f RT, because compute shader can simultaneously read and write to the same resource, allowing you to to lighting "in-place", writing the output over the existing g-buffer. This is also very cache friendly since the read pulls the cache lines to L1 and the write thus never misses L1 (GCN has fully featured read & write caches).

It's also trivial to get this layout down to 32 MB from the 40 MB. Replace gbuffer1 with a 32 bit RT (32 MB target reached at 1080p). Store normal as 11+11 bit using lambert azimuth equal area projection. You can't see any quality difference. 5+5 bits for tangents is enough (4 bits for exponent = mip level + 1 bit mantissa). 11+11+5+5=32. Also if you only use the tangents for shadow mapping / other planar projections, you don't need them at all, since you can analytically calculate the derivatives from the stored normal vector.

This layout is highly efficient for both g-buffer rendering and lighting. And of course also for post processing since all your heavy data fits in the fast memory. Shadow maps obviously need to be sampled from main memory during the lighting, but this is actually a great idea since the lighting pass woudn't otherwise use any main memory BW at all (it would be completely unused = wasted).

A good exchange and an indication to me that the ESRAM can do nice things if worked with. It seems like MJP thought there would be memory issues with 32MB ESRAM at 1080P, but Sebbbi shows with clever optimzation it all MJP's example can be nicely packed in 32 MB's.

LINK please.



Madword said:
Getting the impression that this is the new secret sauce.


No, this is called resources optimisations.



Madword said:
Getting the impression that this is the new secret sauce.


Well devs will have to master it if they want 1080P on Xbone. It may take some time. I for one am very interested to see how the big annual franchises (AC, COD, BF) compare in say, 3 years time. I suspect they'll all be 1080P on Xbox.

 

Yeah, you could say it's the closest thing to secret sauce Xbox has.



Around the Network
Madword said:
Getting the impression that this is the new secret sauce.


OT: Good news for the use ESRAm then.



PSN ID: clemens-nl                                                                                                                

This must be what that 1080p SDK helps with. I will be interested in seeing if that isn't just talk.



*Reads article not understanding anything they said*



starworld said:
fallen said:

Thought this was fun though technical. Seems proof Xone can be a beast.

 

First MJP, who works at Ready at Dawn (The Order 1886 guys) had this to say, seemingly painting a difficult picture of working with the ESRAM, specifically fitting everything in 32 megabytes(Though he admits he has never worked with Xbox One, only PS4)

 

Deferred renderers are *very* common for next-gen titles, especially those in development. With the major middleware providers all using deferred rendering, games using forward rendering are very likely to be the minority from this point on (even considering games using Forward+/Tiled Forward/whatever you want to call it).

Back when we were in the prototype stage we were using a deferred renderer, with a tiled compute-based approach similar to what Frostbite uses. At the time we had a G-Buffer setup like this:

Lighting target: RGBA16f
Normals: RG16
Diffuse albedo + BRDF ID: RGBA8
Specular albedo + roughness: RGBA8
Tangents: RG16
Depth: D32

So if you're looking to target 1920x1080 with that setup, then you're talking about (8 + 4 + 4 + 4 + 4 + 4) * 1920 * 1080 = 55.3MB. On top of that we supported 16 shadow-casting lights which required 16 1024x1024 shadow maps in an array, plus 4 2048x2048 cascades for a directional light. That gives you 64MB of shadow maps + another 64MB of cascade maps, which you'll want to be reading from at the same time you're reading from your G-Buffers. Obviously some of these numbers are pretty extreme (we were still prototyping) and you could certainly reduce that a lot, but I wanted to give an idea of the upper bound on what an engine might want to be putting in ESRAM for their main render pass. However even without the shadows it doesn't really bode well for fitting all of your G-Buffers in 32MB at 1080p. Which means either decreasing resolution, or making some tough choices about which render targets (or which portions of render targets, if using tiled rendering) should live in ESRAM. Any kind of MSAA at 1080p also seems like a no-go for fitting in ESRAM, even for forward rendering. Just having a RGBA16f target + D32 depth buffer at 2xMSAA requires around 47.5MB at 1920x1080.

 

Then Sebbbi, who works on the Trials HD series, responds

MJPs g-buffer layout is actually only two RTs in the g-buffer rendering stage and one RT in the lighting stage. And a depth buffer of course. Quite normal stuff.

On GCN you want to pack your data to 64 bpp (4 x 16 bit integer) render targets because that doubles your fill rate compared to using more traditional 32 bpp RTs (GCN can do 64 bit filling at same ROP rate as 32 bit filling).

I assume that the packing is like this:
Gbuffer1 = normals + tangents (64 bit)
Gbuffer2 = diffuse + brdf + specular + roughness (64 bits)
Depth buffer (32 bits)

Without any modifications this takes 40 megabytes of memory (1080p).

The lighting step doesn't need extra 8 MB for the 4x16f RT, because compute shader can simultaneously read and write to the same resource, allowing you to to lighting "in-place", writing the output over the existing g-buffer. This is also very cache friendly since the read pulls the cache lines to L1 and the write thus never misses L1 (GCN has fully featured read & write caches).

It's also trivial to get this layout down to 32 MB from the 40 MB. Replace gbuffer1 with a 32 bit RT (32 MB target reached at 1080p). Store normal as 11+11 bit using lambert azimuth equal area projection. You can't see any quality difference. 5+5 bits for tangents is enough (4 bits for exponent = mip level + 1 bit mantissa). 11+11+5+5=32. Also if you only use the tangents for shadow mapping / other planar projections, you don't need them at all, since you can analytically calculate the derivatives from the stored normal vector.

This layout is highly efficient for both g-buffer rendering and lighting. And of course also for post processing since all your heavy data fits in the fast memory. Shadow maps obviously need to be sampled from main memory during the lighting, but this is actually a great idea since the lighting pass woudn't otherwise use any main memory BW at all (it would be completely unused = wasted).

A good exchange and an indication to me that the ESRAM can do nice things if worked with. It seems like MJP thought there would be memory issues with 32MB ESRAM at 1080P, but Sebbbi shows with clever optimzation it all MJP's example can be nicely packed in32

LINK please.

http://forum.beyond3d.com/showthread.php?t=61416&page=8

the posts are on that page



EpicRandy said:
Madword said:
Getting the impression that this is the new secret sauce.


No, this is called resources optimisations.

I should have clarified, I mean from some posts I have read already on the internet forums like this is some holy grail that will make the console super powerful.

Resource optimisations will always happen, and games will improve over time... but its not going to make a weak machine more powerful.



Making an indie game : Dead of Day!