sebbbi (Xbox One programmer) on B3D lays out effective use of ESRAM (technical)

fallen

Banned

2,141

660 posts since 24/04/10

Recent Badges:

Trust Me, It'll Have Legs 100 replies made to user's most popular thread.
Hit And Run 15 comments posted on VGChartz news articles.
3 Years Has been a VGChartz member for over 3 years.
1st Birthday Has been a VGChartz member for over 1 year.
So You Came Back For More, Huh? Logged in a second time.
2 Years Has been a VGChartz member for over 2 years.

fallen on 26 February 2014

Thought this was fun though technical. Seems proof Xone can be a beast.

First MJP, who works at Ready at Dawn (The Order 1886 guys) had this to say, seemingly painting a difficult picture of working with the ESRAM, specifically fitting everything in 32 megabytes(Though he admits he has never worked with Xbox One, only PS4)

Deferred renderers are *very* common for next-gen titles, especially those in development. With the major middleware providers all using deferred rendering, games using forward rendering are very likely to be the minority from this point on (even considering games using Forward+/Tiled Forward/whatever you want to call it).

Back when we were in the prototype stage we were using a deferred renderer, with a tiled compute-based approach similar to what Frostbite uses. At the time we had a G-Buffer setup like this:

Lighting target: RGBA16f
Normals: RG16
Diffuse albedo + BRDF ID: RGBA8
Specular albedo + roughness: RGBA8
Tangents: RG16
Depth: D32

So if you're looking to target 1920x1080 with that setup, then you're talking about (8 + 4 + 4 + 4 + 4 + 4) * 1920 * 1080 = 55.3MB. On top of that we supported 16 shadow-casting lights which required 16 1024x1024 shadow maps in an array, plus 4 2048x2048 cascades for a directional light. That gives you 64MB of shadow maps + another 64MB of cascade maps, which you'll want to be reading from at the same time you're reading from your G-Buffers. Obviously some of these numbers are pretty extreme (we were still prototyping) and you could certainly reduce that a lot, but I wanted to give an idea of the upper bound on what an engine might want to be putting in ESRAM for their main render pass. However even without the shadows it doesn't really bode well for fitting all of your G-Buffers in 32MB at 1080p. Which means either decreasing resolution, or making some tough choices about which render targets (or which portions of render targets, if using tiled rendering) should live in ESRAM. Any kind of MSAA at 1080p also seems like a no-go for fitting in ESRAM, even for forward rendering. Just having a RGBA16f target + D32 depth buffer at 2xMSAA requires around 47.5MB at 1920x1080.

Then Sebbbi, who works on the Trials HD series, responds

MJPs g-buffer layout is actually only two RTs in the g-buffer rendering stage and one RT in the lighting stage. And a depth buffer of course. Quite normal stuff.

On GCN you want to pack your data to 64 bpp (4 x 16 bit integer) render targets because that doubles your fill rate compared to using more traditional 32 bpp RTs (GCN can do 64 bit filling at same ROP rate as 32 bit filling).

I assume that the packing is like this:
Gbuffer1 = normals + tangents (64 bit)
Gbuffer2 = diffuse + brdf + specular + roughness (64 bits)
Depth buffer (32 bits)

Without any modifications this takes 40 megabytes of memory (1080p).

The lighting step doesn't need extra 8 MB for the 4x16f RT, because compute shader can simultaneously read and write to the same resource, allowing you to to lighting "in-place", writing the output over the existing g-buffer. This is also very cache friendly since the read pulls the cache lines to L1 and the write thus never misses L1 (GCN has fully featured read & write caches).

It's also trivial to get this layout down to 32 MB from the 40 MB. Replace gbuffer1 with a 32 bit RT (32 MB target reached at 1080p). Store normal as 11+11 bit using lambert azimuth equal area projection. You can't see any quality difference. 5+5 bits for tangents is enough (4 bits for exponent = mip level + 1 bit mantissa). 11+11+5+5=32. Also if you only use the tangents for shadow mapping / other planar projections, you don't need them at all, since you can analytically calculate the derivatives from the stored normal vector.

This layout is highly efficient for both g-buffer rendering and lighting. And of course also for post processing since all your heavy data fits in the fast memory. Shadow maps obviously need to be sampled from main memory during the lighting, but this is actually a great idea since the lighting pass woudn't otherwise use any main memory BW at all (it would be completely unused = wasted).

A good exchange and an indication to me that the ESRAM can do nice things if worked with. It seems like MJP thought there would be memory issues with 32MB ESRAM at 1080P, but Sebbbi shows with clever optimzation it all MJP's example can be nicely packed in 32 MB's.

Madword

Currently Offline

11,267

3634 posts since 28/05/13

Recent Badges:

2 Years Has been a VGChartz member for over 2 years.
Vice Free Managed to avoid being banned for 6 months.
Littlest Genocide 1,000 posts on the gamrConnect forums.
10 Years Has been a VGChartz member for over 10 years.
One Piece at a Time Add your first game to your collection.
4 Years Has been a VGChartz member for over 4 years.

Madword on 26 February 2014

Getting the impression that this is the new secret sauce.

Making an indie game : Dead of Day!

starworld

Banned

593

180 posts since 18/02/14

Recent Badges:

Leaving Limbo 100 posts on the gamrConnect forums.

starworld on 26 February 2014

fallen said:

Thought this was fun though technical. Seems proof Xone can be a beast.

First MJP, who works at Ready at Dawn (The Order 1886 guys) had this to say, seemingly painting a difficult picture of working with the ESRAM, specifically fitting everything in 32 megabytes(Though he admits he has never worked with Xbox One, only PS4)

Deferred renderers are *very* common for next-gen titles, especially those in development. With the major middleware providers all using deferred rendering, games using forward rendering are very likely to be the minority from this point on (even considering games using Forward+/Tiled Forward/whatever you want to call it).

Back when we were in the prototype stage we were using a deferred renderer, with a tiled compute-based approach similar to what Frostbite uses. At the time we had a G-Buffer setup like this:

Lighting target: RGBA16f
Normals: RG16
Diffuse albedo + BRDF ID: RGBA8
Specular albedo + roughness: RGBA8
Tangents: RG16
Depth: D32

So if you're looking to target 1920x1080 with that setup, then you're talking about (8 + 4 + 4 + 4 + 4 + 4) * 1920 * 1080 = 55.3MB. On top of that we supported 16 shadow-casting lights which required 16 1024x1024 shadow maps in an array, plus 4 2048x2048 cascades for a directional light. That gives you 64MB of shadow maps + another 64MB of cascade maps, which you'll want to be reading from at the same time you're reading from your G-Buffers. Obviously some of these numbers are pretty extreme (we were still prototyping) and you could certainly reduce that a lot, but I wanted to give an idea of the upper bound on what an engine might want to be putting in ESRAM for their main render pass. However even without the shadows it doesn't really bode well for fitting all of your G-Buffers in 32MB at 1080p. Which means either decreasing resolution, or making some tough choices about which render targets (or which portions of render targets, if using tiled rendering) should live in ESRAM. Any kind of MSAA at 1080p also seems like a no-go for fitting in ESRAM, even for forward rendering. Just having a RGBA16f target + D32 depth buffer at 2xMSAA requires around 47.5MB at 1920x1080.

Then Sebbbi, who works on the Trials HD series, responds

MJPs g-buffer layout is actually only two RTs in the g-buffer rendering stage and one RT in the lighting stage. And a depth buffer of course. Quite normal stuff.

On GCN you want to pack your data to 64 bpp (4 x 16 bit integer) render targets because that doubles your fill rate compared to using more traditional 32 bpp RTs (GCN can do 64 bit filling at same ROP rate as 32 bit filling).

I assume that the packing is like this:
Gbuffer1 = normals + tangents (64 bit)
Gbuffer2 = diffuse + brdf + specular + roughness (64 bits)
Depth buffer (32 bits)

Without any modifications this takes 40 megabytes of memory (1080p).

The lighting step doesn't need extra 8 MB for the 4x16f RT, because compute shader can simultaneously read and write to the same resource, allowing you to to lighting "in-place", writing the output over the existing g-buffer. This is also very cache friendly since the read pulls the cache lines to L1 and the write thus never misses L1 (GCN has fully featured read & write caches).

It's also trivial to get this layout down to 32 MB from the 40 MB. Replace gbuffer1 with a 32 bit RT (32 MB target reached at 1080p). Store normal as 11+11 bit using lambert azimuth equal area projection. You can't see any quality difference. 5+5 bits for tangents is enough (4 bits for exponent = mip level + 1 bit mantissa). 11+11+5+5=32. Also if you only use the tangents for shadow mapping / other planar projections, you don't need them at all, since you can analytically calculate the derivatives from the stored normal vector.

This layout is highly efficient for both g-buffer rendering and lighting. And of course also for post processing since all your heavy data fits in the fast memory. Shadow maps obviously need to be sampled from main memory during the lighting, but this is actually a great idea since the lighting pass woudn't otherwise use any main memory BW at all (it would be completely unused = wasted).

A good exchange and an indication to me that the ESRAM can do nice things if worked with. It seems like MJP thought there would be memory issues with 32MB ESRAM at 1080P, but Sebbbi shows with clever optimzation it all MJP's example can be nicely packed in 32 MB's.

LINK please.

EpicRandy

Currently Offline

8,259

1307 posts since 13/06/13

Recent Badges:

10 Years Has been a VGChartz member for over 10 years.
Man or Robot? Managed to avoid being banned for 10 years.
Vice Free Managed to avoid being banned for 6 months.
Mirror Image Awarded for uploading an avatar.
A Civilized Man Managed to avoid being banned for 5 years.
A Badge Within A Badge Earned 20 badges.

EpicRandy on 26 February 2014

Madword said:
Getting the impression that this is the new secret sauce.

No, this is called resources optimisations.

fallen

Banned

2,141

660 posts since 24/04/10

Recent Badges:

3 Years Has been a VGChartz member for over 3 years.
Hit And Run 15 comments posted on VGChartz news articles.
So You Came Back For More, Huh? Logged in a second time.
'Ello Princess! Awarded for signing up.
Watch Your Back! Received 10,000 profile views.
Leaving Limbo 100 posts on the gamrConnect forums.

fallen on 26 February 2014

Madword said:
Getting the impression that this is the new secret sauce.

Well devs will have to master it if they want 1080P on Xbone. It may take some time. I for one am very interested to see how the big annual franchises (AC, COD, BF) compare in say, 3 years time. I suspect they'll all be 1080P on Xbox.

Yeah, you could say it's the closest thing to secret sauce Xbox has.

Kratos

Currently Offline

7,704

1603 posts since 07/08/07

Recent Badges:

Site Veteran Has been a VGChartz member for over 5 years.
11 Years Has been a VGChartz member for over 11 years.
14 Years Has been a VGChartz member for over 14 years.
Pata 100 wall post comments made on gamrConnect.
So You Came Back For More, Huh? Logged in a second time.
Hit And Run 15 comments posted on VGChartz news articles.

Kratos on 26 February 2014

Madword said:
Getting the impression that this is the new secret sauce.

OT: Good news for the use ESRAm then.

PSN ID: clemens-nl

VitroBahllee

Currently Offline

5,203

992 posts since 13/01/14

Recent Badges:

Watch Your Back! Received 10,000 profile views.
Site Veteran Has been a VGChartz member for over 5 years.
One Piece at a Time Add your first game to your collection.
6 Years Has been a VGChartz member for over 6 years.
Trust Me, It'll Have Legs 100 replies made to user's most popular thread.
9 Years Has been a VGChartz member for over 9 years.

VitroBahllee on 26 February 2014

This must be what that 1080p SDK helps with. I will be interested in seeing if that isn't just talk.

NobleTeam360

Currently Offline

70,455

19526 posts since 16/10/11

Recent Badges:

Talk of the Town 1,000 wall posts made on gamrConnect.
We're Still Watching You Received 50,000 profile views.
3 Years Has been a VGChartz member for over 3 years.
Killer Scorpion Earned 60 badges.
1st Birthday Has been a VGChartz member for over 1 year.
11 Years Has been a VGChartz member for over 11 years.

Currently Playing:

Final Fantasy VII Remake (PS4)
Final Fantasy XII: The Zodiac Age (PS4)
Tales of Berseria (PS4)
NieR: Automata (PS4)

NobleTeam360 on 26 February 2014

*Reads article not understanding anything they said*

fallen

Banned

2,141

660 posts since 24/04/10

Recent Badges:

So You Came Back For More, Huh? Logged in a second time.
'Ello Princess! Awarded for signing up.
3 Years Has been a VGChartz member for over 3 years.
1st Birthday Has been a VGChartz member for over 1 year.
Watch Your Back! Received 10,000 profile views.
Trust Me, It'll Have Legs 100 replies made to user's most popular thread.

fallen on 26 February 2014

starworld said:

fallen said:

Thought this was fun though technical. Seems proof Xone can be a beast.

First MJP, who works at Ready at Dawn (The Order 1886 guys) had this to say, seemingly painting a difficult picture of working with the ESRAM, specifically fitting everything in 32 megabytes(Though he admits he has never worked with Xbox One, only PS4)

Deferred renderers are *very* common for next-gen titles, especially those in development. With the major middleware providers all using deferred rendering, games using forward rendering are very likely to be the minority from this point on (even considering games using Forward+/Tiled Forward/whatever you want to call it).

Back when we were in the prototype stage we were using a deferred renderer, with a tiled compute-based approach similar to what Frostbite uses. At the time we had a G-Buffer setup like this:

Lighting target: RGBA16f
Normals: RG16
Diffuse albedo + BRDF ID: RGBA8
Specular albedo + roughness: RGBA8
Tangents: RG16
Depth: D32

So if you're looking to target 1920x1080 with that setup, then you're talking about (8 + 4 + 4 + 4 + 4 + 4) * 1920 * 1080 = 55.3MB. On top of that we supported 16 shadow-casting lights which required 16 1024x1024 shadow maps in an array, plus 4 2048x2048 cascades for a directional light. That gives you 64MB of shadow maps + another 64MB of cascade maps, which you'll want to be reading from at the same time you're reading from your G-Buffers. Obviously some of these numbers are pretty extreme (we were still prototyping) and you could certainly reduce that a lot, but I wanted to give an idea of the upper bound on what an engine might want to be putting in ESRAM for their main render pass. However even without the shadows it doesn't really bode well for fitting all of your G-Buffers in 32MB at 1080p. Which means either decreasing resolution, or making some tough choices about which render targets (or which portions of render targets, if using tiled rendering) should live in ESRAM. Any kind of MSAA at 1080p also seems like a no-go for fitting in ESRAM, even for forward rendering. Just having a RGBA16f target + D32 depth buffer at 2xMSAA requires around 47.5MB at 1920x1080.

Then Sebbbi, who works on the Trials HD series, responds

MJPs g-buffer layout is actually only two RTs in the g-buffer rendering stage and one RT in the lighting stage. And a depth buffer of course. Quite normal stuff.

On GCN you want to pack your data to 64 bpp (4 x 16 bit integer) render targets because that doubles your fill rate compared to using more traditional 32 bpp RTs (GCN can do 64 bit filling at same ROP rate as 32 bit filling).

I assume that the packing is like this:
Gbuffer1 = normals + tangents (64 bit)
Gbuffer2 = diffuse + brdf + specular + roughness (64 bits)
Depth buffer (32 bits)

Without any modifications this takes 40 megabytes of memory (1080p).

The lighting step doesn't need extra 8 MB for the 4x16f RT, because compute shader can simultaneously read and write to the same resource, allowing you to to lighting "in-place", writing the output over the existing g-buffer. This is also very cache friendly since the read pulls the cache lines to L1 and the write thus never misses L1 (GCN has fully featured read & write caches).

It's also trivial to get this layout down to 32 MB from the 40 MB. Replace gbuffer1 with a 32 bit RT (32 MB target reached at 1080p). Store normal as 11+11 bit using lambert azimuth equal area projection. You can't see any quality difference. 5+5 bits for tangents is enough (4 bits for exponent = mip level + 1 bit mantissa). 11+11+5+5=32. Also if you only use the tangents for shadow mapping / other planar projections, you don't need them at all, since you can analytically calculate the derivatives from the stored normal vector.

This layout is highly efficient for both g-buffer rendering and lighting. And of course also for post processing since all your heavy data fits in the fast memory. Shadow maps obviously need to be sampled from main memory during the lighting, but this is actually a great idea since the lighting pass woudn't otherwise use any main memory BW at all (it would be completely unused = wasted).

A good exchange and an indication to me that the ESRAM can do nice things if worked with. It seems like MJP thought there would be memory issues with 32MB ESRAM at 1080P, but Sebbbi shows with clever optimzation it all MJP's example can be nicely packed in32

LINK please.

http://forum.beyond3d.com/showthread.php?t=61416&page=8

the posts are on that page

Madword

Currently Offline

11,267

3634 posts since 28/05/13

Recent Badges:

So You Came Back For More, Huh? Logged in a second time.
Escape Artist Managed to avoid being banned for 1 month.
9 Years Has been a VGChartz member for over 9 years.
Open For Business Earned 10 badges.
Breaking Out Managed to avoid being banned for 1 year.
Vice Free Managed to avoid being banned for 6 months.

Madword on 26 February 2014

EpicRandy said:

Madword said:
Getting the impression that this is the new secret sauce.

No, this is called resources optimisations.

I should have clarified, I mean from some posts I have read already on the internet forums like this is some holy grail that will make the console super powerful.

Resource optimisations will always happen, and games will improve over time... but its not going to make a weak machine more powerful.

Making an indie game : Dead of Day!

Existing User Log In

New User Registration

Forums - Microsoft Discussion - sebbbi (Xbox One programmer) on B3D lays out effective use of ESRAM (technical)

Recent Badges:

Recent Badges:

Recent Badges:

Recent Badges:

Recent Badges:

Recent Badges:

Recent Badges:

Recent Badges:

Currently Playing:

Recent Badges:

Recent Badges: