By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Gaming Discussion - Shinen is using triple buffering for the gbuffer on Fast Racing Neo, bandwidth is not a problem

nintendo should buy these guys. they understand how to push hardware and workaround problems with the hardware, just like rare did with DKC. and they work realy cheap, and that in a high wage land like germany.
and they could expand easyly because they could grab the most talente german guys(there is no big competition in germany in the gaming biz, no big studios(soon))



Around the Network
generic-user-1 said:
nintendo should buy these guys. they understand how to push hardware and workaround problems with the hardware, just like rare did with DKC. and they work realy cheap, and that in a high wage land like germany.
and they could expand easyly because they could grab the most talente german guys(there is no big competition in germany in the gaming biz, no big studios(soon))


Shin'en and their games are far too small for Nintendo to be interested in buying them, even if they were willing to sell which I doubt they seem like a studio doing it for passion more than anything and I doubt they would last long in a corprate structure. If Nintendo wanted to start building a new dev studio they could without buying anyone. But they don't seem interested in that either.

As for Germany there is actually a pretty big development scene there. Including Crytek's main studio with 400+ employees, Blue Byte (Ubisoft studio develops the Anno series), King.com (makers of Candy Crush they have 600 employees), Zynga Germany, Yager Development (Spec Ops: The Line, Dead Island 2), Deck 13 (Lords of the Fallen), Limbic Entertainment (Might & Magic games). Plus a bunch of smaller studios like Black Forrest Games, Egosoft, Giants Software etc.



@TheVoxelman on twitter

Check out my hype threads: Cyberpunk, and The Witcher 3!

Pemalite said:
curl-6 said:
fatslob-:O said:
curl-6 said:

I am aware it is a technique. One that makes it more efficient to do some things. It's not so much that it's a feat, more that it's something Wii U seems better equipped for than last gen consoles.

"Better equipped" is questionable since a G-buffer which consists of multiple render targets doesn't exactly have a very small footprint to be able to fit all of the render targets on the eDRAM and the WII U doesn't exactly have a very high memory bandwidth either to help the situation ... 

Better than the situation on 360 where there was only 10MB of eDRAM.

Shin'en reckon the multiple render targets fit into 32MB just fine.


I was nagged at to see this thread via Steam.

Of course it would be.
You can do multiple render targets that fit just fine into 10Mb of eDRAM, you could also fit a 30Mb G-buffer into 10Mb of eDRAM.

Sounds impossible right?

Not exactly. - You can do multiple passes and you can do tiling.

After the first tile of the G-buffer is created, you can keep the Z in-place and then proceed to light it, then you can move onto the next tile, then the next and the next... This method allows you to minimse the overhead of reloads, provided if you aren't doing anything with neighbouring pixels.

From what I can recall, Xenos will have a bandwidth cap of roughly 16GB/s during this process, hence with a 30MB G-Buffer, that would take roughly 2 ms, which is more than do-able on the Xbox 360 and with just 10Mb of eDRAM.

The WiiU just requires far less trickery to achieve the same thing, but don't think that functionally it's superior, because it's not.

Also Megafenix, after all this time you really should stop posting information that you clearly have zero idea about, you LITERALLY have no idea if it's even correct or not or what half the stuff does.

Also, cats.

its true that multipass is an option, but multipasses put to much work on the gpu comapred to single pass

http://www.orpheuscomputing.com/downloads/ATI-smartshader.pdf

"

The key improvements offered by ATI’s SMARTSHADER™ technology over existing hardware vertex and pixel
shader implementations are:
• Support for up to six textures in a single rendering pass, allowing more complex effects to be achieved
without the heavy memory bandwidth requirements and severe performance impact of multi-pass
rendering

Every time a pixel passes through the rendering
pipeline, it consumes precious memory bandwidth as data is read from and written to texture
memory, the depth buffer, and the frame buffer. By decreasing the number of times each pixel on
the screen has to pass through the rendering pipeline, memory bandwidth consumption can be
reduced and the performance impact of using pixel shaders can be minimized. DirectX® 8.1 pixel
shaders allow up to six textures to be sampled and blended in a single rendering pass. This
means effects that required multiple rendering passes in earlier versions of DirectX® can now be
processed in fewer passes, and effects that were previously too slow to be useful can become
more practical to implement

"

http://books.google.com.mx/books?id=BV8MeSkHaD4C&pg=PA64&lpg=PA64&dq=multi+pass+rendering+vs+single+pass&source=bl&ots=oUGfAJqSQO&sig=2y6jekyjXj1FpAxE8wICmzw5B1E&hl=es&sa=X&ei=vyNNVOPsPKHmiQLFvIDICQ&ved=0CDwQ6AEwBA#v=onepage&q=multi%20pass%20rendering%20vs%20single%20pass&f=false

 

As you can read form the deveopers diary, they implemented deffered rendering using 5 spus of the ps3(there are only 8 and one of them is not available for games) and parallelism of the xbox 360 gpu+cpu work, the key adventage of g buffers is that you use few shader power but here ps3 and 360 had to put a lot pressure on the hardware to achieve it which almost nulls one of the adventages of the technique due to the lack of memory bandwidth. Since wii u edram has plenty of bandwidth it doenst need to use multipass or to much shader power to acheive the technique and has lot better performance. On 360 i could bet that besides the pressure on the gpu the 720p was done with a single buffer and not double buffering since that way you would reduce the memory consumption of the edram to 5MB and with some trickery use the rest for the deffered rendering(normally 12MB of edram would do but there are not there)

You can read here the problems developers went to implement deffered rendering on the last generation consoles, and obviously requirements like needing 5 spus out of the 8 existent ones is not very cheap, and using parallelism on the 360 gpu plus the help of the cpu isnt very cheap either

 

 

http://webstaff.itn.liu.se/~perla/Siggraph2011/content/talks/18-ferrier.pdf

 

 

In wii u you wont have to do this and the primary adventage of the deffered rendering will be available(use few shader power by trading bandwidth), triple framebuffers of 720p is only 10.8MB where in 360 10MB was barely enough for the 720p with double buffering, and as you can read in the first artlicle, developers wnated to use the deffered rendering on 360 but they needed 12MB of edram that were not present(thats why sometime later developers used the trick found in the second artlicle), in wiius case making some calculations its likely that a gbuffer would take out about 8.64MB of edram, combining that with the triple framebuffer is only about 19.44MB which leaves 12.6MB of edram plus the extra 3MB of faster edram and sram, thats why besides the triple framebuffering and gbuffer shinen still is able to fit some intermediate buffers there

 

As for the tiling technology, yea its something pretty cool and something tells me that shinen is using it in fast racing neo, looking at the terrain here

loos like the terrain is composed of tiles, i bet they used it since is impossible to fit the 4k-8k textures on the texture memory(even bc1 compression gives you 10MB storage for each texture) and so deviding the textures into tiles makes possible to use the texture cache or texture memory

http://books.google.com.mx/books?id=bmv2HRpG1bUC&pg=PA281&lpg=PA281&dq=tiles+textures+tessellation&source=bl&ots=6hOJ8zd7wA&sig=mtlU58XVFicKUMz5klAr4cDRX9w&hl=es&sa=X&ei=TPQ3VNKyCtKRNs7vgqAD&ved=0CFwQ6AEwCw#v=onepage&q=tiles%20textures%20tessellation&f=false

"



You are confusing completely unrelated things again. processing multiple texture layers in a single pass is a very old feature that all modern hardware has and all games utilize. Multi pass rendering is completely different and unrelated. Differed rendering is a multi pass rendering technique, the whole point is to separate geometry processing and shading to separate passes.

Tiled textures is also a completely different thing to tiled rendering. Most games since the dawn of 3D graphics have used repeated textures to saver production resources as well as for performance reasons. But that is not what tiled rendering is. Tiled rendering splits up the framebuffer into smaller sections, and then renders each section separately.



@TheVoxelman on twitter

Check out my hype threads: Cyberpunk, and The Witcher 3!

zarx said:
You are confusing completely unrelated things again. processing multiple texture layers in a single pass is a very old feature that all modern hardware has and all games utilize. Multi pass rendering is completely different and unrelated. Differed rendering is a multi pass rendering technique, the whole point is to separate geometry processing and shading to separate passes.

Tiled textures is also a completely different thing to tiled rendering. Most games since the dawn of 3D graphics have used repeated textures to saver production resources as well as for performance reasons. But that is not what tiled rendering is. Tiled rendering splits up the framebuffer into smaller sections, and then renders each section separately.


i wasnt talking about mutipass with tiled textures, i was talking about tiled textures, the deffered rendering and multipass in three separate topics

1.-multipass rendeinring as the name suggests requires the use of the pipeline multiple times for one work and single pass you use it one time so obviously that means less use of the ipeline to achieve a task, in the sdk information they tell us that 1080p or 720p can be done on a single pass. I woudnt call singl pass a old feature since was implemented on directx 8 and wasnt avalable before it

http://www.orpheuscomputing.com/downloads/ATI-smartshader.pdf

"

The key improvements offered by ATI’s SMARTSHADER™ technology over existing hardware vertex and pixel
shader implementations are:
• Support for up to six textures in a single rendering pass, allowing more complex effects to be achieved
without the heavy memory bandwidth requirements and severe performance impact of multi-pass
rendering

Every time a pixel passes through the rendering
pipeline, it consumes precious memory bandwidth as data is read from and written to texture
memory, the depth buffer, and the frame buffer. By decreasing the number of times each pixel on
the screen has to pass through the rendering pipeline, memory bandwidth consumption can be
reduced and the performance impact of using pixel shaders can be minimized. DirectX® 8.1 pixel
shaders allow up to six textures to be sampled and blended in a single rendering pass. This
means effects that required multiple rendering passes in earlier versions of DirectX® can now be
processed in fewer passes, and effects that were previously too slow to be useful can become
more practical to implement

"

2.-For the deffered rendering i just provided the article about the limitations on 360 sicne they needed 12MB gbuffer and since they didnt have it in the second article is explained that they had to use gpu parallelism and+cpu while the ps3 had to use 5spus out of the 8 that ps3 has(one spu is for the SO); doing it that way they can achieve the technique but also consumes large amount of rendering power, maybe still less than forward rendering but perfromance still wont be as good if they had enough memory bandwidth to do it the normal way. Wii U has enough bandwidth to sore triple framebuffers on 10.8MB and for the gbuffer would be about 8.64MB, with that in mind its clear that the deffered rendering on wii u will not take to much shader power as it took in last generation consoles and also will be better than using forward rendering, i wont deny that amost 9MB for the gbuffer is a lot bandwidth consumption, but seeing that triple frembuffers of 720p are just 10.8MB that still leaves edram room for other things and by trading bandwidth we can save up a lot shader power for other rendering purposes

yes is true that deffered rendering is multipass, but compared to forward rendering requires less passes, in fact only requires two passes while forward rendering would require that or more for the complex material/light combination

https://hacks.mozilla.org/2014/01/webgl-deferred-shading/

"

Forward rendering

Today, most WebGL engines use forward shading, where lighting is computed in the same pass that geometry is transformed. This makes it difficult to support a large number of dynamic lights and different light types.

Forward shading can use a pass per light. Rendering a scene looks like:

This requires a different shader for each material/light-type combination, which adds up. From a performance perspective, each mesh needs to be rendered (vertex transform, rasterization, material part of the fragment shader, etc.) once per light instead of just once. In addition, fragments that ultimately fail the depth test are still shaded, but with early-z and z-cull hardware optimizations and a front-to-back sorting or a z-prepass, this not as bad as the cost for adding lights.

To optimize performance, light sources that have a limited effect are often used. Unlike real-world lights, we allow the light from a point source to travel only a limited distance. However, even if a light’s volume of effect intersects a mesh, it may only affect a small part of the mesh, but the entire mesh is still rendered.

In practice, forward shaders usually try to do as much work as they can in a single pass leading to the need for a complex system of chaining lights together in a single shader. For example:

The biggest drawback is the number of shaders required since a different shader is required for each material/light (not light type) combination. This makes shaders harder to author, increases compile times, usually requires runtime compiling, and increases the number of shaders to sort by. Although meshes are only rendered once, this also has the same performance drawbacks for fragments that fail the depth test as the multi-pass approach.

 

Deferred Shading

Deferred shading takes a different approach than forward shading by dividing rendering into two passes: the g-buffer pass, which transforms geometry and writes positions, normals, and material properties to textures called the g-buffer, and the light accumulation pass, which performs lighting as a series of screen-space post-processing effects.

This decouples lighting from scene complexity (number of triangles) and only requires one shader per material and per light type. Since lighting takes place in screen-space, fragments failing the z-test are not shaded, essentially bringing the depth complexity down to one. There are also downsides such as its high memory bandwidth usage and making translucency and anti-aliasing difficult.

Until recently, WebGL had a roadblock for implementing deferred shading. In WebGL, a fragment shader could only write to a single texture/renderbuffer. With deferred shading, the g-buffer is usually composed of several textures, which meant that the scene needed to be rendered multiple times during the g-buffer pass.

"

 

But forward rendering requires a pass for each object or light, now thats a waste of shader power compared to the deffered rendering

http://www.cse.chalmers.se/edu/year/2011/course/TDA361/Advanced%20Computer%20Graphics/DeferredRenderingPresentation.pdf

"

Forward rendering

• Traditional method

• Single pass

– For each object

• Find all lights affecting object

• Render all lighting and material in a single shader

– Shader for each material vs. light setup combination

Wasted shader cycles

• Invisible surfaces / overdraw

• Triangles outside light influence

Most of the text in this slide is extracted from a presentation by GUERRILLA GAMES 

from DEVELOP CONFERENCE, JULY ’07, BRIGHTONForward rendering (cont.)

Forward rendering (cont.)

• Solution to material/light combination issue

• Multi-pass

– For each light

• For each object

– Add lighting from single light to frame buffer

– Shader for each material and light type

– Wasted shader cycles

• Invisible surfaces / overdraw

• Triangles outside light influence

• Lots of repeated work

– Full vertex shaders, texture filtering

 

Deferred Rendering

1. For each object
– Render surface properties into the G-Buffer
2. For each light and lit pixel
– Use G-Buffer to compute lighting
– Add result to frame buffer
• 3. Render Transparent Stuff (using forward 
rendering)
Deferred Rendering Pro
• Complexity
• Shades only visible pixels
• Few shaders
• Post-processing stuff ready
• Lots and lots of Lights!
Deferred Rendering Con
• Lots of memory
• Bandwidth!
• Transparency
– G-buffers store one value per pixel
• Antialiasing

"

 

3.-As for the tiled textures, well, i just mentioned that sicne shinen is using 4k-8k textures and even with BC1 compression they are still to bif to fit on texture memory or in the edram its possible they devided the textures into tiles so that they could fit them in texture memory; i am not sure if they are using it or not but the first image of fast racing neo seems to tell us they are indeed using texture tiles for the terrain at least



Around the Network
Pemalite said:


I was nagged at to see this thread via Steam.

Of course it would be.
You can do multiple render targets that fit just fine into 10Mb of eDRAM, you could also fit a 30Mb G-buffer into 10Mb of eDRAM.

Sounds impossible right?

Not exactly. - You can do multiple passes and you can do tiling.

After the first tile of the G-buffer is created, you can keep the Z in-place and then proceed to light it, then you can move onto the next tile, then the next and the next... This method allows you to minimse the overhead of reloads, provided if you aren't doing anything with neighbouring pixels.

From what I can recall, Xenos will have a bandwidth cap of roughly 16GB/s during this process, hence with a 30MB G-Buffer, that would take roughly 2 ms, which is more than do-able on the Xbox 360 and with just 10Mb of eDRAM.

The WiiU just requires far less trickery to achieve the same thing, but don't think that functionally it's superior, because it's not.

Also Megafenix, after all this time you really should stop posting information that you clearly have zero idea about, you LITERALLY have no idea if it's even correct or not or what half the stuff does.

Also, cats.

Do note that multipassing can consume tons of bandwidth ... 

What about cats ?



fatslob-:O said:
Pemalite said:


I was nagged at to see this thread via Steam.

Of course it would be.
You can do multiple render targets that fit just fine into 10Mb of eDRAM, you could also fit a 30Mb G-buffer into 10Mb of eDRAM.

Sounds impossible right?

Not exactly. - You can do multiple passes and you can do tiling.

After the first tile of the G-buffer is created, you can keep the Z in-place and then proceed to light it, then you can move onto the next tile, then the next and the next... This method allows you to minimse the overhead of reloads, provided if you aren't doing anything with neighbouring pixels.

From what I can recall, Xenos will have a bandwidth cap of roughly 16GB/s during this process, hence with a 30MB G-Buffer, that would take roughly 2 ms, which is more than do-able on the Xbox 360 and with just 10Mb of eDRAM.

The WiiU just requires far less trickery to achieve the same thing, but don't think that functionally it's superior, because it's not.

Also Megafenix, after all this time you really should stop posting information that you clearly have zero idea about, you LITERALLY have no idea if it's even correct or not or what half the stuff does.

Also, cats.

Do note that multipassing can consume tons of bandwidth ... 


thats true, and thats why trying to use single pass should be a priority although is not always possible

Talking about bandwidth, How much bandwidth would it take for the deffered rendering on the ps3 if developers had to use 5 spus and obviously the internal memory banwidth?

each spu had access to internal memory of 256kb and they say that each of them was about 300GB/s of bandwidth

http://www.zdnet.com/blog/storage/build-an-8-ps3-supercomputer/220

 

One of the games that used deffered rendering on ps3 was killzone 2 and they say it used about 60% of the ps3 power, considering that it used deffered rendering and that the technique requires 5 spus then its credible they indeed used 60% or more

 

developers really did good job even eith the limitations on the hardwrae, but they put to much pressure on it to achieve things that in this new generation will have less performance hit on the hardware thank to theadditional memory bandwidth and the new features on the gpu and cpu



Pemalite said:


Then it all died out for awhile, probably because the PC was advancing rapidly and the Xbox 360 and Playstation 3 launched, so it was easier/cheaper for developers to ditch the tiled rendering so they can push out games faster.

Of course, things eventually changed, mobile came storming into the market with ARM Mali, PowerVR,
Qualcomm's Adreno (Aka. Mobile Radeon) all pushing tiled approaches due to power and efficiency.
Then the Xbox 360 and Playstation 3 simply got old, people still demanded better graphics, so tiled rendering was a solution once again and it seems to have continued to carry onwards, which is a good sign.

That's half the story but ...

They died out because they couldn't solve the hardware accelerated transform and lighting issue fast enough. 

After having added shaders to their design they always made the most sense in the mobile space where bandwidth was sparse plus I don't think the Adreno features tile based rendering. 

I'm not so sure that tile based rendering was the solution to push for better graphics. It's really good for the purpose of geometry binning but that's about it though. I almost forgot but tile based rendering also came back to desktops with the 2nd gen Nvidia Maxwell and Intel Haswell featuring it! Both of those GPU architectures support conservative rasterization which enables programmable binning so it's possible to do some tile based rendering and maybe AMD can support conservative rasterization too with their GCN GPUs just like how their hardware always supported volume tiled resources as well as pixel shader ordering ...



What?



Kuksenkov said:
What?


Since mobile harware is pretty limited specially in bandwidth disposition, gpu maker like amd and other have designed tile based architectures to overcome the problem

http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/gdc2008_ribble_maurice_TileBasedGpus.pdf

"

Next-Gen 

Tile-Based GPUs

Why is TBR so Popular in 

Embedded Devices?

• Reduced bus bandwidth

• Saves power

• Allows for simpler system designs

• Desktop PC’s brute force approach doesn’t work 

as well in the mobile space

• Lower polygon counts in mobile games are 

an ideal match for TBR

"