By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Gaming Discussion - Technical breakdown on Next-Gen Game Consoles GPU

Hello everyone,

   This information is meant only for reference about PS4, Wii U and Xbox one GPU's complete technical details and i'll update this post as soon as I find out new details, so everybody can better understand and get some insight on the GPU's inside next gen consoles with respect to PC GPU's. These details are based on my research in the internet and links i've provided in the Sources, which we know about these so far. So, some of them may change in future and i'll update the post accordingly.

Comparison between PS4, Xbox One and Wii U GPU's
Technical details

Playstaion 4 GPU

Xbox One GPU Wii U GPU
Compute Units(CU)

18 CU (4 VU + 1 SU)

72 Vector and 18 Scalar Units

12 CU (4 VU + 1 SU)

48 Vector and 12 Scalar Units

4 SC (Shader Cores)

64 Shader Units (4SC x 16)

Shader/Stream Processors(SP)

1152 SP

(18CU x 64 [4VU x 16ALU])

768 SP

(12CU x 64 [4VU x 16ALU])

320 SP

(4SC x 80[5ALU x 16SU])

Texture Mapping units(TMU) 72 TMU (18CU x 4VU) 48 TMU (12CU x 4VU) 16 TMU (4SC x 4)
Raster back-ends(RB) 8 RB (Color/depth blocks) 4 RB (Color/depth blocks) 2 RB (Color/depth blocks)
Raster  operators(ROP) 32 ROP (8RB x 4/clock) 16 ROP (4RB x 4/clock) 8 ROP (2RB x 4/clock)

Pixel Fill rate

(GPU Clock x ROP)

25600 MPixels/sec

(800Mhz x 32ROP)

13648 MPixels/sec

(853Mhz x 16ROP)

4400 MPixels/sec

(550Mhz x 8 ROP)

Texture Fill rate

(GPU Clock x TMU)

57600 MTexels/sec

(800Mhz x 72TMU)

40944 MTexels/sec

(853Mhz x 48TMU)

8800 MTexels/sec

(550Mhz x 16 TMU)

FLOPS

(SP x G Clk x 2ops/cycle)

1843.2 GFLOPS

1310.2 GFLOPS

352 GFLOPS
GPU Clock 800 Mhz 853 Mhz 550 Mhz
Memory Clock 5500 Mhz GDDR5 2133 Mhz DDR3 1600 Mhz DDR3

Memory bandwidth

(M Clk x Bus wide/8)/1000

176 GB/sec Unified RAM

(5500Mhz x 256bit/8)/1000

68 GB/sec RAM + ESRAM of

109 GB/sec @853 Mhz(Uni)

204 GB/sec Peak(Bi)

12.8 GB/sec RAM +

Up to 1000 GB/sec EDRAM

Fabrication and architecture 28 nm, GCN and GCN 2.0 28 nm, GCN 40/45 nm, RV810[TS 2]
RAM 8GB GDDR5 8GB DDR3 + 32MB ESRAM 2GB DDR3 + 32MB EDRAM
Close Desktop GPU match Radeon HD 7870 (disabled 2CU)
Radeon HD 7790(disabled 2CU) ATI HD 5550

 

PS4 GPU:

It is derived based on the AMD's GCN architecture Southern Islands GPU with enhancements of liver pool architecture(GCN 2.0).

PS4: GPU Specifications:

Compute Units : 18 CU

Shader Processors : 1152 SP(18CU x 64 [4VU x 16ALU])

Texture Units : 72 TMU(18CU x 4VU)

Raster Back-end : 8 RB (Color/depth blocks)

Raster Operators : 32 ROP (8RB x 4/clock)

Memory Bus Type : 256 bit (8GB GDDR5)

Pixel Fill Rate : 25600 MPixels/sec (800Mhz x 32 ROP)

Texture Fill Rate : 57600 MTexels/sec (800Mhz x 72 TMU)

FLOPS : 1843.2 GFLOPS (1152 x 800Mhz x 2 ops/cycle[MUL+ADD])

Memory Bandwidth  : 176 GB/sec (5500Mhz x 256bit/8)/1000

SOURCES: 

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?page=2

http://www.vgleaks.com/orbis-gpu-compute-queues-and-pipelines/

http://www.gpureview.com

http://www.eurogamer.net/articles/digitalfoundry-face-to-face-with-mark-cerny

How Sony Modified the Hardware:

The three "major modifications" Sony did to the architecture to support this vision are as follows, in Cerny's words:

1) First, added another bus to the GPU that allows it to read directly from system memory or write directly to system memory, bypassing its own L1 and L2 caches. We can pass almost 20 gigabytes a second down that bus. That's not very small in today’s terms -- it’s larger than the PCIe on most PCs!.

         PCIe x16 v2.x bandwidth - 8 GB/s

         PCIe x16 v3.0 bandwidth - 15.75 GB/s

One barrier in a traditional PC hardware environment is communication between the CPU, GPU, and RAM. The PS4 architecture is designed to address that problem.

2) "Next, to support the case where you want to use the GPU L2 cache simultaneously for both graphics processing and asynchronous compute, we have added a bit in the tags of the cache lines, we call it the 'volatile' bit. This innovation radically reduces the overhead of running compute and graphics together on the GPU."

3) Thirdly, "The original AMD GCN architecture allowed for one source of graphics commands, and two sources of compute commands. For PS4, we’ve worked with AMD to increase the limit to 64 sources of compute commands -- the idea is if you have some asynchronous compute you want to perform, you put commands in one of these 64 queues, and then there are multiple levels of arbitration in the hardware to determine what runs, how it runs, and when it runs, alongside the graphics that's in the system."

Cerney said, "Our overall approach was to put in a very large number of controls about how to mix compute and graphics, and let the development community figure out which ones they want to use when they get around to the point where they're doing a lot of asynchronous compute."

He expects developers to run middleware -- such as physics, for example -- on the GPU. Using the system he describes above, you can run at peak efficiency, he said.

The PS4's Dedicated Units:

Another thing the PlayStation 4 team did to increase the flexibility of the console is to put many of its basic functions on dedicated units on the board -- that way, you don't have to allocate resources to handling these things.

1)Hardware dedicated unit for audio:

 This unit supports audio chat without the games needing to dedicate any significant resources to them.  The audio unit also handles decompression of "a very large number" of MP3 streams for in-game audio.

2)Hardware dedicated unit for video:

This unit does compression and decompression of video.

3)Hardware dedicated unit for zlib compression/decompression:

To further help the Blu-ray along, this unit supports zlib decompression -- so developers can confidently compress all of their game data and know the system will decode it on the fly. "As a minimum, our vision is that our games are zlib compressed on media

4)Secondary chip for background downloads/uploads/vita remote play:

This unit supports downloads and updates in the background, vita remote play and to help blue ray drive in reading by caching the mostly needed game date to HDD.

COMPARISON WITH PC GPU:

PS4 GPU is a match for Destkop GPU AMD 7870 with two Compute Units(CU) disbled for yields and clocked down by 200Mhz, but PS4 GPU has more memory bandwith and other customizations to Compute Units.

Comparison between PS4, AMD 7870 and AMD 7850 GPU's

Technical details

Playstaion 4 GPU

AMD 7870 GPU

Compute Units(CU)

18 CU (4 VU + 1 SU)

72 Vector and 18 Scalar Units

20 CU (4 VU + 1 SU)

80 Vector and 20 Scalar Units

Shader/Stream Processors(SP)

1152 SP

(18CU x 64 [4VU x 16ALU])

1280 SP

(20CU x 64 [4VU x 16ALU])

Texture Mapping units(TMU)

72 TMU (18CU x 4VU)

80 TMU (20CU x 4VU)

Raster back-ends(RB)

8 RB (Color/depth blocks)

8 RB (Color/depth blocks)

Raster  operators(ROP)

32 ROP (8RB x 4/clock)

32 ROP (8RB x 4/clock)

Pixel Fill rate

(GPU Clock x ROP)

25600 MPixels/sec

32000 MPixels/sec

Texture Fill rate

(GPU Clock x TMU)

57600 MTexels/sec

80000 MTexels/sec 

FLOPS

(SP x G Clk x 2ops/cycle)

1843.2 GFLOPS

2560 GFLOPS

GPU Clock

800 Mhz

1000 Mhz

Memory Clock

5500 Mhz GDDR5

4800 Mhz GDDR5

Memory bandwidth

(M Clk x Bus wide/8)/1000

176 GB/sec 

153.6 GB/sec

Fabrication

28 nm

28 nm

RAM

8GB GDDR5

2GB GDDR5

Architecture

GCN and GCN 2.0

GCN

 

XBOX one GPU:

It is also derived based on the AMD's GCN architecture as PS4 and it matches Desktop AMD 7790 GPU.

XBOX ONE GPU Specifications:

Compute Units : 12 CU

Shader Processors : 768 SP(12CU x 64 [4VU x 16ALU])

Texture Units : 48 TMU(12CU x 4VU)

Raster Back-end : 4 RB (Color/depth blocks)

Raster Operators : 16 ROP (4RB x 4/clock)

Memory Bus Type : 256 bit (32MB ESRAM + 8GB DDR3)

Pixel Fill Rate : 13648 MPixels/sec (853Mhz x 16 ROP) 

Texture Fill Rate : 40944 MTexels/sec (853Mhz x 48 TMU) 

FLOPS : 1310.2 GFLOPS (768 x 853Mhz x 2 Os/cyl[MUL+ADD])

Memory Bandwidth  : 68 GB/sec ( DDR3 RAM) and ESRAM buffer= 109 GB/sec for uni-direction, 204 GB/sec Peak for Bi-direcrtion

Addtional Hardware and Optimizations in XBon one:

Digital Foundry: You talk about having 15 processors. Can you break that down?

Nick Baker: On the SoC, there are many parallel engines - some of those are more like CPU cores or DSP cores. How we count to 15: [we have] eight inside the audio block, four move engines, one video encode, one video decode and one video compositor/resizer.

The audio block was completely unique. That was designed by us in-house. It's based on four tensilica DSP cores and several programmable processing engines. We break it up as one core running control, two cores running a lot of vector code for speech and one for general purpose DSP. We couple with that sample rate conversion, filtering, mixing, equalisation, dynamic range compensation then also the XMA audio block. The goal was to run 512 simultaneous voices for game audio as well as being able to do speech pre-processing for Kinect.

COMPARISON WITH PC GPU:

Xbox one GPU overall matches  to the Desktop GPU Model AMD 7790 with 2 Compute Units(CU) disabled for yields, slower memory(DDR3) and clocked down by 147Mhz, but Xbox one GPU has 32MB faster memory bandwidth with 256bit controller and some console related customizations.

Comparison between Xbox One and AMD 7790 GPU's
Technical details Xbox One GPU AMD 7790 GPU
Compute Units(CU)

12 CU (4 VU + 1 SU)

48 Vector and 12 Scalar Units

14 CU (4 VU + 1 SU)

56 Vector and 14 Scalar Units

Shader/Stream Processors(SP)

768 SP

(12CU x 64 [4VU x 16ALU])

896 SP

(14CU x 64 [4VU x 16ALU])

Texture Mapping Units(TMU) 48 TMU (12CU x 4VU) 56 TMU (10CU x 4VU)
Raster back-ends(RB) 4 RB (Color/depth blocks) 4 RB (Color/depth blocks)
Raster operators(ROP) 16 ROP (4RB x 4/clock) 16 ROP (4RB x 4/clock)

Pixel Fill Rate

(GPU clock x ROP)

13648 MPixels/sec

16000 MPixels/sec

Texture Fill Rate

(GPU clock x TMU)

40944 MTexels/sec

56000 MTexels/sec

FLOPS

(SP x GPU Clock x 2ops/cycle)

1310.2 GFLOPS

1792 GFLOPS

GPU Clock 853 Mhz 1000 Mhz
Memory Clock 2133 Mhz DDR3 6000 Mhz GDDR5
Memory bandwidth

68 GB/sec RAM + ESRAM of

109 GB/sec @853 Mhz(Uni)

204 GB/sec Peak(Bi)

96 GB/sec RAM

Fabrication 28 nm 28 nm
RAM 8GB DDR3(256bit) + 32MB ESRAM 1GB GDDR5(128bit)
Architecture GCN GCN

Sources:

http://techreport.com/news/24844/microsoft-reveals-next-generation-xbox-one-console

http://www.vgleaks.com/durango-gpu-2/

http://www.eurogamer.net/articles/digitalfoundry-xbox-one-memory-better-in-production-hardware

http://www.extremetech.com/gaming/156467-xbox-one-hardware-and-software-specs-detailed-and-analyzed

http://kotaku.com/the-xbox-ones-insides-have-changed-a-little-bit-since-992960685

http://www.itworld.com/hardware/370538/xbox-one-will-have-high-performance-custom-chip?page=0,0

http://gamrconnect.vgchartz.com/thread.php?id=167055&page=1

http://www.cpu-world.com/news_2013/2013032301_AMD_Introduces_Radeon_HD_7790_GPU.html

http://www.hwcompare.com/14298/radeon-hd-7790-vs-radeon-hd-7870/

http://www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview

Wii U GPU:

Wii U GPU is based on AMD's Terascale 2 archictecture and its a cut down version with a GPU name of Redwood(RV810) which is used for HD5xxx series, unlike PS4 and Xbox one's GCN architecture which is used for HD7xxx series. Wii U GPU is highly customized with high speed L1/L2 caches and EDRAM buffer on the hardware level. So, to get the most performance out of this hardware the developers needs to master using these high speed caches and buffer.

Wii U GPU Specifications:

Shader Cores/SIMD Engines : 4 SC (each has 16 Shader Units [SU] and 4 Texture Units[TMU])

Shader Processors : 320 SP(4SC x 80 [16SU x 5ALU])

Texture Units : 16 TMU(4SU x 4)

Raster Back-end : 2 RB (Color/depth blocks)

Raster Operators : 8 ROP (2RB x 4/clock)

Memory Bus Type : 128 bit (32MB EDRAM + 2GB DDR3[64bit])

Pixel Fill Rate : 4400 MPixels/sec (550Mhz x 8 ROP)

Texture Fill Rate : 8800 MTexels/sec (550Mhz x 16 TMU)

FLOPS : 352 GFLOPS (320 x 550Mhz x 2 ops/cyl[MUL+ADD])

Memory Bandwidth  : EDRAM Up to 1000GB/sec and 12.8 GB/sec for DDR3 RAM (1600Mhz x 64bit/8)/1000

Wii GPU is an exact match with ATI Radeon HD 5550 Desktop GPU.

Comparison between Wii U and ATI Radeon HD 5550 GPU

Technical details Wii U GPU ATI Radeon HD 5550 GPU
Compute Units(CU)

4 SC (Shader Cores)

64 Shader Units (4SC x 16)

4 SC (Shader Cores)

64 Shader Units (4SC x 16)

Shader/Stream Processors(SP)

320 SP

(4SC x 80[5ALU x 16SU])

320 SP

(4SC x 80[5ALU x 16SU])

Texture Mapping units(TMU) 16 TMU (4SC x 4) 16 TMU (4SC x 4)
Raster back-ends(RB) 2 RB (Color/depth blocks) 2 RB (Color/depth blocks)
Raster  operators(ROP) 8 ROP (2RB x 4/clock) 8 ROP (2RB x 4/clock)

Pixel Fill rate

(GPU Clock x ROP)

4400 MPixels/sec

(550Mhz x 8 ROP)

4400 MPixels/sec

(550Mhz x 8 ROP)

Texture Fill rate

(GPU Clock x TMU)

8800 MTexels/sec

(550Mhz x 16 TMU)

8800 MTexels/sec

(550Mhz x 16 TMU)

FLOPS

(SP x G Clk x 2ops/cycle)

352 GFLOPS 352 GFLOPS
GPU Clock 550 Mhz 550 Mhz
Memory Clock 1600 Mhz DDR3 1800 Mhz GDDR3

Memory bandwidth

(M Clk x Bus wide/8)/1000

12.8 GB/sec for RAM  and

Up to 1000 GB/sec EDRAM

28.8 GB/sec RAM

Fabrication 40 nm 40 nm
RAM 2GB DDR3 + 32MB EDRAM 1GB GDDR3
Architecture  RV810 [Terascale 2]  RV810 [Terascale 2]
Sources:

http://www.anandtech.com/show/6465/nintendo-wii-u-teardown

http://www.nintendolife.com/news/2013/05/shinen_wii_u_has_enough_power_for_years_to_come_gpu_is_several_generations_ahead_of_current_consoles

http://ixbtlabs.com/articles3/video/cypress-p2.html

http://www.guru3d.com/articles_pages/radeon_hd_5670_review_(crossfire_tested),2.html

http://www.bit-tech.net/hardware/graphics/2009/09/30/ati-radeon-hd-5870-architecture-analysis/

http://perspectives.mvdirona.com/2009/03/18/HeterogeneousComputingUsingGPGPUsAMDATIRV770.aspx

http://en.wikipedia.org/wiki/Radeon_HD_5000_Series#Radeon_HD_5500

http://www.eurogamer.net/articles/df-hardware-wii-u-graphics-power-finally-revealed

http://beyond3d.com/showthread.php?t=60501&page=211

 

Details about GPU concepts and how its useful:

Compute Unit of GCN:

Each compute unit has 1 scalar unit, Texture unit and four vector units. Each vector unit is 16 lanes wide. Each lane of the vector unit is a scalar ALU, and that is described as a "processing element" in OpenCL terminology. Stream core (or) Shader Processor is a scalar ALU within a vector unit: that is there are 64 of them in each compute unit. 

SIMD Engine (or) Shader Cores of :  (Used before GCN architecture, found in Wii U)

Cypress(RV870) architecture is split down into twenty SIMD cores, each with 16 shader units(SU) and four Texture units(TMU), each SU has four ALUs which can do ADD+MUL and one special ALU doing more than others. Totally, there are 64ALU + 16 SFU ALU = 80 stream processors per SIMD.

Redwood(RV810) architecture used in Wii U GPU has 4 SIMD cores, which is a cut down version of Terascale architecture 2(improved version of Cypress architecture). Totally, there are 40 ALU's(stream processors) and 16 Texture units tied to 4 SIMID cores each with 4.

Texture Units:

Texture units (aka TMUs or texture mapping units) map textures onto 3D geometry. 3D scenes are generally composed of two things: 3D geometry, and the textures that cover that geometry. Texture units in a video card take a texture and 'map' it to a piece of geometry. That is, they wrap the texture around the geometry and produce textured pixels which can then be written to the screen.Textures can be an actual image, a light map, or even bump mapping. 

Texture Fill Rate:

The number of textured pixels the card can render to the screen every second. To render a 3D scene, textures are mapped over the top of polygon meshes. This is called texture mapping and is accomplished by texture mapping units (TMUs) on the videocard. Texture fill rate is a measure of the speed with which a particular card can perform texture mapping.

Raster Operator:

The last stage of the graphics pipeline which writes the textured/shaded pixels to the frame buffer.Raster Operators (ROPs) handle several chores near the end of the of the pixel pipeline. ROPs handle anti-aliasing, Z and color compression, and the actual writing of the pixel to the output buffer.

Pixel Fill Rate:

The number of pixels the card can render to the screen every second. Before pixel shader processing became the more limiting factor, this was the most accurate measure of performance (along with texel fill rate).

Memory bandwidth:

The speed at which the card can access memory.Memory bandwidth is equal to the size of the memory bus multiplied by the speed at which the memory is clocked.The higher the memory bandwidth, the better the card will be able to handle large textures and anti-aliasing and anisotropic filtering.



GAMING is not about spending hours to pass/waste our time just for fun,

its a Feeling/Experience about a VIRTUAL WORLD we can never be in real, and realizing some of our dreams (also creating new ones).

So, Feel Emotions, Experience Adventure/Action, Challenge Game, Solve puzzles and Have fun.

PlayStation is about all-round "New experiences" using new IP's to provide great diversity for everyone.

Xbox is always about Online and Shooting.

Nintendo is always about Fun games and milking IP's.

Around the Network

I wouldn't add the DDR3 and SDRAM bandwidths together. You'll never get a practical speed like that as at some point the DDR3 is going to limit the transfer rate.



Scoobes said:
I wouldn't add the DDR3 and SDRAM bandwidths together. You'll never get a practical speed like that as at some point the DDR3 is going to limit the transfer rate.


Yes, I just wanted to give its peak bandwidth that can achieve.



GAMING is not about spending hours to pass/waste our time just for fun,

its a Feeling/Experience about a VIRTUAL WORLD we can never be in real, and realizing some of our dreams (also creating new ones).

So, Feel Emotions, Experience Adventure/Action, Challenge Game, Solve puzzles and Have fun.

PlayStation is about all-round "New experiences" using new IP's to provide great diversity for everyone.

Xbox is always about Online and Shooting.

Nintendo is always about Fun games and milking IP's.

You should include the Wii U, as its still next gen. Regardless, useful information so thank you!



 

Here lies the dearly departed Nintendomination Thread.

the only problem adding Wii U is we don't really know its specs even 7 months after launch. People constantly pull up the DF hardware reveal but it was complete BS. You cant skip over half the chip because you don't know what those areas are for and base the entire chip performance off of only what you know. That's like me judging how fast a car will go by only looking at the bumper.



Around the Network
ListerOfSmeg said:
the only problem adding Wii U is we don't really know its specs even 7 months after launch. People constantly pull up the DF hardware reveal but it was complete BS. You cant skip over half the chip because you don't know what those areas are for and base the entire chip performance off of only what you know. That's like me judging how fast a car will go by only looking at the bumper.


This is the reason why I 've not added Wii U GPU yet, but I'm digging more and gonna add those details how far we know.



GAMING is not about spending hours to pass/waste our time just for fun,

its a Feeling/Experience about a VIRTUAL WORLD we can never be in real, and realizing some of our dreams (also creating new ones).

So, Feel Emotions, Experience Adventure/Action, Challenge Game, Solve puzzles and Have fun.

PlayStation is about all-round "New experiences" using new IP's to provide great diversity for everyone.

Xbox is always about Online and Shooting.

Nintendo is always about Fun games and milking IP's.

biglittlesps said:
Scoobes said:
I wouldn't add the DDR3 and SDRAM bandwidths together. You'll never get a practical speed like that as at some point the DDR3 is going to limit the transfer rate.


Yes, I just wanted to give its peak bandwidth that can achieve.

It's not really a peak as such; at some point the APU will need to access DDR3 and that'll bottleneck it somewhat.

That's the only thing I criticsm I have in an otherwise informative post. Nice to see the WiiU GPU info is there now too.



niice thread. will read later.



 

 

biglittlesps said:

 

Hello everyone,

   This is meant only for reference about PS4, Wii U and Xbox one GPU's complete technical details and i'll update this post as soon as I find out new details, so everybody can better understand and get some insight on the GPU's inside next gen consoles with respect to PC GPU's.

Comparison between PS4, Xbox One and Wii U GPU's
Technical details

Playstaion 4

Xbox One Wii U
Compute Units(CU)

18 CU (4 VU + 1 SU)

72 Vector and 18 Scalar Units

12 CU (4 VU + 1 SU)

48 Vector and 12 Scalar Units

8 SC (Shader Cores)

64 Shader Units (8SC x 8)

Shader/Stream Processors(SP)

1152 SP

(18CU x 64 [4VU x 16ALU])

768 SP

(12CU x 64 [4VU x 16ALU])

320 SP

(8SC x 40[5ALU x 8SU])

Texture Mapping units(TMU) 72 TMU (18CU x 4VU) 48 TMU (12CU x 4VU) 16 TMU (8SU x 2)
Raster back-ends(RB) 8 RB (Color/depth blocks) 4 RB (Color/depth blocks) 2 RB (Color/depth blocks)
Raster  operators(ROP) 32 ROP (8RB x 4/clock) 16 ROP (4RB x 4/clock) 8 ROP (2RB x 4/clock)
Pixel Fill rate

25600 MPixels/sec

(800Mhz x 32 ROP)

12800 MPixels/sec

(800Mhz x 16 ROP)

4400 MPixels/sec

(550Mhz x 8 ROP)

Texture Fill rate

57600 MTexels/sec

(800Mhz x 72 TMU)

38400 MTexels/sec 

(800Mhz x 48 TMU)

8800 MTexels/sec

(550Mhz x 16 TMU)

FLOPS

(SP x G Clk x 2[ADD+MUL])

1843.2 GFLOPS 1228.8 GFLOPS 352 GFLOPS
GPU Clock 800 Mhz 800 Mhz 550 Mhz
Memory Clock 5500 Mhz 2133 Mhz 1600 Mhz
Memory bandwidth

176 GB/sec Unified RAM

(5500Mhz x 256bit/8)/1000

68 GB/sec RAM +

102 GB/sec ESRAM

12.8 GB/sec RAM +

Up to 1000 GB/sec EDRAM

Fabrication 28 nm 40 nm 40 nm
Reserved Reserved Reserved Reserved
Reserved Reserved Reserved Reserved

 

PS4 GPU:

It is derived based on the AMD's GCN architecture and based on HD7000 series (between 7850 and 7870).

PS4: GPU Specifications:

Compute Units : 18 CU

Shader Processors : 1152 SP(18CU x 64 [4VU x 16ALU])

Texture Units : 72 TMU(18CU x 4VU)

Raster Back-end : 8 RB (Color/depth blocks)

Raster Operators : 32 ROP (8RB x 4/clock)

Memory Bus Type : 256 bit (8GB GDDR5)

Pixel Fill Rate : 25600 MPixels/sec (800Mhz x 32 ROP)

Texture Fill Rate : 57600 MTexels/sec (800Mhz x 72 TMU)

FLOPS : 1843.2 GFLOPS (1152 x 800Mhz x 2 OP/sec[MUL+ADD])

Memory Bandwidth  : 176 GB/sec (5500Mhz x 256bit/8)/1000

SOURCES: 

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?page=2

http://www.vgleaks.com/orbis-gpu-compute-queues-and-pipelines/

http://www.gpureview.com

How Sony Modified the Hardware:

The three "major modifications" Sony did to the architecture to support this vision are as follows, in Cerny's words:

1) First, added another bus to the GPU that allows it to read directly from system memory or write directly to system memory, bypassing its own L1 and L2 caches. We can pass almost 20 gigabytes a second down that bus. That's not very small in today’s terms -- it’s larger than the PCIe on most PCs!.

         PCIe x16 v2.x bandwidth - 8 GB/s

         PCIe x16 v3.0 bandwidth - 15.75 GB/s

One barrier in a traditional PC hardware environment is communication between the CPU, GPU, and RAM. The PS4 architecture is designed to address that problem.

2) "Next, to support the case where you want to use the GPU L2 cache simultaneously for both graphics processing and asynchronous compute, we have added a bit in the tags of the cache lines, we call it the 'volatile' bit. This innovation radically reduces the overhead of running compute and graphics together on the GPU."

3) Thirdly, "The original AMD GCN architecture allowed for one source of graphics commands, and two sources of compute commands. For PS4, we’ve worked with AMD to increase the limit to 64 sources of compute commands -- the idea is if you have some asynchronous compute you want to perform, you put commands in one of these 64 queues, and then there are multiple levels of arbitration in the hardware to determine what runs, how it runs, and when it runs, alongside the graphics that's in the system."

Cerney said, "Our overall approach was to put in a very large number of controls about how to mix compute and graphics, and let the development community figure out which ones they want to use when they get around to the point where they're doing a lot of asynchronous compute."

He expects developers to run middleware -- such as physics, for example -- on the GPU. Using the system he describes above, you can run at peak efficiency, he said.

The PS4's Dedicated Units:

Another thing the PlayStation 4 team did to increase the flexibility of the console is to put many of its basic functions on dedicated units on the board -- that way, you don't have to allocate resources to handling these things.

1)Hardware dedicated unit for audio:

 This unit supports audio chat without the games needing to dedicate any significant resources to them.  The audio unit also handles decompression of "a very large number" of MP3 streams for in-game audio.

2)Hardware dedicated unit for video:

This unit does compression and decompression of video.

3)Hardware dedicated unit for zlib compression/decompression:

To further help the Blu-ray along, this unit supports zlib decompression -- so developers can confidently compress all of their game data and know the system will decode it on the fly. "As a minimum, our vision is that our games are zlib compressed on media

4)Secondary chip for background downloads/uploads/vita remote play:

This unit supports downloads and updates in the background, vita remote play and to help blue ray drive in reading by caching the mostly needed game date to HDD.

COMPARISON WITH PC GPU:

PS4 GPU comes in between these two PC Desktop GPU' AMD 7850 and AMD 7870.

AMD 7850:

Compute Units : 16 CU

Shader Processors : 1024 (18CU x 64 [4VU x 16ALU])

Texture Units : 64 (18CU x 4VU)

Raster Back-end : 8 RB (Color/depth blocks)

Raster Operators : 32 (8RB x 4/clock)

Memory Bus Type : 256 bit (2GB GDDR5)

Pixel Fill Rate : 27520 MPixels/sec (860Mhz x 32 ROP)

Texture Fill Rate : 55040 MTexels/sec (860Mhz x 72 TMU)

FLOPS : 1761.28 GFLOPS (1.76 TFLOPS)

Memory Bandwidth  : 153.6 GB/sec (4800Mhz x 256bit/8)/1000

 

AMD 7870:

Compute Units : 20 CU

Shader Processors : 1280 (20CU x 64 [4VU x 16ALU])

Texture Units : 80 (18CU x 4VU)

Raster Back-end : 8 RB (Color/depth blocks)

Raster Operators : 32 (8RB x 4/clock)

Memory Bus Type : 256 bit (2GB GDDR5)

Pixel Fill Rate : 3200 MPixels/sec (1000Mhz x 32 ROP)

Texture Fill Rate : 80000 MTexels/sec (1000Mhz x 80 TMU)

FLOPS : 2560 GFLOPS (2.56 TFLOPS)

Memory Bandwidth  : 153.6 GB/sec (4800Mhz x 256bit/8)/1000

 

XBOX one GPU:

It is also derived based on the AMD's GCN architecture as PS4 and based on HD7000 series (matching closely AMD 7770 GPU).

XBOX ONE GPU Specifications:

Compute Units : 12 CU

Shader Processors : 768 SP(12CU x 64 [4VU x 16ALU])

Texture Units : 48 TMU(12CU x 4VU)

Raster Back-end : 4 RB (Color/depth blocks)

Raster Operators : 16 ROP (4RB x 4/clock)

Memory Bus Type : 256 bit (32MB ESRAM + 8GB DDR3)

Pixel Fill Rate : 12800 MPixels/sec (800Mhz x 16 ROP)

Texture Fill Rate : 38400 MTexels/sec (800Mhz x 48 TMU)

FLOPS : 1228.8 GFLOPS (768 x 800Mhz x 2 OP/sec[MUL+ADD])

Memory Bandwidth  : 68(RAM) + 102(ESRAM buffer) = 170 GB/sec

Xbox one GPU overall comes very close to the below Desktop GPU Model AMD 7770.

AMD 7770:

Compute Units : 10 CU

Shader Processors : 640 (10CU x 64 [4VU x 16ALU])

Texture Units : 40 (10CU x 4VU)

Raster Back-end : 4 RB (Color/depth blocks)

Raster Operators : 16 (4RB x 4/clock)

Memory Bus Type : 128 bit (1GB GDDR5)

Pixel Fill Rate : 16000 MPixels/sec (100Mhz x 16 ROP)

Texture Fill Rate : 40000 MTexels/sec (100Mhz x 48 TMU)

FLOPS : 1280GFLOPS (1.28 TFLOPS)

Memory Bandwidth  : 72 GB/sec (4500Mhz x 128bit/8)/1000

Sources:

http://techreport.com/news/24844/microsoft-reveals-next-generation-xbox-one-console

http://www.vgleaks.com/durango-gpu-2/

 

Wii U GPU:

Wii U is based on AMD's RV730 (cut down version of RV770) architecture (used for HD467x series) unlike PS4 and Xbox one's GCN (used for HD7000 series) architecture. Wii U GPU is highly customized with high speed L1/L2 caches and EDRAM buffer on the hardware level. So, to get the most performance out of this hardware the developers needs to master using these high speed caches and buffer.

Wii U GPU Specifications:

Shader Cores/SIMD Engines : 8 SC (each has 8 Shader Units [SU])

Shader Processors : 320 SP(8CU x 40 [8SU x 5ALU])

Texture Units : 16 TMU(8SU x 2)

Raster Back-end : 2 RB (Color/depth blocks)

Raster Operators : 8 ROP (2RB x 4/clock)

Memory Bus Type : 128 bit (32MB EDRAM + 2GB DDR3)

Pixel Fill Rate : 4400 MPixels/sec (550Mhz x 8 ROP)

Texture Fill Rate : 8800 MTexels/sec (550Mhz x 16 TMU)

FLOPS : 352 GFLOPS (320 x 750Mhz x 2 OP/sec[MUL+ADD])

Memory Bandwidth  : EDRAM Up to 1000GB/sec + 12.8 GB/sec(DDR3 RAM) (1600Mhz x 64bit/8)/1000

 

Wii GPU closely matches with AMD 4670 Desktop GPU.

AMD 4670:

Shader Cores/SIMD Engines : 8 SC (each has 8 Shader Units [SU])

Shader Processors : 320 SP(8CU x 40 [8SU x 5ALU])

Texture Units : 32 TMU(8SU x 4)

Raster Back-end : 2 RB (Color/depth blocks)

Raster Operators : 8 ROP (2RB x 4/clock)

Memory Bus Type : 128 bit (512MB GDDR3)

Pixel Fill Rate : 6000 MPixels/sec (750Mhz x 8 ROP)

Texture Fill Rate : 24000 MTexels/sec (750Mhz x 32 TMU)

FLOPS : 480 GFLOPS (320 x 750Mhz x 2 OP/sec[MUL+ADD])

Memory Bandwidth  : 32 GB/sec (1000Mhz x 128bit/8)/1000

 

Sources:

http://www.nintendolife.com/news/2013/05/shinen_wii_u_has_enough_power_for_years_to_come_gpu_is_several_generations_ahead_of_current_consoles

http://perspectives.mvdirona.com/2009/03/18/HeterogeneousComputingUsingGPGPUsAMDATIRV770.aspx

http://www.eurogamer.net/articles/df-hardware-wii-u-graphics-power-finally-revealed

http://www.tomshardware.com/reviews/radeon-hd-7970-benchmark-tahiti-gcn,3104-2.html

http://www.bit-tech.net/hardware/graphics/2008/09/02/ati-radeon-4850-4870-architecture-review/8

http://www.bit-tech.net/hardware/graphics/2008/09/11/amd-ati-radeon-hd-4670-512mb/2

 

Details about GPU concepts and how its useful:

Compute Unit of GCN:

Each compute unit has 1 scalar unit, Texture unit and four vector units. Each vector unit is 16 lanes wide. Each lane of the vector unit is a scalar ALU, and that is described as a "processing element" in OpenCL terminology. Stream core (or) Shader Processor is a scalar ALU within a vector unit: that is there are 64 of them in each compute unit. 

SIMD Engine (or) Shader Cores:  (Used before GCN architecture, found in Wii U)

RV770 architecture are split down into ten SIMD cores, each with 16 shader units(SU), each SU with four ALUs which can do ADD+MUL and one special ALU doing more than others. Totally there are 64ALU + 16 SFU ALU = 80 stream processors per SIMD.RV730 architecture used in Wii U GPU has 8 SIMD cores, with each one having 8 shader units. Totally, there are 40 ALU's(stream processors), among them 32 ALU's(which can ADD+MUL) and 8 ALU's(which can more than others)

Texture Units:

Texture units (aka TMUs or texture mapping units) map textures onto 3D geometry. 3D scenes are generally composed of two things: 3D geometry, and the textures that cover that geometry. Texture units in a video card take a texture and 'map' it to a piece of geometry. That is, they wrap the texture around the geometry and produce textured pixels which can then be written to the screen.Textures can be an actual image, a light map, or even bump mapping. 

Texture Fill Rate:

The number of textured pixels the card can render to the screen every second. To render a 3D scene, textures are mapped over the top of polygon meshes. This is called texture mapping and is accomplished by texture mapping units (TMUs) on the videocard. Texture fill rate is a measure of the speed with which a particular card can perform texture mapping.

Raster Operator:

The last stage of the graphics pipeline which writes the textured/shaded pixels to the frame buffer.Raster Operators (ROPs) handle several chores near the end of the of the pixel pipeline. ROPs handle anti-aliasing, Z and color compression, and the actual writing of the pixel to the output buffer.

Pixel Fill Rate:

The number of pixels the card can render to the screen every second. Before pixel shader processing became the more limiting factor, this was the most accurate measure of performance (along with texel fill rate).

Memory bandwidth:

The speed at which the card can access memory.Memory bandwidth is equal to the size of the memory bus multiplied by the speed at which the memory is clocked.The higher the memory bandwidth, the better the card will be able to handle large textures and anti-aliasing and anisotropic filtering.

the wiiu hardware is all wrong if you understand about tech, read these 2 pages. http://beyond3d.com/showthread.php?t=60501&page=211



ninjablade said:

the wiiu hardware is all wrong if you understand about tech, read these 2 pages. http://beyond3d.com/showthread.php?t=60501&page=211


I thought you might be talking BS but I checked the power consumption and the manufacturing process and indeed it seems inconsistent with hardware delivering above some 250 GFLOPS or so. That would be a major bummer but perhaps the games tested didn't push hardware to the max, so I'm going to give it the benefit of doubt, since the chip itself seems a lot like a lesser RV740.