By using this site, you agree to our Privacy Policy and our Terms of Use. Close
biglittlesps said:

 

Hello everyone,

   This is meant only for reference about PS4, Wii U and Xbox one GPU's complete technical details and i'll update this post as soon as I find out new details, so everybody can better understand and get some insight on the GPU's inside next gen consoles with respect to PC GPU's.

Comparison between PS4, Xbox One and Wii U GPU's
Technical details

Playstaion 4

Xbox One Wii U
Compute Units(CU)

18 CU (4 VU + 1 SU)

72 Vector and 18 Scalar Units

12 CU (4 VU + 1 SU)

48 Vector and 12 Scalar Units

8 SC (Shader Cores)

64 Shader Units (8SC x 8)

Shader/Stream Processors(SP)

1152 SP

(18CU x 64 [4VU x 16ALU])

768 SP

(12CU x 64 [4VU x 16ALU])

320 SP

(8SC x 40[5ALU x 8SU])

Texture Mapping units(TMU) 72 TMU (18CU x 4VU) 48 TMU (12CU x 4VU) 16 TMU (8SU x 2)
Raster back-ends(RB) 8 RB (Color/depth blocks) 4 RB (Color/depth blocks) 2 RB (Color/depth blocks)
Raster  operators(ROP) 32 ROP (8RB x 4/clock) 16 ROP (4RB x 4/clock) 8 ROP (2RB x 4/clock)
Pixel Fill rate

25600 MPixels/sec

(800Mhz x 32 ROP)

12800 MPixels/sec

(800Mhz x 16 ROP)

4400 MPixels/sec

(550Mhz x 8 ROP)

Texture Fill rate

57600 MTexels/sec

(800Mhz x 72 TMU)

38400 MTexels/sec 

(800Mhz x 48 TMU)

8800 MTexels/sec

(550Mhz x 16 TMU)

FLOPS

(SP x G Clk x 2[ADD+MUL])

1843.2 GFLOPS 1228.8 GFLOPS 352 GFLOPS
GPU Clock 800 Mhz 800 Mhz 550 Mhz
Memory Clock 5500 Mhz 2133 Mhz 1600 Mhz
Memory bandwidth

176 GB/sec Unified RAM

(5500Mhz x 256bit/8)/1000

68 GB/sec RAM +

102 GB/sec ESRAM

12.8 GB/sec RAM +

Up to 1000 GB/sec EDRAM

Fabrication 28 nm 40 nm 40 nm
Reserved Reserved Reserved Reserved
Reserved Reserved Reserved Reserved

 

PS4 GPU:

It is derived based on the AMD's GCN architecture and based on HD7000 series (between 7850 and 7870).

PS4: GPU Specifications:

Compute Units : 18 CU

Shader Processors : 1152 SP(18CU x 64 [4VU x 16ALU])

Texture Units : 72 TMU(18CU x 4VU)

Raster Back-end : 8 RB (Color/depth blocks)

Raster Operators : 32 ROP (8RB x 4/clock)

Memory Bus Type : 256 bit (8GB GDDR5)

Pixel Fill Rate : 25600 MPixels/sec (800Mhz x 32 ROP)

Texture Fill Rate : 57600 MTexels/sec (800Mhz x 72 TMU)

FLOPS : 1843.2 GFLOPS (1152 x 800Mhz x 2 OP/sec[MUL+ADD])

Memory Bandwidth  : 176 GB/sec (5500Mhz x 256bit/8)/1000

SOURCES: 

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?page=2

http://www.vgleaks.com/orbis-gpu-compute-queues-and-pipelines/

http://www.gpureview.com

How Sony Modified the Hardware:

The three "major modifications" Sony did to the architecture to support this vision are as follows, in Cerny's words:

1) First, added another bus to the GPU that allows it to read directly from system memory or write directly to system memory, bypassing its own L1 and L2 caches. We can pass almost 20 gigabytes a second down that bus. That's not very small in today’s terms -- it’s larger than the PCIe on most PCs!.

         PCIe x16 v2.x bandwidth - 8 GB/s

         PCIe x16 v3.0 bandwidth - 15.75 GB/s

One barrier in a traditional PC hardware environment is communication between the CPU, GPU, and RAM. The PS4 architecture is designed to address that problem.

2) "Next, to support the case where you want to use the GPU L2 cache simultaneously for both graphics processing and asynchronous compute, we have added a bit in the tags of the cache lines, we call it the 'volatile' bit. This innovation radically reduces the overhead of running compute and graphics together on the GPU."

3) Thirdly, "The original AMD GCN architecture allowed for one source of graphics commands, and two sources of compute commands. For PS4, we’ve worked with AMD to increase the limit to 64 sources of compute commands -- the idea is if you have some asynchronous compute you want to perform, you put commands in one of these 64 queues, and then there are multiple levels of arbitration in the hardware to determine what runs, how it runs, and when it runs, alongside the graphics that's in the system."

Cerney said, "Our overall approach was to put in a very large number of controls about how to mix compute and graphics, and let the development community figure out which ones they want to use when they get around to the point where they're doing a lot of asynchronous compute."

He expects developers to run middleware -- such as physics, for example -- on the GPU. Using the system he describes above, you can run at peak efficiency, he said.

The PS4's Dedicated Units:

Another thing the PlayStation 4 team did to increase the flexibility of the console is to put many of its basic functions on dedicated units on the board -- that way, you don't have to allocate resources to handling these things.

1)Hardware dedicated unit for audio:

 This unit supports audio chat without the games needing to dedicate any significant resources to them.  The audio unit also handles decompression of "a very large number" of MP3 streams for in-game audio.

2)Hardware dedicated unit for video:

This unit does compression and decompression of video.

3)Hardware dedicated unit for zlib compression/decompression:

To further help the Blu-ray along, this unit supports zlib decompression -- so developers can confidently compress all of their game data and know the system will decode it on the fly. "As a minimum, our vision is that our games are zlib compressed on media

4)Secondary chip for background downloads/uploads/vita remote play:

This unit supports downloads and updates in the background, vita remote play and to help blue ray drive in reading by caching the mostly needed game date to HDD.

COMPARISON WITH PC GPU:

PS4 GPU comes in between these two PC Desktop GPU' AMD 7850 and AMD 7870.

AMD 7850:

Compute Units : 16 CU

Shader Processors : 1024 (18CU x 64 [4VU x 16ALU])

Texture Units : 64 (18CU x 4VU)

Raster Back-end : 8 RB (Color/depth blocks)

Raster Operators : 32 (8RB x 4/clock)

Memory Bus Type : 256 bit (2GB GDDR5)

Pixel Fill Rate : 27520 MPixels/sec (860Mhz x 32 ROP)

Texture Fill Rate : 55040 MTexels/sec (860Mhz x 72 TMU)

FLOPS : 1761.28 GFLOPS (1.76 TFLOPS)

Memory Bandwidth  : 153.6 GB/sec (4800Mhz x 256bit/8)/1000

 

AMD 7870:

Compute Units : 20 CU

Shader Processors : 1280 (20CU x 64 [4VU x 16ALU])

Texture Units : 80 (18CU x 4VU)

Raster Back-end : 8 RB (Color/depth blocks)

Raster Operators : 32 (8RB x 4/clock)

Memory Bus Type : 256 bit (2GB GDDR5)

Pixel Fill Rate : 3200 MPixels/sec (1000Mhz x 32 ROP)

Texture Fill Rate : 80000 MTexels/sec (1000Mhz x 80 TMU)

FLOPS : 2560 GFLOPS (2.56 TFLOPS)

Memory Bandwidth  : 153.6 GB/sec (4800Mhz x 256bit/8)/1000

 

XBOX one GPU:

It is also derived based on the AMD's GCN architecture as PS4 and based on HD7000 series (matching closely AMD 7770 GPU).

XBOX ONE GPU Specifications:

Compute Units : 12 CU

Shader Processors : 768 SP(12CU x 64 [4VU x 16ALU])

Texture Units : 48 TMU(12CU x 4VU)

Raster Back-end : 4 RB (Color/depth blocks)

Raster Operators : 16 ROP (4RB x 4/clock)

Memory Bus Type : 256 bit (32MB ESRAM + 8GB DDR3)

Pixel Fill Rate : 12800 MPixels/sec (800Mhz x 16 ROP)

Texture Fill Rate : 38400 MTexels/sec (800Mhz x 48 TMU)

FLOPS : 1228.8 GFLOPS (768 x 800Mhz x 2 OP/sec[MUL+ADD])

Memory Bandwidth  : 68(RAM) + 102(ESRAM buffer) = 170 GB/sec

Xbox one GPU overall comes very close to the below Desktop GPU Model AMD 7770.

AMD 7770:

Compute Units : 10 CU

Shader Processors : 640 (10CU x 64 [4VU x 16ALU])

Texture Units : 40 (10CU x 4VU)

Raster Back-end : 4 RB (Color/depth blocks)

Raster Operators : 16 (4RB x 4/clock)

Memory Bus Type : 128 bit (1GB GDDR5)

Pixel Fill Rate : 16000 MPixels/sec (100Mhz x 16 ROP)

Texture Fill Rate : 40000 MTexels/sec (100Mhz x 48 TMU)

FLOPS : 1280GFLOPS (1.28 TFLOPS)

Memory Bandwidth  : 72 GB/sec (4500Mhz x 128bit/8)/1000

Sources:

http://techreport.com/news/24844/microsoft-reveals-next-generation-xbox-one-console

http://www.vgleaks.com/durango-gpu-2/

 

Wii U GPU:

Wii U is based on AMD's RV730 (cut down version of RV770) architecture (used for HD467x series) unlike PS4 and Xbox one's GCN (used for HD7000 series) architecture. Wii U GPU is highly customized with high speed L1/L2 caches and EDRAM buffer on the hardware level. So, to get the most performance out of this hardware the developers needs to master using these high speed caches and buffer.

Wii U GPU Specifications:

Shader Cores/SIMD Engines : 8 SC (each has 8 Shader Units [SU])

Shader Processors : 320 SP(8CU x 40 [8SU x 5ALU])

Texture Units : 16 TMU(8SU x 2)

Raster Back-end : 2 RB (Color/depth blocks)

Raster Operators : 8 ROP (2RB x 4/clock)

Memory Bus Type : 128 bit (32MB EDRAM + 2GB DDR3)

Pixel Fill Rate : 4400 MPixels/sec (550Mhz x 8 ROP)

Texture Fill Rate : 8800 MTexels/sec (550Mhz x 16 TMU)

FLOPS : 352 GFLOPS (320 x 750Mhz x 2 OP/sec[MUL+ADD])

Memory Bandwidth  : EDRAM Up to 1000GB/sec + 12.8 GB/sec(DDR3 RAM) (1600Mhz x 64bit/8)/1000

 

Wii GPU closely matches with AMD 4670 Desktop GPU.

AMD 4670:

Shader Cores/SIMD Engines : 8 SC (each has 8 Shader Units [SU])

Shader Processors : 320 SP(8CU x 40 [8SU x 5ALU])

Texture Units : 32 TMU(8SU x 4)

Raster Back-end : 2 RB (Color/depth blocks)

Raster Operators : 8 ROP (2RB x 4/clock)

Memory Bus Type : 128 bit (512MB GDDR3)

Pixel Fill Rate : 6000 MPixels/sec (750Mhz x 8 ROP)

Texture Fill Rate : 24000 MTexels/sec (750Mhz x 32 TMU)

FLOPS : 480 GFLOPS (320 x 750Mhz x 2 OP/sec[MUL+ADD])

Memory Bandwidth  : 32 GB/sec (1000Mhz x 128bit/8)/1000

 

Sources:

http://www.nintendolife.com/news/2013/05/shinen_wii_u_has_enough_power_for_years_to_come_gpu_is_several_generations_ahead_of_current_consoles

http://perspectives.mvdirona.com/2009/03/18/HeterogeneousComputingUsingGPGPUsAMDATIRV770.aspx

http://www.eurogamer.net/articles/df-hardware-wii-u-graphics-power-finally-revealed

http://www.tomshardware.com/reviews/radeon-hd-7970-benchmark-tahiti-gcn,3104-2.html

http://www.bit-tech.net/hardware/graphics/2008/09/02/ati-radeon-4850-4870-architecture-review/8

http://www.bit-tech.net/hardware/graphics/2008/09/11/amd-ati-radeon-hd-4670-512mb/2

 

Details about GPU concepts and how its useful:

Compute Unit of GCN:

Each compute unit has 1 scalar unit, Texture unit and four vector units. Each vector unit is 16 lanes wide. Each lane of the vector unit is a scalar ALU, and that is described as a "processing element" in OpenCL terminology. Stream core (or) Shader Processor is a scalar ALU within a vector unit: that is there are 64 of them in each compute unit. 

SIMD Engine (or) Shader Cores:  (Used before GCN architecture, found in Wii U)

RV770 architecture are split down into ten SIMD cores, each with 16 shader units(SU), each SU with four ALUs which can do ADD+MUL and one special ALU doing more than others. Totally there are 64ALU + 16 SFU ALU = 80 stream processors per SIMD.RV730 architecture used in Wii U GPU has 8 SIMD cores, with each one having 8 shader units. Totally, there are 40 ALU's(stream processors), among them 32 ALU's(which can ADD+MUL) and 8 ALU's(which can more than others)

Texture Units:

Texture units (aka TMUs or texture mapping units) map textures onto 3D geometry. 3D scenes are generally composed of two things: 3D geometry, and the textures that cover that geometry. Texture units in a video card take a texture and 'map' it to a piece of geometry. That is, they wrap the texture around the geometry and produce textured pixels which can then be written to the screen.Textures can be an actual image, a light map, or even bump mapping. 

Texture Fill Rate:

The number of textured pixels the card can render to the screen every second. To render a 3D scene, textures are mapped over the top of polygon meshes. This is called texture mapping and is accomplished by texture mapping units (TMUs) on the videocard. Texture fill rate is a measure of the speed with which a particular card can perform texture mapping.

Raster Operator:

The last stage of the graphics pipeline which writes the textured/shaded pixels to the frame buffer.Raster Operators (ROPs) handle several chores near the end of the of the pixel pipeline. ROPs handle anti-aliasing, Z and color compression, and the actual writing of the pixel to the output buffer.

Pixel Fill Rate:

The number of pixels the card can render to the screen every second. Before pixel shader processing became the more limiting factor, this was the most accurate measure of performance (along with texel fill rate).

Memory bandwidth:

The speed at which the card can access memory.Memory bandwidth is equal to the size of the memory bus multiplied by the speed at which the memory is clocked.The higher the memory bandwidth, the better the card will be able to handle large textures and anti-aliasing and anisotropic filtering.

the wiiu hardware is all wrong if you understand about tech, read these 2 pages. http://beyond3d.com/showthread.php?t=60501&page=211