By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Microsoft Discussion - Digital Foundry: The Complete Xbox One architects Interview

nightsurge said:

How old are you...

Also, quite a lot has changed in 30 years, I would think. Quite a lot...

That is not a secret, I'm 55. A lot of things have changed in the past 30 years, but not math logic and engineering laws. Granted the memory management in the XBox One is probably some of the most complex things ever designed, but it still has to follow the same math and engineering laws as always. I could probably draw a diagram of the XBox One MM stuff that either comes pretty close to the truth or is totally off because of some fantastic design philosophy I'm unaware of. If the second case is true, I'd love to see it described by the MS engineers. So far, I have seen the Baker interview stuff and I am convinced that it is truth mixed with pr and downright technical nonsense.



Around the Network
drkohler said:
nightsurge said:

How old are you...

Also, quite a lot has changed in 30 years, I would think. Quite a lot...

That is not a secret, I'm 55. A lot of things have changed in the past 30 years, but not math logic and engineering laws. Granted the memory management in the XBox One is probably some of the most complex things ever designed, but it still has to follow the same math and engineering laws as always. I could probably draw a diagram of the XBox One MM stuff that either comes pretty close to the truth or is totally off because of some fantastic design philosophy I'm unaware of. If the second case is true, I'd love to see it described by the MS engineers. So far, I have seen the Baker interview stuff and I am convinced that that it is truth mixed with pr and downright technical nonsense.

The truth pretty much lies in that hotchips conference slide. 

It honestly doesn't get any easier than this to my eyes. 



fatslob-:O said:

The truth pretty much lies in that hotchips conference slide. 

It honestly doesn't get any easier than this to my eyes. 

There are more questions than answers on this slide. Just a few of them:

1. Does the cpu access the ddr3 with 30GB/s max? What with the rest of the 68GB/s?

2. The memory transfer between gpu and esram is a mystery. You CANNOT transfer 109GB/s in both directions the same time with only 4 memory controllers.

3. Where are the two dma controllers of the gpu? Are they in reality the two "swizzle copy" dme engines?

4. Where are the 4 dme engines located (if two of them are not the gpu dma controllers), and to what do they connect at what possible speeds?

5. What is that thin black line between cpu and esram busses? (we now know the cpu can't access esram directly)

6. What is the speed of the bus between gpu mmu and cpu mmu? Who drives the bus?

 

There are other finer details that are unclear in that slide



fatslob-:O said:
selnor1983 said:

The software DX 11.2 wasnt implemented fully until June this year for Xbox One. UE4 wasnot able to take advantage of this software upgrade for Xbox One. Theres no rason why it cant be added in the future.

It was a big hit at Siggraph and Unreal Engine 4 was initially based on it. Turns out they eventually had to strip it out(quietly) due to them not being able to get it up to speed on next generation consoles and mid-range PC's. However, There's also a plugin for Unity and it runs quite well. The data was being stored in a Sparse Voxel Octree. A 3D, layered, voxel grid. Traversing this grid is very slow.

To get roun this original UE4 method????

Instead of using voxels to store the data, they're using a 3D texture. Like a cube, or a voxel, but stored as an array of 2D textures. Now it was fast, but this had some problems of its own.


The 3D textures were big and required a lot of memory. To fix this partial resident textures are being used. Partial Resident textures are where it chops up an enormous texture into tiny little tiles, and streams only what is needed, saving both RAM and bandwidth.

 So powerful you can store textures as big as 3GB in 16MB of RAM(or eSRAM?).

 DirectX 11.2 and the X1 chip architecture is built for doing partial resident resources in hardware. Removes the limitations other software implementations had, which held some engines back, such as John Carmack's Rage.

The X1's architecture and data move engines have tile and untile features natively, in hardware.

Both AMD GPUs in PS4 and Xbox One support partial resident textures, but we know for a fact Microsoft added additional dedicated hardware in the X1 architecture to focus on this area beyond AMD's standard implementations. Where Sony did not.

MS talked about partial resident resources in their DirectX build conference. They explained the move to partial resident resources as a solution. It just might end up being even more important than originally believed. And more recently an unnamed third party developer is touting better ray tracing capabilities on the X1:


Xbox One does, however, boast superior performance to PS4 in other ways. “Let’s say you are using procedural generation or raytracing via parametric surfaces – that is, using a lot of memory writes and not much texturing or ALU – Xbox One will be likely be faster,” said one developer.
http://www.edge-online.com/news/pow...erences-between-ps4-and-xbox-one-performance/

DirectX11.2 was only recently unveiled earlier this year. No launch games would have been designed for this. Partial resident textures are still a fairly new technique, and just now getting supported in hardware. Voxel cone ray tracing is also a fairly new implementation. And the alternative of using 3D texture along with partially resident textures is even newer, and not many have attempted it. Developers will certainly need time to start messing around with both.



No excuses and I don't want to keep hearing more rubbish from you especially considering the fact that you likely don't understand any of this terminology like ethomaz or adinnieken. 

BTW the API wasn't stopping the guys from epic games from being able to impliment SVOGI on the gtx 680 so why should they have trouble trying to port it to the xbone considering they were able to do it on the PS4. 


Wait a second u telling me ethomaz knows anything I'll bet u £10000 he knows nothing too I don't care wat he does he's just on Sony side and always bashing Xbox one 

 

User was banned for this post.

yo_john117



fatslob-:O said:
selnor1983 said:

No excuses and I don't want to keep hearing more rubbish from you especially considering the fact that you likely don't understand any of this terminology like ethomaz or adinnieken. 

BTW the API wasn't stopping the guys from epic games from being able to impliment SVOGI on the gtx 680 so why should they have trouble trying to port it to the xbone considering they were able to do it on the PS4. 

I do follow technology alot.

Mid range PC's also struggled. PS4 stuggled because it relied entirely on software implementation. MIcrosoft has built it into its hardware. Those Move engines ( co processors ) are native harware that tile and untile. Xbox ONe has this built into its design. Nvidia and AMD are providing hardware for this in PC's also. This is great news for us as gamers. And ayone who wants to see real graphical leaps. But its not been used in game creating before. Hardware hasnt been built around this before.

Do you even know what the move engines and eSRAM are there for ? (Note: I'm willing to bet that you most likely don't know what those things do.) 


I will bet with that your lover ethomaz knows nothing too

 

User was banned for this post.

yo_john117



Around the Network
drkohler said:
fatslob-:O said:

The truth pretty much lies in that hotchips conference slide. 

It honestly doesn't get any easier than this to my eyes. 

There are more questions than answers on this slide. Just a few of them:

1. Does the cpu access the ddr3 with 30GB/s max? What with the rest of the 68GB/s?

2. The memory transfer between gpu and esram is a mystery. You CANNOT transfer 109GB/s in both directions the same time with only 4 memory controllers.

3. Where are the two dma controllers of the gpu? Are they in reality the two "swizzle copy" dme engines?

4. Where are the 4 dme engines located (if two of them are not the gpu dma controllers), and to what do they connect at what possible speeds?

5. What is that thin black line between cpu and esram busses? (we now know the cpu can't access esram directly)

6. What is the speed of the bus between gpu mmu and cpu mmu? Who drives the bus?

 

There are other finer details that are unclear in that slide

1. Yes the cpu does access the whole DDR3 at 30GB/s max and I believe that 68GB/s was meant for the GPU to the DDR3. 

2. Ehh I wouldn`t worry too much about the bandwidth speeds considering that eSRAM is only used too save bandwidth from the GPU having to access the precious 68GB/s.

3. You are right about it missing the DMA on the diagram but I'm willing to bet that it would have to there on the GPU otherwise it wouldn't be able to access the system memory and we all how dumb it would be for the GPU to not have access to the DRAM. 

4. You definitely have a point about the move engines not being on this diagram but I don't see it being much of use but I do think that it's used for faster copying of data between the GPU and CPU. 

5. I don't think that black line is anything but a mistake plus that interview with the architect of the xbone basically disclosed the eSRAM as nothing more than an evolution of the eDRAM so I think the GPU is the only one with the access to it. 

6. I don't think there are any buses between the each of the MMU's considering the diagram doesn't show the MMU's being connected together. 



realgamer said:
fatslob-:O said:
selnor1983 said:
fatslob-:O said:
selnor1983 said:

The software DX 11.2 wasnt implemented fully until June this year for Xbox One. UE4 wasnot able to take advantage of this software upgrade for Xbox One. Theres no rason why it cant be added in the future.

It was a big hit at Siggraph and Unreal Engine 4 was initially based on it. Turns out they eventually had to strip it out(quietly) due to them not being able to get it up to speed on next generation consoles and mid-range PC's. However, There's also a plugin for Unity and it runs quite well. The data was being stored in a Sparse Voxel Octree. A 3D, layered, voxel grid. Traversing this grid is very slow.

To get roun this original UE4 method????

Instead of using voxels to store the data, they're using a 3D texture. Like a cube, or a voxel, but stored as an array of 2D textures. Now it was fast, but this had some problems of its own.


The 3D textures were big and required a lot of memory. To fix this partial resident textures are being used. Partial Resident textures are where it chops up an enormous texture into tiny little tiles, and streams only what is needed, saving both RAM and bandwidth.

 So powerful you can store textures as big as 3GB in 16MB of RAM(or eSRAM?).

 DirectX 11.2 and the X1 chip architecture is built for doing partial resident resources in hardware. Removes the limitations other software implementations had, which held some engines back, such as John Carmack's Rage.

The X1's architecture and data move engines have tile and untile features natively, in hardware.

Both AMD GPUs in PS4 and Xbox One support partial resident textures, but we know for a fact Microsoft added additional dedicated hardware in the X1 architecture to focus on this area beyond AMD's standard implementations. Where Sony did not.

MS talked about partial resident resources in their DirectX build conference. They explained the move to partial resident resources as a solution. It just might end up being even more important than originally believed. And more recently an unnamed third party developer is touting better ray tracing capabilities on the X1:


Xbox One does, however, boast superior performance to PS4 in other ways. “Let’s say you are using procedural generation or raytracing via parametric surfaces – that is, using a lot of memory writes and not much texturing or ALU – Xbox One will be likely be faster,” said one developer.
http://www.edge-online.com/news/pow...erences-between-ps4-and-xbox-one-performance/

DirectX11.2 was only recently unveiled earlier this year. No launch games would have been designed for this. Partial resident textures are still a fairly new technique, and just now getting supported in hardware. Voxel cone ray tracing is also a fairly new implementation. And the alternative of using 3D texture along with partially resident textures is even newer, and not many have attempted it. Developers will certainly need time to start messing around with both.



No excuses and I don't want to keep hearing more rubbish from you especially considering the fact that you likely don't understand any of this terminology like ethomaz or adinnieken. 

BTW the API wasn't stopping the guys from epic games from being able to impliment SVOGI on the gtx 680 so why should they have trouble trying to port it to the xbone considering they were able to do it on the PS4. 

I do follow technology alot.

Mid range PC's also struggled. PS4 stuggled because it relied entirely on software implementation. MIcrosoft has built it into its hardware. Those Move engines ( co processors ) are native harware that tile and untile. Xbox ONe has this built into its design. Nvidia and AMD are providing hardware for this in PC's also. This is great news for us as gamers. And ayone who wants to see real graphical leaps. But its not been used in game creating before. Hardware hasnt been built around this before.

Do you even know what the move engines and eSRAM are there for ? (Note: I'm willing to bet that you most likely don't know what those things do.) 


I will bet with that your lover ethomaz knows nothing too

LOL somebody here is mad. I already stated that ethomaz likely know little about hardware so why don't you read my other posts in the other threads. No need to get defensive over the internet. 



fatslob-:O said:

1. Yes the cpu does access the whole DDR3 at 30GB/s max and I believe that 68GB/s was meant for the GPU to the DDR3. 

2. Ehh I wouldn`t worry too much about the bandwidth speeds considering that eSRAM is only used too save bandwidth from the GPU having to access the precious 68GB/s.

3. You are right about it missing the DMA on the diagram but I'm willing to bet that it would have to there on the GPU otherwise it wouldn't be able to access the system memory and we all how dumb it would be for the GPU to not have access to the DRAM. 

4. You definitely have a point about the move engines not being on this diagram but I don't see it being much of use but I do think that it's used for faster copying of data between the GPU and CPU. 

5. I don't think that black line is anything but a mistake plus that interview with the architect of the xbone basically disclosed the eSRAM as nothing more than an evolution of the eDRAM so I think the GPU is the only one with the access to it. 

6. I don't think there are any buses between the each of the MMU's considering the diagram doesn't show the MMU's being connected together. 

The key part of the diagram is the green rectangle labeled "Host Guest gpu MMU". Lets call it a crossbar switch with tristate buffers and with "stuff" controlling it.

1/2/3. The gpu accesses the ddr3 ram with a maximum of approx 54.6GB/s (unless the memory controllers are "free running" at either 853MHz or 1066MHz). The green rectangle can do alot more (and I mean ALOT more) than what you think. (Note that the ms engineer calls the gpu memory controllers "internal")

4. I meant the four "dme engines". Obviously one of them pushes peripheral data (Kinect, audio, hd). I'd assume at least some of the four dme engines are arbitrated by the "green rectangle"

5. The esram can be accessed by a lot of units via the "green rectangle". The cpu is not one of them (since it is an amd part with apparently ddr3 controller only). Hence the cpu has to do some routing to get to the esram. (Note that the ms engineer calls the cpu memory controllers "external")

6. There is a blue bus between the two mmus

So what is the "green rectangle" labeled Host-Guest gpu mmu? Think of it as a train station where several tracks lead into the station and several tracks lead out of the station. A train incoming on any track can leave the station at any other track. If two trains are incoming, they can do it and leave at the same time if their paths do not cross. Same with data transfers on the XBox.

So what is the difference between "internal" and "external" memory controllers. The four 64bit memory "external" controllers of the cpu are actually (1+3) ddr3 controllers residing near the cpu. This means that controller 1 accesses bytes 0..7 beginning at some address, the other three controllers 2-4 have no free will, they will always access bytes 8..31 (basically there is one master and three slaves, just like in any gpu or cpu). The four 256bit memory "internal" controllers are actually (4+0) sram and ddr3 capable controllers residing near/in the "green rectangle", and, independantly from one another, each of the four controllers can fully address a block of 8MByte of esram (notice there is NO 32MByte block of esram in the XBox One, there are 4 discretely addressable 8MByte blocks of esram). An internal controller can also access the ddr3 ram (assuming at 853MHz). These independant controllers allow that esram can be read and written to at the same time as each controller addresses its own 8MByte block.

What you can also do is the following freak simultaneous memory shuffling:

a) 3 of the internal mcs access the esram at a total of 82GB/s (853MHz). The fourth internal mc accesses the ddr3 at 52.6GB/s (853MHz). A dme copies 15.4GB/s from ddr3 to the fourth esram block. This gives you a maximum data transfer of around 150GB/s.

b) All 4 of the internal mcs access the esram, the cpu accesses the ddr3 at 30GB/s. This gives you 139GB/s data transfer speed.

These are the numbers I see logical. The article writes about 109G/s reads and 109G/s writes to the esram at the same time and gets to 204G/s with dubious "corrections" (even talking about read-modify-write cycles which would be just about the absolute worst thing to do in a multiprocessor environment). Unfortunatly there are only four internal memory controllers, and a memory controller can operate in one direction only at any one given moment (as soon as the address bus is "reserved"). So I wonder how they get there...



drkohler said:
fatslob-:O said:

1. Yes the cpu does access the whole DDR3 at 30GB/s max and I believe that 68GB/s was meant for the GPU to the DDR3. 

2. Ehh I wouldn`t worry too much about the bandwidth speeds considering that eSRAM is only used too save bandwidth from the GPU having to access the precious 68GB/s.

3. You are right about it missing the DMA on the diagram but I'm willing to bet that it would have to there on the GPU otherwise it wouldn't be able to access the system memory and we all how dumb it would be for the GPU to not have access to the DRAM. 

4. You definitely have a point about the move engines not being on this diagram but I don't see it being much of use but I do think that it's used for faster copying of data between the GPU and CPU. 

5. I don't think that black line is anything but a mistake plus that interview with the architect of the xbone basically disclosed the eSRAM as nothing more than an evolution of the eDRAM so I think the GPU is the only one with the access to it. 

6. I don't think there are any buses between the each of the MMU's considering the diagram doesn't show the MMU's being connected together. 

The key part of the diagram is the green rectangle labeled "Host Guest gpu MMU". Lets call it a crossbar switch with tristate buffers and with "stuff" controlling it.

1/2/3. The gpu accesses the ddr3 ram with a maximum of approx 54.6GB/s (unless the memory controllers are "free running" at either 853MHz or 1066MHz). The green rectangle can do alot more (and I mean ALOT more) than what you think. (Note that the ms engineer calls the gpu memory controllers "internal")

4. I meant the four "dme engines". Obviously one of them pushes peripheral data (Kinect, audio, hd). I'd assume at least some of the four dme engines are arbitrated by the "green rectangle"

5. The esram can be accessed by a lot of units via the "green rectangle". The cpu is not one of them (since it is an amd part with apparently ddr3 controller only). Hence the cpu has to do some routing to get to the esram. (Note that the ms engineer calls the cpu memory controllers "external")

6. There is a blue bus between the two mmus

So what is the "green rectangle" labeled Host-Guest gpu mmu? Think of it as a train station where several tracks lead into the station and several tracks lead out of the station. A train incoming on any track can leave the station at any other track. If two trains are incoming, they can do it and leave at the same time if their paths do not cross. Same with data transfers on the XBox.

So what is the difference between "internal" and "external" memory controllers. The four 64bit memory "external" controllers of the cpu are actually (1+3) ddr3 controllers residing near the cpu. This means that controller 1 accesses bytes 0..7 beginning at some address, the other three controllers 2-4 have no free will, they will always access bytes 8..31 (basically there is one master and three slaves, just like in any gpu or cpu). The four 256bit memory "internal" controllers are actually (4+0) sram and ddr3 capable controllers residing near/in the "green rectangle", and, independantly from one another, each of the four controllers can fully address a block of 8MByte of esram (notice there is NO 32MByte block of esram in the XBox One, there are 4 discretely addressable 8MByte blocks of esram). An internal controller can also access the ddr3 ram (assuming at 853MHz). These independant controllers allow that esram can be read and written to at the same time as each controller addresses its own 8MByte block.

What you can also do is the following freak simultaneous memory shuffling:

a) 3 of the internal mcs access the esram at a total of 82GB/s (853MHz). The fourth internal mc accesses the ddr3 at 52.6GB/s (853MHz). A dme copies 15.4GB/s from ddr3 to the fourth esram block. This gives you a maximum data transfer of around 150GB/s.

b) All 4 of the internal mcs access the esram, the cpu accesses the ddr3 at 30GB/s. This gives you 139GB/s data transfer speed.

These are the numbers I see logical. The article writes about 109G/s reads and 109G/s writes to the esram at the same time and gets to 204G/s with dubious "corrections" (even talking about read-modify-write cycles which would be just about the absolute worst thing to do in a multiprocessor environment). Unfortunatly there are only four internal memory controllers, and a memory controller can operate in one direction only at any one given moment (as soon as the address bus is "reserved"). So I wonder how they get there...

I believe that green rectangle represents a away for for the GPU and coprocessors to access the system memory. What do you mean by an "external memory controller" ? As far as I know memory controllers have almost always being integrated on to the silicon ever since the core 2 duo days. 



Adinnieken said:
ethomaz said:

Hello guys... I did a litte vacation this weekend... so I will try to resume what I read.

HIGHLIGHTS OR KEY POINTS

  • 15 co-porcessors listed: eight inside the audio block, four move engines, one video encode, one video decode and one video compositor/resizer... SO NOTHING TO HELP GRAPHICS, NO DGPU, NO SPECIAL SAUCE.

I disagree.  By offloading the video encode, decode, and resizing to a separate processor, you free the GPU and CPU up.  Likewise with the move engines and audio block (CPU).

They're what Microsoft calls "Free" resources.  Meaning, developers don't have to program for them specifically, the API is built so that when certain functions are called they automatically take advantage of the features.  Behind the scenes, the API is doing the work so that the developer doesn't have to write custom code to use it.

  • Co-processors are mostly reserved to Kinect and SystemOS.

 No they aren't.  There is a processor within the audio block used by Kinect.  The system will take advantage of some of the processors for various functions, but games will also take advantage of those processors as well.  The Move engines, as an example are part of the GPU and thus are utilized during graphical rendering.

  • Each game is shipped with a OS... so the virtual machine for Games only run the OS together with the game.

Um...no they didn't say that.

  • They confirme the CPU is weak and the biggest problem for fps drops.
  • No, they didn't say that.  They said the CPU is a bottleneck for frames per second in any given system.  Even for the PS4 that's true.


    Seeing as PS4 and XB1 are running similar 8 core jagaurs and ps4 has "50% faster" gpu. Doesn't this mean the ps4 will have a severe bottleneck?