Forums - Microsoft Discussion - Xbox Series Velocity Architecture detailed

Xbox Velocity Architecture: A Closer Look at the Next-Gen Tech Driving Gaming Innovation Forward on Xbox Series X

When we set out to design the Xbox Series X, we aspired to build our most powerful console ever powered by next generation innovation and delivering consistent, sustained performance never before seen in a console with no compromises. To achieve this goal, we knew we needed to analyze each component of the system, to push beyond the limitations in traditional console performance and design. It was critical in the design of the Xbox Series X to ensure we had a superior balance of power, speed and performance while ensuring no component would constrain the creative ambition of the world’s best creators, empowering them to deliver truly transformative next gen gaming experiences not possible in prior console generations.

At the heart of the Xbox Series X is our custom processor leveraging the latest RDNA 2 and Zen 2 architectures from our partners at AMD to deliver a best in class next generation processor delivering more than 12 TFLOPs of GPU power and more than 4 times the CPU processing power of the Xbox One X. Xbox Series X includes the highest memory bandwidth of any next generation console with 16GB of GDDR6 memory, including 10GB of GPU optimized memory at 560 GB/s to keep the processor fed with no bottlenecks. As we analyzed the storage subsystem, it became clear that we had reached the upper limits of traditional hard drive technology and to deliver on our design aspirations, we would need to radically rethink and revolutionize our approach with the Xbox Series X.

Empowering Next Generation Game Design and Creative Vision

Modern games require a significant amount of data to create the realistic worlds and universes that gamers experience. To enable the processor to work at its optimum performance, all of this data must be loaded from storage into memory. The explosion of massive, dynamic open-world environments and living, persistent worlds with increased density and variety has only increased the amount of data required. From environmental mesh data, high polygon character models, high resolution textures, animation data, audio and video source files and more all combine together to deliver the most immersive game play environment for the player.

Despite the ability for modern game engines and middleware to stream game assets into memory off of local storage, level designers are still often required to create narrow pathways, hallways, or elevators to work around the limitations of a traditional hard drive and I/O pipeline. These in-game elements are often used to mask the need to unload the prior zone’s assets from memory while loading in new assets for the next play space. As we discussed developers’ aspirations for their next generation titles and the limitations of current generation technology, this challenge would continue to increase exponentially and further constrain the ambition for truly transformative games. This feedback influenced the design and development of the Xbox Velocity Architecture.

Introducing the Xbox Velocity Architecture

The Xbox Velocity Architecture was designed as the ultimate solution for game asset streaming in the next generation. This radical reinvention of the traditional I/O subsystem directly influenced all aspects of the Xbox Series X design. If our custom designed processor is at the heart of the Xbox Series X, the Xbox Velocity Architecture is the soul. Through a deep integration of hardware and software innovation, the Xbox Velocity Architecture will power next-gen gaming experiences unlike anything you have seen before.

The Xbox Velocity Architecture comprises four major components: our custom NVME SSD, hardware accelerated decompression blocks, a brand new DirectStorage API layer and Sampler Feedback Streaming (SFS).

Let’s dive deep into each component:

  • Custom NVME SSD: The foundation of the Xbox Velocity Architecture is our custom, 1TB NVME SSD, delivering 2.4 GB/s of raw I/O throughput, more than 40x the throughput of Xbox One. Traditional SSDs used in PCs often reduce performance as thermals increase or while performing drive maintenance. The custom NVME SSD in Xbox Series X is designed for consistent, sustained performance as opposed to peak performance. Developers have a guaranteed level of I/O performance at all times and they can reliably design and optimize their games removing the barriers and constraints they have to work around today. This same level of consistent, sustained performance also applies to the Seagate Expandable Storage Card ensuring you have the exact same gameplay experience regardless of where the game resides.
  • Hardware Accelerated Decompression: Game packages and assets are compressed to minimize download times and the amount of storage required for each individual game. With hardware accelerated support for both the industry standard LZ decompressor as well as a brand new, proprietary algorithm specifically designed for texture data named BCPack, Xbox Series X provides the best of both worlds for developers to achieve massive savings with no loss in quality or performance. As texture data comprises a significant portion of the total overall size of a game, having a purpose built algorithm optimized for texture data in addition to the general purpose LZ decompressor, both can be used in parallel to reduce the overall size of a game package. Assuming a 2:1 compression ratio, Xbox Series X delivers an effective 4.8 GB/s in I/O performance to the title, approximately 100x the I/O performance in current generation consoles. To deliver similar levels of decompression performance in software would require more than 4 Zen 2 CPU cores.
  • New DirectStorage API: Standard File I/O APIs were developed more than 30 years ago and are virtually unchanged while storage technology has made significant advancements since then. As we analyzed game data access patterns as well as the latest hardware advancements with SSD technology, we knew we needed to advance the state of the art to put more control in the hands of developers. We added a brand new DirectStorage API to the DirectX family, providing developers with fine grain control of their I/O operations empowering them to establish multiple I/O queues, prioritization and minimizing I/O latency. These direct, low level access APIs ensure developers will be able to take full advantage of the raw I/O performance afforded by the hardware, resulting in virtually eliminating load times or fast travel systems that are just that . . . fast.
  • Sampler Feedback Streaming (SFS): Sampler Feedback Streaming is a brand-new innovation built on top of all the other advancements of the Xbox Velocity Architecture. Game textures are optimized at differing levels of detail and resolution, called mipmaps, and can be used during rendering based on how close or far away an object is from the player. As an object moves closer to the player, the resolution of the texture must increase to provide the crisp detail and visuals that gamers expect. However, these larger mipmaps require a significant amount of memory compared to the lower resolution mips that can be used if the object is further away in the scene. Today, developers must load an entire mip level in memory even in cases where they may only sample a very small portion of the overall texture. Through specialized hardware added to the Xbox One X, we were able to analyze texture memory usage by the GPU and we discovered that the GPU often accesses less than 1/3 of the texture data required to be loaded in memory. A single scene often includes thousands of different textures resulting in a significant loss in effective memory and I/O bandwidth utilization due to inefficient usage. With this insight, we were able to create and add new capabilities to the Xbox Series X GPU which enables it to only load the sub portions of a mip level into memory, on demand, just in time for when the GPU requires the data. This innovation results in approximately 2.5x the effective I/O throughput and memory usage above and beyond the raw hardware capabilities on average. SFS provides an effective multiplier on available system memory and I/O bandwidth, resulting in significantly more memory and I/O throughput available to make your game richer and more immersive.

Through the massive increase in I/O throughput, hardware accelerated decompression, DirectStorage, and the significant increases in efficiency provided by Sampler Feedback Streaming, the Xbox Velocity Architecture enables the Xbox Series X to deliver effective performance well beyond the raw hardware specs, providing direct, instant, low level access to more than 100GB of game data stored on the SSD just in time for when the game requires it. These innovations will unlock new gameplay experiences and a level of depth and immersion unlike anything you have previously experienced in gaming.

Unlocking Next Generation Experiences

What does this all mean for you as a gamer? As the industry’s most creative developers and middleware companies have begun to explore these new capabilities, we expect significant innovation throughout the next generation as this revolutionary new architecture enables entirely new scenarios never before considered possible in gaming.  The Xbox Velocity Architecture provides a new level of performance and capabilities well beyond the raw specifications of the hardware itself. The Xbox Velocity Architecture fundamentally rethinks how a developer can take advantage of the hardware provided by the Xbox Series X. From entirely new rendering techniques to the virtual elimination of loading times, to larger,  more dynamic living worlds where, as a gamer, you can choose how you want to explore, we can’t be more excited by the early results we are already seeing. In addition, the Xbox Velocity Architecture has opened even more opportunities and enabled new innovations at the platform level, such as Quick Resume which enables you to instantly resume where you left off across multiple games, improving the overall gaming experience for all gamers on Xbox Series X.

We can’t wait for gamers around the world to get to experience these new, next generation gaming experiences on Xbox Series X this holiday and beyond. For more information on the Xbox Velocity Architecture, check out the video above.

https://news.xbox.com/en-us/2020/07/14/a-closer-look-at-xbox-velocity-architecture/

Last edited by shikamaru317 - on 14 July 2020

Around the Network

TLDR version

Velocity Architecture is composed of 4 key components:

  • Custom NVMe SSD- Able to output a sustained speed of 2.4 GB/s, rather than peak speed. The Seagate expandable storage cards are the same speed as the internal SSD so games will run the same regardless of which they are stored on. Raw throughput is 40x faster than the peak speed of Xbox One's 5400 RPM hard drive.
  • Hardware accelerated decompression- Xbox Series X has custom hardware designed specifically for decompression. This allows assets to be compressed on the SSD, and then decompressed before being fed into the memory and GPU, effectively doubling the SSD speed to 4.8 GB/s. The custom hardware MS designed for decompression frees up 4 of the Zen 2 CPU cores for use on other things.
  • DirectStorage API- File Storage API's have largely remained the same for more than 30 years. However, DirectStorage gives devs fine tuned control of the I/O, allow them to setup multiple I/O queues, and minimize I/O latency.
  • Sampler Feedback Streaming- MS discovered that GPU's are only using about 1/3rd of the textures loaded onto the RAM in any given scene, which is very inefficient. SFS allows only the textures that are needed for any given scene to be loaded onto the RAM just in time for the GPU to use them, effectively increasing I/O output speed and RAM amount by 2.5x when it comes to textures.
Last edited by shikamaru317 - on 14 July 2020

shikamaru317 said:

Let’s dive deep into each component:

  • The custom NVME SSD in Xbox Series X is designed for consistent, sustained performance as opposed to peak performance.
  • Assuming a 2:1 compression ratio, Xbox Series X delivers an effective 4.8 GB/s in I/O performance to the title
  • We added a brand new DirectStorage API to the DirectX family
  • Sampler Feedback Streaming is a brand-new innovation

So instead of just posting another PR fluff piece, you could have commented on the key information hidden in all the PR speak:

1. If the ssd is designed not for peak performance, what are the actual, measured average transfer rates? Still waiting for actual numbers..

2. Obviously the same wishful PR speak, complicated by multiplying two peak numbers together. Again, I want to see actual, measured average transfer rates.

3. And this runs on the Zen2 cores. So what is the actual, measured penalty for this (including all side effects like possibly bombing cpu caches)?

4. Yeah, really. Nobody ever figured out that one should only load data that is actually used. What software lab geniuses are we talking about here? This has been done on the AppleII (for obvious reasons) 50 years ago.



drkohler said:
shikamaru317 said:

Let’s dive deep into each component:

  • The custom NVME SSD in Xbox Series X is designed for consistent, sustained performance as opposed to peak performance.
  • Assuming a 2:1 compression ratio, Xbox Series X delivers an effective 4.8 GB/s in I/O performance to the title
  • We added a brand new DirectStorage API to the DirectX family
  • Sampler Feedback Streaming is a brand-new innovation

So instead of just posting another PR fluff piece, you could have commented on the key information hidden in all the PR speak:

1. If the ssd is designed not for peak performance, what are the actual, measured average transfer rates? Still waiting for actual numbers..

2. Obviously the same wishful PR speak, complicated by multiplying two peak numbers together. Again, I want to see actual, measured average transfer rates.

3. And this runs on the Zen2 cores. So what is the actual, measured penalty for this (including all side effects like possibly bombing cpu caches)?

4. Yeah, really. Nobody ever figured out that one should only load data that is actually used. What software lab geniuses are we talking about here? This has been done on the AppleII (for obvious reasons) 50 years ago.

1. 2.4 GB/s is the sustained transfer speed of the SSD, not peak, it says it right in the article. 

2. Sony did the exact same thing, they just doubled their SSD number for their compressed speed.

3. 3 is an API, not sure why you would think the APU being used for storage would hurt the CPU performance

4. Knowing and doing are two different things. I would assume that it has been known for a long time that GPU's were having issues with texture usage efficiency, but it was only now that somebody has finally came up with a way to allow only partial mipmaps to be loaded into the memory at any given time, instead of the full mipmaps. The article implies predictive tech here, saying that it loads only the textures that are needed into the RAM just in time for them to be used.



Good detail, very similar to what Sony have done, but don't seem to mention direct access that doesn't go through the I/O for some functions.
Also they abused a little on mixed PR adjectives that are verified not true.
@drkohler
1. The 2.4 is the performance design, not peak, cold, etc, it is always 2.4
2. Yes the multiplying of peaks can be ignored, 2.4 is the speed, everything else will vary based on the application
3. No it doesn't use Zen2 cores, he is explaining that the decompression block perform an activity that would use otherwise use many Zen2 cores (PS5 have the same with their I/O and decompression)
4. Loading only the texture only when required and at the detail needed isn't that old. Remember that because of the HDD speed a lot of stuff needed to be already loaded on the RAM (even when not showing) what was avoided was rendering what isn't seem. So now with faster SSD and new techniques the texture isn't even on the RAM before needed and also it will load at the level of detail it is needed.



duduspace11 "Well, since we are estimating costs, Pokemon Red/Blue did cost Nintendo about $50m to make back in 1996"

http://gamrconnect.vgchartz.com/post.php?id=8808363

Mr Puggsly: "Hehe, I said good profit. You said big profit. Frankly, not losing money is what I meant by good. Don't get hung up on semantics"

http://gamrconnect.vgchartz.com/post.php?id=9008994

Around the Network

Sampler feedback streaming sounds promising to me. It sounds like it's entirely software based, which means any improvements it brings could theoretically also be brought to PC, right?



I wonder what's wrong with the consoles' GPUs when all they're talking about is the storage.



If you demand respect or gratitude for your volunteer work, you're doing volunteering wrong.

Trunkin said:
Sampler feedback streaming sounds promising to me. It sounds like it's entirely software based, which means any improvements it brings could theoretically also be brought to PC, right?

It still needs a fast I/O to really make it happen I guess. But yes probably can be done in the PC, seems like an API feature.



duduspace11 "Well, since we are estimating costs, Pokemon Red/Blue did cost Nintendo about $50m to make back in 1996"

http://gamrconnect.vgchartz.com/post.php?id=8808363

Mr Puggsly: "Hehe, I said good profit. You said big profit. Frankly, not losing money is what I meant by good. Don't get hung up on semantics"

http://gamrconnect.vgchartz.com/post.php?id=9008994

2-3x bandwidth. So lets say 2.5x multiplied effect of an average of the 2.5Gb/s SSD. We're looking at around 6Gb/s I/O. Pretty nice stuff.

Looking forward to seeing Direct Storage for PC.



hinch said:

2-3x bandwidth. So lets say 2.5x multiplied effect of an average of the 2.5Gb/s SSD. We're looking at around 6Gb/s I/O. Pretty nice stuff.

Looking forward to seeing Direct Storage for PC.

I may be mistaken, but it seems to me like MS is claiming:

  • 2.4 GB/s (base SSD speed) x 2 (decompression)= 4.8 GB/s
  • 2.5x effective I/O increase for textures (Sampler Feedback Streaming)= 12 GB/s

By comparison, Sony is claiming:

  • 5.5 GB/s (base SSD speed)
  • 8-9 GB/s decompression speed (PS5 has dedicated decompression hardware, same as XSX, so that CPU resources aren't wasted on decompression, though it seems like their decompression multiplier is less than the 2x that MS is claiming for XSX)

If the above is true, MS will have the advantage on texture streaming, assuming of course that Sampler Feedback Streaming works as advertised, and assuming that Sony doesn't have a similar technique in the works for only loading partial mipmaps into the RAM at just the right time (which they might). However, even if SFS works as advertised, it will only help with texture streaming, it won't help with load times. Sony will still have the load time advantage since their SSD is faster and they also have dedicated decompression hardware, PS5 will be able to load compressed game data from the drive at 8-9 GB/s, compared to 4.8 GB/s for XSX.

Last edited by shikamaru317 - on 14 July 2020