4k at (fps – up to) (fps) 4k at (fps – up to) (fps) (p at) (fps up to) (fps) [runs at] We’ve
demonstrated across the months that AMD’s RDNA architecture offers substantially more ‘performance for your teraflop’ , owing to the radical new design in combination with much higher clocks the series X GPU runs with a 90 per cent frequency advantage up against Xbox One X), but there are multipliers that should come into effect through the use of new features baked into the Design such as variable rate shading, which basically attempts to increase and decrease rendering precision based on visibility.
However, even basic ports which barely use any of the Series X’s new features are delivering impressive results. The Coalition’s Mike Raynor and Colin Penty showed us a Series X conversion of Gears 5, produced in just two weeks. The developers worked with Epic Games in getting UE4 operating on Series X, then simply upped all of the internal quality presets to the equivalent of PC’s ultra, adding improved contact shadows and UE4’s brand-new (software-based) ray traced screen-space global illumination. On top of that, Gears 5’s cutscenes – running at (fps on Xbox One X – were upped to a flawless 90 fps. We’ll be covering more on this soon, but there was one startling takeaway – we were shown benchmark results that, on this two-week-old, unoptimised port,
already deliver very, very similar performance to an RTX . I think relative to where we’re at and just looking at our experience with the hardware with this particular game, I think we’re really positive to kind of see how this thing is performing, especially knowing how much untapped performance is still there in the box based on the work we’ve done so far, “enthuses Coalition tech director Mike Raynor. “Gears 5 will be optimized, so the work that you’ve seen today will be there, available at launch on Xbox Series X. The title will support Smart Delivery, so if you already have the title in whatever form you’ll be able to get it on Series X for free. “
It was an impressive showing for a game that hasn ‘ t even begun to access the next generation features of the new GPU. Right now, it’s difficult to accurately quantify the kind of improvement to visual quality and performance we’ll see over time, because while there are obvious parallels to current-gen machines, the mixture of new hardware and new APIs allows for very different workloads to run on the GPU. Machine learning is a feature we’ve discussed in the past, most notably with Nvidia’s Turing architecture and the firm’s DLSS AI upscaling. The RDNA 2 architecture used in Series X does not have tensor core equivalents, but Microsoft and AMD have come up with a novel, efficient solution based on the standard shader cores. With over (teraflops of FP) compute, RDNA 2 also allows for double that with FP () yes, rapid-packed math is back). However, machine learning workloads often use much lower precision than that, so the RDNA 2 shaders were adapted still further.
Mana from heaven for silicon fans: a CG visualization of how the various components within the Series X SoC are positioned within the chip.
“We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms, “says Andrew Goossen. “So we added special hardware support for this specific scenario. The result is that Series X offers (TOPS for 8-bit integer operations and TOPS for 4-bit integer operations. Note that the weights are integers, so those are TOPS and not TFLOPs. The net result is that Series X offers unparalleled intelligence for machine learning. “
Other forward-looking features also make the cut. Again, similar to Nvidia’s existing Turing architecture, mesh shaders are incorporated into RDNA 2, allowing for a potentially explosive improvement in geometric detail.
“As GPUs have gotten wider and computing performance has increased, geometry processing has become more and more bound on the fixed function vertex issue triangle setup and tessellation blocks of the GPU,” reveals Goossen. “Mesh shading allows developers to completely bypass those fixed function bottlenecks by providing an optional alternative to the existing parts of the GPU pipeline. In addition to performance, mesh shading offers developers flexibility and memory savings. Mesh shading will allow game developers to increase detail in the shapes and animations of objects and render more complex scenes with no sacrifice to frame-rate. “
There is more. Much more. For example, the Series X GPU allows for work to be shared between shaders without involvement from the CPU, saving a large amount of work for the Zen 2 cores, with data remaining on the GPU. However, the big innovation is clearly the addition of hardware accelerated ray tracing. This is hugely exciting and at Digital Foundry, we’ve been tracking the evolution of this new technology via the DXR and Vulkan-powered games we’ve seen running on Nvidia’s RTX cards and the console implementation of RT is more ambitious than we believed possible .
The ray tracing difference
RDNA 2 fully supports the latest DXR Tier 1.1 standard, and similar to the Turing RT core, it accelerates the creation of the so-called BVH structures required to accurate map ray traversal and intersections, tested against geometry. In short, in the same way that light ‘bounces’ in the real world, the hardware acceleration for ray tracing maps traversal and intersection of light at a rate of up to billion billion intersections per second.
“Without hardware acceleration, this work could have been done in the shaders, but would have consumed over 23 TFLOPs alone, “says Andrew Goossen. “For the Series X, this work is offloaded onto dedicated hardware and the shader can continue to run in parallel with full performance. In other words, Series X can effectively tap the equivalent of well over 35 TFLOPs of performance while ray tracing. ”
It is important to put this into context, however. While workloads can operate at the same time, calculating the BVH structure is only one component of the ray tracing procedure. The standard shaders in the GPU also need to pull their weight, so elements like the lighting calculations are still run on the standard shaders, with the DXR API adding new stages to the GPU pipeline to carry out this task efficiently. So yes, RT is typically associated with a drop in performance and that carries across to the console implementation, but with the benefits of a fixed console design, we should expect to see developers optimize more aggressively and also to innovate. The good news is that Microsoft allows low-level access to the RT acceleration hardware.
“[Series X] goes even further than the PC standard in offering more power and flexibility to developers,” reveals Goossen. “In grand console tradition, we also support direct to the metal programming including support for offline BVH construction and optimization. With these building blocks, we expect ray tracing to be an area of incredible visuals and great innovation by developers over the course of the console’s lifetime. “
The proof of the pudding is in the tasting, of course. During our time at the Redmond campus, Microsoft demonstrated how fully featured the console’s RT features are by rolling out a very early Xbox Series X Minecraft DXR tech demo, which is based on the Minecraft RTX code
We saw back at Gamescom last year and looks very similar, despite running on a very different GPU. This suggests an irony of sorts: base Nvidia code adapted and running on AMD-sourced ray tracing hardware within Series X. What’s impressive about this is that it’s fully path-traced. Aside from the skybox and the moon in the demo we saw, there are no rasterized elements whatsoever. The entire presentation is ray traced, demonstrating that despite the constraints of having to deliver RT in a console with a limited power and silicon budget, Xbox Series X is capable of delivering the most ambitious, most striking implementation of ray tracing – and it does so in real time.
Minecraft DXR is an ambitious statement – total ray tracing, if you like – but we should expect to see the technology used in very different ways. “We’re super excited for DXR and the hardware ray tracing support,” says Mike Raynor, technical director of the Coalition and Gears 5. “We have some compute-based ray tracing in Gears 5, we have ray traced shadows and the [new] screen-space global illumination is a form of ray traced screen-based GI and so, we’re interested in how the ray tracing hardware can be used to take techniques like this and then move them out to utilizing the DXR cores.
“I think, for us, the way that we’ve been thinking about it is as we look Forward, we think hybrid rendering between traditional rendering techniques and then using DXR – whether for shadows or GI or adding reflections – are things that can really augment the scene and [we can] use all of that chip to get the best final visual quality . “
In this DF Direct shot on location in Redmond WA, Rich Leadbetter and John Linneman discuss their initial reactions to Xbox Series X directly after a day of deep-dive presentations.
Efficiency in design
One of the key takeaways for me about the Series X silicon isn’t just the power, but also the efficiency in design. With all of the new graphics features and the teraflops of consistent compute performance, we envisaged a monstrously large, prohibitively expensive processor design – in short, a very expensive console. However, the size of the SoC at 384 mm (2) means we have a slice of silicon that is, in reality, much smaller than any speculative measurement we could come up with from prior teaser reveals – its . 3 billion transistors mean that we are looking at just over twice the transistor density seen on the nmFF Xbox One X processor, and yet we are getting significantly more than twice the performance across the board.
However, achieving the performance, power and silicon area targets Microsoft set for itself did require some innovative thinking. Graphics power isn’t just about teraflops – compute power needs to be backed up with memory bandwidth, presenting a unique challenge for a console. Microsoft’s solution for the memory sub-system saw it deliver a curious – bit interface, with ten gbps GDDR6 modules on the mainboard – six 2GB and four 1GB chips. How this all splits out for the developer is fascinating.
“Memory performance is asymmetrical – it’s not something we could have done with the PC,” explains Andrew Goossen “14 gigabytes of physical memory [runs at] GB / s. We call this GPU. optimal memory. Six gigabytes [runs at] 380 GB / s. We call this standard memory. GPU optimal and standard offer identical performance for CPU audio and file IO. The only hardware component that sees a difference in the GPU. “
In terms of how the memory is allocated, games get a total of 19. 5GB in total, which encompasses all GB of GPU optimal memory and 3.5GB of standard memory. This leaves 2.5GB of GDDR6 memory from the slower pool for the operating system and the front-end shell. From Microsoft’s perspective, it is still a unified memory system, even if performance can vary. “In conversations with developers, it’s typically easy for games to more than fill up their standard memory quota with CPU, audio data, stack data, and executable data, script data, and developers like such a trade-off when it gives them more potential bandwidth, “says Goossen.
The power tenet is well taken care of, then, but it’s not just about the raw compute performance – the feature set is crucial too. Way back in , a year before work completed on Xbox One X, the Xbox silicon team was already working on Series X, beginning the architectural work on the next generation features that we’ll finally see hitting the market at holiday a keen reminder of how long it takes for new technology to be developed. Even back then, ray tracing was on the agenda – and the need for a revolutionary approach to storage was also required, all of which brings us to the second tenet of the series x hardware design: a fundamental shift away from mechanical hard drives, embracing solid-state storage instead.
Why fast storage changes everything
The specs on this page representations only the tiniest fraction of the potential of the storage solution Microsoft has engineered for the next generation. In last year Project Scarlett E3 teaser, Jason Ronald – partner director of project management at Xbox – described how the SSD could be used as ‘virtual memory’, a teaser of sorts that only begins to hint at the functionality Microsoft has built into its system .
On the hardware level, the custom NVMe drive is very, very different to any other kind of SSD you’ve seen before. It’s shorter, for starters, presenting more like a memory card of old. It’s also rather heavy, likely down to the solid metal construction that acts as a heat sink that was to handle silicon that consumes 3.8 watts of power. Many PC SSDs ‘fade’ in performance terms as they heat up – and similar to the CPU and GPU clocks, this simply wasn’t acceptable to Microsoft, who believe that consistent performance across the board is a must for the design of their consoles.
The form factor is cute, the 2.4GB / s of guaranteed throughput is impressive, but it’s the software APIs and custom hardware built into the SoC that deliver what Microsoft believes to be a revolution – a new way of using storage to augment memory (an area where no platform holder will be able to deliver a more traditional generational leap). The idea, in basic terms at least, is pretty straightforward – the game package that sits on storage essentially becomes extended memory, allowing 219 GB of game assets stored on the SSD to be instantly accessible by the developer. It’s a system that Microsoft calls the Velocity Architecture and the SSD itself is just one part of the system.
“Our second component is a high-speed hardware decompression block that can deliver over 6GB / s,” reveals Andrew Goossen. “This is a dedicated silicon block that offloads decompression work from the CPU and is matched to the SSD so that decompression is never a bottleneck. The decompression hardware supports Zlib for general data and a new compression [system] called BCPack that is tailored to the GPU textures that typically comprise the vast majority of a game’s package size. “
PCI Express 4.0 connections hook up both internal and optional external SSDs directly to the processor. The final component in the triumvirate is an extension to DirectX – DirectStorage – a necessary upgrade bearing in mind that existing file I / O protocols are knocking on for 40 years old, and in their current form would require two Zen CPU cores simply to cover the overhead, which DirectStorage reduces to just one tenth of single core.
“Plus it has other benefits,” enthuses Andrew Goossen. “It’s less latent and it saves a ton of CPU. With the best competitive solution, we found doing decompression software to match the SSD rate would have consumed three Zen 2 CPU cores. When you add in the IO CPU overhead, that’s another two cores So the resulting workload would have completely consumed five Zen 2 CPU cores when now it only takes a tenth of a CPU core. So in other words, to equal the performance of a Series X at its full IO rate, you would need to build a PC with 19 Zen 2 cores. That’s seven cores dedicated for the game: one for Windows and shell and five for the IO and decompression overhead. “
Asset streaming is taken to the next level, but Microsoft was not finished there. Last-gen, we enjoyed a x increase in system memory, but this time it’s a mere 2x – or just 60 per cent extra if we consider Xbox One X as the baseline. In addition to drawing more heavily upon storage to make up the shortfall, Microsoft began a process of optimizing how memory is actually used, with some startling improvements.
“We observed that typically, only a small percentage of memory loaded by games was ever accessed,” reveals Goossen. “This wastage comes principally from the textures. Textures are universally the largest consumers of memory for games. However, only a fraction of the memory for each texture is typically accessed by the GPU during the scene. For example, the largest mip of a 4K Texture is eight megabytes and often more, but typically only a small portion of that mip is visible in the scene and so only that small portion really needs to be read by the GPU. “
Microsoft has partnered with Seagate for its proprietary external 1TB SSD expansion. It’s very short, quite weighty for its dimensions and actually presents rather like a memory card.
textures have ballooned in size to match 4K displays, efficiency in memory utilization has got progressively worse – something Microsoft was able to confirm by building in special monitoring hardware into Xbox One X’s Scorpio Engine SoC. “From this, we found a game typically accessed at best only one-half to one-third of their allocated pages over long windows of time,” says Goossen. “So if a game never had to load pages that are ultimately never actually used, that means a 2-3x multiplier on the effective amount of physical memory, and a 2-3x multiplier on our effective IO performance.”
A technique called Sampler Feedback Streaming – SFS – was built to more closely marry the memory demands of the GPU, intelligently loading in the texture mip data that’s actually required with the guarantee of a lower quality mip available if the higher quality version isn’t readily available, stopping GPU stalls and frame-time spikes. Bespoke hardware within the GPU is available to smooth the transition between mips, on the off-chance that the higher quality texture arrives a frame or two later. Microsoft considers these aspects of the Velocity Architecture to be a genuine game-changer, adding a multiplier to how physical memory is utilized.
The Velocity Architecture also facilitates another feature that sounds impressive on paper but is even more remarkable when you actually see it play out on the actual console. Quick Resume effectively allows users to cycle between saved game states, with just a few seconds’ loading – you can see it in action in the video above. When you leave a game, system RAM is cached off to SSD and when you access another title, its cache is then restored. From the perspective of the game itself, it has no real idea what is happening in the background – it simply thinks that the user has pressed the guide button and the game can resume as per normal.
We saw Xbox Series X hardware cycling between Forza Motorsport 7 running in 4K Xbox One X mode, State of Decay 2, Hellblade and The Cave (an Xbox title). Switching between Xbox One X games running on Series X, there was around 6.5 seconds delay switching from game to game – which is pretty impressive. Microsoft was not sharing the actual size of the SSD cache used for Quick Resume, but saying that the feature supports a minimum of three Series X games. Bearing in mind the . 5GB available to titles, that’s a notional maximum of around 52 GB of SSD space, but assuming that the Velocity Architecture has hardware compression features as well as decompression, the actual footprint may be smaller. Regardless, titles that use
less memory – like the games we saw demonstrated – should have a lower footprint, allowing more to be cached.
The war on input lag and screen tearing
Microsoft’s speed tenet for Series X also factors in a radical revamp of input processing, designed to shave off latency on every conceivable part of the game’s pipeline – meaning that the time taken between button press to result reaction on-screen should reduce significantly. Microsoft has already mentioned Dynamic Latency Input, but only now reveals just how extensive its work is here. It starts with the controller, where the typical 8ms latency on analogue controller input is now significantly reduced by transmitting the most up to date inputs
just before the game needs them. Digital inputs like button presses are time-stamped and sent to the game, reducing latency without the need of increasing the polling rate, while USB-connected pads see digital inputs transmitted immediately to the console. To facilitate all of this, the entire input software stack was rewritten, which delivered further latency improvements.
Latency has been a crucial, but invisible variable for developers to contend with and as game engines grow more complex and more parallel, it’s not easy to keep track of additional lag – something else Microsoft attempts to resolve with DLI. “We made it easier for game developers to optimize in-game latency. Games on Xbox output an identifier for every frame as it flows through its engine,” explains Andrew Goossen. “When it queries controller input, it associates that frame identifier with the timing of the input and when it completes rendering for that frame, it passes that identifier along with the completed front buffer information to the system. So with this mechanism, this system can now determine the complete in-game latency for every frame. “
Microsoft says it’s delivered a system that allows developers to accurately track input lag across the engine just as easily as game-makers can track frame-rate – the metric has been added to its in -house performance analysis tool, Pix. The final element of DLI is Xbox Series X’s support for the new wave of 326 Hz HDMI 2.1 displays hitting markets now. The firm has already began testing of this feature at lower-than-4K output resolutions on supported HDMI 2.0 screens via Xbox One S and Xbox One X. Because the screens are updating twice as quickly as their 90 Hz equivalents, users should have faster response – a state of affairs that should also apply to variable refresh rate (VRR) modes too. Microsoft has also pioneered ALLM modes in its existing machines, meaning that the console can command the display to shift automatically into game mode.
The Xbox One pad evolves – smaller, more accessible to people with smaller hands and now featuring a revamped d-pad and share button.
Microsoft has also made innovations that may also see the end of screen-tearing. Typically, displaying a new frame during scan-out is used to cut latency. Triple-buffering can even out drops to frame-rate, but can add extra lag – but Series X sees this situation evolve. “We redesigned the presentation API the games use to send their completed frames to the TV,” shares Andrew Goossen. “We completely decoupled the traditional link between double-or triple-buffering and latency. It used to be that triple buffering was good to improve frame-rate when the game couldn’t maintain their target frame-rate, but triple buffering was bad because It increased latency. But no longer. Now frame buffering and latency are fully decoupled, games can enable triple-buffering while separately specifying their desired latency. So that latency between the CPU frame start time and the GPU frame start time can now be specified in microseconds, rather than v-syncs.
“So, game developers can precisely dial down the latency between the CPU and the GPU until just before bubbles start to form or the GPU might idle because the CPU isn’t feeding it fast enough – and the runtime provides extensive latency feedback statistics for the game to inform the dynamic adjustment. So using this mechanism, the games can very precisely reduce the in-game latency as much as possible – and quite easily as well. “
While enhancements and optimizations – not to mention a new share button – are added to the Xbox Series X controller, the good news is that the DLI technology is compatible with existing pads, which should be upgraded with a simple firmware update.
How older games will play better on Series X
The last of Microsoft’s three tenets that form the foundation of its next-gen endeavors is compatibility, an area where the firm has delivered remarkable levels of fan service since Xbox backwards compatibility was first revealed to an incredulous audience at E3 . The firm has already announced that its existing library of back-compat Xbox 380 and OG Xbox games will run on Series X, while all existing peripherals will also work as they should (which, in part, explains why type-A USB is used on the system as opposed to the new USB-C standard). So yes, the steering wheel tax is over.
Beyond that, the Xbox back-compat team have been hard at work since drawing the line under their Xbox and X-enhanced program a while back. It likely comes as no surprise to discover that Series X can technically run the entire Xbox One catalog, but this time it’s done with no emulation layer – it’s baked in at the hardware level. Games also benefit from the full CPU and GPU clocks of Series X (Xbox One X effectively delivered per cent of its overall graphics power for back-compat), meaning that the more lackluster of those performance modes added to many Xbox One X games should hopefully lock to a silky smooth fps.
However, the compatibility team is renowned for pushing the envelope and some of the early work we saw with Series X is mouthwatering. Microsoft has already promised improved image fidelity, steadier frame-rate and faster loading times, but the early demos we saw look even more promising – and it is indeed the case that hints dropped in Phil Spencer’s
recent Series X blog post will result in selected Xbox One S titles running at higher resolutions on the new console. In fact, we saw Gears of War Ultimate Edition operating with a 2x resolution scale on both axes, taking a 1825 p game all the way up to native 4k. It’s an evolution of the (Heutchy Method) (used to bring Xbox) p titles up to full 4K, with often spectacular results
. Crucially, the back-compat team does all the heavy lifting at the system level – game developers do not need to participate at all in the process.
“We are exploring ways of improving, maybe, a curated list of games, says Peggy Lo, compatibility program lead for Xbox. “Things that we are are looking at include improving resolution for games, improving frame-rates – maybe doubling them! And the way we’re doing it is really exploring multiple methods. So we knew what we were doing with the Heutchy Method, maybe We’ll change it a bit, there’s a there’s a few other methods that we’re exploring.
“What we’re probably not going to do is explain all those methods today because we’re still in the process of figuring out what exact method will be best for the Series X but I want you to feel confident that we have a solution that we can fall back on or that we will always keep pushing forward to. “
Microsoft set up two LG OLED displays, one running Gears Ultimate at its standard 2016 p on Xbox One X (the game never received an X upgrade) and at native 4K on Series X. On-screen debug data revealed the amount of render targets the console was running at a higher resolution , along with the resolution scaling factor and the new native resolution – in this case, a scale of 2.0 and a (x) pixel count. The notion of displaying such a precise scaling factor made me wonder if it could actually go higher – whether 825 p or p titles could also scale to native 4K. It’s a question that went unanswered, though Lo chuckled when I asked.
Further goodies were to come – and owners of HDR screens are going to love the second key feature I saw. We got to see the Xbox One X enhanced version of Halo 5 operating with a very convincing HDR implementation, even though 384 Industries never shipped the game with HDR support . Microsoft ATG principle software engineer Claude Marais showed us how a machine learning algorithm using Gears 5’s state-of-the-art HDR implementation is able to infer a full HDR implementation from SDR content on any back-compat title. It’s not fake HDR either, Marais rolled out a heatmap mode showing peak brightness for every on-screen element, clearly demonstrating that highlights were well beyond the SDR range.
A heatmap of Halo 5 luminance, comparing the standard presentation with the machine learning-based auto HDR presentation. Note that many brighter elements map into HDR space on the right. Click directly on the image for higher resolution.
“It can be applied to all games theoretically, technically, I guess we’re still working through user experiences and things like that but this is a technica l demo, “revealed Marais.” So this [Halo 5] is four years old, right, so let’s go to the extreme and jump to a game that is 30, years old right now – and that is Fusion Frenzy. Back then there’s nothing known about HDR, no-one knew about HDR things. Games just used 8-bit back buffers. “
This was a show-stopping moment. It was indeed Fusion Frenzy – an original Xbox title – running with its usual x resolution multiplier via back- compat, but this time presented with highly convincing, perceptibly
HDR. The key point is that this is proposed as a system-level feature for Xbox Series X, which should apply to all compatible games that don’t have their own bespoke HDR modes – and as Marais demonstrated, it extends across the entire Xbox library.
“But you can think of other things that we could do,” Marias adds. “Let’s look at accessibility. If you have people that cannot read well or see well, you probably want to enhance contrast when there’s a lot of text on-screen. We can easily do that. We talked to someone that’s colorblind this morning and that’s a great example. We just switch on the LUT and we can change colors for them to more easily experience the announcement there. “
It’s clear that there’s a lot of love for the Xbox library and that the back-compat team are hugely excited about what they do. “Hopefully you realise that we are still quite passionate about this,” says Peggy Lo. “It’s a very personal project for a lot of us and we are committed to keep doing this and making all your games look best on Series X.”
Power, speed, compatibility. Microsoft made a convincing pitch for all of the foundational pillars of Series X – and remarkably, there’s still more to share. After the initial presentations, we headed over to Building on the Microsoft campus, where principle designer Chris Kujawski and his colleagues gave us a hands-on look at the Series X hardware, a detailed breakdown of its interior components and everything We could possibly want to know about its innovative form-factor, along with the subtle, but effective refinements made to the Xbox controller. The bottom line? There is still so much to share about Xbox Series X and we’re looking forward to revealing more.
Digital Foundry was invited to Microsoft in Redmond WA during early March to cover the Xbox Series X specs reveal. Microsoft paid for travel and accommodation.