RTX 5090 – AI is the only path forward for gaming

January 27, 2025 / Max Weinbach

I know it sounds off, but after spending some time getting back into gaming thanks to the RTX 5090, and then looking at some of the technical details on the RTX 5090, I’ve been sold on Nvidia’s vision of neural rendering and AI-accelerated rendering. Frankly, I think it’s the only way going forward.

Let’s face it, the brute force approach to performance gains is hitting a wall. We’re looking at a 750mm² die with the GB202 in the RTX 5090, and the reticle limit – the maximum size a chip can be manufactured – is hovering around 858mm². We’re practically there. Squeezing more raw performance out of silicon through traditional means is becoming exponentially harder, and significantly more expensive. Node shrinks are still happening, sure, but the gains are increasingly marginal – we’re seeing 15-20% improvements every couple of years, largely through power efficiency gains, not raw transistor density increases. “Moore’s Law” isn’t dead, but it’s certainly on life support, relying on clever marketing around “effective” nanometer scales rather than substantial leaps in transistor counts.

So, if we can’t just make the chips significantly bigger, and we can’t cram exponentially more transistors onto them, how do we keep pushing the boundaries of visual fidelity and performance? The answer, as Nvidia has clearly realized, is intelligence, not brute force. It’s about working smarter, not just harder.

That’s where AI, and specifically neural rendering, comes in. DLSS, from its inception, was a glimpse into this future. But DLSS 4, with its multi-frame generation and specifically its shift to a transformer-based model, is the paradigm shift. It’s not just about upscaling anymore; it’s about changing how we render graphics.

The move to transformer models is, in my opinion, the most significant aspect of this whole announcement. Convolutional Neural Networks (CNNs), which powered previous DLSS versions, were revolutionary for image classification. But they have limitations. They analyze images in a hierarchical, localized way. Transformers, on the other hand, use “self-attention” mechanisms. They can identify long-range dependencies and patterns across a much wider area of the image. They essentially “understand” the context of the scene better.

Transformers also underpin the functionality of Large Language Models (LLMs) like ChatGPT. These models rely on attention mechanisms to weigh the importance of different words or tokens in a sentence relative to each other. The autoregressive nature of LLMs means they predict the next token in a sequence step by step, enabling coherent text generation. The self-attention layers help the model grasp relationships between words regardless of their distance within the text, capturing context in a way that traditional models couldn’t. This allows LLMs to generate detailed, contextually appropriate responses or perform tasks like summarization and translation with remarkable accuracy. This same architecture is what makes DLSS 4 so much better.

This improved contextual understanding translates directly into tangible benefits: better temporal stability (less shimmering and ghosting), enhanced detail in motion, and significantly improved ray reconstruction. Think about it: DLSS Ray Reconstruction (RR) is now capable of producing results that rival, and in some cases surpass, traditional denoisers, all while using fewer rays. That’s a massive efficiency gain.

And it’s not just about image quality. The efficiency gains are crucial for performance. DLSS 4’s multi-frame generation, which can generate up to three additional frames for every traditionally rendered one, is a game-changer. And the fact that they’ve managed to reduce the computational cost and VRAM usage of frame generation by 40% and 30% respectively, while also improving the Al model by 40% is a testament to the power of this approach. The enhanced Flip Metering, moving frame pacing logic to the display engine, further enhances the smoothness of the experience.

The introduction of the AI Management Processor (AMP) is another key piece of the puzzle, and it deserves a closer look. This isn’t just some generic task scheduler; it’s a dedicated RISC-V processor specifically designed for AI workload scheduling on the GPU. This is crucial because the demands of running multiple AI models concurrently are unique and complex.

You might have a game running, utilizing DLSS, while also having a background application using an LLM for voice transcription, and perhaps even a generative AI tool creating textures or assets on the fly. Each of these AI models wants to utilize the GPU’s Tensor Cores and CUDA cores to their fullest potential. Ideally, you’d want to dedicate all available resources to a single model at a time to achieve the best possible result with the lowest latency. But that’s simply not practical in a multitasking environment.

This is where the AMP shines. It intelligently prioritizes and schedules these competing AI workloads, taking into account the needs of both the foreground application (the game, in most cases) and any background AI tasks. It understands that a slight delay in a background task is far less noticeable than a stutter or frame drop in a game. It’s about maximizing the perceived performance and responsiveness for the user, not just raw throughput. The AMP ensures that AI tasks get the resources they need, when they need them, without negatively impacting the primary graphics workload. It’s a delicate balancing act, and the AMP is the conductor of this complex AI orchestra.

The benefits of this dedicated AI scheduling are significant. We’re talking about smoother frame rates in games, faster response times for AI-powered applications, and a more seamless overall user experience.


And the performance gains aren’t just theoretical. In real-world scenarios, the RTX 5090, leveraging the power of the Blackwell architecture and the AMP, shows impressive improvements in AI tasks. For example, consider the following benchmarks for running various Large Language Models (LLMs):

Specification R1 Distil Qwen 32B Q4 Phi 4 14B Hermes 3 Llama 3.1 8B DeepSeek R1 Distill Llama 8B
Backend llama.cpp CUDA llama.cpp CUDA llama.cpp CUDA llama.cpp CUDA
Context Length 4096 4096 4096 4096
GPU Offload 64 / 64 40 / 40 32 / 32 32 / 32
Token Speed 36.57 tok/sec 79.57 tok/sec 101.03 tok/sec 85.06 tok/sec
Total Tokens Processed 1186 tokens 593 tokens 318 tokens 1498 tokens
Time to First Token 0.05s 0.02s 0.02s 0.02s

As you can see the increased Tensor Core performance, and the intelligent scheduling of the AMP leads to substantial performance gains in LLM inference. While I’m not sold on limiting the datatype to 4-bit precision for all tasks, it’s obviously working well enough here. I don’t think this should be a universal trend, at least not yet, but for what it’s being used with in the RTX 50 series, it’s understandable. It is also worth mentioning for LLMs, I am using llama.cpp with CUDA backend over TenorRT-LLM or ONNX. Frankly, I think this is the most realistic show of GPU performance for an LLM right now in practical use, though not the theoretical best the GPU can perform. As Deepseek has shown us, it’s as much about software as it is hardware, and public access to proper software optimization just isn’t here. That being said, it is still the fastest local inference throughput I have seen on any local GPU or SoC. 

When it comes to games, I think this Cyberpunk 2077 demo shows it best. You’re getting near identical fidelity at 4x the fps. You can play at 29fps, or you can play at 200fps. It’s your choice, but I know what I’d chose.


The other neural rendering techniques – RTX Neural Materials, Neural Radiance Cache, RTX Skin, and RTX Neural Faces – are all incredibly promising. They demonstrate the potential of AI to not just enhance existing rendering techniques, but to fundamentally change how we approach them. Imagine a future where complex materials, lighting, and even character models are generated or approximated by AI in real-time, freeing up artists and developers to focus on the creative aspects of game development, rather than getting bogged down in the technical minutiae. Some might not like this, some will. The fact is building this tech provides more options for creatives.

The bottom line is this: we’re reaching the physical limits of what’s possible with traditional silicon scaling. AI-powered neural rendering isn’t just a cool new feature; it’s the only viable path forward to continue delivering the exponential performance and visual fidelity improvements that gamers and creators demand. Nvidia’s Blackwell architecture, and the RTX 5090 in particular, is a bold step in that direction, and I, for one, am excited to see where it leads. It’s not just about better graphics; it’s about a smarter, more efficient, and ultimately more creative future for rendering.

Join the newsletter and stay up to date

Trusted by 80% of the top 10 Fortune 500 technology companies