Nvidia, AIB Vendors: Update Drivers to Fix RTX 3080, 3090 Instability
One of the frustrating things about trying to sort out the RTX instability issues from last week’s launch has been the relative paucity of comments from vendors. Now, however, they’ve collectively broken their silence — and they’re all saying pretty much the same thing: Update your video drivers.
Zotac: “A new GeForce driver version 456.55 has been released and we urge all to re-install your graphics card drivers as we believe it should improve stability…Our graphics cards have undergone stringent testing and quality controls in design and manufacturing to ensure safety and great performance.”
Gigabyte: “It is false that POSCAP capacitors independently could cause a hardware crash. Whether a graphics card is stable or not requires a comprehensive evaluation of the overall circuit and power delivery design…GIGABYTE GeForce RTX 3080/3090 GAMING OC and EAGLE OC series graphics cards use high-quality, low-ESR 470uF SP-CAP capacitors, which meet the specifications set by NVIDIA and provide a total capacity of 2820u in terms of GPU core power, higher than the industry’s average. The cost of SP-CAP capacitors is not lower than that of MLCCs.” (Gigabyte also goes on to recommend updating to 456.55, but I wanted to quote other parts of their statement.)
MSI: “MSI stands behind its design decisions for its GeForce RTX 30 Series graphics cards catalog which consists of GAMING models and VENTUS models. MSI utilizes a mixed capacitor grouping in its designs to benefit from the strengths of both SP-Caps and MLCCs.” (MSI also notes that all GPUs shipped to customers used the PCB configurations shown in its updated photos, and that folks should update to 456.55).
Finally, Nvidia has released its own statement: “Nvidia posted a driver this morning that improves stability. Regarding partner board designs, our partners regularly customize their designs and we work closely with them in the process. The appropriate number of POSCAP vs. MLCC groupings can vary depending on the design and is not necessarily indicative of quality.”
Does Independent Investigation Back This Up?
Overclocker der8auer, otherwise known as “Person who does things I don’t have the guts to try,” decided to replace two of Gigabyte’s stock 470u CP-CAP capacitors with twenty 47u MLCC capacitors (this works out to the same power capacity for both setups). His maximum stable overclock went up 2 percent as a result, or about 30MHz.
Der8auer’s results do show that power rail hardware can make a small difference, but it’s not enough to really move the needle one way or the other. Instead, the problem really does appear to have been driver-related.
This may appear to be contradictory. How can a problem be driver-related when the problematic and less-problematic GPUs appeared to come from different vendors and have different power circuitry? Here’s a simple, hypothetical example: Imagine that Nvidia’s clock specification states that the GPU clock can change up to 5x per second. A GPU that uses straight POSCAPs, in our hypothetical example, can handle up to eight switches per second on average. An MLCC can handle up to 10 switches per second, on average. Both of these parts are within Nvidia spec.
The first driver Nvidia ships, unfortunately, has a flaw. It allows the GPU clock to change up to 12x per second. Because this is an “up to” number, some GPUs only encounter it occasionally depending on the games the owner plays. Other GPUs don’t encounter it at all. Furthermore, some GPUs — those with above-average MLCCs or POSCAPs — can actually handle the 12x per-second switching. Because the 12x rate is closer to the maximum typical for MLCCs, more MLCCs are capable of handling the shift. This makes MLCCs appear to be more stable than POSCAPs under these conditions — because they are.
But the problem, in this case, isn’t with MLCCs or POSCAPs. It’s with the fact that Nvidia’s driver is allowing the GPU clock to shift too often. The fact that the problem appears resolvable with different hardware doesn’t mean the hardware is the problem.
The example above is hypothetical; we don’t know what Nvidia adjusted in its driver to improve stability, and while there have been reports of clock drops, there have also been reports of clock improvements.
I’ve been following this story since it broke and I’ve written a number of updates to illustrate how quickly something can evolve — and how early reports, even when they accurately identify a problem, can incorrectly identify the cause. Until and unless new evidence emerges showing the problem is still somehow linked to the POSCAP / MLCC question, the Nvidia 455.56 driver appears to resolve the extant problems. If they stay resolved, they’ll be remembered as a hiccup on the way to a successful overall launch.