Nvidia held a keynote speech late at night on September 20 and officially launched the RTX 40 series graphics card, including the RTX 4090, the RTX 4080 16GB and the RTX 4080 12GB. They are based on the newly designed Ada Lovelace GPU architecture. In addition to the comprehensiv

But what followed was a comprehensive increase in price, especially the 80 graphics card. Compared with the 30 Series graphics card, the price increase is very obvious. So what new features have the 40 Series graphics card brought us? Are these functions worth paying for by consumers?

Ada Lovelace chip: a brand new process, a surge in performance

First, let’s talk about this architecture named by the first female programmer Ada Lovelace. As the Card King, the core of the RTX 4090 graphics card is this AD102 core. The entire core area is 604.2 square millimeter . Compared with the RTX 30 series graphics card, it has been reduced, but the number of transistors has increased significantly, reaching 76.3 billion.

In terms of overall architecture, Ada Lovelace seems to be not much different from Ampere. One computing unit includes FP32 computing unit, computing unit shared by FP32 and INT32, and the fourth-generation Tensor Core, and also the third-generation RT computing unit, making deep learning and ray tracing more efficient.

In terms of overall scale, Ada Lovelace has made rapid progress compared to Ampere. For example, the graphics processing cluster has increased from 7 to 12, thus causing the computing units to leap from 84 to 144, which means a complete Ada core, which can provide up to 18,432 stream processors, , far exceeding the 10,752 GA102. In addition, the number of ray tracing units has also been increased from 84 to 144, the number of deep learning units has been increased from 336 to 576, and the frequency has also been increased from 1.9GHz to 2.5GHz. The main reason why Ada chips can achieve such a large parameter improvement is the progress of the process. In the Ampere process, Nvidia uses the Samsung 8nm process. In the Ada era, it uses a customized version of TSMC 4nm process. The transistor density is extremely improved, which also allows the chip area to grow so fiercely when the chip area is reduced.

In addition, Nvidia also stated that the energy consumption ratio of Ada GPU is twice that of ampere, and the shader rendering capacity reaches 83TFLOPS, which is also twice that of the previous generation, and the ray tracing computing power has soared to 191TFLOPS, which is 2.8 times that of the previous generation. The FP8 tensor calculation related to deep learning has reached the terrifying 1.32PFLOPS, which is 5 times that of the previous generation core. In terms of gaming, Nvidia also says that Ada has twice the grating performance, and ray tracing performance is four times that of the latter.

For players, the RTX 40 series graphics card has also greatly improved the power consumption performance during operation, and runs more smoothly, without large-scale instantaneous power consumption increases, which is especially important for players who want to add high power. After all, high-end power supply, 1W more power supply may cost 1.2 or even 1.5 yuan in budget. It is precisely because of such powerful computing performance that Ada GPUs can achieve more work, such as DLSS 3, a AI frame generation technology that Lao Huang considers to be revolutionary.

DLSS 3: Let AI generate frames, frame rate significantly improve

DLSS is Nvidia's deep learning anti-aliasing technology. It uses Nvidia's AI neural network to reduce GPU's picture rendering, thereby improving the game's picture. Since the Turing architecture, DLSS has begun to be recognized by consumers. The black technology that improves picture quality has also undergone three generations of changes. The first generation of DLSS uses the AI ​​driver and neural network of the graphics card itself to render the picture. However, due to the limitation of computing power, the actual effect is not ideal. Although the frame rate has been improved, the picture is extremely blurred, especially in some dynamic pictures.

The second generation, which is currently the most mainstream DLSS 2.0 era, Nvidia chose a technology similar to DSR, first allowing the graphics card to render at a lower resolution, and then using AI computing power to make the picture become high resolution for output. Of course, compared with the first generation of DLSS, the second generation of DLSS has qualitative improvements in both the effect and the acceptance of manufacturers, and consumers are increasingly accepting this technology. In addition, competitors also use FSR and XeSS to achieve similar effects to DLSS. In the era of DLSS 3, Nvidia, which is no longer satisfied with traditional graphics rendering, began to use AI to create rendered images by itself, and further reduce the rendering pressure of the GPU by inserting them into two rendered images.

First, Nvidia added a hardware called optical flow accelerator to the Ada GPU, and it is also the core of implementing DLSS 3. First, with the help of optical flow accelerator, GPU analyzes vector data of moving objects in the picture, and then uses convolution neural network to automatically render the game screen and insert it into the normal game screen, which can effectively improve the game frame rate. In addition, this rendering method is also the first application in the field of game rendering, and the premise is naturally the huge Tensor Core computing cluster of 40 series.

Nvidia said that DLSS 3 can render up to 7/8 of display pixels with AI, which is a 4-fold increase in frame rate compared to games that do not use DLSS. Especially effective for games that turn on ray tracing effects.

For example, "Cyberpunk 2077" announced at the press conference jumped from about 22 frames to more than 90 frames. Even since all image frames are performed on the GPU and do not pass through CPU, even if you do not have a CPU with strong performance, the game frame rate can also be significantly improved.

But some people may worry, since they use AI rendering frames and insert them between two normal rendering frames, will cause the screen delay to rise. For players of 3A masterpieces, delay may not be a problem, but for FPS players, delay is more important. In this regard, Nvidia said that game developers and gamers can use NVIDIA Reflex to effectively reduce the game's transmission delay, so that even players who turn on the DLSS 3 special effects can enjoy the ideal delay.

Of course, DLSS 3 is not all RTX graphics cards that can be enjoyed. Due to the lack of optical flow accelerator, the RTX 20 and RTX 30 series graphics cards say goodbye to it directly. In addition, Nvidia also provides a special effect table about DLSS, among which AI rendering and frame insertion technology is exclusive to the RTX 40 series graphics cards, and the RTX 40/30/20 series graphics cards support the original picture zoom function. As for NVIDIA Reflex, this special effect can be supported from the GTX 900 series graphics card. There are currently more than 35 games that support DLSS 3 and will meet you one after another in October.

new rendering engine: more efficient graphics rendering

With the improvement of the performance of RTX graphics cards, especially the emergence of RTX 4090 graphics cards with a huge 24GB video memory and the arrival of NVIDIA Studio drivers, more and more studios have begun to purchase GeForce game graphics cards as image rendering cards, and Nvidia is constantly stuffing new rendering engines into game graphics cards, allowing these professional workers to have more efficient graphics and picture rendering.

This time the engines added are Opacity Micromask engine and Micro-Mesh engine. The former is used for ray tracing rendering. With this engine, the geometric performance of Alpha-Test in ray tracing is increased by up to 2 times. The latter improves the richness of the rendered picture without losing storage resources and using simple BVH. Compared with the past, the image modeling speed has also been greatly improved, and this function has also been recognized by professional application manufacturers such as Adobe.

In addition, Lao Huang also supports shader reordering in the RTX 40 series graphics card. Like the out-of-order execution of the CPU, the rendering task queue can be replaced according to actual needs, thereby greatly improving the rendering efficiency of the image and the utilization rate of the GPU. In the game, it is to improve the gaming performance by about 25%, and the ray tracing performance is improved by up to 3 times.

Currently, with applications such as Nvidia Studio, the boundary between game cards and professional cards is becoming increasingly blurred. With the help of the latest technology, professional users can also enjoy the excellent work efficiency brought by the new generation of GPUs. After all, the RTX 4090 is not just a graphics card exclusive to players.

integrated eighth generation NVIDIA encoder: Video and live broadcast users' favorite

Currently, the rise of live broadcast and video production has also made the GPU have more encoding and codec performance. Nvidia has added dual NVIDIA encoder to the RTX 40 series graphics card this time, so that the output time of video can be reduced by up to 50%, and it also supports AV1 encoding and codec. Design and live broadcast software such as OBS and Blackmagic Design DaVinci Resolve have also added NVENC AV1 encoder, which can give RTX 40 series graphics cards room to play.

NVIDIA Broadcast software development toolkit adds three functions: facial expression estimation, eye tracking, and virtual green screen quality improvement, making the live UP host more immersive when broadcasting, and of course it is also of great benefit to conference users.

Card King is the most cost-effective

In the end, it still can’t avoid the core controversy of this RTX 40 series graphics card, that is, the price. Due to the higher wafer manufacturing cost and exchange rate , it is expected that the suggested retail price of Nvidia RTX 40 series graphics cards will increase compared to the RTX 30 series, but unexpectedly, it was unexpected that the price increase of the graphics card this time is really unacceptable to consumers. The price of RTX 4080 12GB is 7,199 yuan, while the price of RTX 4080 16GB is 9,499 yuan, which is really too fierce compared to the suggested retail price of RTX 3080 5,499 yuan. As Kahuang, the RTX 4090 is the most cost-effective one among the three graphics cards. The reason is that the suggested retail price of 12,999 yuan is 1,000 yuan higher than the previous generation. Of course, the increased performance is obviously worthy of Kahuang's price.

and the other two models are not necessarily the case. The RTX 4080 16GB uses 9728 CUDA cores and is equipped with 16GB GDDR6X video memory, with a performance equivalent to twice the RTX 3080 Ti, while the RTX 4080 12GB uses 7680 CUDA cores and is equipped with 12GB GDDR6X video memory, with a performance exceeding the RTX 3090Ti. In the official game performance, under the raster game, RTX 4080 12GB is comparable to the RTX 3090 Ti, and some games lose slightly, while RTX 4080 16GB is about 20% higher than the RTX 3090 Ti.

For Nvidia, it is obvious that the RTX 4080 series graphics cards need to have a strong performance to allow consumers to accept these two graphics cards. After all, after two years of mining wave, consumers' enthusiasm has reached the bottom, and it is not easy to recover.