PCIe 4.0 upgrade geometry? NVIDIA RTX A4000 professional graphics test (1)

Catalog

-Turbo fan + double-side air intake: Improvements in heat dissipation of single slot graphics card

-Specifications: How to double the CUDA

-SPECviewperf test: Why is RTX A4000 better at 4K resolution?


I did a PCIe 4.0 SSD test earlier this year. In most cases, since the graphics card used to work at PCIe 3.0 x16 bandwidth, the effect of 4.0 may not be so obvious. For example, I saw that on some alien notebooks for enthusiast users, only PCIe 4.0 x8 is allocated to the graphics card. This should be to leave more space for the high-speed SSD. Simply put, the performance bonus of graphics card in PCIe 4.0 is still application-related . For professional graphics cards, you still have to run graphics applications on workstations to see.

In the previous article " Hidden Win7 Support? After the NVIDIA RTX A4000 professional graphics card alternative test ", the performance evaluation has been delayed for a long time.

In fact, every time NVIDIA releases a new workstation graphics card, there will be some official performance improvements, such as how much faster the A4000 is than the previous generation Quadro RTX 4000, and even the previous generation P4000. But this number is often only a generalized ratio. Even if some application software is mentioned, you don't know which operation it does? What model/scene was used? Shows how much FPS has been increased by frame rate , how many seconds have been reduced in rendering time, etc.This is why I want to actually test it myself.

This is my first video card test on PCIe 4.0 platform. I will compare and analyze the specific specifications of RTX A4000 in the above picture.

This time I initially planned 3-4 articles. If I just run the test, I find that I can't spend too much time and energy, but every time I organize it into a document and share it with everyone, it will be a bit of a "torn" process.

- SPECviewperf 2020v2.0 mapping test, HD and 4K resolution (this article)

- em32 2021 Visualspan em32sp 2021 More ray tracing and rendering tests (Blender, V-Ray, KeyShot, OctaneBench)...

Before each test, I will make an estimate based on the existing parameters and experience. Of course, the test results may not be all Yes (this is where the test is interesting). In addition, I will also focus on verifying the performance characteristics of the new graphics card in terms of graphics, CUDA computing/RTX rendering, etc., and what should be paid attention to in design and heat dissipation .

turbofan + double-sided air intake: an improvement in the heat dissipation of single-slot graphics cards

last time I tested Ampere graphics cards,It was also last year's " RTX3090 preliminary test: double-wide turbo fan design, tricks and "troubles" ". Do you remember the "penetrating" fan design like the public version of GeForce RTX 3080/3090? This time, the RTX A4000 professional graphics card is a radiator that combines the two characteristics of a turbo and a double-sided hollow. Do you think of the value?

Since the professional graphics card of the 4000 series maintains the width of PCIe single slot, when the previous generation RTX 4000 is added to the optical chase, the heat generation is a bit large. The fan is a bit noisy when it is fully loaded, and some users have heard Reflected that the temperature is too high.

I do not doubt the stability of Quadro professional graphics cards under normal ambient temperature, but the TDP (thermal design power consumption) of the RTX 4000 board itself is 125W, and the A4000 has been increased to 140W, which is still a single slot space , It is imperative to improve the heat dissipation design.

The previous turbofans (including those with dual slots of the Turbo version RTX 3090) can only take in air from the front side of the graphics card, but this time A4000 can suck air from the back at the same time , which is more in The benefits of card high-density environment are more obvious -because in the past, the temperature of the top card is usually higher.

The back of the graphics card can be seen, mainly to maintain the smaller board size of the AmpereGA10x generation, 16GBGDDR6 memory is placed on the front of the PCB.

The opening rate of the air outlet of the heat dissipation of the RTX A4000 is higher than before (compared to the RTX4000 in the figure below). But after all, it is a single-slot graphics card, it is impossible to reach the hollow area of ​​the turbo version RTX3090.After all, the latter is 350W power consumption.

The picture above shows the Quadro RTX 4000. The small round hole on the PCIe I/O baffle does not look as good as the new-generation A4000's heat dissipation and exhaust effect.

Since the previous generation RTX4000 graphics card has designed a VirtualLink Type-C interface, it can power VR helmet and other peripherals, so the power consumption can reach 160W, so 8-pin PCIe power supply has to be used. A4000 honestly returns to 4 DP outputs , no more than 150W, so it is ok to design 6-pin power supply.

RTX A4000 professional graphics card maintains the STEREO 3D Vision stereo glasses interface (via a 3pin miniDIN accessory), and the G-Sync synchronization daughter card interface . G-Sync here is to maintain the synchronization refresh phase between multiple graphics cards and even multiple host outputs. It is very important for large-screen splicing, especially in situations such as stereo projection and VR. (Extended reading: " NVIDIA Quadro 20th Anniversary: ​​The Past and Present of Professional Graphics Cards ")

Thank you for the dismantling of RTX A4000, in the previous Win7 driver test I have already borrowed a photo, and here is another one:) I am not as interested in dismantling graphics cards as I did when I was young, because good looks are not as important as actual application performance.

Specifications: How to use the CUDA Core that is twice as much as Ampere

The above specifications are from NVIDIA official documents,And the frequency that can be seen in the drive control panel.

From the Quadro M4000 to the P4000 and RTX 4000, 8GB of video memory is used. Now A4000 has finally been upgraded to 16GB, or it would be too shameful to face the GeForceRTX 3060 12GB:) Regarding the ECC memory verification support, I am on It has been tested in one article.

RTX A4000 and the previous generation QuadroRTX 5000 have memory bandwidth of 448GB/s , which is obviously not accidental. I was fortunate to have been in contact with this industry since Quadro started 20 years ago. The performance of is improved by a notch every time a professional graphics card is replaced. is almost constantly repeated. Now that the A4000 has reached the 16GB memory of the previous 5000 series, there is no suspense to compare the performance with the RTX 4000, so I just try PK. RTX 5000 directly.

Some friends may ask: On GeForce game cards, the new generation of 140W power consumption 30x0 may not reach the performance of the previous generation 230W 20x0? However, in addition to driver optimization of professional graphics cards, some models have higher hardware efficiency . For example, the previous 125W RTX 4000, my users found in some applications (such as: VR) that it can approach or even slightly exceed the performance of GeForce RTX 2070 (180W), which is not the traditional OpenGL manufacturing 3D design.

RTX A4000's CUDA computing core is 6144 , which seems to be exactly twice the RTX 5000 of the previous generation , not to mention RTX 4000. However, NVIDIA also specifically noted that it is the CUDA Cores of the Ampere architecture. I also introduced the secret behind this "digital game" in "The impact of NVIDIA Tensor Core on the performance of RTX graphics card ray tracing ".

As the white paper of GA104 has not been published yet, I still quote the GA10x StreamingMultiprocessor (SM) diagram in this white paper of GA102. Since the INT32 integer unit of the Ampere architecture can also be used for floating point , the FP32 single-precision performance is equivalent to doubled. As for the CUDA Cores available in INT32 calculations, I understand that they are actually only half of what NVIDIA claims.

In some calculation tests, the RTXA 4000 GPU Boost frequency seen by GPU-Z is higher than the nominal 1560MHz. However, every graphics card has TDP restrictions, such as Tensor Core, RT Core, and memory controller are also considered big fans. I understand that if the pressure of several units runs up at the same time, the GPU clock/traditional CUDA computing power will have to sacrifice some to make a trade-off.

Before I saw someone use AIDA64's GPGPU test to verify the single-precision floating-point performance of Ampere. This time the 19508 GFLOPS measured by A4000 is indeed equivalent to the number of CUDA cores. But not all applications/software of can use the doubled CUDA core , as shown in the figure below:

This is an example of RTX 6000. On hardware designed to support CUDA 11 / 8.6 computing architecture , if you run CUDA 10 or earlier, it will only use 64 cores per SM unit because SM8.6 is not defined. That is, INT32 cannot be redefined as FP32 for use.

When the actual Runtime version is CUDA 11.2,The 10752 CUDA cores of the RTX A6000 are fully fired. The picture above shows 128 CUDA Cores X 84 Multiprocessors .

wants to make use of this. Simply put, the application needs to recompile with the new version of CUDA. For workstation users who use off-the-shelf commercial or open source software, it depends on when the software or plug-in will provide support. In the next two tests, I will try my best to use the new version of the 3D design software to verify it.

SPECviewperf test: Why is RTX A4000 better at 4K resolution?

It is finally the test session. I have introduced the SPECviewperf 2020 benchmark test tool last year. This time using the latest version 2.0, it is said that the SolidWorks sub-test project is a bit updated, and the results are not suitable for direct comparison with the previous version.

The hardware platform used this time is a Dell support PCIe 4.0 host, equipped with 11th generation Intel Core i7-11700 CPU, and no special performance tuning has been done. I used two graphics cards and ran the tests at 1920x1080 and 3840x2160 resolutions.

Since Quadro RTX 5000 is still PCIe 3.0 x16 host interface, I used to guess that RTX A4000 might benefit from PCIe 4.0 in higher resolution before testing. You can take a look at the test results below.

First of all, at a resolution of 1920x1080 (HD), the performance of RTX A4000 and RTX 5000 are mostly close, and there are winners and losers. A test of snx-04 from Siemens NX,A4000 is more ahead.

snx-04 The test was created by the graphics workload trajectory generated by the earlier version of Siemens PLM NX 8.0 application. The two models are respectively 7.15 million and 8.45 million vertices in size. 4 years ago, "Performance plummeted by 62%? It was introduced in the test of "The old driver of the graphics workstation stepped on the pit". The two graphics cards we tested this time were a bit overkill, so the frame speed reached more than 400 FPS.

3840x2160 (4K) resolution, I see that RTX A4000 is in the lead in most of the test items, and the gap between Maya-06 is the biggest. However, it is also a bit unexpected that the advantage of the previous 1920x1080 resolution compared to RTX 5000 has been recovered.

The above is the actual screenshot of my 4K test. I don’t know if I can click it from here to enlarge it. The maya-06 view set is created based on the trajectory of the graphics workload generated by the Autodesk Maya 2017 application. It feels a bit old, and the scene is not complex enough so it can run to about 200 FPS.

SPECviewperf test set has a certain comprehensiveness, it includes CAD/AEC/DCC, engineering manufacturing/architecture/media entertainment/energy/biomedical industry in 3D design or reconstruction scene , RTX A4000 professional graphics card in This round initially reached the same level of performance as the Quadro RTX 5000.

For an veteran who is in contact with workstations, SPECviewperf can only be regarded as a preliminary understanding.I have to run more actual application software to test. A feature of graphics workstations is that the editing mode of 3D software is usually single-threaded. In some cases, the single-core CPU performance will become the bottleneck instead of the graphics card . In addition to display acceleration, there is another important direction- GPU computing/rendering performance needs to be evaluated, that is, CUDA, Tensor/RT ray tracing.

to be continued:)


Note: This article only represents the author’s personal views and has nothing to do with any organization. If there are any errors or shortcomings, please comment in the comments. If you want to share your own technical dry goods on this official account, you are also welcome to contact me:)

Respect knowledge, please keep the full text when reprinting. Thank you for your reading and support!

.