As the finale of this year's six-stop GTC (GPU Technology Conference, GPU Technology Summit), GTC China was held in Suzhou this week. What attracted more than 5,000 people to attend the meeting. In addition to the personal charm of founder Huang Renxun , the increasingly popular GPU AI world is even more attractive from the blueprint to the implementation.
Although the mining accident in digital currency caused the stock price to fall from its high point, this did not affect people's pursuit of NVIDIA Nvidia. Since 2018, NVIDIA's self-contained GTC conference has gone through 6 stops. GTC China, the finale, was held in Suzhou this week. More than 5,000 people listened to the two-hour speech by the company's founder and CEO Huang Renxun.
To be honest, this year is not the first time for CHIP live editors to listen to a speech by GTC or Lao Huang, but like thousands of people on the scene, they are still impressed by NVIDIA's rapid development and dominant hormones in GPU and AI.
In the field of traditional GPUs, the RTX ray chasing technology brought by the GeForce RTX 20 series is undoubtedly the most "people-friendly" technology. Taking the first domestic "Against the Cold" that supports this technology as an example, Huang Renxun introduced in detail the NVIDIA RTX ray tracing reflection that will give armor, weapons, objects, puddles, rivers, canals and other game elements lifelike reflective attributes, so that they can accurately reflect the surrounding world. "Against the Cold" is also the industry's first game that uses real-time ray tracing caustic effects. Caustic refers to the refocusing or scattering of light after reaching the reflective or refractive surface, thereby forming a new light source, illuminating the surrounding environment and casting shadows. This ensures that the caustic effect can respond to objects accordingly, changing the set and light conditions, and even the trajectory of the ship.
RTX OFF
RTX ON
In addition, "Against the Cold" has also improved the performance by up to 40%. Compared with the previous generation of GeForce GPUs, the performance improvement of GeForce RTX GPUs can reach 90%. In terms of image quality, DLSS technology can provide clearer details.
In addition to consumer-grade products, Huang Renxun focused on the HGX-2 and Turing T4 products designed by NVIDIA for HPC and mainframe (Hyperscale). The former is no longer a "new" product. 16 V100 cores can provide 20 trillion computing performance, 0.5 TB of memory and 16 TB/s total memory bandwidth in a single node. Compared with servers that use only CPUs, it increases the running speed of AI machine learning workloads by nearly 550 times, increases the running speed of AI deep learning workloads by nearly 300 times, and increases the running speed of high-performance computing workloads by nearly 160 times. The information of
is relatively updated is that NVIDIA has completed relevant technical authorizations to server manufacturers, currently including Foxconn, Inert, Yunda Technology, Quanta Computer , Super Micro, Wistron and Weiying. In addition, Oracle announced last month that it plans to deploy the HGX-2 platform for Oracle Cloud infrastructure, and will use two instances of bare metal and virtual machine, allowing customers to easily access a unified HPC and AI computing architecture.
The relatively new product is Turing T4, which is based on the new NVIDIA Turing architecture, adopts multi-precision Turing Tensor Core and the new RT Core, which combines with an accelerated containerized software stack to provide unprecedented performance at scale. The finished T4 is very small, just like a half-high AIC card, with a power consumption of only 70W, which is 1/4 of the competitive products. It can provide multiple accuracy functions to support 4 different precision levels of various AI workloads, helping to achieve breakthrough AI performance. FP32 accuracy can provide 8.1 TFLOPS, FP16 accuracy can provide 65 TFLOPS, INT8 accuracy can provide 130 TOPS, INT4 accuracy can provide 260 TOPS. For AI inference workloads, a single server with 2 T4 GPUs can replace up to 54 CPU servers. For AI training, a single server equipped with 2 T4s can replace 9 dual socket CPU servers. In Huang Renxun's live demonstration, using Baidu's artificial intelligence image search application, 4 T4s can provide recognition speeds of more than 6k/s, and 28 can reach 43k/s, while the CPU solution can only reach single-digit level.
In addition to the above products, NVIDIA has also joined hands with many Chinese startups, truck manufacturers and suppliers to pave the way for the new generation of autonomous vehicles to use the NVIDIA DRIVE AGX autonomous driving platform. At present, Xiaopeng Motors and its main first-tier supplier Desay SV will use DRIVE AGX Xavier with 30 trillion operations per second and power of only 30W to create an L3 autonomous driving system for mass production models. DRIVE AGX Xavier provides technical support for Singularity’s new models. In addition, SF Motors plans to launch its first electric crossover SF5 next year, and said it will use DRIVE AGX Xavier to develop the next generation of independent computing platform.
Jetson AGX Xavier provides large-scale computing performance for delivery robots in small space and power consumption. Various sensors including multiple high-resolution cameras and lidars must sense the surrounding world and localize path planning and driving in complex and dynamic urban environments. The module is capable of providing a powerful workstation-like processing power with up to 32 trillion operations per second, and is 10 times more energy efficient than its predecessors, and is only the size of a palm.
After years of GTC conference, it has penetrated from the field of simple graphics technology to a wider field of artificial intelligence. With the advantages of high parallelism of GPUs, both tensor processing and further enhanced deep learning applications have shown their "strong" side. In terms of self-control driving, stronger visual sensing, processing combined with AI computing will show more the power of GPU products. Compared with traditional CPU solutions, both Huang Renxun in the speech and the executives interviewed after the meeting showed confidence in cost. "Moore's Law has failed", "only 1/4 of the power consumption but provides 4 times the performance" and other things are beyond words.