Zhidongxi (public account: zhidxcom) article | Xinyuan
Just last Wednesday, Stanford University released the latest DAWNBench list, which is one of the most authoritative competitions in the field of artificial intelligence in the world. Huawei Cloud ModelArts One-stop AI development platform, which will win the championship in the total training time of image recognition and inference performance list.
This time, Huawei Cloud ModelArts shortened the training time to 4 minutes and 8 seconds, which is twice as fast as the record released on the list 3 months ago. The best records of DAWNBench image recognition training in the previous few times were also created by Huawei cloud ModelArts.
The good news on the other side of the ocean has just been announced. Just one day later, Rongcheng Fuzhou brought new good news to AI developers. At the Huawei China Ecological Partner Conference, the Huawei Cloud AI market was officially released. This developer ecological community built on the Huawei Cloud ModelArts platform provides secure, open, fair and reliable content sharing and transactions for universities, enterprises, individual developers and other groups such as sharing and transactions.
So, what super powers does the strongly-motivated Huawei Cloud ModelArts have? How did it break its own record in just 3 months in the benchmark test of experts? What optimizations does it make for training and reasoning respectively to achieve such excellent performance? What convenience has the newly released AI market brought to AI developers? This article will reveal what is the technical hard power of Huawei Cloud ModelArts one by one.
wins the double champion of training and reasoning, which is 1 times faster than the previous highest training record
Stanford DAWNBench list is an international authoritative benchmark test platform used to measure the end-to-end deep learning model training and reasoning performance. The corresponding rankings reflect the current leadership of the industry's deep learning platform technology.
The latest image recognition rankings on this list show that in terms of training performance, Huawei Cloud ModelArts uses 128 V100 GPUs to train models on ResNet50_on_ImageNet (more than 93% accuracy), and the training time is only 4 minutes and 08 seconds, which is twice as fast as its 9 minutes and 22 seconds record set in December 2018, and 4 times faster than the previous fast.ai training speed on the AWS platform.
▲Stanford University DAWNBench Training Time List
In terms of inference performance, Huawei Cloud ModelArts recognizes pictures at 1.72 times faster than second place, 4 times more than Amazon, and 9.1 times more than Google .
▲Stanford University DAWNBench Inference Performance List
Huawei Cloud ModelArts can perform well on the international authoritative deep learning model benchmark platform and break the record set by itself in just 3 months?
This is thanks to the ModelArts team starting from two parts: training and inference, including high-performance distributed model training and fast inference technology.
optimization training three dimensions: network structure, framework and algorithm
In terms of training, the Huawei Cloud ModelArts team mainly optimizes from three dimensions: deep neural network structure, distributed training framework, and deep learning training algorithm.
1, Deep neural network structure optimization
The network structure used this time is based on the classic ResNet50 structure. Since the low-resolution input image of 128*128 is used in training to improve the training speed, it has an impact on the training accuracy. The original model cannot train the model to the specified top5 93% accuracy while maintaining the number of trained epochs.
For this reason, the ModelArts team optimized the convolution structure in ResNet50, so that the target accuracy can be achieved stably in low-resolution training mode.
2, distributed training framework optimization
deep learning training process involves large-scale parameters transfer between networks. TensorFlow uses a centralized network parameter server (Parameter Server) to undertake the collection, average and distribution of gradients. Access to server nodes will become a bottleneck, and the utilization rate of bandwidth is low. For this purpose, the ModelArts team used the AllReduce algorithm to perform gradient aggregation to optimize bandwidth.
simultaneously fuses the transmission gradients, and merges multiple transmissions of gradients smaller than the threshold into one time to improve bandwidth utilization; in addition, NVIDIA's NvLink, P2P and other technologies are used in the communications layer to improve communication bandwidth within and between nodes and reduce communication delay.
3. Deep learning training algorithm optimization
The distributed global batch size used in this training process is 32768. Such a huge batch size improves the parallelism of the training, but also leads to the problem of reducing convergence accuracy. To this end, the ModelArts team implemented the hierarchical adaptive rate scaling (LARS) algorithm proposed in "Large Batch Training of Convolutional Networks". In terms of global learning rate scheduling, a linear cosine decay scheduler with warmup is used, and the training optimizer uses the momentum algorithm.
In the training results submitted this time, ModelArts only used 35 epochs to train the model to the specified accuracy, and maintained the large batch size of 32K except for the last epoch. It took 4 minutes and 08 seconds, which doubled the previous result again.
Optimization of three dimensions of inference: network structure, quantization and pruning
In terms of inference, the ModelArts team optimized from the following three aspects: 1. Network structure optimization 2. Int8 quantization 3. Neural network convolution channel pruning.
1, network structure optimization
also uses the ResNet50 model in inference, and is the ResNet50-v1 version with better inference efficiency. Based on this model, downsampling is advanced and a downsampling method with smaller information loss is used, which not only improves the inference speed but also obtains higher model accuracy.
2, Int8 Quantization
Low-bit quantization is a main means to improve inference performance, among which the int8 quantization method is highly versatile and has a small loss to model accuracy. During the quantization process, the original model is first loaded, and then the corresponding int8 quantization model is created for the original model. Then, the typical samples in training are extracted to calibrate the quantization model, and finally the optimized int8 model is generated based on the calibration results.
In this quantization, the model's inference accuracy is only 0.15%, while the model's inference speed is increased by more than 2 times.
3, neural network convolution channel pruning
research shows that many connections in the network are close to 0 or redundant, and the removal of these parameters has relatively little impact on accuracy. The model pruning method includes structural pruning and non-structural pruning. Unstructured pruning sets a threshold, which will be set to 0 when the weight is below this threshold and will not be updated. The
method makes model connections sparse, but because these connections are scattered in various weights, and since the GPU does not support sparse matrix convolution and multiplication acceleration, it cannot effectively improve the inference speed.
Structural pruning is mainly used to prune the convolution channel, that is, to evaluate the influence coefficient of each convolution kernel in the neural network through some methods, and then remove the convolution kernel with lower influence coefficients as a whole, making the entire model smaller and the inference speed increase.
officially released the first AI model market in China
It is worth mentioning that at the Huawei China Ecological Partner Conference, Huawei evolved the "platform + ecology" strategy into "platform + AI+ ecology", providing partners with "industry + AI" support.
Zheng Yelai, vice president of Huawei and president of Cloud BU, also officially launched the Huawei Cloud AI market and announced special incentives to help developers and partners accelerate the implementation of enterprises' AI applications in the AI market.
As mentioned earlier, the Huawei Cloud AI market mainly includes module resources such as AI model market, API market, WIKI data set, competition Hub and case Hub. Users can freely choose the resources they are interested in for transactions.
From university research institutions, AI application developers, solution integrators, enterprises to individual developers, all participants in these AI development ecosystems are effectively connected. The AI market not only helps them accelerate the development and implementation of AI products, but also ensures that the sharing and transaction environment is safe and open enough.
Let’s focus on the AI model market in the Huawei Cloud AI market, which is also the first platform in China to provide publishing and subscribing to AI model services. The main function of the AI model market is to publish and subscribe to AI models, and ensure the security of both buyers and sellers' models and data through the market intermediary mechanism and ModelArts platform.
After authenticating their account, the seller user can upload his or her model to the market and specify different publishing permissions and billing strategies for the model, such as per-time, annual, and monthly. Buyer users can find and subscribe to models of interest in the AI model market for their own AI reasoning.
Seller users can also portray attributes for their model, so that buyer users can position their targets faster. In addition, the AI model market supports configuring inference/retraining code for published models, and buyer users can use these models for retraining or deploying them as inference services.
Previously, Zhidongxi had detailed the four highlights and operational processes of the Huawei Cloud ModelArts platform in an article (Magic! Play AI in minutes with zero foundation, full experience of Huawei Cloud ModelArts). ModelArts has buff bonuses such as open source data sets, automated parameter adjustment, MoXing distributed framework, and accelerating training and acceleration of GPU clusters at a scale, and one-click deployment of cloud edges. The threshold for getting started with this platform is very low. From novices with zero programming experience to advanced algorithm engineers, they can use this platform to complete the training and inference of AI models faster and better.
▲Huawei Cloud ModelArts Developer Ecological Exhibition Area
Huawei Cloud ModelArts platform was officially launched on January 30 this year. It has been widely used in AI scenarios such as smart medical care, intelligent manufacturing, autonomous driving, smart cities, smart security, and water conservancy, helping enterprises and developers from all walks of life to implement AI development and application implementation and respond to market demand in a timely manner.
Conclusion: cloud computing AI war continues to heat up, and application implementation is still the kingly way
At present, cloud computing embracing AI is still a blue ocean market. Various players such as Internet giants, traditional ICT companies, traditional enterprise service providers and emerging startups have flocked to this market, old players have solidified their position, new players have accelerated their growth, and competition in the cloud computing market is intensifying.
Huawei Cloud ModelArts platform can be said to be a direct example of Huawei's concept of "leaving complexity to yourself and leaving simplicity to customers and partners", from basic modules such as data sets and AI models to integrated operation processes, allowing enterprises and developers to complete high-quality AI development with a click of the mouse.
From the Huawei Cloud ModelArts platform, we can extract several keywords for cloud computing service providers to build competitive barriers for AI services - more powerful, more comprehensive, easier to use, and more reliable. As AI technology is gradually implemented in various industries, the bubble gradually disappears. Only those who can truly provide enterprises with the best services and help them achieve commercial monetization can ultimately build a stronger ecosystem and promote the implementation of AI applications to a climax.
We know that tomorrow (March 29), ModelArts will bring new benefits to developers. Interested readers can follow the ModelArts official website, or click "Learn More" below.