is 未分类 quotation from Aofei Temple Quantum Bit Report | Official Account QbitAI
Do you still remember ModelArts?
This is Huawei's latest AI development platform released this year, which can provide AI application development services including data annotation preparation, model training, model tuning, model deployment and other AI application development services.
Among them, model training attracted much attention when it was released.
Because Huawei Cloud says that ModelArts can use various optimization technologies during the model training stage, especially through cascading hybrid parallel technology. Under the same model, data set and the same hardware resources, the time to model training can be greatly shortened.
But what is the actual effect? Now there are international authoritative achievements to refer to.
In the latest DAWNBenchmark ranking, in the total training time of image recognition (ResNet50-on-ImageNet, above 93% accuracy), Huawei Cloud ModelArts accelerated by nearly 44% from the second place with a time of 10 minutes and 28 seconds, winning the latest world first.
△DAWNBenchmark latest results
Stanford DAWNBenchmark
DWANBench is an international authoritative benchmark test platform initiated by Stanford University. It is mainly used to measure the end-to-end deep learning model training and reasoning performance. The corresponding rankings have always been regarded as the latest level of deep learning platform technology in the global industry.
In the construction of deep learning models, computing time and cost are one of the most critical resources.
DAWNBench provides a common set of deep learning evaluation metrics for this purpose, which are used to evaluate training time, training cost, inference delay and inference cost on different optimization strategies, model architectures, software frameworks, cloud and hardware.
So after the latest results were announced, Huawei Cloud said that it further proved that ModelArts can achieve a lower cost, faster and more extreme AI development experience.
In addition, Huawei Cloud also shared the model polishing ideas behind the results, introducing the method of ModelArts using 128 GPUs to complete ImageNet training in 10 minutes.
full text is reproduced as follows:
In recent years, deep learning has been widely used in computer vision, speech recognition, natural language processing, video analysis and other fields, and can serve video surveillance, autonomous driving, search recommendation, dialogue robots and other scenarios, and has broad commercial value.
In order to achieve higher accuracy, the amount of data and models required for deep learning are usually very large, and training is very time-consuming.
For example, in computer vision, if we train a ResNet-50 model with 1 P100 GPU on the ImageNet[1] dataset, it will take nearly 1 week.
This seriously hinders the development progress of deep learning applications. Therefore, the acceleration of deep learning training has always been an important issue that concerns the academic and industrial circles, and is also a pain point for the main application of deep learning. Fast.ai, led by several professors such as
Jeremy Howard, is currently focusing on deep learning acceleration, and the shortest time to train the ResNet-50 model with 128 V100 GPUs on the ImageNet dataset is 18 minutes.
However, the recent emergence of BigGAN, NASNet, BERT and other models indicates that training models with better accuracy requires more powerful computing resources.
can be foreseen that in the future, as the model increases and the amount of data increases, the acceleration of deep learning training will become more important. Only by having end-to-end full stack optimization capabilities can the training performance of deep learning be maximized.
Huawei Cloud ModelArts is a one-stop AI development platform that has served the development of AI models in Huawei's major product lines within . Over the past few years, it has accumulated multi-faceted optimization experience such as cross-scenarios, soft and hard collaboration, and end-cloud integration.
ModelArts provides multiple modular services such as automatic learning, data management, development management, training management, model management, inference service management, and market, so that users at different levels can quickly develop their own AI models.
△ Huawei Cloud ModelArts Function View
In the model training part, ModelArts achieves training acceleration through coordinated optimization of hardware, software and algorithms.Especially in deep learning model training, Huawei abstracts the distributed acceleration layer and forms a general framework - MoXing (the pinyin of "model" means that all optimizations revolve around the model).
uses the same hardware, model and training data as fast.ai. ModelArts can shorten the training time to 10 minutes, creating new records and saving users 44% of their time.
△ Training speed improvement based on MoXing and ModelArts
MoXing is a distributed training acceleration framework developed by the Huawei Cloud ModelArts team. It is built on the open source deep learning engines TensorFlow, MXNet, PyTorch, and Keras, making these computing engines distributed more performant in and more ease of use.
MoXing has built-in multiple model parameter slicing and aggregation strategies, distributed SGD optimization algorithm, cascading hybrid parallel technology, hyperparameter automatic tuning algorithm, and optimized in many aspects such as distributed training data slicing strategies, data reading and preprocessing, distributed communication, etc. Combined with Huawei Cloud Atlas high-performance server, it realizes the acceleration of distributed deep learning collaborative optimization of hardware, software and algorithms.
△ Huawei Cloud MoXing Architecture Diagram
In terms of ease of use , upper-level developers only need to pay attention to business models, no need to pay attention to the lower-level distributed APIs, but only need to input data, models and corresponding optimizers based on the actual business definition. The training script has nothing to do with the running environment (standalone or distributed). The upper-level business code and distributed training engine can be completely decoupled.
two major indicators: MoXing distributed acceleration key technologies
When measuring the acceleration performance of distributed deep learning, there are two important indicators as follows:
- throughput, that is, the amount of data processed in unit time;
- convergence time, that is, the time required to achieve a certain convergence accuracy.
throughput generally depends on optimizations in server hardware (such as AI acceleration chips with more and greater FLOPS processing capabilities, larger communication bandwidth, etc.), data reading and caching, data preprocessing, model calculation (such as convolution algorithm selection, etc.), communication topology, etc. In addition to low-bit computing and gradient (or parameter) compression, most technologies will not affect the model accuracy while improving throughput.
In order to achieve the shortest convergence time, it is necessary to optimize the throughput while also adjusting parameters. If the parameter adjustment is not good, the throughput is sometimes difficult to optimize. For example, when the ultra-parameter batch size is not large enough, the parallelism of model training will be poor, and the throughput will be difficult to increase by increasing the number of calculation nodes.
The ultimate indicator for users is convergence time, so MoXing and ModelArts implement full-stack optimization, greatly shortening the training convergence time. In terms of data reading and preprocessing, MoXing uses multi-stage concurrent input pipelines to make data IO not a bottleneck; in terms of model calculation,
test results compared with
Generally, the ResNet-50 model is trained on the ImageNet dataset. When the Top-5 accuracy is ≥93% or the Top-1 accuracy is ≥75%, the model can be considered to be converged.
The model training convergence curve we tested is shown in the figure below. Here, the Top-1 and Top-5 accuracy are the accuracy on the training set. In order to achieve the ultimate training speed, additional processes are used to verify the model during the training process. The final verification accuracy is shown in Table 1 (including comparison with fast.ai).
The model corresponding to Figure 4(a) has a Top-1 accuracy of ≥75% on the verification set, and the training time is 10 minutes and 06 seconds; the model corresponding to Figure 4(b) has a Top-5 accuracy of ≥93% on the verification set, and the training time is 10 minutes and 58 seconds.
△ResNet50 on ImageNet training convergence curve
△MoXing and fast.ai training results comparison
Future: Faster Inclusive AI Development Platform
Huawei Cloud ModelArts is committed to providing users with faster Inclusive AI development experience, especially in model training. The built-in MoXing framework has greatly improved the training speed of deep learning models.
As mentioned earlier, deep learning acceleration is the result of a multi-faceted collaborative optimization of distributed training framework and its optimization algorithms from the underlying hardware to the upper-level computing engine, and then to the higher-level. Only by having full-stack optimization capabilities can users' training costs be minimized.
Following, Huawei Cloud ModelArts will further integrate the advantages of hardware and software integration, providing a deep learning training platform that is full-stack optimization from chips (Ascend), servers (Atlas Server), computing communication library (CANN) to deep learning engines (MindSpore) and distributed optimization framework (MoXing).
, and ModelArts will gradually integrate more data annotation tools to expand its application scope, and will continue to serve smart cities, intelligent manufacturing, autonomous driving and other emerging business scenarios, and provide users with more inclusive AI services on the public cloud.
[1] The ImageNet data set referred to in the article contains 1,000 categories, with a total of 1.28 million pictures. It is the most commonly used and classic image classification data set and a subset of the original ImageNet data.
portal
DAWNBenchmark:
https://dawn.cs.stanford.edu/benchmark/