In foreign countries, AnalyticsZoo has been integrated into IBM Cloud Pak for Data in the United States. Mastercard in the United States built deep learning recommendation services based on Analytics Zoo and BigDL. European atomic research institution CERN built a real-time event

core things (ID: aichip001) article | Xinyuan

core things reported on April 21 that in a stand-alone environment, it is not difficult for data scientists to build an artificial intelligence (AI) model, but how to easily build the entire distributed architecture and apply the AI ​​model to production data has caused many data scientists a headache.

The process of moving applications from laptops to production environment is quite long. After constructing a prototype with sample data on a laptop, you need to use historical data on the cluster to run the model experiments, and then deploy the algorithm online in a production environment.

In this process, many times data scientists have to rewrite code, perform model conversion, data transmission and copying, and the workload is very large.

So can build an end-to-end pipeline, which can seamlessly and automatically move AI applications from laptops to distributed environments with almost no modification of any code?

This is the vision that the Intel Big Data Analysis and Artificial Intelligence Innovation Institute is focusing on realizing. Since its establishment in China in June last year, the Innovation Institute has been committed to truly improving the efficiency of data analysis and AI in actual production environments through optimized libraries, software and tools.

Recently, through a conversation with Intel big data technology global CTO, and director of the Big Data Analysis and Artificial Intelligence Innovation Institute, Dai Jinquan, we are trying to see clearly what kind of game Intel is playing in the layout of artificial intelligence software?

▲Dai Jinquan, global CTO of Intel Big Data Technology and director of Big Data Analysis and Artificial Intelligence Innovation Institute,

1. The "three horses" of application research in the Innovation Institute

Intel hopes to build an end-to-end pipeline that unified big data analysis and AI, which can directly access production data. When users need to migrate AI applications from laptops to large clusters for distributed training or reasoning, they almost do not need to modify any code.

In order to realize this vision, the Intel Big Data Analysis and Artificial Intelligence Innovation Institute's method is applied research, specifically, there are "three horses": cutting-edge technology research, open source software platform, and practical application implementation .

cutting-edge technology research can be divided into two stages. Early research can be conducted on how to efficiently build deep learning applications on big data platforms. The next step of research focuses on better automation and seamless expansion of AI in big data environments.

According to Dai Jinquan, at the CVPR academic conference in June this year, the Intel Big Data Analysis and Artificial Intelligence Innovation Institute will conduct a phased report on its latest work. The main work is how to automatically build the machine learning workflow in a distributed big data environment.

open source software platform includes BigDL, a distributed high-performance deep learning framework based on Apache Spark and a unified big data analysis + artificial intelligence platform Analytics Zoo.

BigDL has similar functions to TensorFlow, Caffe and other frameworks, and can build various data analysis and deep learning applications on existing Hadoop and Spark clusters. Analytics Zoo is positioned as a software platform above a framework, with its main features being to support a variety of different deep learning frameworks and big data frameworks, libraries and tools. While

and other platforms utilize hardware computing capabilities, they can also achieve more automated and seamless perception by building an open source ecosystem, which can better help users solve problems.

Many Intel users, customers and partners have adopted this type of open source software platform. Dai Jinquan cited some of the latest cases of practical application implementation at home and abroad.

In China, Analytics Zoo has been integrated into Alibaba Cloud E-MapReduce service, which can directly run deep learning applications. At last year's Alibaba Cloud Tianchi Competition, Intel also used Flink and Analytics Zoo to provide real-time garbage classification detection.

Analytics Zoo is also integrated into Tencent Cloud Intelligent Titanium Machine Learning TI-ONE platform, providing various data processing and analysis based on big data.

Neosoft integrates functions based on AutoML time series analysis into its application performance management product RealSight APM, providing their users with application performance management and analysis.

Jinfeng Huineng builds AI applications based on Analytics Zoo, increasing the power prediction accuracy in some areas from 60% to more than 80%, thereby achieving energy saving effects.

Abroad, Analytics Zoo has been integrated into IBM Cloud Pak for Data in the United States. Mastercard has built deep learning recommendation services based on Analytics Zoo and BigDL. European atomic research institution CERN has built a real-time event filter for the Large Hadron Collider based on Analytics Zoo and BigDL. SK Telecom, the largest telecom company in South Korea, built intelligent communication network management based on Analytics Zoo.

2. Analytics Zoo's three-layer functions

Analytics Zoo is built on the software layer of the underlying Intel oneAPI, and provides three-layer functions based on this.

The first layer is unified data analysis and AI pipeline , which can provide relatively high-level pipelines to help users expand AI and deep learning into large-scale distributed big data environments.

In this layer, Analytics Zoo organically integrates frameworks such as TensorFlow, Keras, PyTorch, BigDL, Spark, Flink, etc., so that when users want to adopt appropriate processing methods for their application needs, they can build end-to-end workflows more flexibly.

For example, SK Telecom, Mastercard, etc. use Analytics Zoo to run large-scale distributed TensorFlow on Spark to process their data.

The second layer is an automated machine learning workflow , which can help users build the lower layer pipeline through automation methods such as AutoML. Customers such as Neusoft and Tencent Cloud cooperate with Intel to use such functions.

is the top layer that provides corresponding models and algorithms for different application scenarios. Users can also use any standard deep learning framework on the Analytics Zoo platform, including TensorFlow, PyTorch, etc.

3. From the overall software layout, Intel's AI confidence

In people's impression, Intel is a chip company with strong hardware technology, but its strict layout in the field of artificial intelligence software should not be underestimated.

First, From a research perspective, Intel Research Institutes at home and abroad have done a lot of relatively medium- and long-term AI algorithms and other research. For example, Intel China Research Institute has carried out many cutting-edge research in the field of computer vision. Dai Jinquan introduced that Intel has made great investments in AI algorithm research.

Second, in addition to research, Intel has a lot of work to focus on mid- to near-term software stacks, including the unified programming model oneAPI that runs the AI ​​software stack well on different hardware architectures such as CPU, GPU, FPGA, ASIC, etc., various computing libraries for deep learning, various optimizations for open source frameworks such as TensorFlow, PyTorch, and MXNet, as well as work on inference engines such as OpenVINO.

Dai Jinquan told Xintiao that in order to allow users to seamlessly run models on different architecture platforms through oneAPI, Intel has done a lot of work in tools, compilers, libraries, etc. "We are still very confident in performance and can achieve optimal or better performance improvements in different architectures."

Third, On this basis, Intel tries to build a convenient and efficient end-to-end platform for users, which can be expanded to big data and large-scale clusters, and can be expanded to different hardware architectures very transparently, and many other tasks that originally needed to be completed by hand, such as feature engineering, hyperparameter adjustment, model selection and distributed inference, are automated through machine learning, thereby greatly improving production efficiency and model accuracy and better improving application-level services.

Next, Intel will continue to explore solutions that are closer to the user's final application based on some more important application scenarios.

Conclusion: Software and hardware collaboration accelerates the efficient implementation of AI

"In our opinion, software and hardware collaboration can truly bring the computing power of hardware or chips to the extreme." Dai Jinquan said.

The core problem faced by many users is not what hardware to use or what deep learning framework to use, but more of an application-level problem.

Intel takes the underlying hardware innovation as the cornerstone, goes through the intermediate level of basic software and platform software, and finally helps enterprise users solve core problems at the application software level.

Nowadays, more and more companies are applying big data analysis and AI to production and operation. Solutions such as Intel that collaboratively innovate software and hardware are not only helping to lower the threshold for enterprises' digital transformation, but also press the acceleration button to improve the efficiency of AI applications.