Apache IoTDB time series database is a top open source project for the field of industrial Internet of Things, independently developed by my country and open sourced to Apache. With the continuous introduction of triggers, UDF and other functions, the application scenarios of IoT

2024/05/0200:06:34 technology 1975

Apache IoTDB time series database is a top open source project for the field of industrial Internet of Things, independently developed by my country and open sourced to Apache. With the continuous introduction of triggers, UDF and other functions, the application scenarios of IoTDB are becoming more and more abundant. This article focuses on sharing the new features of IoTDB, as well as technology selection considerations in industrial IoT application solutions, hoping to be helpful to relevant practitioners in technology selection and overall architecture design.

Apache IoTDB time series database is a top open source project for the field of industrial Internet of Things, independently developed by my country and open sourced to Apache. With the continuous introduction of triggers, UDF and other functions, the application scenarios of IoT - DayDayNews

▲ Huang Xiangdong, assistant researcher at the School of Software, Tsinghua University

Guest introduction: Huang Xiangdong, assistant researcher at the School of Software, Tsinghua University, China Association for Science and Technology Youth Talent Entrustment, Communications Committee Member of the CCF Database Special Committee, and Member of the Open Source Technology Committee of the China Communications Society , Member of the Apache International Open Source Software Foundation, V.P. of the Apache IoTDB project. The main research field is big data management technology, focusing on industrial big data management. Obtained more than 30 national invention patents and authorizations. Participated in the development of my country's new generation meteorological big data platform, and was responsible for the development of the industrial Internet of Things time series database management system software IoTDB. The open source version became the first Apache top-level project launched by a Chinese university. Hosted the National Natural Science Foundation Project and the China Postdoctoral Science Fund. Won the first prize of the 2018 Ministry of Education Technology Invention Award and the 2018 first prize of the Chinese Meteorological Society Science and Technology Progress Award.

The following is the transcript of Mr. Huang Xiangdong’s speech at the DTCC2021 conference:

IoTDB project introduction

Industrial Internet of Things time series data is a digital record of the physical quantities of industrial equipment. It is time-stamped data and contains rich industrial semantics. IoTDB is a high-performance and lightweight time-series data management system developed by Tsinghua University and focusing on the industrial Internet of Things. It provides data collection, storage, and analysis functions.

General Electric (GE), the international industrial Internet leader, pointed out in 2012: "Making full use of massive time series data to drive industrial innovation, competition and growth is a historic opportunity that big data technology brings to the new industrial revolution ."

You can observe the Seattle report in the database field, which points out that a database development prospect will be formed in the next five years. A report specifically mentioned that IoT will bring some new challenges to database writing queries, including: massive sequence storage, complex metadata management, rich query requirements, edge-cloud collaboration, etc. As a result, time series data has become more concerned in both industry and academia.

Apache IoTDB time series database is a top open source project for the field of industrial Internet of Things, independently developed by my country and open sourced to Apache. With the continuous introduction of triggers, UDF and other functions, the application scenarios of IoT - DayDayNews

If you delete the ordinate of the industrial IoT time series data in the figure, change it to "value". Then, it will change from industrial IoT time series data to time series data. It can be seen that time series data is actually very common. The behaviors of people, software, and machines are constantly generating time series data.

Regarding TSDB, Wikipedia explains that time series database is a software used to store time-series data and create indexes based on time (points or intervals). Its main application scenarios are divided into three types: APM monitoring, Internet of Things applications, and data analysis. Currently, there are various types of time series data management systems, including ClickHouse, Druid, HBase, OpenTSDB, Parquet on HDFS/S3, PG/MySQL, TimeScaleDB, MatrixDB, InfluxDB, M3DB, TDengine, Apache IoTDB, Skywalking EMQx, etc. Therefore, when facing different systems, there will be different selection solutions.

Apache IoTDB new features

IoTDB follows the open source model in the Apache community and adapts and integrates with a series of open source software. The ultimate goal is to create an open source solution for time series data full life cycle management , including data collection, storage, processing, and Analysis, application, all links are connected.

In the collection stage, there are many open source software, such as EDGENT and PLC4X. In the analysis stage, you can use Spark and Hive for big data analysis. In the application stage, you can choose open source programs such as Grafana, calcite, and karaf.

Apache IoTDB time series database is a top open source project for the field of industrial Internet of Things, independently developed by my country and open sourced to Apache. With the continuous introduction of triggers, UDF and other functions, the application scenarios of IoT - DayDayNews

Judging from the release history, IoTDB comes from the Software School of Tsinghua University. In 2017, we open sourced it.In 2018, many experienced partners in the Apache community worked together to help improve IoTDB, gradually changing from demo form to product form. In 2019, the first Apache IoTDB version was released. In fact, it can be seen that the release speed of IoTDB is not very fast, but basically one or two versions are released every year, and each version has improvements in writing, query, and stability. In 2019, IoTDB optimized the function of sorting out-of-order data. In 2020, IoTDB's query performance has been greatly improved and new memory control functions have been provided. In 2021, the community officially launched the IoTDB 0.12 series version, and maintained a total of 5 minor versions 0.12.0 - 0.12.4 on version 0.12.

Feature 1: Data model from background definition to edge device definition

In recent years, equipment parts of traditional industrial enterprises have been upgraded more and more frequently. In the process of upgrading the system, the data collection method is also changing, and measuring points are added or reduced. Finally, such a table structure will be formed. But when you actually read the data, you will find that the columns of a table need to change frequently.

Feature 2: From regular load to complex load

Due to too many measurement points of the device, in the relational database mode, such a table will be forced to be dismantled vertically. Because if a database does not set the number of columns in the table, it will not be able to allocate memory, and it will also have a certain impact on the system.

Apache IoTDB time series database is a top open source project for the field of industrial Internet of Things, independently developed by my country and open sourced to Apache. With the continuous introduction of triggers, UDF and other functions, the application scenarios of IoT - DayDayNews

In addition, complex industrial equipment is composed of many independent sub-equipments or components. From the perspective of the database, each measurement point is collected independently, with inconsistent collection frequency and inconsistent time. Although the table structure is most suitable when making queries, the user's expressive ability is strongest at this time. But when writing and managing, users are not required to see such a table. Based on this, managing data in a hierarchical structure is a natural idea for industrial enterprises. In this hierarchical structure, the collection frequency of measurement points on the device can be different.

During the application process, IoTDB has always insisted on using the tree structure in the figure above, which is more in line with the way of asset management and asset scheduling.

IoTDB defines multiple concepts. In the scenario of industrial Internet of Things, equipment or devices that directly possess physical quantities will generate data, and all physical quantities have ownership and are called entities. For example, a wind turbine , a car, a bridge, etc. are all entities. A collection of multiple entities whose data is physically isolated on disk is called a storage group.

Physical quantities (working conditions, fields, measuring points, variables) can measure the measurement information recorded by the device, and it can be univariate or multivariate. Unary physical quantities can independently collect power, voltage value, current, support displacement, wind speed, vehicle speed, longitude, latitude, etc. Multiple physical quantities can be collected simultaneously such as GPS (longitude, latitude), etc.

From this, we define time series (entity + physical quantity), including unary sequence (entity + one-yuan physical quantity) and multivariate sequence (entity + multivariate physical quantity). For IoTDB, our storage becomes entities + physical quantities, such as the rotation speed of a certain wind turbine and the GPS of a certain vehicle.

Apache IoTDB time series database is a top open source project for the field of industrial Internet of Things, independently developed by my country and open sourced to Apache. With the continuous introduction of triggers, UDF and other functions, the application scenarios of IoT - DayDayNews

What impact will different modeling methods have on the system? Obviously, each unary sequence will have its own timeline. If the timelines of multiple unary sequences are exactly the same, then the timelines must be repeated, which will lead to a waste of space resources. In this case, it should be stored as a multivariate sequence.

IoTDB will try to identify in the future whether the scenario is suitable for univariate sequences or multivariate sequences. At this stage, users need to define it themselves.

Apache IoTDB time series database is a top open source project for the field of industrial Internet of Things, independently developed by my country and open sourced to Apache. With the continuous introduction of triggers, UDF and other functions, the application scenarios of IoT - DayDayNews

In real-life applications, there will also be a large number of entities of the same type and model. Each entity has the same set of physical quantities, such as a batch of cars of the same model and a batch of fans of the same model. We call them physical quantity templates. Using physical quantity templates can significantly reduce the cost of metadata management and .

Apache IoTDB time series database is a top open source project for the field of industrial Internet of Things, independently developed by my country and open sourced to Apache. With the continuous introduction of triggers, UDF and other functions, the application scenarios of IoT - DayDayNews

Different sequence types have different encoding and compression method choices to provide better compression rates. Among them, PLAIN is suitable for data that changes greatly, is irregular, and is difficult to predict. RLE works well for data that has mostly the same values.TS_2DIFF is suitable for data that changes stably. GORILLA is suitable for floating point numbers with small data changes. DICTIONARY is suitable for TEXT types with smaller cardinality.

Feature 3: From time series data query to time series data processing

Storing data is a cost for users, but analyzing the value in the data can create benefits for users. Therefore, it is necessary to provide IoTDB with analytical capabilities, especially for analysis in the industrial field. So, can the analysis tasks, as well as the processing and analysis tasks, be handed over to IoTDB?

For database, there are several opportunities for data analysis. The first opportunity is when the data first comes in, and it may be processed once. When data arrives, IoTDB provides a computing model based on sliding windows and single points to implement stateful and stateless computing. At the same time, in order to limit the complexity of calculations, IoTDB only allows analysis calculation tasks to calculate a single sequence or multiple sequences under one device.

The second opportunity is when the data has been stored in the database but has not yet been queried. During this time, we can analyze and calculate the data in advance. This part can be represented by index . The purpose of the index is to allow users to use the data faster.

The third opportunity is to calculate during query, such as averaging or absolute value of the original data. If the concept of query-time calculation is mapped to the database, we have to mention the user-defined function (UDF). For UDF, there are two types of time series data, one is UDTF (user-defined time series function), that is, input n sequences and output 1 sequence; the other is UDAF (user-defined aggregation function ), that is Input n sequences and output 1 point.

Apache IoTDB time series database is a top open source project for the field of industrial Internet of Things, independently developed by my country and open sourced to Apache. With the continuous introduction of triggers, UDF and other functions, the application scenarios of IoT - DayDayNews

In the past few years, we have been doing research on data quality of time series data. With the help of UDF, these algorithms are integrated with IoTDB to form a set of IoTDB-Quality data quality algorithm library for time series data. Of course, this set of algorithms is not strongly bound to IoTDB.

Apache IoTDB time series database is a top open source project for the field of industrial Internet of Things, independently developed by my country and open sourced to Apache. With the continuous introduction of triggers, UDF and other functions, the application scenarios of IoT - DayDayNews

IoTDB supports user-defined downsampling. If the downsampling capability provided by the system or the functions and services are not satisfied, for example, when the data is dense, the average value is used, and when the data is sparse, the maximum value and the minimum value are used. In this case, the downsampling capability of DB cannot support such rich customization, so it can be achieved using custom functions.

As another example, an important industrial application of time series data is real-time alarming. Once an alarm occurs, rapid response control is required. However, in real applications, observation values ​​below the specified threshold will cause a large number of alarms. If you want to use it in production, you often need to filter out false alarms. The so-called false alarms are anomalies caused by fluctuations in data.

Apache IoTDB time series database is a top open source project for the field of industrial Internet of Things, independently developed by my country and open sourced to Apache. With the continuous introduction of triggers, UDF and other functions, the application scenarios of IoT - DayDayNews

We use UDF to solve the problem of false alarms. For such an alarm, we divide it into several steps, including jump clearing, threshold filtering, matching true alarms, abnormality calculation, abnormal alarming, and manual confirmation.

is written at the end

From the two versions of IoTDB 0.12 and 0.13, we have been thinking about: As the use of data becomes higher and higher, what new features should emerge in IoTDB. So we have been doing UDF and trigger functions. At the same time, in the development of distributed versions, IoTDB will continue to strengthen the capabilities of time partitioning and virtual storage groups. In the future, IoTDB will provide everyone with increasingly rich capabilities and better performance.

comes from "ITPUB Blog", link: http://blog.itpub.net/31545813/viewspace-2903547/. If you need to reprint, please indicate the source, otherwise legal responsibility will be pursued.

technology Category Latest News