With the continuous update and iteration of big data technology, data management tools have developed rapidly, and related concepts have emerged like mushrooms after a rain. For example, from the initial Decision Support System (DSS) to Business Intelligence (BI), data warehouse, data lake, data middle platform, etc., these concepts are particularly easy to be confused. This article systematically analyzes these nouns and terms to facilitate readers to have a comprehensive understanding of the concepts related to the data platform.
1, BI
Business Intelligence (BI, Business Intelligence) is an information system established to provide decision-making and analytical operational data. It combines technologies such as data warehouse, online analysis and processing, data mining with application systems such as customer relationship management , and applies it to the actual process of enterprise activities, ultimately achieving the purpose of serving management decisions.
BI uses information technology to integrate and convert various data scattered within and outside the enterprise into knowledge, and conduct decision-making analysis and calculation based on certain specific topic needs; users will find solutions needed to solve business problems through reports, charts, and multi-dimensional analysis; these results will be presented to decision makers to support strategic decisions and define organizational performance , or be integrated into the intelligent knowledge base to automatically push them to customers.
2. Data Warehouse
data warehouse (Data Warehouse), also known as enterprise data warehouse, is a theme-oriented, integrated, relatively stable, and historical changes data collection storage system. It aggregates structured data from different sources for comparison and analysis in the business intelligence field. The data warehouse is a repository containing a variety of data and is highly modeled. The role of the
data warehouse system can realize data integration across business lines and systems, providing unified data support for management analysis and business decision-making. Data warehouses can fundamentally help people transform their company's operating data into high-value, accessible information or knowledge, and pass the right information to the right people in the right way at the right time.
3. Data Lake
data lake (Data Lake) is a data storage concept proposed by James Dixon, CTO of Pentaho, which is a method of storing data in a natural format in a system or repository. The data lake serves as a centralized repository where structured and unstructured data of any size can be stored. In a data lake, data that does not need to be structured can be stored, so that different types of analysis can be run.
Data lake can help enterprises realize various capabilities such as centralized data management. The data lake combines advanced data science, machine learning and artificial intelligence technologies to help enterprises build more optimized data operation models and provide enterprises with predictive analysis, recommendation models and other capabilities. These models can stimulate the continuous growth of enterprise capabilities and continuously empower enterprise growth.
4. Data Middle Platform
The new concept of "data middle platform" was first introduced to China by Ali from Finland in 2014. It is still in a "definition chaos period". Different people have their own understanding of the data middle platform. Some data experts interpret it as a data middle platform, which is a sustainable mechanism for "using the enterprise's data", is a strategic choice and organizational form. It is a mechanism for continuously turning data into assets and serving the business based on the company's unique business model and organizational structure, and supported by tangible products and implementation methodology.In its strategic interpretation of the data middle platform, Alibaba Cloud proposed that "the middle platform contains advanced technology (technical competitiveness), but it is not just technology, but more importantly, the organization can rely on advanced technology and use its core resources (resource competitiveness) to build its competitiveness, voice, and ecological centripetal force (ecological competitiveness). The middle platform is a kind of capability (technology, enablement, empowerment, innovation, and ecology). "
Data middle platform collects, manages, modeling, analysis and application of multi-source heterogeneous data within and outside the enterprise, so that data optimization management can improve business value, and conducts external data cooperation to release business value, making it a center for enterprise data asset management. After the data middle platform is established, data API services will be formed to provide enterprises and customers with various efficient data services.
5. Data Warehouse VS Data Lake
In terms of storage, the data lake can process all types of data, such as structured data, unstructured data, semi-structured data, etc. The data type depends on the original data format of the data source system. Data warehouses mainly process historical and structured data, and are usually extracted from transaction systems.
Data Lake is suitable for in-depth analysis. It has strong enough computing power to process and analyze all types of data, and can do data mining and data analysis. Data warehouses mainly process structured data, converting them into multidimensional data, or converting them into reports to meet subsequent advanced reporting and data analysis needs.
Compared with data warehouses, data lakes lack structurality, are more flexible, and provide higher agility. Data warehouses are characterized by high performance and repeatability.
6. Data Warehouse VS Data Middle Station
The starting point of the data warehouse is a supporting technical system that emphasizes data quality and metadata management; while the first starting point of the Data Middle Station is not data but business, and it pays more attention to thinking about what kind of data services are needed for business issues.
There are also obvious differences between the two in the specific technical processing link. The data preprocessing process is changing from the traditional ETL structure to the ELT structure. The traditional data warehouse integrated processing architecture is an ETL structure, which is an important part of building a data warehouse, that is, users extract the required data from the data source, clean the data, and load the data into the data warehouse. The architecture system under the background of big data is an ELT structure, which extracts the desired original data from the data at any time for modeling and analysis based on the application needs of the upper layer.
7. Data Warehouse VS BI
Business Intelligence BI is a larger concept compared to data warehouses. Business intelligence can be said to be based on data warehouses. After data mining, it obtains commercial value. So data warehouse is a gold mine, data mining is alchemy, and business reporting is gold. The data warehouse is like the foundation of the BI house. Only after the DW foundation is built can it be analyzed and used and finally generate value.
Text source: World of Eyes (WeChat Official Account)