There are many articles on the Internet about "How to build a data middle platform", and everyone has different opinions.

2025/06/0508:57:37 technology 1739

There are many articles on the Internet about "How to build a data middle platform", and everyone has different opinions.

  • Some people say that the data middle platform is a methodology for data construction. Data middle platform can be built by following the design methods and specifications of the data middle platform;
  • Some people also believe that behind the data middle platform is the change of the organizational structure of the data department. The original scattered organizational structure is formed into a unified middle platform department, and the data middle platform is built;
  • In addition, you may have heard some big data companies say that they can sell product technologies that support the construction of the data middle platform.

So, how to build a data middle platform?

There are many articles on the Internet about

The solution to the six core problems lies in the promotion of two major concepts

In fact, as early as 2016, Alibaba proposed two core concepts of data middle platform construction: One Data, One Service, which is also the world-recognized method to solve the current problem of digital transformation . In one sentence, it is : all data is processed only once, data is a service .

There are many articles on the Internet about 1

One Data

One Data means that all data is processed only once .

For example, in the e-commerce scenario, the data middle platform is to form a public data layer in the entire e-commerce business, collaborate with cross-departmental decimal warehouses to realize data reuse, and data processing will not be repeated due to different application scenarios.

There are many articles on the Internet about

Ali Data Middle Platform Panorama

So, how to do it to achieve data processing only once? There are five points:

  • Sub-theme Domain Management
  • Naming Specification Definition
  • Indicators Consistent
  • Data Model Multiplexing
  • Data Improvement

Just imagine, now you are starting to build a data middle platform. The first step is to face the enterprise with tens of thousands of tables, and at the same time, there are dozens of data development and maintenance of these tables. How do you ensure the management efficiency of these tables?

  • First, you need to divide the theme domain . We can divide these tens of thousands of photos into different theme domains. For example, in e-commerce business, products, transactions, traffic, users, after-sales, distribution, and supply chain can all be used as theme domains. Standard and accurate topic domain division is relatively stable, covering most tables as much as possible. (We can understand it as a data directory)

There are many articles on the Internet about

Data directory: convenient for table management and directory-based data retrieval

  • In addition, the naming of the table must be standardized and unified . The table name is best to carry the table's topic domain, business process, hierarchy and partition information.
  • Next, in order to realize the multiplexing of the model of , the data storage of the data middle platform is suitable for the layered design of . Common layers include: ODS raw data layer, DWD detailed data layer, DWS light summary data layer, ADS/DM application data layer/data mart layer.
  • Finally, the data in the data middle platform must cover all business processes as much as possible , and the data in each layer of the data in Taichung should also be as perfect as possible so that data users can use the aggregated data as much as possible.

There are many articles on the Internet about

Unified Data Specification

In summary, the goal of the One Data system is to build a unified data specification standard so that data becomes an asset, not a cost.

There are many articles on the Internet about 2

One Service

One Service is data-as-a-service , emphasizing that the data in the data in the Taiwan Strait should be accessed through the API interface.

So, why do data have to be accessed through the API interface? What problems do they have to directly provide data tables to users without the API interface?

If you are a data application developer, when you want to develop a data product, you must first export the data to different query engines: use MySQL for small data volume; use HBase for large data, use HBase for large data, and use Greenplum for multi-dimensional analysis; use Redish for high real-time requirements.

Therefore, for different query engines, application development requires customization of different access interfaces .

There are many articles on the Internet about

Use the data API to greatly reduce the workload of data developers

If you are a data developer, when a task cannot be produced on time and an exception occurs, you want to know which applications or reports this table may affect downstream, but you find that simply relying on the blood relationship between the table and the table cannot touch the application, and you cannot know which applications the last table are accessed by.

At the same time, when you want to offline a table, because you don’t know who accessed this table, it cannot be implemented, which ultimately leads to the dilemma of “easy to go online and difficult to go offline”.

On the one hand, the API interface blocks the underlying data storage for application development, uses a unified standard API interface to query data, which improves the speed of data access.

On the other hand, for data development, the management efficiency of data applications is improved, and the link relationship between table and application is established. Once problems arise, data needs to be traced, and then the link is clearly passed through the link to the specified table and field.

There are many articles on the Internet about

Use the data API to extend the data blood relationship to the application layer, and fixing the problem is more convenient

So, how to achieve data service? There are four points:

  • blocks heterogeneous data source
  • controls data gateway
  • provides a user-oriented logic model
  • guarantees performance and stability

blocks heterogeneous data source : Data services must be able to support a wide variety of query engines to meet the query needs of data in different scenarios. Common ones include MySQL, HBase, Greenplum, Redis, Elasticsearch, etc.

There are many articles on the Internet about

Data gateway : To implement a series of control capabilities including permissions, monitoring, flow control , logs, which application page accesses which model, and real-time tracking is required. If some models have not been accessed for a long time, the following line should be given.

Every application that uses data should implement identity authentication and interface permission management through accesskey and secretkey. In addition, accessing logs can facilitate faster troubleshooting when access problems occur.

Logical model : From the user's perspective, block the implementation of underlying model design and provide users with logical models. It can help application developers block the underlying data physical implementation, implement data of the same granularity to construct a logical model, simplifying the complexity of data access.

Performance and stability : Since the data service invades the user's access link, there are high requirements for the service availability and performance. The data service must be stateless and can be scaled horizontally. The goal of the

One Service system is to string scattered pearls (data) into necklaces (chains), while improving the data sharing ability so that the data can be used well and smoothly!

There is limited space, and the "bottom technology of the data middle platform" and the current industry trend "lightweight data middle platform" will be introduced later. If you want to obtain a complete set of learning materials about the data middle platform in advance, you can like and follow it and send a private message to Mai Cong.

You may want to see:

Data Middle Platform: Started from Alibaba, and became popular in DaaS

Data Middle Platform Past and Present Life (IV): Considering the enterprise adaptability from the problems solved by the data middle platform

Data Middle Platform Past and Present Life (III): The second half of the Internet + digital transformation = Data Middle Platform

Data Middle Platform Past and Present Life 2: From the data lake to the big data platform, looking at the development of data application requirements

Data Middle Platform Past and Present Life (I): Data Warehouse - the emergence of data application requirements

Maicong Software, a world-leading DaaS manufacturer and a leader in lightweight data middle platform. More than 30 of the Fortune 500 groups have been selected, helping nearly 400 companies to further transform their digital transformation within two years. The core product, Maicong DaaS platform, includes two major modules: unified data management and unified data service. It has functions related to data integration, data development, data quality, and data service. Everyone is welcome to discuss with us with corporate digitalization issues.

technology Category Latest News