The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d

2025/04/0103:37:45 technology 1960

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

The arrival of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big data puzzle, then no matter how much business and technology investment is invested, it will be futile, because a very classic saying: Garbage in, Garbage out, data quality is not guaranteed. To ensure data quality, data governance is a necessary means.

data governance topic seems to be high and high, but in fact it is very down-to-earth, or you must stand tall to achieve practical results. Dingtian means that, similar to informatization, data governance is also a top-level project. Without high-level promotion and coordination between business and business, and between business and technology, data governance cannot be implemented; standing means: generally IT personnel have a deep understanding of data issues, and IT personnel are the first to realize the importance of data governance, and data governance is ultimately implemented at the IT level.

1. Related concepts of data governance

1.1 Data classification

Go back to the topic, first of all, the basic concept part. Since we talk about data, we must first look at the classification of data. In fact, the author is a little worried about mentioning the word "categorization", because everyone and each character have different perspectives and make sense.

The data classification mentioned here refers to the usual classification method of data governance in the field of enterprise informationization. There are other ways, and you are welcome to discuss them together. We usually divide data into: main data, transaction data, reference data, metadata and statistical analysis data (indicators). The previous picture shows:

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews


Why do we talk about data classification? Because when governing each type of data, the focus, methods and effects are different, and we need to treat it differently. Let me talk about my personal understanding below:

main data focuses on "people" and "things". Master data management (MDM) is a special topic in the field of data governance. Its main purpose is to establish a unified view of key business entities (such as employees, customers, products, suppliers, etc.), so that the objective world is the same person or thing, and can be uniquely identified in the data world, rather than becoming different people or things in different systems and businesses. Master data management has already undergone a lot of practice in enterprises in various industries. Due to time constraints, it will not be developed separately today. Its core management idea is in line with the data governance method to be discussed later.

transaction data focuses on "things". Trading data does not form a separate field of data governance. Since transaction data is the basis of BI analysis, it is often focused on data quality management ;

reference data is a more fine-grained data, which is a standardized description of certain attributes of "people", "things" and "things". The management of reference data is generally carried out simultaneously with the main data management, or at the same time with BI data quality management, because the indicator dimensions and dimension values ​​directly affect the quality of BI data;

metadata is an all-inclusive concept, its essence is to provide descriptions for data, so any data has metadata. Metadata in the field of data governance refers more to metadata within the scope of BI and data warehouses (there are common Warehouse Meta-model specifications internationally), and there are metadata for information resource management (such as Dublin core protocol), geographic information metadata, meteorological metadata, etc. Because of this widespreadness, practitioners have extremely high expectations for it and great loss after practice.

talks about metadata: I have been engaged in product design and solution planning for metadata management for about 4 years, but now I rarely talk about "metadata", but I talk about " data definition ". When talking about data, I must talk about definition, but I do not manage it as a special type of data. Metadata management is done separately in the field of data governance, and there is little effect.

has two main reasons:

  • Data production is out of touch with data management, metadata management is more about metadata collection and application display after data production, and plays a very small role in controlling data production.
  • tool's own problem: Although many tools claim to support the CWM specification, automatic metadata acquisition has always been a technical problem. Moreover, for the stored procedure and custom scripts, it is difficult to automatically parse and obtain, so it is impossible to accurately and completely display the detailed data processing process.

statistical analysis data (indicators), without much need to be said. The main function of BI system construction at present is to calculate and display various indicators and reports. Indicators are often the focus of data governance. Data flow analysis of indicators, volatility and balance monitoring of indicator values ​​are almost essential applications for data governance in various enterprises.

1.2 Data governance

After talking about data classification, let’s talk about “what is data governance”. Data Governance is DataGovernance in English. The definitions given by different software manufacturers and consulting companies will also be different, but the essence is similar.

Here we quote the definition given by the book "DAMA Guide to Knowledge of Data Management": Data governance is a collection of activities (planning, monitoring and execution) that exercise power and control over data asset management. Data governance functions guide how other data management functions are implemented. It may be a bit abstract, with pictures and truth. The following figure illustrates the relationship between data governance and several other data management functions:

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

It can be seen that data governance runs throughout the entire process of data management, focusing on high-level topics such as data strategies, organizations, and systems. By formulating and implementing strategies, organizations, and systems, several other data management functions are integrated and coordinated, so that the data work of enterprises can become an organic whole rather than doing their own things.

The Chinese translation of DataGovernance, there are two most common translations in China: data governance and data control. Domestic customers seem to prefer data control because this word is powerful and embodies authority. The author’s experience from a practical level: governance and control are indispensable. Governance comes first and control comes later. Governance is aimed at existing data, a process from chaos to governance and establishing rules and regulations, while control is aimed at incremental data, which realizes the constraints that law enforcement must be strictly implemented and that the rules are not exceeded.

Why do data governance? The following is a survey result of the International Data Quality Association for reference.

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

Theoretically speaking, data governance mainly has three purposes: to ensure data availability, data quality and data security. At the practical level, when it comes to data governance at home and abroad, its main purpose is data quality. For data security, there are often special teams and management measures, and fewer involvement in the field of data governance. Our discussion below also inherits this habit and mainly discusses the goal of data quality.

concept discussion comes to an end first, and when discussing methods and practices, we will in turn have a better understanding of the concept.

2. Methods of data governance

In the method section, it mainly talks about three contents: Who is responsible for data governance? What are the governance or control targets? What are the technical tools?

2.1 Organizational Structure

First, let’s talk about who is responsible for data governance, that is, organizational structure, first take a picture.

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

From the perspective of theory and foreign practice, large enterprises will establish enterprise-level data governance committees, with business department leaders and IT department leaders participating, so that there can be more full discussion and communication between business and business and technology, so as to reach a consensus on the macro data strategy and system. At the enterprise level, there can also be department-level and project-level committees responsible for certain local data governance. At the grassroots level, there should be corresponding data management specialists (DataStewards) for a certain business field.

Steward actually means the butler, but it seems not serious enough to translate it into the butler, so the "specialist" is adopted. The word Steward corresponds to Owner, which means that although the assets are not owned by Steward, they manage Owner, and thus the term Stewardship is derived, indicating that the custody and custody system contains a kind of conscientious and self-disciplined steward spirit, which is rare! The Data Governance Committee and the Data Management Specialist will formulate a series of data-related standards and systems that will be implemented by the Data Management Service Organization (DMSO). As can be seen from the figure, DMSO is actually an information construction team, and they are responsible for the construction of technical platforms such as data warehouses and data integration.

talks about theory and foreign countries. The situation in China is just the opposite. DMSO is the main force because everyone generally "focuses on functions, neglects data, neglects technology, neglects management". Most companies lack the management roles of committees on the left. According to the author's experience, large domestic banks are relatively leading in this regard, and enterprise-level data governance committees or full-time departments promote data governance; the energy industry has a relatively high degree of contact and recognition of data governance, and has carried out many data governance projects, especially in the management of main data .

operators pay more attention to technical means, and the data governance system and mechanism need to be built and improved. Overall, not many data governance committees are established in China at the enterprise level. They are more focused on promoting data governance work in the "Enterprise Information Leading Group", and the information department is responsible for the specific implementation. Although some enterprises have a high level of informatization, the construction of information technology has not achieved centralized management of the information department, which has brought great challenges to the implementation of data governance, and cross-departmental and cross-system collaboration is extremely difficult.

2.2 Governance/control object

This part is mainly a summary of the author's personal practical experience, which may be different from some foreign theories. Personal summary is "content control" and "process control". The term control is used here to reflect some management "power".

2.2.1 Content control

First talk about content control. Data is reflected in different forms in the information system. Each form needs to be managed well in order to manage the final data quality. The previous picture shows:

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

From macro to micro, the form of data is reflected in data architecture, data standards and data quality standards.

data architecture, including data model ( conceptual model , logical model) and data circulation relationship. It generally talks about the data architecture at the enterprise level and system level, mainly planning and designing the classification, distribution and circulation of enterprise data to ensure that new systems and new applications can be consistent and integrated with existing systems, avoiding the generation of information silos, or causing duplication and unnecessary data integration and data conversion.

data standard includes different forms of standards such as data items, reference data, indicators, etc. For example, "customer type" is a data item, which should have a unified business meaning. What are the rules for classifying customers as large customers and general customers? The value of the data item is how many digits of length, what are the valid values ​​ (such as 01, 02, 03), etc. There are international standards for reference in this regard, such as ISO11179. Many domestic industries have also formulated industry data standards, such as e-government data elements, financial industry statistical data elements, etc. The common question is, how will the implementation be after the standard is defined? Has it been truly implemented in the IT system?

data quality standards, including data quality rules and audit models (i.e., the combination of rules). Data quality rules generally focus on timeliness, accuracy, completeness, consistency, uniqueness, etc. There are many other contents to talk about. Some experts have compiled 12 data quality dimensions, both qualitative and quantitative.

IT department should take the lead in formulating and regularly update enterprise-level data architecture, data standards and data quality standards as a guiding constraint for new systems and applications.It is worth noting that in the process of setting standards, we must avoid the IT department’s work behind closed doors and ensure that the business department is fully involved.

To give an example, as a technician, I personally participate in the planning of data architecture and need to design the data circulation relationship. The author found that from a technical perspective, it seems reasonable where the data flows from, and there can be corresponding tools to support it, and there seems to be no basis for decision-making. In fact, there should be business participation at this time, because the division of business functions, business processes and functional boundaries between business departments directly determines the source and destination of the data. The IT department considers specific implementation plans more from a technical level.

2.2.2 Process Control

The process discussed here refers to the information system construction process. Because after a lot of practice, we have found that one of the main reasons for poor data quality is that the control of data is ignored in the process of information system construction, which will lead to inconsistent data design and requirements, inconsistent development and design, lack of consideration of data quality requirements, inconsistent definition and technical implementation of different systems, and many other problems. Wait until the system is online before solving these problems, make up for the loss and consume resources.

In fact, data management and even IT industries should humbly learn management concepts from traditional industries. For example, the quality management of the manufacturing industry is to control quality in all aspects of the product production line, and some concepts are also inspiring: QualityBy Design, quality is designed, not checked; Quality check is a cost not benefit, quality inspection is cost rather than benefit.

The author's company has recently completed the exploration and preliminary practice of factory-based data production and management models. The operation efficiency, development and maintenance efficiency and data quality have been significantly improved. I will find opportunities to share and provide a rendering with some perceptual understanding.

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

Below is a schematic diagram of process control:

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

This picture has rich content. Its core content is to inject various standards and specifications formed in "content control" into the life cycle of information system construction, and ensure that the standards and specifications are followed through the control of delivered items at each stage of system construction, thereby ensuring that the standardization and standardization of data are guaranteed.

process control relies on the one hand to implement the evaluation mechanism in development management, and on the other hand, it relies on tools to solidify some standards and specifications and achieve automated inspections. During the normal operation stage of the system online, pay attention to the collection and processing of data requirements and data problems, and optimize the standards and specifications.

In the early stage of informatization, the construction of operational systems such as ERP, CRM is centered on functions and processes, while the construction of data analysis platforms such as BI, data warehouse, and big data platforms is centered on data. This destined that some traditional methods need to be changed. We should pay more attention to the control of data architecture, data standards, and data quality, and pay more attention to the life cycle of the data. Otherwise, the probability of successful construction of data analysis platforms is not high.

2.2.3 Technical Tools

Let’s briefly talk about technical tools. Let’s take a picture first, which is the conclusion of a foreign research on key technologies in data governance.

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

You can see that metadata, main data, and data quality are the main technical means. The specific product functions are not the topic to be discussed today. The author mainly wants to talk about the positioning of technical tools in data governance. It is very similar to the situation encountered by ERP. Domestic customers often hope that the previous set of technical tools can cure all kinds of problems and improve data quality.

. The actual situation is that if the management mechanisms and technical standards such as organizational structure, content control, process control and other management mechanisms and technical standards mentioned above are not in place, just the previous set of software tools will not have any effect. What is the function of the above software tools? The core role is to solidify knowledge and improve the work efficiency of data governance personnel.

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

The arrival of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big data puzzle, then no matter how much business and technology investment is invested, it will be futile, because a very classic saying: Garbage in, Garbage out, data quality is not guaranteed. To ensure data quality, data governance is a necessary means.

data governance topic seems to be high and high, but in fact it is very down-to-earth, or you must stand tall to achieve practical results. Dingtian means that, similar to informatization, data governance is also a top-level project. Without high-level promotion and coordination between business and business, and between business and technology, data governance cannot be implemented; standing means: generally IT personnel have a deep understanding of data issues, and IT personnel are the first to realize the importance of data governance, and data governance is ultimately implemented at the IT level.

1. Related concepts of data governance

1.1 Data classification

Go back to the topic, first of all, the basic concept part. Since we talk about data, we must first look at the classification of data. In fact, the author is a little worried about mentioning the word "categorization", because everyone and each character have different perspectives and make sense.

The data classification mentioned here refers to the usual classification method of data governance in the field of enterprise informationization. There are other ways, and you are welcome to discuss them together. We usually divide data into: main data, transaction data, reference data, metadata and statistical analysis data (indicators). The previous picture shows:

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews


Why do we talk about data classification? Because when governing each type of data, the focus, methods and effects are different, and we need to treat it differently. Let me talk about my personal understanding below:

main data focuses on "people" and "things". Master data management (MDM) is a special topic in the field of data governance. Its main purpose is to establish a unified view of key business entities (such as employees, customers, products, suppliers, etc.), so that the objective world is the same person or thing, and can be uniquely identified in the data world, rather than becoming different people or things in different systems and businesses. Master data management has already undergone a lot of practice in enterprises in various industries. Due to time constraints, it will not be developed separately today. Its core management idea is in line with the data governance method to be discussed later.

transaction data focuses on "things". Trading data does not form a separate field of data governance. Since transaction data is the basis of BI analysis, it is often focused on data quality management ;

reference data is a more fine-grained data, which is a standardized description of certain attributes of "people", "things" and "things". The management of reference data is generally carried out simultaneously with the main data management, or at the same time with BI data quality management, because the indicator dimensions and dimension values ​​directly affect the quality of BI data;

metadata is an all-inclusive concept, its essence is to provide descriptions for data, so any data has metadata. Metadata in the field of data governance refers more to metadata within the scope of BI and data warehouses (there are common Warehouse Meta-model specifications internationally), and there are metadata for information resource management (such as Dublin core protocol), geographic information metadata, meteorological metadata, etc. Because of this widespreadness, practitioners have extremely high expectations for it and great loss after practice.

talks about metadata: I have been engaged in product design and solution planning for metadata management for about 4 years, but now I rarely talk about "metadata", but I talk about " data definition ". When talking about data, I must talk about definition, but I do not manage it as a special type of data. Metadata management is done separately in the field of data governance, and there is little effect.

has two main reasons:

  • Data production is out of touch with data management, metadata management is more about metadata collection and application display after data production, and plays a very small role in controlling data production.
  • tool's own problem: Although many tools claim to support the CWM specification, automatic metadata acquisition has always been a technical problem. Moreover, for the stored procedure and custom scripts, it is difficult to automatically parse and obtain, so it is impossible to accurately and completely display the detailed data processing process.

statistical analysis data (indicators), without much need to be said. The main function of BI system construction at present is to calculate and display various indicators and reports. Indicators are often the focus of data governance. Data flow analysis of indicators, volatility and balance monitoring of indicator values ​​are almost essential applications for data governance in various enterprises.

1.2 Data governance

After talking about data classification, let’s talk about “what is data governance”. Data Governance is DataGovernance in English. The definitions given by different software manufacturers and consulting companies will also be different, but the essence is similar.

Here we quote the definition given by the book "DAMA Guide to Knowledge of Data Management": Data governance is a collection of activities (planning, monitoring and execution) that exercise power and control over data asset management. Data governance functions guide how other data management functions are implemented. It may be a bit abstract, with pictures and truth. The following figure illustrates the relationship between data governance and several other data management functions:

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

It can be seen that data governance runs throughout the entire process of data management, focusing on high-level topics such as data strategies, organizations, and systems. By formulating and implementing strategies, organizations, and systems, several other data management functions are integrated and coordinated, so that the data work of enterprises can become an organic whole rather than doing their own things.

The Chinese translation of DataGovernance, there are two most common translations in China: data governance and data control. Domestic customers seem to prefer data control because this word is powerful and embodies authority. The author’s experience from a practical level: governance and control are indispensable. Governance comes first and control comes later. Governance is aimed at existing data, a process from chaos to governance and establishing rules and regulations, while control is aimed at incremental data, which realizes the constraints that law enforcement must be strictly implemented and that the rules are not exceeded.

Why do data governance? The following is a survey result of the International Data Quality Association for reference.

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

Theoretically speaking, data governance mainly has three purposes: to ensure data availability, data quality and data security. At the practical level, when it comes to data governance at home and abroad, its main purpose is data quality. For data security, there are often special teams and management measures, and fewer involvement in the field of data governance. Our discussion below also inherits this habit and mainly discusses the goal of data quality.

concept discussion comes to an end first, and when discussing methods and practices, we will in turn have a better understanding of the concept.

2. Methods of data governance

In the method section, it mainly talks about three contents: Who is responsible for data governance? What are the governance or control targets? What are the technical tools?

2.1 Organizational Structure

First, let’s talk about who is responsible for data governance, that is, organizational structure, first take a picture.

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

From the perspective of theory and foreign practice, large enterprises will establish enterprise-level data governance committees, with business department leaders and IT department leaders participating, so that there can be more full discussion and communication between business and business and technology, so as to reach a consensus on the macro data strategy and system. At the enterprise level, there can also be department-level and project-level committees responsible for certain local data governance. At the grassroots level, there should be corresponding data management specialists (DataStewards) for a certain business field.

Steward actually means the butler, but it seems not serious enough to translate it into the butler, so the "specialist" is adopted. The word Steward corresponds to Owner, which means that although the assets are not owned by Steward, they manage Owner, and thus the term Stewardship is derived, indicating that the custody and custody system contains a kind of conscientious and self-disciplined steward spirit, which is rare! The Data Governance Committee and the Data Management Specialist will formulate a series of data-related standards and systems that will be implemented by the Data Management Service Organization (DMSO). As can be seen from the figure, DMSO is actually an information construction team, and they are responsible for the construction of technical platforms such as data warehouses and data integration.

talks about theory and foreign countries. The situation in China is just the opposite. DMSO is the main force because everyone generally "focuses on functions, neglects data, neglects technology, neglects management". Most companies lack the management roles of committees on the left. According to the author's experience, large domestic banks are relatively leading in this regard, and enterprise-level data governance committees or full-time departments promote data governance; the energy industry has a relatively high degree of contact and recognition of data governance, and has carried out many data governance projects, especially in the management of main data .

operators pay more attention to technical means, and the data governance system and mechanism need to be built and improved. Overall, not many data governance committees are established in China at the enterprise level. They are more focused on promoting data governance work in the "Enterprise Information Leading Group", and the information department is responsible for the specific implementation. Although some enterprises have a high level of informatization, the construction of information technology has not achieved centralized management of the information department, which has brought great challenges to the implementation of data governance, and cross-departmental and cross-system collaboration is extremely difficult.

2.2 Governance/control object

This part is mainly a summary of the author's personal practical experience, which may be different from some foreign theories. Personal summary is "content control" and "process control". The term control is used here to reflect some management "power".

2.2.1 Content control

First talk about content control. Data is reflected in different forms in the information system. Each form needs to be managed well in order to manage the final data quality. The previous picture shows:

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

From macro to micro, the form of data is reflected in data architecture, data standards and data quality standards.

data architecture, including data model ( conceptual model , logical model) and data circulation relationship. It generally talks about the data architecture at the enterprise level and system level, mainly planning and designing the classification, distribution and circulation of enterprise data to ensure that new systems and new applications can be consistent and integrated with existing systems, avoiding the generation of information silos, or causing duplication and unnecessary data integration and data conversion.

data standard includes different forms of standards such as data items, reference data, indicators, etc. For example, "customer type" is a data item, which should have a unified business meaning. What are the rules for classifying customers as large customers and general customers? The value of the data item is how many digits of length, what are the valid values ​​ (such as 01, 02, 03), etc. There are international standards for reference in this regard, such as ISO11179. Many domestic industries have also formulated industry data standards, such as e-government data elements, financial industry statistical data elements, etc. The common question is, how will the implementation be after the standard is defined? Has it been truly implemented in the IT system?

data quality standards, including data quality rules and audit models (i.e., the combination of rules). Data quality rules generally focus on timeliness, accuracy, completeness, consistency, uniqueness, etc. There are many other contents to talk about. Some experts have compiled 12 data quality dimensions, both qualitative and quantitative.

IT department should take the lead in formulating and regularly update enterprise-level data architecture, data standards and data quality standards as a guiding constraint for new systems and applications.It is worth noting that in the process of setting standards, we must avoid the IT department’s work behind closed doors and ensure that the business department is fully involved.

To give an example, as a technician, I personally participate in the planning of data architecture and need to design the data circulation relationship. The author found that from a technical perspective, it seems reasonable where the data flows from, and there can be corresponding tools to support it, and there seems to be no basis for decision-making. In fact, there should be business participation at this time, because the division of business functions, business processes and functional boundaries between business departments directly determines the source and destination of the data. The IT department considers specific implementation plans more from a technical level.

2.2.2 Process Control

The process discussed here refers to the information system construction process. Because after a lot of practice, we have found that one of the main reasons for poor data quality is that the control of data is ignored in the process of information system construction, which will lead to inconsistent data design and requirements, inconsistent development and design, lack of consideration of data quality requirements, inconsistent definition and technical implementation of different systems, and many other problems. Wait until the system is online before solving these problems, make up for the loss and consume resources.

In fact, data management and even IT industries should humbly learn management concepts from traditional industries. For example, the quality management of the manufacturing industry is to control quality in all aspects of the product production line, and some concepts are also inspiring: QualityBy Design, quality is designed, not checked; Quality check is a cost not benefit, quality inspection is cost rather than benefit.

The author's company has recently completed the exploration and preliminary practice of factory-based data production and management models. The operation efficiency, development and maintenance efficiency and data quality have been significantly improved. I will find opportunities to share and provide a rendering with some perceptual understanding.

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

Below is a schematic diagram of process control:

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

This picture has rich content. Its core content is to inject various standards and specifications formed in "content control" into the life cycle of information system construction, and ensure that the standards and specifications are followed through the control of delivered items at each stage of system construction, thereby ensuring that the standardization and standardization of data are guaranteed.

process control relies on the one hand to implement the evaluation mechanism in development management, and on the other hand, it relies on tools to solidify some standards and specifications and achieve automated inspections. During the normal operation stage of the system online, pay attention to the collection and processing of data requirements and data problems, and optimize the standards and specifications.

In the early stage of informatization, the construction of operational systems such as ERP, CRM is centered on functions and processes, while the construction of data analysis platforms such as BI, data warehouse, and big data platforms is centered on data. This destined that some traditional methods need to be changed. We should pay more attention to the control of data architecture, data standards, and data quality, and pay more attention to the life cycle of the data. Otherwise, the probability of successful construction of data analysis platforms is not high.

2.2.3 Technical Tools

Let’s briefly talk about technical tools. Let’s take a picture first, which is the conclusion of a foreign research on key technologies in data governance.

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

You can see that metadata, main data, and data quality are the main technical means. The specific product functions are not the topic to be discussed today. The author mainly wants to talk about the positioning of technical tools in data governance. It is very similar to the situation encountered by ERP. Domestic customers often hope that the previous set of technical tools can cure all kinds of problems and improve data quality.

. The actual situation is that if the management mechanisms and technical standards such as organizational structure, content control, process control and other management mechanisms and technical standards mentioned above are not in place, just the previous set of software tools will not have any effect. What is the function of the above software tools? The core role is to solidify knowledge and improve the work efficiency of data governance personnel.

For example, if you need to manually write the metadata collected by the program, the tools will help you automatically obtain it; if you need to manually identify or write code to implement the data quality inspection, the tools will help you automatically identify problems; if you use the document to manage the data dictionary , the tools will help you manage it online; based on email and offline processes, the tools will help you automate it online.

In addition, the software tools for data governance are the same as other software tools. There is nothing magical about it. Without the participation of data governance personnel and the promotion of data governance work, the software just looks beautiful. This is also why data governance consulting services have always had their markets, and why most domestic simple data governance software projects have failed to meet their expected goals.

3. Practical cases of data governance

The first case is system-level data governance for operator customers. The main revelation lies in the importance of the organizational structure in promoting data governance.

operator data warehouse has been built for many years, and has always attached great importance to metadata management and data quality management. Data quality problems are often found in data warehouses, and a large proportion of problems are due to the upgrade of the upstream BOSS system or the data error is transmitted to the data warehouse.

For example, a new product was launched but the data warehouse has not been registered yet, the number of SIM card numbers has been upgraded but the data warehouse has not been notified, etc. This shows two problems: business personnel and analysis system technicians do not have enough coordination; business systems and analysis system are not enough coordination.

Therefore, the supervisor of the data warehouse tried to promote the coordinated management of data quality of BOSS and data warehouses from the group, and established a series of technical means such as cross-system metadata blood maps and data quality linkage monitoring through pilot projects in several provinces to solve the problem.

However, the work of data quality collaborative management has finally been piloted and failed to be promoted nationwide. There are three main reasons:

  • organization, the BOSS system and data warehouse have not implemented centralized IT management and are managed by two departments at the same level.
  • BOSS system has a higher business criticality than data warehouses.
  • This work was initiated as a technical work and did not seek support, participation or even lead from the business department.

It can be seen that the organizational structure and management mechanism are not smooth, which will restrict the resolution of data problems and may even bring about data problems.

The second case is enterprise-level data governance for an energy industry customer. The main revelation is: data governance must not only focus on the big picture, but also start small, and be good at finding opportunities to get started.

This customer designed an enterprise-level data architecture through information planning. Through the main data management project, the enterprise-level master data standards were established in one year, and different business departments recognized the responsibility of data in different fields (that is, assumed the role of a data management specialist). Through the data management project, the business departments and information departments in data management work have been straightened out. In the project management office, the data management and control group was set up in the PMO project management office, and the system, process and technical standards were formulated. The organization, system and standards are in place, but the implementation of technical standards has not been smooth.

For example, the suite software implementation team led by ERP has always been very resistant to the standards of organizational master data, and refused to use 8-bit unified encoding but instead used local 4-bit encoding. The impact of this problem is not obvious when only ERP system is only , and the data control group cannot promote the application of 8-bit encoding. With the construction of non-supplied software in the later stage of the project, the integration requirements between systems are enriched. If the coding standards cannot be unified, the integration between systems cannot be achieved.

At this time, non-ERP systems comply with the standard to use unified 8-bit encoding. The ERP project team had to give in and realized 4-bit and 8-bit encoding mapping through mapping tables to ensure smooth integration. It can be seen from this that after the organizational structure, management mechanism and technical standards are established, it requires the opportunity to implement it, and the patience and wisdom of data management personnel, otherwise it can only be a paper talk.

The third case is a case in the United States. The main revelation is: starting with in small ways, it can be very, very small, which is very beneficial to domestic customers' ideas that are large and comprehensive.

This company is also trapped in data quality issues and hopes to solve it through data governance. But at the beginning, they didn’t know how to actually operate data governance, so they launched a “enterprise data definition” project: spend 6 months sorting out the data items of existing systems and identifying cross-system and cross-business data items as the focus of data governance. After sorting out the data items, they chose 7 data items for key governance.

Note that there are only 7 data items! Domestic customers will definitely think that 7 is too few and cannot be done as a matter. But this American company is investigating related business users around these 7 data items, discovering their data usage needs and problems, and analyzing business processes and data processes related to these data items. Later, more than 40 contents that can be improved were identified, and experience was accumulated for the comprehensive development of data governance. On this basis, the overall plan and implementation route were formulated.

4. Big data and data governance

Finally talked about big data. Judging from the previous discussion, the context of data governance is not complicated: clear data assets, clear management rights and responsibilities, establish supporting standards and specifications, and ensure implementation, thereby ensuring data quality. Although big data is large in scale, many types and fast in speed, the principle of data governance is also applicable to big data.

So what new requirements will the arrival of big data put forward for data governance?

First, let’s look at one of the views of the author of "The Big Data Era". He believes that data quality is no longer important in the big data era, because what people need is analysis of overall trends rather than precise results. I personally do not agree with this view, but believe that data quality is more important for big data. The overall trend analysis mentioned by the author

is only one of the applications of big data. From the perspective of application scenarios such as precise marketing and risk identification, because data and operations are more closely combined with finer data granularity, any error may directly lead to business losses; while traditional indicator applications do not have such a direct impact on the operational link. Therefore, the demand for data quality in a big data environment is to improve rather than reduce.

Secondly, the application of big data technologies such as Hadooph and Spark puts forward new requirements for the technical means of data governance. In the traditional mode, it is managed based on RDBMS, which is a common data access method. In the big data environment, Hadoop, MPP, RDBMS, and Spark coexist. How to achieve visual and unified control of data assets in a mixed and heterogeneous environment to avoid the big data system becoming an unmanageable black box is one of the key issues that traditional industry needs to face when applying big data technology .

, especially big data technology talents, are currently more flowing to Internet companies, and very few have entered traditional industries. When the availability of talents cannot be quickly solved in the short term, technical means need to be relied on to ensure that traditional enterprise IT personnel can visualize and control data assets.

Third, data security, or data privacy, is significantly more important than before, which also requires strengthening the attention to data security in data governance. In traditional application scenarios, data is collected by enterprises and applied internally to enterprises, and the issue of data ownership is not prominent.

In the era of big data, data needs to be more cross-border integration and external application business model innovation, which involves more topics of data ownership and data privacy. Is user information a company or a user? Under what conditions can the company use it for commercial applications? The answers to these questions are still being discussed. There is no doubt that enterprises need to pay more attention to systems and policies related to data security and data privacy in the process of data governance.

Note: Article author: Liu Chen, if there is any infringement, please contact the owner of this official account to delete

Source | Big Data Ball, Data School, Enterprise Digital Consultation

Editor | Yunyin

Business | Teacher Xu 18584596819 (same as WeChat)

Contact email | [email protected]

Copyright statement | Copyright belongs to the original author and the original source. This official account is reproduced for readers to learn and communicate only. If there is any infringement, please contact the background to delete

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

The third case is a case in the United States. The main revelation is: starting with in small ways, it can be very, very small, which is very beneficial to domestic customers' ideas that are large and comprehensive.

This company is also trapped in data quality issues and hopes to solve it through data governance. But at the beginning, they didn’t know how to actually operate data governance, so they launched a “enterprise data definition” project: spend 6 months sorting out the data items of existing systems and identifying cross-system and cross-business data items as the focus of data governance. After sorting out the data items, they chose 7 data items for key governance.

Note that there are only 7 data items! Domestic customers will definitely think that 7 is too few and cannot be done as a matter. But this American company is investigating related business users around these 7 data items, discovering their data usage needs and problems, and analyzing business processes and data processes related to these data items. Later, more than 40 contents that can be improved were identified, and experience was accumulated for the comprehensive development of data governance. On this basis, the overall plan and implementation route were formulated.

4. Big data and data governance

Finally talked about big data. Judging from the previous discussion, the context of data governance is not complicated: clear data assets, clear management rights and responsibilities, establish supporting standards and specifications, and ensure implementation, thereby ensuring data quality. Although big data is large in scale, many types and fast in speed, the principle of data governance is also applicable to big data.

So what new requirements will the arrival of big data put forward for data governance?

First, let’s look at one of the views of the author of "The Big Data Era". He believes that data quality is no longer important in the big data era, because what people need is analysis of overall trends rather than precise results. I personally do not agree with this view, but believe that data quality is more important for big data. The overall trend analysis mentioned by the author

is only one of the applications of big data. From the perspective of application scenarios such as precise marketing and risk identification, because data and operations are more closely combined with finer data granularity, any error may directly lead to business losses; while traditional indicator applications do not have such a direct impact on the operational link. Therefore, the demand for data quality in a big data environment is to improve rather than reduce.

Secondly, the application of big data technologies such as Hadooph and Spark puts forward new requirements for the technical means of data governance. In the traditional mode, it is managed based on RDBMS, which is a common data access method. In the big data environment, Hadoop, MPP, RDBMS, and Spark coexist. How to achieve visual and unified control of data assets in a mixed and heterogeneous environment to avoid the big data system becoming an unmanageable black box is one of the key issues that traditional industry needs to face when applying big data technology .

, especially big data technology talents, are currently more flowing to Internet companies, and very few have entered traditional industries. When the availability of talents cannot be quickly solved in the short term, technical means need to be relied on to ensure that traditional enterprise IT personnel can visualize and control data assets.

Third, data security, or data privacy, is significantly more important than before, which also requires strengthening the attention to data security in data governance. In traditional application scenarios, data is collected by enterprises and applied internally to enterprises, and the issue of data ownership is not prominent.

In the era of big data, data needs to be more cross-border integration and external application business model innovation, which involves more topics of data ownership and data privacy. Is user information a company or a user? Under what conditions can the company use it for commercial applications? The answers to these questions are still being discussed. There is no doubt that enterprises need to pay more attention to systems and policies related to data security and data privacy in the process of data governance.

Note: Article author: Liu Chen, if there is any infringement, please contact the owner of this official account to delete

Source | Big Data Ball, Data School, Enterprise Digital Consultation

Editor | Yunyin

Business | Teacher Xu 18584596819 (same as WeChat)

Contact email | [email protected]

Copyright statement | Copyright belongs to the original author and the original source. This official account is reproduced for readers to learn and communicate only. If there is any infringement, please contact the background to delete

The advent of the big data era has allowed governments and enterprises to see the value of data assets, quickly begin to explore application scenarios and business models, and build technical platforms. This is understandable. However, if data governance is forgotten in the big d - DayDayNews

technology Category Latest News