This has led to widespread discussion in the field of data science, because strict data regulations will have a huge impact on data science projects, especially machine learning.

2025/06/1817:48:37 hotcomm 1345

Leifeng.com AI Technology Review Note: The EU issued the Data Protection Regulation GDPR on May 25, 2018, which has caused widespread discussion in the field of data science. This is because strict data regulations will have a huge impact on data science projects, especially the field of machine learning .

At present, with the advancement of technology, machine learning is also developing rapidly, and global investment in this field is increasing day by day. Machine learning is rapidly becoming a trend in enterprise data science. And with the advent of strict GDPR, what impact will it have on machine learning? How to continue data science and its R&D projects under the constraints of GDPR? The GDPR just promulgated by

has not yet fully taken effect. Everyone’s understanding of how to implement this regulation is still vague and is still constantly exploring, but the key problems and challenges brought by GDPR have gradually emerged. Andrew Burt, chief privacy officer and legal engineer at data management platform Immutable, wrote an article explaining three major issues that his company has suffered from about the impact on machine learning. Leifeng.com AI Technology Comments compile the relevant content as follows.

Question 1: Will GDPR ban machine learning?

of course not. Even after GDPR comes into effect, machine learning will not be banned in the EU. However, inevitably, the application of machine learning will involve heavy compliance issues afterwards.

According to the requirements of the regulations, GDPR will completely prohibit automated decisions that have no human intervention and will have a significant impact on the data subject. It is worth noting that GDPR applies to all situations where EU data is used, which may be able to identify a data subject, and for data science programs that use a large amount of data, this means that GDPR will apply to all its activities.

GDPR's definition of "automated decision-making" refers to a model that automatically makes decisions without direct human participation. This includes automatic "user portrait analysis" of data subjects, such as classifying users as "prospective customers" or "male age 40-50" to determine whether the loan applicant is eligible for the loan.

Therefore, to identify whether a machine learning model is an "automated decision", the first thing is to see if the model is automatically deployed without human intervention. If so, such a model is prohibited by default. In fact, this is the case with a large number of machine learning models. Although many lawyers and data scientist have opposed this, the EU official, working group 29, who participated in the drafting and interpretation of GDPR, is the case for this explanation.

GDPR bans machine learning? , the word "forbidden" is misleading. There are special cases where prohibiting automated decision-making, and using the word "ban" is too tough. Once GDPR takes effect, data scientists should expect that most applications of machine learning can still be achieved, just adding to the compliance burden they cannot ignore.

will describe the special cases other than "prohibited".

GDPR Regulations clarify three areas where the use of independent decision-making is legal:

When a contract is signed, data processing is necessary;
other laws authorize it separately;
The data subject clearly agrees.

In fact, the last one is more realistic. The common way to solve this ban is that the data subjects explicitly allow their data to be used by the model. However, it is not easy to get the data subject to consent. Data subjects can agree to many different types of data processing, and they can also revoke their consent at any time, which means that in the use of data, it is necessary to manage the data subject's consent for the use of data in a refined manner, allow the data subject to choose different types of consent, dynamic (allowing the data subject to revoke the consent), and provide sufficient user-friendliness, that is, to allow the data subject to understand how their data is used, and to give users the right to control the use of data.

GDPR does not completely prohibit the use of machine learning models, but it will make it increasingly difficult to deploy and manage many machine learning models and their input data.

Question 2: Does machine learning require "interpretability"?

One of the most common questions I hear about the impact of GDPR on machine learning is whether machine learning needs "interpretability". Last year, the author wrote a special article to discuss this issue.

This problem stems from the fact that the GDPR itself is somewhat unclear. The risk posed by

"interpretability" is very high and may have a huge impact on enterprise data science. The complex structure of machine learning models gives them magical prediction capabilities, and it is difficult to explain their internal composition clearly.

Let's start with the text of the GDPR regulations.

In Articles 13-15 of the Regulations, GDPR repeatedly states that the data subject has the right to understand “meaningful information” about the use of data and the “important and foreseeable consequences” of automated decision-making. Then, in Article 22, the GDPR stipulates that users can object to decision-making only if the above-mentioned impact types are met. Finally, the preamble of Article 71 is a non-binding part of the Ordinance, which states that the data subject can require automated decisions to give a reasonable explanation and that the data subject can question these decisions. In short, these three regulations bring more complex scenarios to the use of data.

EU regulators may interpret these provisions in the most stringent ways due to the vague text, such as requiring machine learning models to provide a complete explanation of internal structures, but this approach seems unreasonable.

The more appropriate explanation for these texts may be that when machine learning is used to make decisions without human intervention, and when these decisions have a significant impact on the data subject, the data subject has the right to have a basic understanding of what is happening. The “meaningful information” and “foreseeable consequences” in GDPR may be interpreted in this way. EU regulators may focus on the right of data subjects to make decisions on data usage, and transparency about data usage may be based on the model and the corresponding situation.

Question 3: Does the data subject have the right to request that their information be deleted and then retrain the model?

This is perhaps one of the hardest questions to answer under the GDPR regulations. In other words, if a data scientist uses the data of a certain data subject to train a model and then incorporates new data into the model, will the previous data subject still have some power over the models trained with their data?

As far as I know, the answer will be no, at least in practice, with very few exceptions. To explain it more clearly, I will start with these special cases.

Under GDPR, all data use needs to be carried out under the permission of law, and Article 6 of GDPR stipulates six corresponding legal basis. There are two most important basis for "legal rights and interests", and the data subject explicitly agrees to use the data. In this case, when the processing of the data is based on the consent of the data subject, the data subject will still retain important control over the data, which means that they can withdraw their consent at any time and the legality of processing the data will no longer exist.

So if the organization collects data from the data subject, the data subject agrees to use their data for training a specific model, but then withdraws the consent, when can the data subject force the model to retrain new data?

answer is that it is only possible if the model continues to use the data from that data subject.

As noted by the Working Group 29, all processing that occurred before the revocation remains legal even if the data subject revoked its consent. Therefore, if the data is legally used to create models or predictions, then whatever the output of that data is, it can be retained. In fact, once a model is created with a set of training data, neither the deletion or modification of the training data will affect the previous model.

However, some studies have shown that the model may retain information about the training data, and that the original data can still be found through the model even after the training data is deleted, as written by researcher Nicolas Papernot et al. (Privacy issues of model, see this article).This means that in some cases, retaining the training model and deleting the original data cannot guarantee that the original data will not be reproduced in the future, or in some cases, the original data may still be in use.

But how likely is it to restore the original training data from the model? It's almost impossible.

As far as we know, this kind of research is only conducted in an academic environment, and the data science of enterprises is far from that of academic environment. It is precisely for this reason that the authors do not believe that the model will be retrained because it is required by the data subject. Although this is theoretically possible, it is already a very marginal exception, and only after the exception occurs in a particular situation will regulators and data scientists need to deal with it.

Nevertheless, there are a lot of nuances in all these problems, and these nuances will surely arise in the future. With 99 body texts and 173 introductions, the GDPR is destined to be very long and complex regulations and becomes more complex over time.

However, at least one thing is clear: Thanks to GDPR, in the future large-scale data science programs, lawyers and engineers who specialize in handling privacy will become core members of the data science program.

via www.oreilly.com, Leifeng.com AI Technology Review Compilation