GDPR is the strictest data protection regulations in history, and is making useful explorations for the regulatory standards of this "tightening curse"
(At the Hannover Science and Technology Exhibition in 2017, a TV reporter interviewed a robot. Photo/AFP)
"Finance" reporter Zhou Yuan Special contributor He Qianyue | Text Xie Lirong | Edit
"Because of GDPR (EU General Data Protection Regulations), our new product release in Europe has been postponed." Li Zhifei, founder and CEO of Go out to ask , told
. Gouwenwen is a Chinese artificial intelligence (AI) startup company. The company produces smart voice devices such as smart watches and smart speakers based on AI technology. Since 2016, the products have been sold to the European and American markets.
is not just about going out for a while, a reporter from "Finance" learned that many AI companies are busy working together with legal experts to comprehensively review and adjust their products, services and even business models based on GDPR terms.
GDPR's full name is General Data Protection Regulation, which is known as the strictest data protection regulations in history. It was officially implemented by the EU on May 25, 2018. As a mandatory law, it protects the "personal data" of natural persons, including name, address, birthday, credit card, bank, medical information, location information, IP address, etc.
This means that any company that establishes an institution in the EU or provides products and services to the EU will be subject to GDPR when processing data from individuals in the EU unless the EU's 500 million developed population market is abandoned.
If GDPR is violated, companies will face up to 4% of global annual revenue or 20 million euros (about 150 million yuan) (both are the highest).
AI is an industry that relies heavily on data. Looking at the entire process of AI implementation, from the initial training of the AI algorithm model to the formation of the final AI product or service, data is an indispensable means of production, which means that GDPR's constraints on AI run through the entire AI life cycle. Moreover, it is not just AI companies, but companies that use AI algorithms to transform their own businesses need to evaluate whether they violate GDPR.
"Finance" reporter found that at present, some inappropriate views such as "GDPR leads to deep learning being illegal" are widely circulated, and AI practitioners are generally confused in compliance construction. How to correctly understand the legal terms of GDPR, evaluate possible risks, and predict what impact GDPR will have on the AI industry inside and outside the EU has become the top priority.
"Two Steps" defines personal data
GDPR consists of 99 clauses in chapter 11. The purpose of the legislation is to realize the value of the data on the premise of protecting personal data. Therefore, the first question that companies need to understand is: What is personal data? This is not as simple as it seems, and improper understanding may cause the company to fall into legal risks in the future.
According to the GDPR regulations, the name, address, birthday, credit card, IP address and other information are all personal sensitive information. In addition, data that reveal people's race, political tendencies, religious and philosophical beliefs, genetic and biological data, and even personal health or sexual life are also clearly stated to be personal data, but there are still some confusing data types in reality, such as whether voice data and license plate numbers belong to personal data?
Wang Rong, an expert at Tencent Research Institute who specializes in data protection laws and regulations, said that the specific situation was analyzed. GDPR stipulates that the definition of personal data is broad, and the data that can indirectly identify a specific natural person also belongs to personal data and is within the scope of protection.
"If a voice can locate a specific individual in combination with other information, then this voice data can be regarded as personal data. The same is true for license plate numbers. Because they are unique, they can be identified in many scenarios. This is why Google Street View has to erase all license plate numbers, but the license plates of a public car do not belong to personal information, so they must be analyzed in combination with specific scenarios." Wang Rong explained to the reporter of "Finance".
AI companies use a variety of data types. How to determine whether a certain type of data belongs to personal data? Wang Rong introduced a "two-step" method: the first step is to determine whether this data is generated by a specific individual; the second step is to look at the recognizability, and it is naturally not controversial to directly identify the individual, but be careful to identify the situation of "indirectly identifying the individual".
Once the data in hand is defined as personal data, enterprises and institutions can only seek the consent of individual users one by one, and give individual users the right to withdraw their consent at any time in the future, and to require relevant institutions to delete their personal data at any time.
However, in order to balance with other legitimate interests, GDPR lists some scenarios that do not require the consent of individual users. For example, performing a task for the public interest or due to official authoritative requirements, in order to perform legal duties, in order to protect the core interests of the data subject or another natural person, etc.
This means that public institutions can still set up cameras in public places and use facial recognition technology to investigate potential terrorists; after medical institutions discover a major epidemic, they can also process personal data without the consent of the parties involved.
For this issue, my country's industry standard "Personal Information Security Specifications" (effective on May 1, 2018) has also been revised, and 11 exceptions are listed in detail to ensure the rational use of data.
Deep learning is illegal? Misreading
If improper understanding of personal data may pose legal risks to AI companies, the interpretation of another clause is causing some AI practitioners to be overly worried.
This clause is called "The Right to Explanation of Automated Decision". Many industry insiders interviewed by Caijing reporters believe that this is a clause specially established by GDPR for AI.
Regarding this clause, the author of the Ultimate Algorithm, Pedro Domingos, professor at the University of Washington, made a stunning remark earlier this year: From May 25, the EU will require all algorithms to explain their output principles, which means deep learning will be illegal.
Artificial intelligence existed as an independent discipline as early as the 1950s and has been exposed to cold for a long time because it is difficult to implement. One of the main reasons why artificial intelligence has become popular again in recent years is the emergence of deep learning theory and technology.
But deep learning is still a "black box" to a certain extent, and it is difficult to explain the specific internal logic.
He Baohong, deputy director of the Institute of Cloud Computing and Big Data of the China Academy of Information and Communications Technology, once analyzed that although people use deep learning to build neural networks, they cannot reasonably explain some of the "intelligence" shown by neural networks, and they cannot predict the effect of learning in advance.
"In order to improve the effectiveness of neural network training, there is basically no other tricks except for continuously increasing the network depth and number of nodes, feeding more data and increasing computing power, and then repeatedly adjusting parameters. Moreover, parameter adjustment is like metaphysics, and no system experience has been summarized for guidance. It relies entirely on personal experience, and even depends on luck." He Baohong said.
This is also why Professor Domingos believes that the "interpretability" proposed by GDPR will put deep learning into an "illegal" state.
But Wang Rong told the "Finance" reporter: This interpretation is not rigorous.
This is because there is no so-called "automatic decision interpretation right" in the official terms of GDPR. It is just explained in the GDPR background introduction (Recital71): When the data subject is not satisfied with the automation decision, it can require manual intervention and express opinions to obtain relevant explanations for the relevant automation decisions.
"According to European legislative practices, the background introduction to the legislation is only to help how to understand the terms and does not have legal effect on it." Wang Rong said.
The EU 29 Working Group actually clarified this issue in October 2017: Regarding automatic decision-making, data controllers do not necessarily need to explain complex algorithms. For users, they only need to inform the basic logic or standards behind it in the simplest way possible.
Wang Rong said that EU lawmakers are actually alert to the "algorithm discrimination" formed by "pure automatic algorithms".At present, some companies do "data portraits" people for business needs, but the portrait results are likely to have biased against some users. At this time, companies often defend themselves on the grounds that "I am purely an algorithm, no one intervenes", and GDPR gives users the right to ask the company to give an explanation.
From this perspective, although GDPR does not force AI companies to explain algorithms, AI companies still need to work hard to solve the black box problem of AI in the future, because it is not ruled out that AI companies need to explain algorithm logic to regulators in specific scenarios.
"The scale of this explanation is, there are currently no past cases to refer to, and it can only be known in specific precedents of the execution process." Yang Zhirong, professor of the Department of Computer Science at the University of Norway, told the reporter of "Finance".
Possible Impact on AI
Because of GDPR, AI companies’ top priority is to actively eliminate non-compliant data and reevaluate existing algorithm models.
From a compliance perspective, if the previous AI model was trained based on non-compliant data, it should be retrained. However, the technical director of a well-known domestic AI company told the "Finance" reporter that once an AI model is trained, it is almost impossible to prove that the model is trained from illegal data, so it is difficult to ask the model provider to delete the original model and retrain one.
Still, AI companies need to reevaluate existing algorithm models. This is because whether it is actively eliminating non-compliant data or deleting data at user requests, it will lead to a decrease in the amount of data in the hands of AI companies, and the decrease in data will affect the accuracy of the algorithm, so it needs to be retrained.
In terms of industry segments, voice interaction AI companies are less affected by GDPR, especially much smaller than AI companies engaged in facial recognition. Long Mengzhu, marketing director of Sibichi, a company in
voice interaction solutions, told the reporter of "Finance" that the whole story is: voice research is based on specific scenarios, and voice data in general scenarios downloaded from the Internet is useless, so voice AI companies find people to record in special scenarios, or purchase them from professional data companies. This means that the data obtained has been approved by the parties, and the data source acquisition is legal.
In the long run, GDPR's strict protection of personal data will increase the data acquisition and processing costs of AI companies. Previously, the cost of AI companies to obtain data was close to zero, and GDPR ended such a "good time".
For companies, GDPR compliance is a systematic, dynamic and long-term task, requiring a certain amount of human and financial investment, and not all companies can bear this cost. Many industry insiders told the reporter of "Finance" that GDPR may delay the development of the European local AI industry, and some Chinese AI companies may also slow down their entry into the European market because they cannot bear the compliance cost.
But there are also opinions that as time goes by, spending in this area will gradually decline. Zoom.Ai CEO Roy Pereira once predicted that AI companies will not consider data spending as a burden or hinder innovation in two years.
Since AI is a technology that can be applied to various industries, not only specialized AI companies are affected, but companies that use AI to make their businesses smarter must also make adjustments.
A research and development worker of a European technology company told the "Finance" reporter that AI is currently widely used in Internet products. For example, the company calculates the user churn rate of the product, which is to use the user's personal data and the product usage data logs, such as every click, interaction, and browsing, and use machine learning models to judge if the user may be lost, and then use algorithm-based precise marketing delivery (for example, advertising to users again through advertising channels) to retain users.
"The data collected before GDPR is quite large. After GDPR, sensitive data related to customers' privacy, such as gender, age, address, etc., must be removed from the shelves, and the algorithm must also be corrected. Some of them do not have time to take it off the shelves, so the algorithm needs to be suspended and the compliance is adjusted before continuing to use it." The above-mentioned researcher said. The impact of
GDPR involves all AI application companies.To be specific, it depends on the industry. For example, the customer retention rate in the product is just a decrease in accuracy, because the application of AI algorithms is just an icing on the cake for this type of product, so the overall impact is limited.
, but for businesses with higher dependence on AI algorithms, the impact is relatively large. For example, advertising companies use AI algorithms to deliver advertising, and the decline in delivery accuracy will have a great impact on it and will cause customer churn. In addition, e-commerce companies often increase sales through AI algorithm recommendations, which will also be greatly affected.
Although GDPR puts a "tightening curse" on AI, practitioners generally agree that the development and application of artificial intelligence requires a good environment, trust and responsibility, and legal supervision is essential. GDPR, as the strictest data protection regulations in history, is making useful explorations for the regulatory standards of this "tightening curse".
(This article will be published in the "Finance" magazine published on June 25, 2018)