In recent years, academic, industry and government departments have increasingly attached great importance to AI ethics. From AI ethics regulatory policies to AI ethics technical means, improving AI ethics compliance and creating AI ethics technology tools and products and servic

2025/03/0322:15:51 technology 1295

Machine Heart Analyst Network

Author: Zhu Jiying

Editor: H4O

This article focuses on the research and development of interpretability tools, and interprets the three types of interpretability tools/methods (local interpretability, rule interpretability, and conceptual interpretability) mentioned in AAAI-2022 "Tutorial on Explanations in Interactive Machine Learning", focusing on understanding the latest research progress of interpretability tools and methods.
Background
In recent years, the academic, industrial and government departments have paid more and more attention to AI ethics. From AI ethical supervision policy to AI ethics technical means, improving AI ethical compliance and creating AI ethical technology tools and products and services have become a core point in continuously improving the competitive advantages of the AI market. From the perspective of specific practice in the industry, mainstream foreign technology companies, including IBM, Microsoft , Google , etc., as well as domestic Tencent , Weibo, Meituan , etc., have continued to increase research and practice on AI ethics. Tencent Research Institute also released the industry's first "Interpretable AI Development Report 2022—The Concept and Practice of Opening the Algorithm Black Box".
explains that AI is a very complex field. In addition to involving technologies such as AI algorithms/models, it also involves ethics, laws and regulations and other issues. At the same time, the pursuit of explainable AI also needs to balance the efficiency and performance of AI. Therefore, explainable AI is a long-term problem that needs further exploration and a critical issue that urgently needs to be solved. From the perspective of interpretable AI research, it can be divided into two categories. One category is to focus on how to promote model transparency . For example, by controlling or interpreting the training data, input and output of AI models/algorithms, model architecture, influencing factors, etc., to improve the transparency of the model, so that regulatory departments, model users and users can understand AI models more easily; the other category is to research and develop interpretability tool , that is, to use tools to explain existing AI models, such as Microsoft's open source software package InterpretML for training interpretable models and interpreting black box systems, TensorFlow 2.0 interpretability analysis tool tf-explain, IBM's AI Explainability 360 toolkit, etc.
This article focuses on the research and development of interpretability tools, and interprets the three types of interpretability tools/methods (local interpretability, rule interpretability, conceptual interpretability) mentioned in AAAI-2022 "Tutorial on Explanations in Interactive Machine Learning", focusing on understanding the latest research progress of interpretability tools and methods.
In the just past AAAI-2022, there is a special tutorial introducing the interpretability of interactive machine learning. The entire tutorial was introduced by four experts to four parts: motivation and challenges, interaction through local interpretation, interaction through rule-based interpretation, and interaction through concept-based interpretation [2], focusing on interpretability tools, that is, to improve the interpretability of the AI model itself from a technical perspective, making it more "transparent" to users.
Interaction through local interpretation
Interaction through local interpretation is the most common interpretable AI method, that is, given a predictor and a target decision, the input attribute determines which input variables are "most relevant" to the decision. The SHAP interpretation model (SHapley Additive exPlanation) and LIME interpreters (Local Interpretable Model-agnostic Explanations) that we are more familiar with all belong to this category of methods.
characteristics based on local interpretation methods include:
enables users to establish psychological models for individual predictions;
is difficult to obtain enough samples to obtain an overview of the model decision process;
may cause bias based on the samples observed by users.
This article selects a method from AAAI-2022 tutorial, namely FIND, to interpret it. This article is officially published in EMNLP 2020, and the relevant code has been published https://github.com/plkumjorn/FIND.
FIND: Human-in-the-Loop Debugging Deep Text Classifiers
Since it is almost impossible to obtain a perfect training dataset (i.e., a fairly large, unbiased, dataset that can well represent unseen cases), many real-world text classifiers are trained on existing, imperfect datasets. Therefore, these classifiers may have undesirable characteristics. For example, they may be biased against certain subgroups or may not work effectively in a real environment due to overfitting. This paper proposes a framework that enables humans to debug deep learning text classifiers by disabling unrelated hidden features [3]. The author named this framework FIND (Feature Investigation aNd Disabling, Feature Investigation and Disabled). FIND uses an explanation method, namely layer-wise relevance propagation (LRP) [1], to understand the behavior of the classifier when predicting each training sample. It then uses a word cloud to aggregate all the information to create a global visual map of a model that allows humans to understand features that are automatically learned by the depth classifier, and then disable some features that may affect the accuracy of the prediction during testing.
LRP is an explanation method based on deep Taylor decomposition, which uses the importance score of input features to explain neural network predictions. LRP uses deep Taylor decomposition technology to assign the correlation of the output backwards through a pre-trained network and determines the contribution of nodes to classification. Depending on the activation degree and network weights, the correlation for each layer is obtained by propagating the correlation at the next layer. The interpreter gives a pixel-level heatmap with the same dimension as the input image, thus visualizing important areas in the input image that contribute to the selected category.
Modern NLP models are usually end-to-end, without explicitly encoding semantic features, and the work of understanding and analyzing them is not intuitive, so people are curious to know what the models have learned. As shown in Figure 1, it is difficult for the NLP black box model to clearly characterize the association between word (x) and category probability (p):
Figure 1. The association between word (x) and category probability (p) is difficult to map (Image from Tutorial slides, https://sites.google.com/view/aaai22-ximl-tutorial)
Generally speaking, the deep text classifier can be divided into two parts. The first part performs feature extraction and converts the input text into a dense vector representing the input (i.e., feature vector ). The second part performs classification. After passing the feature vector through the dense layer, use softmax to activate it to obtain the probability of the predicted class. These depth classifiers are all opaque because humans cannot explain the meaning of intermediate vectors or model parameters used for feature extraction. This makes it impossible for humans to use their knowledge to modify or debug classifiers. Instead, if we understand which patterns or qualities of inputs are captured in each feature, we can understand the overall reasoning mechanism of the model, because the dense layers of the classification part become interpretable. By introducing LRP, this paper checks whether the input mode sent to is related to classification by each feature. Additionally, these features can be applied to dense layers to support the correct categories. Figure 2 specifically shows the architecture of FIND.
Figure 2. Overview of FIND Debugging Framework
Consider a text classification task containing | C | categories, where C is a collection of all categories, and V represents the only vocabulary collection in the corpus (vocabulary). Given the training dataset D = {(x_1, y_1), ..., (x_N, y_N)}, where x_i is the i-th document, containing the sequence of L words [xi1, xi2,..., xiL], and y_i is the category label of x_i. Use a deep text classifier M trained on dataset D to classify a new input file x into one of the categories M(x).M can be divided into two parts—the feature extraction part M_f and the classification part M_c:
where f is the eigenvector of x, and W and b are the layer parameters of M_c. The final output is the predicted probability vector p.
To understand how model M works, the authors analyzed the input pattern or feature that activates each feature f_i. Specifically, using LRP, for each f_i of the sample x_j in the training dataset, we compute a correlation vector r_ij to characterize the correlation score (contribution) of each word in x_j for the f_i value. Specifically, for a general neural network
z_ij is the weight between neuron i neuron i and neuron j
combine all neurons to neuron j and add bias term to obtain the vector of all neurons to neuron j in the previous layer zj
through the activation function g to obtain the next layer of neurons xj
Understand the correlation between specific neurons and the classification decision function f(x) R_j ^( l + 1 ), hoping to obtain a decomposition of this correlation based on messages sent to the neurons in the previous layer, denoting these messages as R_ (i ← j).
l + The correlation of a neuron j in layer 1 = l + the sum of the correlations of neurons in layer 1 to all neurons in layer 1. The correlation vector r_ij of
x_j and f_i can be calculated according to the above formula.
After performing the above operations on all d features of the training sample, a word cloud can be generated to help users better understand the model M. This does this: it is not clear what the latent feature means, but it is clear how it maps to each category, i.e. the interpretability of the model.
word cloud : For each feature f_i, create a word cloud(s) to visualize the pattern of highly activated f_i in the input text. This can be achieved by analyzing all r_ijs of x_j in the training data and displaying words or n-grams that obtain high correlation scores in the word cloud. The authors say that different model architectures may have different ways to generate word clouds to effectively reveal the behavior of features.
This article selects CNN as a classifier, and each feature from the training sample has a word cloud containing n-grams, selected by CNN's max-pooling. As shown in Figure 3, corresponding to the feature with a filter size of 2, bi-grams are given (for example, "love love", "love my", "loves his", etc.), and 's font size corresponds to the size of the feature value generated by bi-grams . This is similar to the way in which previous work analyzes CNN features, which is equivalent to using LRP to backpropagate the input eigenvalues and cropping out continuous input words with LRP scores that are not zero, and displaying them in the word cloud.
Figure 3. Word cloud of a feature from CNN (or literally n-gram)
As mentioned earlier, we want to know whether the learned features are valid, whether they are related to the classification task, and whether they get appropriate weights from the next layer, which can be achieved by having humans consider the word cloud of each feature and telling us which category the feature is related to.
However, there are still problems with such direct application in actual scenarios. If a word cloud receives a different answer than its true category (characterized as W), it indicates that there is a problem with the model. For example, suppose that the word cloud in Figure 3 represents the feature f_i in the sentiment analysis task, but the i-th column of W implies that f_i is classified as "negative sentiment class", then we can judge that this model is incorrect. If this word cloud appears in the product classification task, this is also problematic, because the phrases in the word cloud are not distinguished from any product category and cannot give correct classification results. Therefore, the author further corrects this method to provide users with a way to disable the features corresponding to any problematic word cloud.Fix M_c to M’_c:
Q is a mask matrix, ⊙ is the operator of an element multiplication. Initially, all elements in Q are elements that enable all connections between features and outputs. To disable feature f_i, set the i-th column of Q to the zero vector. After the feature is disabled, freeze the parameters of M_f and fine-tune the parameters of M'_c (except the mask matrix Q) in the last step with the original training dataset D.
As an example, the author conducted an experiment in the Yelp database. The Yelp library was used to predict the sentiment (positive or negative) of restaurant reviews, and the authors drew 500 samples as Yelp's training data. Use human responses on MTurk to assign ranks to the characteristics. Since each classifier has 30 original features (d = 30), the authors divide them into three levels (A, B, and C), with 10 features for each level. The features of Ranking A are expected to be most relevant and useful for the prediction task, while the features of Ranking C are least relevant and may undermine the performance of the model.
Figure 4 shows the distribution of the mean feature scores from one of the three CNN samples from the Yelp dataset. Figure 5 shows an example of word cloud for each level. We can clearly see the different qualities of these three characteristics. Some participants responded that the level B feature in Figure 5 is related to the positive category (probably due to the word "delicious"), and so is the weight of this feature in W (positive: negative = 0.137: -0.135). Interestingly, the rank C feature in Figure 5 was negatively scored because some participants thought the word cloud was related to a positive category, but in fact the model used this feature as evidence for a negative category (positive: negative = 0.209: 0.385).
Figure 4. Distribution of mean feature scores of CNN models trained on the Yelp dataset
Figure 5. Example of word cloud for CNN features ranking A, B, C

Interaction through rule-based interpretation
Through rule-based interpretation, it can be regarded as a method based on global interpretation. Features of the global interpretation method include:
can provide an overview diagram;
can avoid expressive bias;
such global simplified overview processing comes at the cost of loyalty.
rule can be learned directly from the data (white box model) or from the alternatives to the model (black box model). The differences between existing rules-based interpretation methods are mainly in the three aspects of "complexity, accuracy, and non-overlapping". In addition, they also have differences in the way they present rules (decision list, decision set). To accurately reflect decision boundaries, rules must cover increasingly narrow data slices/sets, which in turn can negatively affect interpretability. We also chose one from the methods given by tutorial for in-depth interpretation.
Machine Guides, Human Supervises: Interactive Learning with Global Explanations
This article is an article in AAAI 2021, which proposes explanatory guided learning (XGL), a new interactive learning strategy in which machines guide human supervisors to select informative samples for classifiers. This guidance is provided by a global interpretation that summarizes the behavior of the classifier in different areas of the sample space and exposes its flaws. Compared to other interpreted interactive learning strategies initiated by machines and rely on local interpretation, XGL is designed to cope with the situation where the explanation provided by the machine exceeds the classifier quality. Furthermore, XGL utilizes global interpretation to open the black box of human-initiated interactions, allowing supervisors to select information samples that challenge the learned model. The biggest advantage of XGL is that the rule can be simple and is used to guide human feedback [4].
Let H characterize a type of black box classifier h, that is, a neural network or kernel machine.Our goal is to learn a classifier h from the data. Initially we may only get a small training set S_0, and then we can get more samples through the supervisor. For ease of understanding and control, the machine is also required to interpret its beliefs in a way that an expert supervisor can understand, which will help identify errors in the predictor logic. Explanatory active learning (XAL) is this type of representative method. In XAL, the machine selects queries x from an unlabeled sample pool and asks the supervisor to tag it. In addition, XAL also gives the prediction results for queries and a local explanation of the prediction results. These explanations reveal the reasons for generating these predictions, such as feature correlations, and build a narrative with the predictions. Additionally, the supervisor can control the predictor by providing feedback on the explanation, for example, which features the predictor relies on incorrectly.
However, since local interpretation focuses on queries, the "narrative" output of XAL ignores unknowns (unknown unknowns, UU), the machine performs poorly in this case by definition. UUs can induce machines to over-sell their performance to users, especially when they are associated with high costs. This leads to narrative bias (NB). Intuitively speaking, NB measures the performance and real risk that queries x1,...,X_T conveys to users R_T. The performance felt by the user is a function of the loss exposed by XAL's narrative over time. Figure 6 (left) shows this problem specifically, and the synthetic data is designed to induce unknown UUs. Group the red samples into clusters with even spacing, while the blue samples are evenly distributed elsewhere. The selected queries after 140 uncertainty sampling iterations of the active RBF SVM are circled in yellow, with the background being the decision-making surface. Queries are clearly concentrated around known red clusters, where classifiers perform better in prediction and interpretation (such as feature correlation or gradient information). queries completely ignores the poor performance of the model on unknown red clusters and is therefore also ignored by XAL's output narrative.
AL (active learning) In the case of unknown unknowns (UU), that is, the area where the classifier will make high confidence errors, and its working effect is very poor. This is common in the case of category offsets and concept drifts and is particularly challenging when associated with high error marking costs. Figure 6 (left) illustrates this problem, and the synthetic data is designed to induce unknown UU. Group the red samples into clusters with even spacing, while the blue samples are evenly distributed elsewhere. The query selected by the active RBF SVM after 140 uncertainty sampling iterations is circled in yellow, with the background being the decision surface. Queries are clearly concentrated around known red clusters, in which classifiers have performed well in both prediction and interpretation (such as feature correlation or gradient information). The poor performance of the model on unknown red clusters is completely ignored by the query and therefore by the expression output of XAL.
Figure 6. Left picture: AL based on uncertainty query points around known red clusters (circled in yellow) and ignore unknown clusters; Middle picture: XGL found most red clusters; Right picture: HINTER Rule samples extracted from the hepatitis dataset (categories are living and dead): Doctors can understand and (verify) such rules
This article proposes the use of human-initiated interactive learning as a way to solve narrative bias (NB). The starting point of this approach is that if the supervisor can see and understand the decision-making aspect of h, she can identify known and unknown errors—and thus determine whether the predictor is behaving inappropriately—and wisely choose examples that can correct these errors. Of course, h is very complex in practical application scenarios, so the problem discussed in this article is ideal, and the real challenge is how to make it feasible.
This paper proposes a method to use human-initiated interactive learning as a response to expression bias, namely XGL. If a motivated, knowledgeable supervisor can see and understand the decision-making aspect of h, she can identify known and unknown errors, thereby determining whether the predictor has had wrong behavior and intelligently selecting samples to correct these errors. Of course, since the decision-making scope of h can be very complex, this strategy is purely a measure under an ideal state. Therefore, the key challenge in applying this strategy is how to use it.
The author proposes to solve this problem by utilizing global interpretation in a compact and interpretable way. Global interpretation is an interpretable alternative to h g, usually a shallow decision tree or a rule set. These models can be broken down into simple atom elements, such as short decision paths or simple rules, which can be independently described and visualized and associated with individual samples. Figure 6 (right) shows a sample module. Usually g is obtained by model distillation, i.e., h is projected onto G by using a global interpreter π:
where P is the ground-truth distribution, M is the loss function, Ω is used to measure the complexity of the explanation, and λ0 controls the trade-off between h and simplicity. Expected values are usually replaced by an empirical Monte Carlo estimate that uses a new i.i.d. sample from P or any available label-free sample. The pseudo-code of
XGL is shown in Algorithm 1. In each iteration, a classifier h is fitted on the current training set S and generalized using the global interpretation g=π(h). Then, submit g to the supervisor. Each rule is translated into a visual artifact or a literal description and presented with the samples it covers. Mark the samples according to the rules. The supervisor is then asked to provide one or more samples with errors that are wrongly explained, and these samples are added to the training set S. Repeat until h is good enough or query budget runs out.
In practice, supervisor can find errors by:
scan the sample, each sample is displayed with predictions and rules, and points out one or more errors;
searches for the wrong rules and then provides counterexamples for it.
The first strategy mimics guided learning (GL): in GL, given a text description about certain target concepts and a list of samples obtained through search engines, the user must identify the sample of the concept in the list. The difference is that in XGL, samples are presented with corresponding predictions and explanations, which makes it possible for the user to identify actual errors and understand the model. From this perspective, XGL is to GL, just like XAL to AL: a way to make interaction-free things opaque. Samples can be grouped by rules to facilitate scanning of them. Given that GL has been successfully deployed in industrial applications, the authors believe that XGL can do it too. The second strategy is to target experts who are capable of identifying bad rules and identifying or synthesizing counterexamples. Since rules are often much smaller than samples (in this experiment, usually 5-30 rules vs hundreds or thousands of samples), this strategy may be more efficient. The interpretability of the rules can be promoted by appropriately normalizing them.
XGL is designed to resist narrative bias (NB) while enabling expert supervisors to identify errors. The author emphasizes that simply combining global interpretation with machine-started interactive learning does not achieve the same effect, because the choice of queries will still be affected by UU. Another benefit of XGL is that it natively supports selecting batches of instances in each iteration, thus reducing queries costs. In this article, the authors limit the discussion and experiments to the case of one example-per-query to simplify comparisons with competitors.
transfers the responsibility for selecting examples to humans. There are also risks in supervisors.A global explanation may be a too rough summary, or may be misunderstood by supervisor. This problem also affects AL and XAL. So, the authors say that XGL should be applied to environments where these problems are unlikely to occur or their impact can be negligible. The main disadvantage of
XGL is undoubtedly the cognitive and computational cost of global interpretation. The calculation cost can be reduced by updating g incrementally as h is updated. Cognitive costs can be improved in the following ways: the global interpretation can be limited to those areas of the instance space; the resolution of the global interpretation can be adjusted as needed, for example, a rough rule g can be provided to the supervisor first, and then allowed to perfect g, and "magnify" those areas that look suspicious or subspace . No matter what, global explanation must be more demanding than local explanation or non-explanation. Like other interactive protocols, XGL involves a human-in-the-loop step where the supervisor must be involved and devote time and attention to it. The author's argument is that this additional effort is justified in applications where overestimating the wrong model is expensive.
The author named the implementation method of the rule-based XGL in this article HINTER (Human-INiTiated Explanatory LeaRning) and compared the standard binary classifiers (SVM and gradient lifting tree) on several UCI datasets with several human- and machine-initiated alternatives. The authors conducted experiments in the synthetic dataset shown in Figure 6 and several classified datasets from the UCI repository, and the experimental results are shown in Figure 7. In most data sets, HINTER's prediction results are the same or better than those of competitors. The performance differences are quite obvious on particularly difficult synthetic data, with XGL nearly 20% higher than its competitors’ F1 score. The author analyzes that this is also due to UU. AL and random sampling is simply a slight query for samples from the red category, which is why they show slow progress in Figure 7 (left), while GL oversamples a few categories. XGL performs similar or better than all competitors in all original datasets and all "+uu" variants. The worst performer was the german dataset, and XGL performed poorly on F1 regardless of the underlying classifier chosen, but still performed best in expression bias (NB). In summary, the results show that in the presence of UU, XGL tends to learn better classifiers, and if UU is not a big problem, XGL performs reasonably.
Figure 7: On three representative datasets, when the number of queries increases, all competitors’ F1 scores (top) and expression bias (bottom, lower the better): Synthesis task (left), banknote (middle) and german (right)
4 Interaction by conceptual interpretation
This section focuses on the interaction methods based on conceptual interpretation, including concept-based models (CBMs) and neural symbolic models (Neuro-symbolic models). These models focus on the advantages of higher semantic levels of model interpretation. The reasons for the above-mentioned local or rule-based methods are difficult to access internal and conceptual levels of the model, especially for the black box model. The conceptual explanation method attempts to analyze the working mechanism of the AI model from the perspective of conceptual and semantics. Literature [5] is an article mentioned in tutorial, and we interpret it.
Interactive Disentanglement: Learning Concepts by Interacting with their Prototype Representations
This article was recently published in CVPR 2022. The main purpose of the research is to learn visual concepts in the potential space of the prototype discrete through weak supervision and human-computer interaction. This paper proposes interactive Concept Swapping Networks (iCSNs), a new framework for learning concept-based representations through weakly supervised and implicit prototype representations [5]. This semantically-based, discrete latent space is conducive to human understanding and human-computer interaction.
Due to the complexity of conceptual learning, and inspired by the results of research on conceptual prototypes in the fields of psychology and cognitive science, the authors examine the advantages of prototype representations for neural concept learners in learning human comprehensible and revisable conceptual representations. To this end, this paper proposes iCSN, which learns implicitly combines semantic concepts with prototype representations through a weakly supervised approach. This combination is achieved through the exchange of discrete distance estimation and the exchange of shared conceptual representations between paired data samples. iCSN allows querying and revising the concepts it learns (see Figure 8) and integrating knowledge about unseen concepts (see Figure 9).
Figure 8. A trained model (left) asks the human user (right) if the concept it extracts from the data matches the user's knowledge. Subsequently, the model can accept modification opinions from users
Figure 9. Human-computer interaction for learning new concepts. The user querys an object and boots the machine when necessary. The prototype suggests
iCSN. See Figure 10.
Figure 10. Interactive concept exchange network. iCSN is based on a deterministic autoencoder structure, providing an initial enangled latent encoding (1) read-out encoder; (2) extract relevant information from the latent space and compare its extracted concept code with a set of prototype slots; (3) pass a weighted, softmax-based dot product; (4) generate a discrete code indicating the most similar prototype slot for each concept code. iCSNs are trained through simple reconstruction losses, weak supervision and interaction methods of matching pairing; (5) potential conceptual representations of shared concepts, forcing semantic information to be combined with specific prototype representations
prototype-based conceptual architecture . Given the input x_i, for simplicity, the sample index i will be deleted from the symbol in the following statement and x will be used to represent the entire image. In the framework of this article, x can also be a potential representation of image subregions. This sub-region can be extracted implicitly or explicitly from the image by preprocessing steps, such as generating a scene model by segmentation algorithm or synthesis. Furthermore, assume that each x contains several properties, such as color, shape, and size. The implementation of these properties is called basic concepts, such as "blue" or "triangle". Call "color" a category concept, or is often referred to as a superordinate concept in the fields of cognitive and psychological science. Therefore, each image x has the ground-truth basic concept c and J represents the total number of superior concepts. The author made the necessary assumption that each superior concept x can only contain one basic concept implementation. For simplicity, it is further assumed that each upper concept contains the same number of basic concepts, K, which may differ in practice.
Assuming the encoder-decoder structure, defining an input encoder h(·), which receives the image x and encodes it as a potential characterization h(x)=z. Instead of reconstructing directly from z like many autoencoder-based methods, iCSN first applies several read-out encoder Mj(·) to the potential characterization z, thereby generating Mj(z)=φ_j. The encoding φ_j is called concept encoding. The goal of each read-out encoder is to extract relevant information corresponding to the superior concept (e.g., color) from the entangled latent space z. We will discuss how to force concept-specific information to be extracted below. A core component of iCSN is a set of codebooks, each containing multiple prototype slots. Define this set as Θ:=[p_1,…,p_j]. Where p_j represents a code book, each code book contains an ordered set of trainable, randomly initialized prototype slots.
To assign each concept encoding φ_j to a prototype slot of p_j, the similarity fraction S_dot(·,·) is defined as the softmax on the dot product of its two inputs.In this way, the similarity between the concept code φ_j and the specific prototype slot (p_j)^k is obtained: the similarity vector s_j obtained by
contains the similarity score of each prototype slot of category j, and the corresponding concept code is φ_j. In order to further discretize and bind concepts to each prototype slot, a second function Sτ(·) is introduced, and the weighted softmax function is applied to the similarity score:
In this experiment, τ is gradually reduced to gradually strengthen the binding of information. In the extreme case of τ, ∏j is similar to a one-heat vector (in the case of j1, a multi-label one-heat vector), to characterize which prototype slot in the jth category concept code φ_j is most similar to. Finally, the weighted similarity scores for each category are concatenated into a vector to receive the final prototype distance code y and passed to the decoder g(·) to reconstruct the image.
Concept exchange and weak supervision . Before training, i.e. after initialization, no semantic knowledge has been bound to the prototype slot. However, semantic knowledge found in converging iCSN is learned indirectly through weakly supervised training processes and simple interaction techniques. This paper adopts a matching pairing method, which is a practical weak supervision training process to overcome the problem of unsupervised unwrapped entanglement. In this approach, a pair of images (x,x’) are observed that share values of a known subset of potential variable factors in the data, such as color, while the total number of shared factors can vary between 1 and J− 1. In this way, the model can use paired additional information to constrain and guide the learning of its potential representation. Previous working recovery methods for weakly supervised training (particularly VAEs) were primarily to apply the product or average of the encoder distributions of x and x’ at the shared factor ID, while iCSN used a simple exchange technique between pairwise representations. Specifically, when v is the shared factor ID between image pairs (x,x’), the corresponding similarity score (∏_v, ∏’_v) is exchanged between the final corresponding prototype code:
This exchange process has intuitive semantics, which forces iCSN to extract information from the first image and use it to characterize the attribute of the category v of the second image .
Training target . iCSN is ultimately trained by single pixel reconstruction loss for each pair of images in batches of size N:
This loss term is in contrast to several previous work on prototype learning, which strengthens semantic binding through additional consistency losses. iCSN then reduces the need to introduce additional hyperparameters and perform more complex optimization processes for multiple targets by implicitly including semantic bindings into the network architecture.
interacts with iCSNs . The goal of iCSNs, especially compared to VAEs, is not necessarily to learn the generative latent variable model of the underlying data distribution, but also to learn prototype conceptual representations that humans can understand and interact with. Therefore, the autoencoder structure is a means to achieve the goal, not a necessary condition. However, instead of discarding the decoder after convergence, iCSN can present the input sample to the nearest prototype reconstruction for each concept. Thus, by querying these prototype reconstructions at test time, human users can confirm whether the concept of prediction makes sense and may detect unwanted model behavior. By defining the threshold for reconstruction errors at the time of testing, the iCSN can give a heuristic indication of its deterministic concept in identifying new samples.
Due to the existence of discrete and semantically restricted latent code y, human users can interact with iCSNs by treating y as a multi-tag one-hot encoding. For example, the logical statement ∀img. ⇒ ¬hasconcept(img, p 11 ) or ∀img. isin(img, imgset) ⇒ hasconcept(img, p 12 ), and the user can formulate logical constraints, read "The concept represented by prototype p 11 has never been detected" "For each image in this set of images, you should detect the concept represented by prototype p 12." Users can interactively manage a set of incorrectly performed images.
Finally, the modular features of iCSNs also have interactive online learning capabilities. For example, when a model provides a sample of data containing a new concept, or when a factor present in the data is initially considered unimportant but is considered important in the initial learning phase, in both cases, the method of interaction depends on the hierarchy of the concept to be learned, i.e., whether it is a basic concept or an upper concept. Assuming that the human user is satisfied with the concept before iCSN and that J (the total number of prototype slots per codebook) is set to overestimate, the user can simply give feedback through an unused prototype slot in the relevant category to represent a new fundamental concept. If you need to learn a new upper concept, it can be achieved by adding an additional read-out encoder during the initial training phase. Compared to other read-out encoders, this encoder does not map to the space of the prototype slot. Ultimately, the initial latent space z of the iCSN can be trained to characterize the complete data distribution. To include concepts that were originally considered unrelated, J can be extended only, which means adding a new read-out encoder m_J+1(z)=φ_J+1 and codebook P_J+1 to iCSN. m_J+1 then learns to bind the new basic concept from the "new" upper concept to p_J+1, which only requires new pairs of data to illustrate previously unimportant concepts.
This article proposes a new benchmark dataset: Elementary Concept Reasoning (ECR), as shown in Figure 11. ECR consists of an RGB image (64×64×3) of a two-dimensional geometric object on a constant color background. The shape (circle, triangle, square, and pentagon), size (size) and color (red, green, blue, yellow) of the object can be different. Add even jitter to each color, resulting in different shades. Each image contains an object fixed at the center of the image. The images are paired so that at least 1 object in a single image can share at least 1 and up to 1 shared attribute. The ECR consists of 5000 image pairs and 2000 training sets for validation.
Figure 11. Basic conceptual reasoning dataset sample. Each sample image (left) depicts a centered two-dimensional object with three different properties: color, shape, and size. The images were paired so that these objects share one and two concepts (right)
In this experiment, the authors compared iCSN with several baseline methods, including unsupervised training β-VAE and Ada-VAE with arithmetic mean using encoder distribution. To make a fair comparison with iCSN and Ada-VAE trained through shared matching pairing, Ada-VAE was initially introduced as a weaker form of supervision, and the authors also trained Ada-VAE with known shared factor IDs. This baseline is essentially similar to β-VAE, averaging the encoder distribution of the images at a known sharing factor ID. This method is expressed as VAE in the experimental results of this paper. Finally, the authors compare iCSN with a discrete VAE method that uses a Classified Distribution (Cat-VAE) through the Gumbel-softmax technique. Cat-VAE is trained in the same way as VAE, that is, through share pairing and averaged encoder distribution.
The authors studied the potential encoding of each model through linear probing. The results in Table 6 (Part 1) record the average accuracy and standard deviation of the validation set performed by different models during the five random initializations. We observe that the latent encoding of CSN has nearly perfect prediction performance and exceeds all variational methods. Importantly, CSNs perform even better than the VAE method (VAE and Cat VAE), which is trained under the same type of weak supervision as CSNs. The mean performance of β-VAE is worse than that of the weakly supervised model. However, Ada-VAE performed worse than β-VAE. Furthermore, discrete latent characterization of Cat VAE also performed worse than CSN. The operation of Cat VAE indicates a large deviation in performance, and it also indicates that multiple Cat VAE operations converge to a suboptimal state. In summary, although the ECR dataset contains only changes in a single 2D geometric object, the baseline model is not as good as CSN, even if trained with the same amount of information.
Table 6. Linear detection is performed through decision tree (DT) and logistic regression (LR). (Part 1) Probe the iCSN model and the latent codes of various baselines. (Part 2) Ablation studies were conducted by probing the latent codes of Cat-VAE and ablation of encoder distribution exchange and iCSN concept encoding averaging. All classification accuracy is an advantage of the semantic constraint discrete latent space of
iCSN calculated on the test set, which is that human users can directly identify suboptimal conceptual representations, see Figure 8 shown above. After identifying the correct or wrongly learned concept, users can apply simple logical feedback rules on this discrete concept space. Specifically, after training through weak supervision, it is recommended that machine and human users discuss the learned concepts and determine whether these concepts are consistent with the user's knowledge or whether modifications are required. For example, iCSN can learn to represent colors on several prototype slots, or to represent two shapes through one slot, indicating that it mistakenly believes that these shapes belong to the same concept. iCSN can then communicate the concepts it has learned in two ways. First, it can group new images that share a concept based on the inferred discrete prototype distance code and ask whether the grouped images do indeed share a common basic concept, as shown in Figure 8. Second, with the decoder, it can present a prototype reconstruction of each learning concept, for example, presenting an object with a blue prototype shadow, see Figure 9 shown above. After identifying potential suboptimal conceptual representations, human users can interact over the discrete latent space of iCSNs through logical rules and further improve the representation.
For all previous vanilla CSN configurations, the concept encoding y for one of the 32 possible concept combinations was manually checked, and the prototype slots (main slots) that were “activated” in most examples of each individual concept were identified, and those prototype slots that were never activated or rarely activated in the subset of examples (super slots) were identified based on the concept. Next, apply the L2 loss on y, using the original reconstruction loss and this additional L2 loss, fine-tune the previous run results on the original training set. The semantics of this feedback is that concepts should be represented only by the main prototype slots. Furthermore, in both runs, an observed suboptimal solution is modified, i.e., the pentagon and circle are bound to the same prototype slot. Therefore, feedback is provided on all pentagonal samples of the training set to bind to another empty prototype slot and again optimized by additional L2 loss constraints.
5 Summary
In this article, we discussed the interpretable AI problem from the perspective of the research and development of interpretability tools. The basis for the discussion is the latest research results mentioned in AAAI-2022 tutorial. At present, there are still relatively few applications of interpretable AI at home and abroad, mainly focusing on several super-large companies, and the academic community is far less concerned about this issue than other AI fields. However, as the importance of the digital economy becomes increasingly important, the compliance of platform enterprises has become the key to the next step in AI algorithm/model application. In addition, from the regulatory perspective, promoting the development of explainable AI is also an important tool to effectively regulate the digital economy. Finally, it can be explained that AI is also the centering stone for our majority of users to apply AI models. As more and more large domestic enterprises pay more attention to interpretable AI, it is believed that interpretable AI will be used in a large number of application scenarios soon, and the research and development of interpretability tools will also attract more attention from researchers.
Reference cited in this article
[1] Bach S, Binder A, Montavon G, et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation, PLOS ONE, 2015, 10
[2] Tutorial on Explanations in Interactive Machine Learning, AAAI 2022, https://sites.google.com/view/aaai22-ximl-tutorial
[3] Lertvittayakumjorn et al., 2020, FIND: Human-in-the-Loop Debugging Deep Text Classifiers, EMNLP 2020
[4] Teodora Popordanoska, Mohit Kumar, Stefano Teso, Human-in-the-Loop Debugging Deep Text Classifiers, AAAI 2021
[5] Stammer, W. , et al. "Interactive Disentanglement: Learning Concepts by Interacting with their Prototype Representations." CVPR 2022
Zhu Jiying, Ph.D. in engineering, graduated from Beijing Jiaotong University. She has served as an assistant researcher and research assistant at the Chinese University of Hong Kong and the Hong Kong University of Science and Technology respectively. She is currently engaged in the research on new information technology in the field of e-government. The main research directions are pattern recognition, computer vision, and they are interested in scientific research. They hope to maintain learning and continue to improve .