Current research on adversarial examples generally believes that simple methods based on single-image iterative optimization are difficult to achieve the transferability of targeted adversarial attacks, so they have to resort to complex methods that require a large amount of addi

Category：hotcomm

2025-10-22

Current research on adversarial examples generally believes that simple methods based on single-image iterative optimization are difficult to achieve the transferability of targeted adversarial attacks, so they have to resort to complex methods that require a large amount of additional data to train multiple generative models. However, our work published in NeurIPS 2021 found that this is not the case: when a simple method uses a very simple objective function, Logit Loss, to replace the currently commonly used cross entropy loss function (Cross Entropy Loss) on the premise of having enough iterations to ensure convergence, it is enough to make its performance overwhelm the current strongest complex methods. In addition, we have also reflected on and improved the current common scenarios for evaluating transferability. Specifically, we found that the current assessment scenario settings are too simple and unrealistic, resulting in many assessment results being misleading. Therefore, we propose three more challenging and realistic scenarios to provide references for future related research.

In this issue of AI TIME PhD live broadcast room, we invited Zhao Zhengyu, a postdoctoral researcher at the Helmholtz Center for Information Security (CISPA) in Leeds, to share his report "Reflections on Targeted Combat Image Migration".

Zhao Zhengyu: Postdoctoral researcher at Helmholtz Center for Information Security (CISPA), Germany. PhD degree from Radboud University in the Netherlands. The research direction involves the security and privacy issues of computer vision , mainly including adversarial examples (Adversarial Examples), training sample poisoning (Data Poisoning), and training sample membership inference (Membership Inference).

Background of Computer Vision

Computer vision is to imitate human vision by training computers to realize the perception of the world.

Computer vision has been widely used in our daily lives: such as autonomous driving, medical imaging, and face recognition.

In computer vision, a typical image recognition task mainly consists of the following three steps:

First, calculate The computer camera captures the real scene and stores it in the form of an RGB three-dimensional image matrix;

Then, we will use a large number of images with correct labels to train the computer to learn how to recognize;

Finally, the trained computer can be used to recognize new images.

As we all know, with the development of deep learning technology, the recognition ability of computer vision in specific tasks can surpass humans around 2015.

However, when faced with images taken in some unconventional scenes, the computer's recognition ability will drop sharply. As shown in the figure, although the human eye can identify the yellow bird regardless of various noises, it is difficult for computer vision to do so.

In order to study the reasons for this phenomenon and further understand the shortcomings of computer vision, researchers began to study adversarial images. So let's take a look at what adversarial images are.

Running Time of Frank-Wolfe

An adversarial image is an artificial image produced by tampering with a normal image such that the computer vision model is no longer able to correctly identify the true content of the image.

As shown in the figure above, a conventional model training process inputs a normal cat image x0 and optimizes the model parameters θ to minimize the output of the loss function J(x,y0=cat), so that the model can learn how to correctly identify x0. This training process can be expressed in the following form:

The process of generating adversarial images can be regarded as a mirror operation of the above training process.In other words, in order to prevent the trained model from correctly identifying the cat in the input image x0, we will optimize the input image x0 to maximize the output of the loss function J(x,y0=cat). This process of generating adversarial images can be expressed in the following form:

Since it only prevents the model from outputting the correct recognition result y0=cat, we call this adversarial process untargeted adversarial. Similarly, we can also use the following method to make the model output a specific error recognition result, such as yt=dog. We call it targeted confrontation.

In addition, generating adversarial images also needs to meet a basic condition, that is, there are no obvious traces of tampering. This condition is generally achieved by limiting the Lp distance between the adversarial image and the original image as follows:

In summary, the process of generating a targeted adversarial image can be expressed as the following optimization problem:

h Current research on tml2 most often uses an iteration-based gradient descent method as shown below to achieve this optimization:

Data Structure

Obviously, during the above optimization process, we have a strong assumption: we can obtain the gradient of the model. This situation is called white box confrontation.

In a real (black box) confrontation scenario, it is difficult for us to know the technical details of the model, let alone obtain its specific gradient. Therefore, in real confrontation scenarios, we need to optimize the adversarial image on the local white box model, and at the same time enable it to deceive the unknown black box model. We call this ability of adversarial images its Transferability.

Currently there are many iterative methods based on the above I-FGSM to improve the resistance to image migration. They can roughly be divided into the following two categories.

The first category is from the perspective of optimizing the gradient. For example, the accumulation of gradients generated by two adjacent iterations can make the gradient direction more stable and less likely to fall into the local optimum:

The second category is from the perspective of data augmentation. For example, in each iteration, images that have undergone different transformations are used as input. This can also make the gradient more general, that is, the generated adversarial images are more transferable:

In addition to the above-mentioned iterative methods based on I-FGSM, some researchers have recently proposed more complex methods based on generative models.

As shown in the figure below, if we need to generate a targeted adversarial image with the target category yt, we need to train a generative model so that the image distribution of the natural training image corresponding to the yt category and its generated adversarial image is as close as possible.

Finally, input any normal (test) image into this trained generation model, and you can get its corresponding adversarial image with the target category yt.

Compared with the iterative method, the generative model method inevitably consumes more data and computing resources

From a data perspective, iterative The method only requires a single input image in the test phase, while the generative model method requires a large amount of data for additional training;

From a model perspective, when generating adversarial images corresponding to n different target categories yt, the iterative method only needs to use the same white-box model, while the generative model method requires training n different generative models for specific yt.

Researchers naturally found that, under the premise of consuming a large amount of additional data and computing resources, the transfer effect achieved by the generative model method is much better than that of the simple iterative method.

Our New Insights into Targeted Transferability

We review current research on targeted adversarial images and find that with only minor changes to traditional iterative methods, transfer performance that even exceeds that of generative model methods can be achieved.

As shown in the table below, this mobility gap is even more prominent in the more challenging small image distance scenario.

Specifically, we found that unlike targetless scenarios, generating transferable targeted adversarial images requires hundreds of iterations to complete the convergence of the optimization algorithm.

As shown in the figure below, the non-target scene represented by the red line has converged to nearly 100% migration before 20 times, but the optimal effect has not been achieved for the target scene. However, the targeted scenarios in existing research also stop at less than 20 iterations, so naturally good migration effects cannot be obtained. By increasing the number of iterations, we found that the transferability of targeted scenarios was also significantly improved.

Although increasing the number of iterations can improve migration to a certain extent, we found that the currently commonly used cross entropy loss function (Cross Entropy Loss) is not suitable for our large number of iteration scenarios due to the defects of gradient descent.

is shown in the following formula. As the number of iterations increases, the probability pt corresponding to the target category yt will gradually tend to 1, while the gradient corresponding to the loss function continues to decrease until it finally tends to 0.

This phenomenon can also be understood more intuitively through the following figure.

It is precisely because of the existence of this defect that the optimization of migration using the cross entropy loss function (Cross Entropy Loss) will quickly stall even with a large number of iterations. Therefore, we propose a simple and effective Logit loss function to avoid the above phenomenon.

As shown in the following formula, the gradient of the logit loss function is always fixed at 1, so that optimization stagnation does not occur when a large number of iterations are applied.

Judging from the migration results shown in the figure below, although the logit loss function is not different from the cross-entropy loss function when the number of iterations is small, its advantages gradually become apparent as the number of iterations increases.

So far, we have proposed to greatly improve the portability of traditional iterative methods by increasing the number of iterations and applying the Logit loss function. Next, we also reflected on and improved the evaluation scenarios commonly used in current research.

Our reflection on the evaluation scenario mainly starts from the following two dimensions:

Diversity of models

We found that the current migration evaluation scenario is too simple, because the white box and black box models involved are very similar in structure.

is shown in the table below. In this simple scenario, different methods can achieve a high transferability of about 90% if the number of iterations is sufficient. Such high saturation performance makes it difficult to reflect the advantages and disadvantages of different methods well.

So we propose more challenging migration scenarios involving more diverse model structures.

As shown in the figure below, in this new and more challenging scenario, we found that the newly proposed Logit loss function achieved better results than the other two existing methods.

We further tested a more stringent real-life migration scenario: we directly uploaded the adversarial image optimized from the white box model to the Google Cloud Vision API to test its adversarial effect, and also concluded that the Logit loss function performed the best.

As can be seen from the example of the screenshot below, although the adversarial image only looks like some irregular noise compared to the original image, it is enough to trick the black-box Google Cloud Vision API into our preset target class (yt=boat).

Most of the evaluation scenarios in current research only test the case of randomly selecting the adversarial target category yt, and we believe that for the same image, the transferability corresponding to different yt will also be different.

In particular, when we artificially change from the 2nd category to the 1000th category, migration will become more difficult to achieve. The results shown in the table below verify our idea.

Based on the above conclusion, we can make the evaluation more challenging by setting yt to the bottom category of the sequence (such as 1000th), and it is no longer limited to the simple evaluation scenario of randomly selecting yt.

At the same time, we can also find that even in the most difficult case, our Logit loss function still performs best.

Summary

. We found that by making minor changes (including increasing the number of iterations and using the logit loss function to replace the traditional cross-entropy loss function), the existing simple iterative method can actually achieve transfer effects comparable to the complex generative model method.

. We propose new scenarios that are more challenging and realistic for migration assessment. This new scenario mainly takes into account the diversity of models and the diversity of adversarial target categories.

Outlook

. As shown in the figure below, we found that when designing certain model structures (such as Inception), the transfer effect is particularly poor. In the future, we need more exploration to understand and solve this problem.

. Although the generative model method requires a large amount of data and computing resources to train the generative model, it can generate adversarial images with only one forward operation in the test phase.

On the contrary, although the iterative method design is more lightweight, it inevitably requires a lot of iterative optimization during the testing phase.

Therefore, in the future we can consider how to combine the advantages of the two methods to achieve a method that is both fast and relatively saves data and computing resources.

Reminder

Paper link:

https://arxiv.org/abs/2012.11207

Paper title:

On Success and Simplicity: A Second Look at Transferable Targeted Attacks

hotcomm Latest News

Site article recommendation