Heart of the Machine Report
double-blind review, Turing Award winner's paper will also be rejected.
Last week, NeurIPS 2021, the global artificial intelligence summit, released the results of this year's paper acceptance. As the saying goes, some people are happy and some are worried, but some people are in a state other than "happy" and "sorrow"-"proud" after being rejected.
This researcher with a unique mood is Yann LeCun, the famous Facebook chief AI scientist and the winner of the 2018 Turing Award.
The title of the rejected paper is "VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning ". LeCun said that in this paper, they proposed an extremely simple and efficient method for self-supervised training of joint-embedding architecture.
VICReg paper link: https://arxiv.org/pdf/2105.04906.pdf
The reason for rejection given by the chairman of the field is: it is similar to the "Barlow Twins" paper published by LeCun et al. on ICML 2021. Compared to this, the improvement proposed in this "VICReg" is not big enough.
Barlow Twins paper link: https://arxiv.org/pdf/2103.03230.pdf
But LeCun doesn’t seem to think so.He said that VICReg introduced variance regularization, which makes it suitable for a wider range of architectures.
Therefore, in LeCun's view, their paper is sufficiently innovative and it is not a shame to be rejected. "Some of the most influential papers have been rejected many times, such as David Lowe's famous SIFT," LeCun wrote on twitter.
In response to encouragement such as "don't give up", LeCun responded: "My entire career is based on "don't give up", and it will not change now." For LeCun's continuous "AI Winter" "For those who have experienced it, individual research is not certain, it is nothing.
However, for ordinary researchers, the rejection of LeCun's paper shows the transparent side of the review mechanism of the top meeting: it seems that double-blind review is still fair.
Moreover, the results of the review did not seem to be influenced by publicity on the Internet: the paper appeared on arXiv in May this year, and LeCun published Twitter for publicity. In LeCun's view, this is a normal academic information exchange, which is conducive to technological progress. But one thing that cannot be ignored is that in various channels of "communication", researchers of different status occupies very different resources, which inevitably causes some unfairness, which will allow researchers who are very academically appealing to gain from it. beneficial.
However, as to whether the paper "VICReg" should be accepted, we still have to look at the specific content of the paper.
What kind of method is "VICReg"
Self-supervised representation learning has made significant progress in the past few years, almost reaching the performance of supervised learning methods on many downstream tasks. Although it is possible to explicitly prevent a crash (collapse), many methods have the problem of high cost, requiring a large amount of memory and a large batch size.
There are still some methods that are effective,But it relies on architectural skills that are difficult to explain. Some studies have provided theoretical analysis on how to avoid collapse through asymmetric methods, but they are far from complete, and these methods may not be suitable for other self-supervised learning scenarios. Finally, the method of redundancy reduction avoids collapse by decorrelating the dimensions of the representation, so that the representation can provide maximum information about its corresponding input. These methods have good performance, can learn meaningful representations, and preserve the variance of the representations while decorrelating, but all of them use a unique objective function. VICReg's research proposes to decompose the target into three independent objective functions, and each objective function has a clear explanation.
In this paper, the researchers proposed a new self-supervised algorithm-VICReg (Variance-Invariance-Covariance Regularization, variance-invariance-covariance regularization) for learning based on three simple principles ( Variance, invariance and covariance) image representation, these principle has clear goals and explanations.
variance principle Independently constrain the embedded variance in each dimension, which is a simple and effective way to prevent crashes. More precisely, the researcher uses hinge loss to constrain the standard deviation calculated along the embedded batch dimension to achieve a fixed goal. Unlike the contrast method, there is no need for a negative pair, the embeddings are implicitly encouraged to be different from each other, and there is no direct comparison between them.
invariance principle Use the standard mean square Euclidean distance to learn the invariance of multiple views of an image.
Finally, the covariance principle draws on the covariance criterion of Barlow Twins. The latter will decorrelate the different dimensions of the learning representation, and the goal is to spread information between dimensions.Avoid dimensional collapse. The criterion mainly penalizes the non-diagonal coefficients of the embedded covariance matrix.
In SimCLR, negative pairs are given in batches, which means that the method is heavily dependent on batch size. VICReg has no such dependency. Similar to Barlow Twin, VICReg does not require siamese weight differences. In addition, the VICReg architecture is symmetrical and does not require the stop-gradient operation of SimSiam, the momentum encoder of BYOL, and the predictor used by both. Unlike any previous self-supervised methods used for characterization learning, the loss function of VICReg does not require any form of normalization of the embedding, which makes the method relatively simple.
experimental results
In many downstream tasks, researchers evaluate the characteristics learned by the VICReg method to test its effectiveness. These tasks include: ImageNet linear and semi-supervised evaluation and other classification, detection, and instance segmentation tasks. They further show that adding the variance regularization proposed in the article to more complex architectures and other self-supervised representation learning methods can better improve the training stability and performance of downstream tasks. It can be said that VICReg is a simple and effective interpretable method to prevent collapse in self-supervised joint embedding learning.
Figure 1: Evaluation results on ImageNet.
.