In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points.

2024/05/0319:11:33 science 1460

Heart of the Machine released

Author: Chen Hansheng (graduate student at Tongji University, research intern at Alibaba DAMO Academy)

distance CVPR 2022 Not long after the major awards were announced, Chen Hansheng from graduate student at Tongji University, research intern at Alibaba DAMO Academy Read for us about the Best Student Paper Award.

This article explains our work "EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation" which won the CVPR 2022 Best Student Paper Award. The problem studied in this paper is to estimate the pose of an object in 3D space based on a single image.

Among the existing methods, the pose estimation method based on PnP geometric optimization often extracts 2D-3D related points through the deep network. However, because the optimal solution of the pose is not differentiable during back propagation, it is difficult to achieve the pose estimation based on the pose error. As a loss to perform stable end-to-end training of the network, the 2D-3D correlation points rely on the supervision of other agent losses, which is not an optimal training goal for pose estimation. In order to solve this problem, we proposed the EPro-PnP module based on theory, which outputs the probability density distribution of the pose instead of a single optimal solution of the pose, thereby replacing the non-differentiable optimal pose with a differentiable one. Probability density achieves stable end-to-end training. EPro-PnP is highly versatile and suitable for various specific tasks and data. It can be used to improve existing PnP-based pose estimation methods, or it can also use its flexibility to train new networks. In a more general sense, EPro-PnP essentially brings the common classification softmax into the continuous domain, and can theoretically be extended to train general models with nested optimization layers.

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

  • Paper link: https://arxiv.org/abs/2203.13254
  • Code link: https://github.com/tjiiv-cprg/EPro-PnP

1. Preface

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

We study 3D vision A classic question: Locate 3D objects in a single RGB image based on it. Specifically, given an image containing a 3D object projection, our goal is to determine the rigid body transformation from the object coordinate system to the camera coordinate system. This rigid body transformation is called the pose of the object, denoted as y, which contains two parts: 1) position component, which can be represented by a 3x1 displacement vector t, 2) orientation component, which can be represented by a 3x3 rotation matrix R means.

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

To address this problem, existing methods can be divided into two categories: explicit and implicit. The explicit method can also be called direct pose prediction , which uses a feedforward neural network (FFN) to directly output each component of the object's pose, usually: 1) predict the depth of the object, 2) find out where the center point of the object is 2D projection position on the image, 3) predict the orientation of the object (the specific processing method of orientation may be more complicated). Using image data marked with the true pose of the object, a loss function can be designed to directly supervise the pose prediction results, easily achieving end-to-end training of the network. However, such networks lack interpretability and are prone to overfitting on smaller datasets. In 3D object detection tasks, explicit methods dominate, especially for larger datasets (such as nuScenes). The

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

implicit method is a pose estimation method based on geometric optimization. The most typical representative is 's PnP-based pose estimation method . In this type of method, you first need to find N 2D points in the image coordinate system (the 2D coordinates of the i-th point are labeled

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

), and at the same time find the N 3D points associated with them in the object coordinate system (the i-th point). The 3D coordinates are marked as

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

), and sometimes it is necessary to obtain the association weight of each pair of points (the association weight of the i-th pair of points is marked as

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

). According to the perspective projection constraint, these N pairs of 2D-3D weighted associated points implicitly define the optimal pose of the object.Specifically, we can find the object pose that minimizes the reprojection error.

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

represents the camera projection function containing internal parameters, and

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

represents the element product. The PnP method is commonly used in pose estimation tasks where the object geometric shape is known to have 6 degrees of freedom.

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

The PnP-based method also requires a feed-forward network to predict the 2D-3D associated point set

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

. Compared with direct pose prediction, this deep learning model combined with traditional geometric vision algorithms has very good interpretability and its generalization performance is relatively stable. However, there are flaws in the model training methods in previous work. Many methods construct a proxy loss function to supervise the intermediate result X, which is not an optimal goal for pose. For example, if the shape of the object is known, the 3D key points of the object can be selected in advance, and then the network is trained to find the corresponding 2D projection point position. This also means that the surrogate loss can only learn some of the variables in X and is therefore not flexible enough. What if we don’t know the shapes of the objects in the training set and need to learn everything in X from scratch?

The advantages of explicit and implicit methods are complementary. If the network can be trained end-to-end to learn the associated point set X by supervising the pose results output by PnP, the advantages of the two can be combined. To achieve this goal, some recent studies have implemented backpropagation of the PnP layer using the derivation of the implicit function . However, the argmin function in PnP is discontinuous and non-differentiable at certain points, making backpropagation unstable and direct training difficult to converge.

2. EPro-PnP method introduction

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews, EPro-PnP module

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

In order to achieve stable end-to-end training, we proposed end-to-end probabilistic PnP (end-to-end probabilistic PnP), namely EPro-PnP. The basic idea is to regard the implicit pose as a probability distribution , then its probability density

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

is differentiable for X. First, the likelihood function of the pose is defined based on the reprojection error:

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

If an uninformative prior is used, the posterior probability density of the pose is the normalized result of the likelihood function:

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

It can be noted that the above formula is consistent with the commonly used classification The softmax formula

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

is close. In fact, the essence of EPro-PnP is to move the softmax from the discrete threshold to the continuous threshold, and replace the sum

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

with the integral

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

.

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews, KL divergence loss

In the process of training the model, if the true pose of the object is known

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

, the target pose distribution

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

can be defined. At this time, the KL divergence

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

can be calculated as the loss function used to train the network (because

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

is fixed, it can also be understood as the cross-entropy loss function). When the target

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

approaches the Dirac function, the loss function based on KL divergence can be simplified to the following form:

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

If its derivative is:

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

it can be seen that the loss function consists of two items, the first term (note The second item (denoted as

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

) attempts to reduce the reprojection error of the true value of the pose

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

. The second item (denoted as

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

) attempts to increase the reprojection error everywhere in the predicted pose

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

. The two directions are opposite, and the effect is shown in the figure below (left). As an analogy, on the right is the categorical cross-entropy loss that we commonly use when training classification networks.

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews, Monte Carlo pose loss

It should be noted that the second term in the KL loss

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

contains an integral. This integral has no analytical solution, so it must be approximated by numerical methods. Considering versatility, accuracy and computational efficiency, we use the Monte Carlo method to simulate the pose distribution through sampling.

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

Specifically, we used an importance sampling algorithm - Adaptive Multiple Importance Sampling (AMIS) to calculate K pose samples

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

with weight

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

. We call this process Monte Carlo PnP:

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

Accordingly, the second term

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

can be approximated as a function of the weight

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

, and

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

can be backpropagated:

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

The visualization effect of pose sampling is shown in the figure below:

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews. Derivative regularization for PnP solver

Although Monte Carlo Luo PnP loss can be used to train the network to obtain high-quality pose distribution, but in the inference stage, the PnP optimization solver is still needed to obtain the optimal pose solution

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

. The commonly used Gauss-Newton algorithm and its derivatives solve

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

through iterative optimization, and its iterative increment is determined by the first-order and second-order derivative of the cost function

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

. To make the solution of PnP

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

closer to the true value

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

, the derivative of the cost function can be regularized. The regularization loss function is designed as follows:

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

. Among them,

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

is the Gauss-Newton iteration increment, which is related to the first and second order derivatives of the cost function and can be back-propagated.

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

represents the distance metric, using smooth L1 for the position, and smooth L1 for the orientation. Use cosine similarity. When

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

is inconsistent, the loss function prompts the iterative increment

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

to point to the actual true value.

3. EPro-PnP-based pose estimation network

We use different networks for the two subtasks of 6-degree-of-freedom pose estimation and 3D target detection. Among them, for 6-degree-of-freedom pose estimation, it is slightly modified based on the CDPN network of ICCV 2019 and trained with EPro-PnP to conduct ablation studies; for 3D target detection, a brand-new network is designed based on the FCOS3D of ICCVW 2021. Deformable correspondence detection head to prove that EPro-PnP can train the network to directly learn all 2D-3D points and association weights without object shape knowledge, thus demonstrating the flexibility of EPro-PnP in applications.

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews. Dense correlation network for 6-degree-of-freedom pose estimation

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

The network structure is shown in the figure above, but the output layer is modified based on the original CDPN. The original CDPN uses the detected object 2D box to crop out the regional image and inputs it into the ResNet34 backbone. The original CDPN decouples position and orientation and into two branches. The position branch uses the explicit method of direct prediction, while the orientation branch uses the implicit method of dense association and PnP. In order to study EPro-PnP, the modified network only retains the dense correlation branch, whose output is a 3-channel 3D coordinate map, and a 2-channel correlation weight, where the correlation weight has undergone spatial softmax and global weight scaling. The purpose of adding spatial softmax is to normalize the weight

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

so that it has properties similar to attention map and can focus on relatively important areas. Experiments have proved that weight normalization is also the key to stable convergence. Global weight scaling reflects the concentration of pose distribution

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

.The network can be trained with only the Monte Carlo pose loss of EPro-PnP, in addition to adding derivative regularization and an additional 3D coordinate regression loss when the object shape is known.

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews. Deformation correlation network for 3D target detection

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

The network structure is shown in the figure above. Generally speaking, it is based on the FCOS3D detector and refers to the network structure designed by deformable DETR. Based on FCOS3D, its centerness and classification layers are retained, and its original pose prediction layer is replaced with object embedding and reference point layers for generating object query. Referring to the deformable DETR, we get the 2D sampling position by predicting the offset relative to the reference point (and thus get

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

). The sampled features are aggregated into object features through attention operations, which are used to predict object-level results (3D score, weight scale, 3D box size, etc.). In addition, after sampling, the feature of each point is added with object embedding and processed by self attention to output the 3D coordinates

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

and associated weight

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

corresponding to each point. The predicted

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

can all be obtained by EPro-PnP's Monte Carlo pose loss training, which can converge and achieve high accuracy without additional regularization. On this basis, derivative regularization loss and auxiliary loss can be added to further improve accuracy.

4. Experimental results

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews. The 6-degree-of-freedom pose estimation task

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

uses the LineMOD data set experiment and strictly compares it with the CDPN baseline. The main results are as above. It can be seen that by adding EPro-PnP loss for end-to-end training, the accuracy is significantly improved (+12.70). Continue to increase the derivative regularization loss, and the accuracy is further improved. On this basis, using the training results of the original CDPN to initialize and increase epochs (keeping the total number of epochs consistent with the complete three-stage training of the original CDPN) can further improve the accuracy. Part of the advantage of pre-training CDPN comes from the additional training of CDPN. mask supervision.

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

The above figure is a comparison of EPro-PnP with various leading methods. EPro-PnP, which is improved from the backward CDPN, is close to SOTA in accuracy, and the architecture of EPro-PnP is simple. It is completely based on PnP for pose estimation and does not require additional explicit depth estimation or pose refinement. Therefore, in There are also advantages in efficiency.

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews, 3D target detection task

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

uses the nuScenes data set experiment, and the comparison results with other methods are shown in the figure above. EPro-PnP not only has a significant improvement over FCOS3D, but also surpasses PGD, another improved version of SOTA and FCOS3D at the time. More importantly, EPro-PnP is currently the only one that uses geometric optimization methods to estimate pose on the nuScenes dataset. Due to the large scale of the nuScenes data set, the end-to-end trained direct pose estimation network already has good performance, and our results illustrate that end-to-end training of a model based on geometric optimization can achieve better performance on large data sets. Excellent performance.

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews. Visual analysis

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

The above figure shows the prediction results of the dense association network trained with EPro-PnP. Among them, the correlation weight map

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

highlights important areas in the image, similar to the attention mechanism. From the loss function analysis, it can be seen that the highlight area corresponds to the area with low reprojection uncertainty and which is more sensitive to pose changes. The results of

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNewsD target detection are shown in the figure above. The upper left view shows the 2D point positions sampled by the deformation correlation network. Red indicates points with a higher horizontal X component, and green indicates points with a higher vertical Y component. The green dots are generally located at the upper and lower ends of the object. Their main function is to calculate the distance of the object through the height of the object. This feature is not artificially specified and is completely the result of free training.The picture on the right shows the detection results in a top view, in which the blue cloud image represents the distribution density of the center point of the object, reflecting the uncertainty of the object's positioning. Generally, the positioning uncertainty of distant objects is greater than that of nearby objects. Another important advantage of

In this type of method, it is first necessary to find N 2D points in the image coordinate system, and at the same time find N 3D points associated with them in the object coordinate system. Sometimes it is also necessary to obtain the associated weight of each pair of points. - DayDayNews

EPro-PnP is the ability to represent orientation ambiguities by predicting complex multimodal distributions. As shown in the figure above, Barrier often has two peaks with a difference of 180° due to the rotational symmetry of the object itself; Cone itself has no specific orientation, so the prediction results are distributed in all directions; Pedestrian is not completely rotationally symmetrical, but due to the image It's not clear, it's hard to tell the front and back, and sometimes there are two peaks. This probabilistic characteristic makes EPro-PnP do not require any special processing on the loss function for symmetric objects.

5. Summary

EPro-PnP transforms the original undifferentiable optimal pose into a differentiable pose probability density, so that the pose estimation network based on PnP geometric optimization can achieve stable and flexible end-to-end training. EPro-PnP can be applied to general 3D object pose estimation problems. Even when the 3D object geometry is unknown, the 2D-3D associated points of the object can be learned through end-to-end training. Therefore, EPro-PnP broadens the possibilities of network design, such as our proposed deformation correlation network, which was previously impossible to train. In addition, EPro-PnP can also be directly used to improve existing PnP-based pose estimation methods, releasing the potential of existing networks through end-to-end training and improving pose estimation accuracy. In a more general sense, EPro-PnP essentially brings the common classification softmax into the continuous domain. It can not only be used for other 3D vision problems based on geometric optimization, but can also be theoretically extended to train general nested optimization layers. model.

science Category Latest News