End-to-End 6DoF Pose Estimation From Monocular RGB Images - CentraleSupélec Accéder directement au contenu
Article Dans Une Revue IEEE Transactions on Consumer Electronics Année : 2021

End-to-End 6DoF Pose Estimation From Monocular RGB Images

Wenbin Zou
  • Fonction : Auteur
  • PersonId : 1084337
Di Wu
  • Fonction : Auteur
  • PersonId : 1095114
Shishun Tian
  • Fonction : Auteur correspondant
  • PersonId : 1095115

Connectez-vous pour contacter l'auteur
Canqun Xiang
  • Fonction : Auteur
  • PersonId : 1095116
Xia Li
  • Fonction : Auteur
  • PersonId : 948385

Résumé

We present a conceptually simple framework for 6DoF object pose estimation, especially for autonomous driving scenarios. Our approach can efficiently detect the traffic participants from a monocular RGB image while simultaneously regressing their 3D translation and rotation vectors. The proposed method 6D-VNet, extends the Mask R-CNN by adding customised heads for predicting vehicle's finer class, rotation and translation. It is trained end-to-end compared to previous methods. Furthermore, we show that the inclusion of translational regression in the joint losses is crucial for the 6DoF pose estimation task, where object translation distance along longitudinal axis varies significantly, e.g., in autonomous driving scenarios. Additionally, we incorporate the mutual information between traffic participants via a modified non-local block to capture the spatial dependencies among the detected objects. As opposed to the original non-local block implementation, the proposed weighting modification takes the spatial neighbouring information into consideration whilst counteracting the effect of extreme gradient values. We evaluate our method on the challenging real-world Pascal3D+ dataset and our 6D-VNet reaches the 1st place in ApolloScape challenge 3D Car Instance task (Apolloscape, 2018), (Huang et al., 2018).
Fichier principal
Vignette du fichier
Zou et al-2021-End-to-end 6DoF Pose Estimation from Monocular RGB Images.pdf (9.16 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03189018 , version 1 (16-04-2021)

Identifiants

Citer

Wenbin Zou, Di Wu, Shishun Tian, Canqun Xiang, Xia Li, et al.. End-to-End 6DoF Pose Estimation From Monocular RGB Images. IEEE Transactions on Consumer Electronics, 2021, 67 (1), pp.87-96. ⟨10.1109/TCE.2021.3057137⟩. ⟨hal-03189018⟩
103 Consultations
340 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More