Undergraduate Thesis
Published:
Author: Jianing Lin
Date: 2018.9 - 2019.5
Hardware: NVIDIA Jetson TX2 + PC (i5 8600K + GTX1080)
Language: C/C++/Python
Abtract
With the development of deep learning and the proposal of Mask R-CNN, it became possible to obtain the high-accuracy instance segmentation, and this high-level information can help the visual navigation.
The assumption of static environment is the basis of visual odometry, but the real environment always has dynamic objects. Thus, we propose a dynamic odometry based on Mask R-CNN. We use the mask generated by the network and the images from stereo cameras to estimate the movement of the objects in the scene and reject all the dynamic key-points during the process of estimating the camera pose.
Then, we reconstruct the 3D environment with semantic labels based on the refined odometry. We propose an instance-level semantic map, in which we can distinguish different instance in the same class.
Finally, we use Stereo R-CNN, a developed network based on Mask R-CNN, to estimate the 3D bounding box of the vehicles in the environment and use the information between frames to reconstruct an object pose map.
We evaluate our algorithms both in the KITTI dataset and real dataset. Also, we use the data from KITTI dataset and real dataset to reconstruct the 3D semantic map, including the instance-level semantic map, the static map and the object pose map. We visualized the driving environment with detailed information, which is a must for self-driving applications.