Skip to content

Latest commit

 

History

History
49 lines (34 loc) · 3.19 KB

File metadata and controls

49 lines (34 loc) · 3.19 KB

Monocular-Depth-Estimation-vis-Transfer-learning-Winter-school-

This project is a learning material for 2021 Winter school on SLAM in deformable environments. The cited reference is: Paper: Alhashim I, Wonka P. High quality monocular depth estimation via transfer learning[J]. arXiv preprint arXiv:1812.11941, 2018. https://arxiv.org/abs/1812.11941

The network parts of this code has been delated as a potential homework for the winter school. The pre-operations, including training data and data loading, and the later-operations with have been provided. The readers can complete the following steps:

adding the network part based on the following structure:

Encoder and decorder: the input RGB image is encoded into a feature vector using the DenseNet-169 network [1] pretrained on ImageNet [2].

Network Architecture
图片

DenseNet-169 network
图片

Decoder sub-block
图片

improving the loss function

The provided code is only based on the point-wise L1 loss defined on the depth values:
图片

The readers are encourage to test the other loss in the cited reference including the differences in image gradient and structural similarity (SSIM). Some other loss functions are also encouraged.
图片
图片
图片

Adding data augmentation

The data augmentation is not provided in this code. The readers can test some classical augmentation approach for the image dataset, including: Flip, Rotation, Scale, Crop, Translation, Gussian Noise, and Salt-and-pepper Noise, to fully use the offered dataset.
图片 图片 图片

Design new encoder and decorder network (optional)

A new network with this encoder and decorder structure is also encouraged. The readers can desin their own network to reach a better performance.

Reference

[1] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269, 2017. 2, 3, 5, 11 [2] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 3, 5