This project is a learning material for 2021 Winter school on SLAM in deformable environments. The cited reference is: Paper: Alhashim I, Wonka P. High quality monocular depth estimation via transfer learning[J]. arXiv preprint arXiv:1812.11941, 2018. https://arxiv.org/abs/1812.11941
The network parts of this code has been delated as a potential homework for the winter school. The pre-operations, including training data and data loading, and the later-operations with have been provided. The readers can complete the following steps:
Encoder and decorder: the input RGB image is encoded into a feature vector using the DenseNet-169 network [1] pretrained on ImageNet [2].
The provided code is only based on the point-wise L1 loss defined on the depth values:
The readers are encourage to test the other loss in the cited reference including the differences in image gradient and structural similarity (SSIM). Some other loss functions are also encouraged.
The data augmentation is not provided in this code. The readers can test some classical augmentation approach for the image dataset, including: Flip, Rotation, Scale, Crop, Translation, Gussian Noise, and Salt-and-pepper Noise, to fully use the offered dataset.
A new network with this encoder and decorder structure is also encouraged. The readers can desin their own network to reach a better performance.
[1] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269, 2017. 2, 3, 5, 11 [2] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 3, 5