Stereo Matching Algorithm With Deep Learning Method Using Nvidia Platform

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Autonomous vehicle has become a hot topic for researchers in recent years. One of the important sensors used in these vehicles is Stereo Cameras/Vision. Stereo vision systems are used to estimate the depth from the two cameras installed on the robots or vehicles. This method can deliver the 3D position of all objects captured in the scene at a lower cost and higher density compared to LIDAR. Recently, neural networks are vastly investigated and used in image processing problems and deep learning networks which has surpassed traditional computer vision methods specially in object recognition. In this paper, we propose to use a GPU with a new Siamese deep learning method to speed up the stereo matching algorithm. In this work, we use a high end Nvidia Platform DGX workstation to train and test our algorithm and compare the results with normal GPUs and CPUs. Based on numerical evaluation, the Nvidia DGX can train a neural network with higher input image resolution approximately 8 times faster than a normal GPU and 40 times faster than a Core i7 8 Cores CPU. Since it has the ability to train on a higher resolution the network can be trained in more iteration and results in higher accuracy.


  • Keywords

    LIDAR, Stereo Vision, Siamese Deep Neural Network, Nvidia Platform

  • References

      [1] Ben-Tzvi, P. and Xu, X. An embedded feature-based stereo vision system for autonomous mobile robots. Robotic and Sensors Environments (ROSE), 2010 IEEE International Workshop on. 2010. 1 –6. DOI : 5675303.

      [2] Wang, L., Liao, M., Gong, M., Yang, R. and Nister, D. High-Quality Real-Time Stereo Using Adaptive Cost Aggregation and Dynamic Programming. Proceedings of the Third International Symposium on 3D Data Processing, Visualization, and Transmission. Washington, DC, USA: IEEE Computer Society. DOI :

      [3] Samadi M., Othman M.F. (2013) A New Fast and Robust Stereo Matching Algorithm for Robotic Systems. Advances in Intelligent Systems and Computing, vol 209. Springer, Berlin, Heidelberg DOI :

      [4] Geiger A., Roser M., Urtasun R. (2011) Efficient Large-Scale Stereo Matching. In: Kimmel R., Klette R., Sugimoto A. (eds) Computer Vision – ACCV 2010. ACCV 2010. Lecture Notes in Computer Science, vol 6492. Springer, Berlin, Heidelberg. DOI :

      [5] H. Hirschmuller. Accurate and efficient stereo processing by semiglobal matching and mutual information. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages 807–814. DOI :

      [6] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. DOI :

      [7] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014. DOI :

      [8] Z. Chen, X. Sun, L. Wang, Y. Yu, and C. Huang. A deep visual correspondence embedding model for stereo matching costs. In Proceedings of the IEEE International Conference on Computer Vision, pages 972–980, 2015. DOI :

      [9] V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. DOI :

      [10] J. Zbontar and Y. LeCun. Stereo matching by training a convolutional neural network to compare image patches. Journal of Machine Learning Research, 17. DOI :

      [11] Z. Chen, X. Sun, L. Wang, Y. Yu, and C. Huang. A deep visual correspondence embedding model for stereo matching costs. In Proceedings of the IEEE International Conference on Computer Vision, pages 972–980, 2016. DOI :

      [12] J. Flynn, I. Neulander, J. Philbin, and N. Snavely. DeepStereo: Learning to Predict New Views from the World’s Imagery. DOI :

      [13] H. Park and K. M. Lee. Look wider to match image patches with convolutional neural networks. IEEE Signal Processing Letters, PP(99):11, 2017. DOI :

      [14] N. Mayer, E. Ilg, P. H¨ausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. CoRR, abs/1510.0(2002), 2015. DOI :

      [15] A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, R. Kennedy, A. Bachrach, and A. Bry. End-to-end learning of geometry and context for deep stereo regression. In IEEE Conference on Computer Vision and Pattern Recognition. DOI :

      [16] A. Geiger, P. Lenz and R. Urtasun, "Are we ready for autonomous driving? The KITTI vision benchmark suite," 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, 2012, pp. 3354-3361. DOI :

      [17] M. Menze and A. Geiger. Object scene flow for autonomous vehicles. In Conference on Computer Vision and Pattern Recognition (CVPR). DOI :

      [18] Samadi M., Othman M.F., Talib M.F. 2016. Fast and Robust Stereo Matching Algorithm For Obstacle Detection In Robotic Vision Systems. 6-13, Jurnal Teknologi DOI:

      [19] Mahammed M. A., Melhum A. I., Kochery F.A. 2013. Object Distance Measurement by Stereo VISION. International Journal of Science and Applied Information Technology (IJSAIT), Vol.2, No.2, Pages: 05-08

      [20], TensorFlow opensource deep learning library, Retrieved 15 October 2018.




Article ID: 25646
DOI: 10.14419/ijet.v8i1.6.25646

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.