How to Generate Image Dataset based on 3D Model and Deep Learning Method

  • Authors

    • Sooyoung Cho
    • Sang Geun Choi
    • Daeyeol Kim
    • Gyunghak Lee
    • Chae BongSohn
    2018-09-01
    https://doi.org/10.14419/ijet.v7i3.34.18969
  • CNN, VGG, YOLO, Virtual space, Deep learning, Dataset generation
  • Performances of computer vision tasks have been drastically improved after applying deep learning. Such object recognition, object segmentation, object tracking, and others have been approached to the super-human level. Most of the algorithms were trained by using supervised learning. In general, the performance of computer vision is improved by increasing the size of the data. The collected data was labeled and used as a data set of the YOLO algorithm. In this paper, we propose a data set generation method using Unity which is one of the 3D engines. The proposed method makes it easy to obtain the data necessary for learning. We classify 2D polymorphic objects and test them against various data using a deep learning model. In the classification using CNN and VGG-16, 90% accuracy was achieved. And we used Tiny-YOLO of YOLO algorithm for object recognition and we achieved 78% accuracy. Finally, we compared in terms of virtual and real environments it showed a result of 97 to 99 percent for each accuracy.

  • References

    1. [1] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).

      [2] Redmon, J., Divvala, S., Girshick, R., &Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).

      [3] Lu, W. L., & Little, J. J. (2006, June). Simultaneous tracking and action recognition using the pca-hog descriptor. In Computer and Robot Vision, 2006. The 3rd Canadian Conference on (pp. 6-6). IEEE.

      [4] Zhou, H., Yuan, Y., & Shi, C. (2009). Object tracking using SIFT features and mean shift. Computer vision and image understanding, 113(3), 345-352.

      [5] Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. (2014, November). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 675-678). ACM.

      [6] Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 427-436).

      [7] LeCun, Y., Jackel, L. D., Bottou, L., Cortes, C., Denker, J. S., Drucker, H., ... &Vapnik, V. (1995). Learning algorithms for classification: A comparison on handwritten digit recognition. Neural networks: the statistical mechanics perspective, 261, 276.

      [8] Lin, M., Chen, Q., & Yan, S. (2013). Network in network. arXiv preprint arXiv:1312.4400.

      [9] Xu, Z., Yang, Y., & Hauptmann, A. G. (2015, June). A discriminative CNN video representation for event detection. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on (pp. 1798-1807). IEEE.

      [10] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

      [11] Blog.heuritech.com. (2018). A brief report of the Heuritech Deep Learning Meetup #5. [online] Available at: https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-learning-meetup-5/.

      [12] Haykin, S. S. (Ed.). (2001). Kalman filtering and neural networks(pp. 221-269). New York: Wiley.

      [13] Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91-110.

      [14] Dalal, N., &Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 886-893). IEEE.

  • Downloads

  • How to Cite

    Cho, S., Geun Choi, S., Kim, D., Lee, G., & BongSohn, C. (2018). How to Generate Image Dataset based on 3D Model and Deep Learning Method. International Journal of Engineering & Technology, 7(3.34), 221-225. https://doi.org/10.14419/ijet.v7i3.34.18969