A Study on the Performance of Feature Extraction Methods According to the Size of N-Gram

  • Authors

    • Young Man Kwon
    • Min Gu Son
    • Dong Keun Chung
    • Myung jae Lim
    https://doi.org/10.14419/ijet.v7i3.33.18516

    Received date: August 28, 2018

    Accepted date: August 28, 2018

    Published date: August 29, 2018

  • malware detection, machine learning, classifier, N-gram, Opcode, API
  • Abstract

    In this paper, we studied the performance of feature extraction methods according to the size of N-gram for malware detection. The feature is extracted by three methods, using Opcode Only, both Opcode and API and API Only from PE file. We measure the performance of them indirectly with measuring the AUC score and accuracy of classifier. We did experiments with the different N size by using several classifiers such as DT, SVM, KNN and BNB classifiers. As a result, we got the conclusion as followings. If we use N-gram technique, we recommend Opcode Only method through our experiments. Also, the instance-based classifier KNN and DT among the model based classifier have good performance than SVM and BNB.

  • References

    1. The Independent IT-Security Institute, https://www.av-test.org/en/statistics/malware/
    2. Ashwini Mujumdar, Gayatri Masiwal, Dr.B. B. Meshram, “Analysis of Signature-Based and Behavior-Based Anti-Malware Approaches,” IJARCET, Volume 2, Issue 6, June 2013
    3. James Scott, “Signature Based Malware Detection is Dead,” Insti-tute for Critical Infrastructure Technology, February 2017
    4. Kateryna Chumachenko, “Machine Learning Methods for Malware Detection and Classification,” kaakkois-Suomen ammattikor-keakoulu, March 2017
    5. Edward Raff, Jared Sylvester, Charles Nicholas, “Learning the PE Header, Malware Detection with Minimal Domain Knowledge,” Proceeding of the 10th ACM Workshop on Artificial Intelligence and Security, November 2017, pp: 121-132
    6. Yibin Liao, “PE-Header-Based Malware Study and Detection,” University of Georgia, 2012
    7. Igor Santos, Felix Brezo, Javier Nieves, Yoseba K. Penya, Borja Sanz, Carlos Laorden and Pablo G. Bringas, “Idea: Opcode-sequence-based Malware Detection,” Engineering Secure Software and Systems, vol 5965, Springer, Berlin, Heidelberg, 2010
    8. Veeramani R, Nihin Rai, “Windows API based Malware Detection and Framework Analysis,” International Journal of Scientific & En-gineering Research, Volume 3, Issue 3, March, 2012
    9. Tae-Hyun Ahn, Sang-Jin Oh, Young-Man Kwon, “Malware Detec-tion Method using Opcode and windows API Calls,” The Journal of The Institute of Internet, Broadcasting and Communication (IIBC), Vo1.17, No.6, December 2017, pp: 11-17
    10. Willeam B.Carnar, John M.Trenkle, “N-Gram-Based Text Categori-zation,” In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994, pp: 161-175
    11. Payal B.Awachate, Prof.Vivek P. Kshirsagar, “Improved Twitter Sentiment Analysis Using N Gram Feature Selection and Combina-tions,” IJRCCE, Vol.5, Issue 9, September 2016
    12. Mikhail Zolotukhin, Timo Hamalainen, “Detection of Zero-day Malware Based on the Analysis of Op-code Sequences,” The 11TH Annual IEEE CCNC – Security, Privacy and Content Protection, 2014
    13. Scikit Learn, http://scikit-learn.org/stable/modules/tree.html
    14. Aurelien Geron, Hands-On Machine Learning with Scikit-Learn & TensorFlow, O’REILLY, 2017, pp: 167-179
    15. Chih-Wei Hsu, Chin-Chung Chang, Chih-Jen Lin, “A Practical Guide to Support Vector Classification,” 2016
    16. Scikit Learn, http://scikit-learn.org/stable/modules/naive_ bayes .html
    17. virusshare, https://virusshare.com
    18. joxeankoret, https://malwareurls.joxeankoret.com
    19. malc0de, https://malc0de.com
    20. malwareblacklist, https://www.malwareblacklist.com
    21. Andrew McCallum, Kamal Nigam, “A comparison of Event Models for Naïve Bayes Text Classification,” AAAI Workshop, 1998, pp: 41-48
    22. Vaishali Ganganwar, “An overview of classification algorithms for imbalanced datasets,” International Journal of Emerging Technolo-gy and Advanced Engineering, Vol 2, Issue 4, April 2012
  • Downloads

  • How to Cite

    Man Kwon, Y., Gu Son, M., Keun Chung, D., & jae Lim, M. (2018). A Study on the Performance of Feature Extraction Methods According to the Size of N-Gram. International Journal of Engineering and Technology, 7(3.33), 23-27. https://doi.org/10.14419/ijet.v7i3.33.18516