Identifying the Most Effective Feature Category in Machine Learning-based Phishing Website Detection

Authors

  • Choon Lin Tan
  • Kang Leng Chiew
  • Nadianatra Musa
  • Dayang Hanani Abang Ibrahim

DOI:

https://doi.org/10.14419/ijet.v7i4.31.23331

Published:

2018-12-09

Keywords:

Classification, Feature Categorisation, Machine Learning, Phishing Detection, Web Security

Abstract

This paper proposes an improved approach to categorise phishing features into precise categories. Existing features are surveyed from the current phishing detection works and grouped according to the improved categorisation approach. The performances of various feature sets are evaluated using the C4.5 classifier, whereby the content URL obfuscation category is found to perform the best, achieving an accuracy of 95.97%. Additional benchmarking is conducted to compare the performance of the winning feature set against other feature sets utilised in existing phishing detection techniques. Results suggest that the winning feature set is indeed an effective feature category which has contributed significantly to the performance of existing machine learning-based phishing detection systems.

 

 

References

[1] Anti-Phishing Working Group (2017), “Phishing Activity Trends Report, 1st Half 2017â€, available online: http://docs.apwg.org/reports/apwg_trends_report_h1_2017.pdf, last visit: 06.01.2018

[2] Bleau H (2016), “2017 Global Fraud and Cybercrime Forecastâ€, available online: https://www.rsa.com/en-us/blog/2016-12/2017-global-fraud-cybercrime-forecast, last visit: 09.01.2017

[3] Purkait S (2015), “Examining the effectiveness of phishing filters against DNS based phishing attacks,†Information and Computer Security, Vol. 23, No. 3, pp. 333–346.

[4] Varshney G, Misra M, & Atrey PK (2016), “A survey and classification of web phishing detection schemes,†Security and Communication Networks, 2016.

[5] Gu X, Wang H, & Ni T (2013), “An efficient approach to detecting phishing web,†Journal of Computational Information Systems, Vol. 9, No. 14.

[6] He M, Horng SJ, Fan P, Khan MK, Run RS, Lai JL, Chen RJ, & Sutanto A (2011), “An efficient phishing webpage detector,†Expert Systems with Applications, Vol. 38, No. 10, pp. 12018–12027.

[7] Choo XM, Chiew KL, Ibrahim DHA, Musa N, Sze SN, & Tiong WK (2016), “Feature-based phishing detection technique,†Journal of Theoretical and Applied Information Technology, Vol. 91, No. 1, pp. 101–106.

[8] Nguyen HH, & Nguyen DT (2016), “Machine learning based phishing web sites detection,†Proceedings of the International Conference on Advanced Engineering Theory and Applications (AETA), Ho Chi Minh City, Vietnam, pp. 123–131.

[9] Moghimi M, & Varjani AY (2016), “New rule-based phishing detection method,†Expert Systems with Applications, Vol. 53, pp. 231–242.

[10] Garera S, Provos N, Chew M, & Rubin AD (2007), “A Framework for Detection and Measurement of Phishing Attacks,†Proceedings of the ACM Workshop on Recurring Malcode, Alexandria, USA, pp. 1–8.

[11] Xiang G, Hong J, Rose CP, & Cranor L (2011), “CANTINA+: A Feature-rich Machine Learning Framework for Detecting Phishing Web Sites,†ACM Transactions on Information and System Security, Vol. 14, No. 2, p. 21.

[12] Zhang Y, Hong JI, and Cranor LF (2007), “CANTINA: A Content-Based Approach to Detecting Phishing Web Sites,†Proceedings of the 16th International World Wide Web Conference, Banff, Canada, pp. 639–648.

[13] Abdelhamid N, Ayesh A, & Thabtah F (2014), “Phishing detection based Associative Classification data mining,†Expert Systems with Applications, Vol. 41, No. 13, pp. 5948–5959.

[14] Mohammad RM, Thabtah F, & McCluskey L (2012), “An assessment of features related to phishing websites using an automated technique,†Proceedings of the International Conference for Internet Technology and Secured Transactions, London, UK, pp. 492–497.

[15] Ramesh G, & Krishnamurthi I (2014), “A comprehensive and efficacious architecture for detecting phishing webpages,†Computers & Security, Vol. 40, pp. 23–37.

[16] Zuhair H, Selamat A, & Salleh M (2016), “New hybrid features for phish website prediction,†International Journal of Advances in Soft Computing and its Applications, Vol. 8, No. 1, pp. 28–43.

[17] Schwartz B (2016), “Google has confirmed it is removing Toolbar PageRankâ€, available online: https://searchengineland.com/google-has-confirmed-they-are-removing-toolbar-pagerank-244230, last visit: 27.05.2018.

[18] Sunil ANV, & Sardana A (2012), “A PageRank Based Detection Technique for Phishing Web Sites,†Proceedings of the IEEE Symposium on Computers and Informatics (ISCI), Penang, Malaysia, pp. 58–63.

[19] Mohammad RM, Thabtah F, & McCluskey L (2015), “Phishing Website Features,†unpublished.

[20] Whittaker C, Ryner B, & Nazif M (2010), “Large-Scale Automatic Classification of Phishing Pages,†Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, USA.

[21] Ludl C, McAllister S, Kirda E, & Kruegel C (2007), “On the Effectiveness of Techniques to Detect Phishing Sites,†Proceedings of the 4th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), Lucerne, Switzerland, pp. 20–39.

[22] Pan Y, & Ding X (2006), “Anomaly based web phishing page detection,†Proceedings of the 22nd Annual Computer Security Applications Conference (ACSAC), Miami Beach, USA, pp. 381–392.

[23] Fahmy HMA, & Ghoneim S (2011), “PhishBlock: A hybrid anti-phishing tool,†Proceedings of the International Conference on Communications, Computing and Control Applications (CCCA), Hammamet, Tunisia, pp. 1–5.

[24] Gupta S, & Kumaraguru P (2014), “Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page,†ArXiv e-prints.

[25] Thabtah F, & Abdelhamid N (2016), “Deriving Correlated Sets of Website Features for Phishing Detection: A Computational Intelligence Approach,†Journal of Information & Knowledge Management, Vol. 15, No. 4.

[26] Zuhair H, Selamat A, & Salleh M (2015), “The Effect of Feature Selection on Phish Website Detection,†International Journal of Advanced Computer Science and Applications, Vol. 6, No. 10, pp. 221–232.

[27] Zuhair H, Selamat A, & Salleh M (2015), “Selection of robust feature subsets for phish webpage prediction using maximum relevance and minimum redundancy criterion,†Journal of Theoretical and Applied Information Technology, Vol. 81, No. 2, pp. 188–205.

[28] PhishTank (2017), “Join the fight against phishingâ€, available online: https://www.phishtank.com/, last visit: 10.01.2017.

[29] OpenPhish (2017), “Phishing Intelligenceâ€, available online: https://www.openphish.com/, last visit: 01.01.2017.

[30] Alexa Internet Inc. (2017), “Keyword Research, Competitive Analysis, & Website Rankingâ€, available online: https://www.alexa.com/, last visit: 10.01.2017.

[31] “Common Crawlâ€, available online: http://commoncrawl.org/, last visit: 10.01.2017.

[32] Selenium Project (2017), “Selenium WebDriverâ€, available online: http://www.seleniumhq.org/projects/webdriver/, last visit: 10.01.2017.

[33] Frank E, Hall MA, & Witten IH (2016), The WEKA Workbench, 4th edn. Morgan Kaufmann, Burlington, Massachusetts, pp. 553–571.

View Full Article:

How to Cite

Lin Tan, C., Leng Chiew, K., Musa, N., & Hanani Abang Ibrahim, D. (2018). Identifying the Most Effective Feature Category in Machine Learning-based Phishing Website Detection. International Journal of Engineering & Technology, 7(4.31), 1–6. https://doi.org/10.14419/ijet.v7i4.31.23331
Received 2018-12-07
Accepted 2018-12-07
Published 2018-12-09