Identifying the Most Effective Feature Category in Machine  Learning-based Phishing Website Detection

Choon Lin Tan; Kang Leng Chiew; Nadianatra Musa; Dayang Hanani Abang Ibrahim

doi:10.14419/ijet.v7i4.31.23331

Authors

Choon Lin Tan
Kang Leng Chiew
Nadianatra Musa
Dayang Hanani Abang Ibrahim

Received date: December 7, 2018

Accepted date: December 7, 2018

Published date: December 9, 2018

DOI:

https://doi.org/10.14419/ijet.v7i4.31.23331

Keywords:

Classification, Feature Categorisation, Machine Learning, Phishing Detection, Web Security

Abstract

This paper proposes an improved approach to categorise phishing features into precise categories. Existing features are surveyed from the current phishing detection works and grouped according to the improved categorisation approach. The performances of various feature sets are evaluated using the C4.5 classifier, whereby the content URL obfuscation category is found to perform the best, achieving an accuracy of 95.97%. Additional benchmarking is conducted to compare the performance of the winning feature set against other feature sets utilised in existing phishing detection techniques. Results suggest that the winning feature set is indeed an effective feature category which has contributed significantly to the performance of existing machine learning-based phishing detection systems.
Â
Â

References

[1] Anti-Phishing Working Group (2017), â€œPhishing Activity Trends Report, 1st Half 2017â€, available online: http://docs.apwg.org/reports/apwg_trends_report_h1_2017.pdf, last visit: 06.01.2018
[2] Bleau H (2016), â€œ2017 Global Fraud and Cybercrime Forecastâ€, available online: https://www.rsa.com/en-us/blog/2016-12/2017-global-fraud-cybercrime-forecast, last visit: 09.01.2017
[3] Purkait S (2015), â€œExamining the effectiveness of phishing filters against DNS based phishing attacks,â€ Information and Computer Security, Vol. 23, No. 3, pp. 333â€“346.
[4] Varshney G, Misra M, & Atrey PK (2016), â€œA survey and classification of web phishing detection schemes,â€ Security and Communication Networks, 2016.
[5] Gu X, Wang H, & Ni T (2013), â€œAn efficient approach to detecting phishing web,â€ Journal of Computational Information Systems, Vol. 9, No. 14.
[6] He M, Horng SJ, Fan P, Khan MK, Run RS, Lai JL, Chen RJ, & Sutanto A (2011), â€œAn efficient phishing webpage detector,â€ Expert Systems with Applications, Vol. 38, No. 10, pp. 12018â€“12027.
[7] Choo XM, Chiew KL, Ibrahim DHA, Musa N, Sze SN, & Tiong WK (2016), â€œFeature-based phishing detection technique,â€ Journal of Theoretical and Applied Information Technology, Vol. 91, No. 1, pp. 101â€“106.
[8] Nguyen HH, & Nguyen DT (2016), â€œMachine learning based phishing web sites detection,â€ Proceedings of the International Conference on Advanced Engineering Theory and Applications (AETA), Ho Chi Minh City, Vietnam, pp. 123â€“131.
[9] Moghimi M, & Varjani AY (2016), â€œNew rule-based phishing detection method,â€ Expert Systems with Applications, Vol. 53, pp. 231â€“242.
[10] Garera S, Provos N, Chew M, & Rubin AD (2007), â€œA Framework for Detection and Measurement of Phishing Attacks,â€ Proceedings of the ACM Workshop on Recurring Malcode, Alexandria, USA, pp. 1â€“8.
[11] Xiang G, Hong J, Rose CP, & Cranor L (2011), â€œCANTINA+: A Feature-rich Machine Learning Framework for Detecting Phishing Web Sites,â€ ACM Transactions on Information and System Security, Vol. 14, No. 2, p. 21.
[12] Zhang Y, Hong JI, and Cranor LF (2007), â€œCANTINA: A Content-Based Approach to Detecting Phishing Web Sites,â€ Proceedings of the 16th International World Wide Web Conference, Banff, Canada, pp. 639â€“648.
[13] Abdelhamid N, Ayesh A, & Thabtah F (2014), â€œPhishing detection based Associative Classification data mining,â€ Expert Systems with Applications, Vol. 41, No. 13, pp. 5948â€“5959.
[14] Mohammad RM, Thabtah F, & McCluskey L (2012), â€œAn assessment of features related to phishing websites using an automated technique,â€ Proceedings of the International Conference for Internet Technology and Secured Transactions, London, UK, pp. 492â€“497.
[15] Ramesh G, & Krishnamurthi I (2014), â€œA comprehensive and efficacious architecture for detecting phishing webpages,â€ Computers & Security, Vol. 40, pp. 23â€“37.
[16] Zuhair H, Selamat A, & Salleh M (2016), â€œNew hybrid features for phish website prediction,â€ International Journal of Advances in Soft Computing and its Applications, Vol. 8, No. 1, pp. 28â€“43.
[17] Schwartz B (2016), â€œGoogle has confirmed it is removing Toolbar PageRankâ€, available online: https://searchengineland.com/google-has-confirmed-they-are-removing-toolbar-pagerank-244230, last visit: 27.05.2018.
[18] Sunil ANV, & Sardana A (2012), â€œA PageRank Based Detection Technique for Phishing Web Sites,â€ Proceedings of the IEEE Symposium on Computers and Informatics (ISCI), Penang, Malaysia, pp. 58â€“63.
[19] Mohammad RM, Thabtah F, & McCluskey L (2015), â€œPhishing Website Features,â€ unpublished.
[20] Whittaker C, Ryner B, & Nazif M (2010), â€œLarge-Scale Automatic Classification of Phishing Pages,â€ Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, USA.
[21] Ludl C, McAllister S, Kirda E, & Kruegel C (2007), â€œOn the Effectiveness of Techniques to Detect Phishing Sites,â€ Proceedings of the 4th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), Lucerne, Switzerland, pp. 20â€“39.
[22] Pan Y, & Ding X (2006), â€œAnomaly based web phishing page detection,â€ Proceedings of the 22nd Annual Computer Security Applications Conference (ACSAC), Miami Beach, USA, pp. 381â€“392.
[23] Fahmy HMA, & Ghoneim S (2011), â€œPhishBlock: A hybrid anti-phishing tool,â€ Proceedings of the International Conference on Communications, Computing and Control Applications (CCCA), Hammamet, Tunisia, pp. 1â€“5.
[24] Gupta S, & Kumaraguru P (2014), â€œEmerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page,â€ ArXiv e-prints.
[25] Thabtah F, & Abdelhamid N (2016), â€œDeriving Correlated Sets of Website Features for Phishing Detection: A Computational Intelligence Approach,â€ Journal of Information & Knowledge Management, Vol. 15, No. 4.
[26] Zuhair H, Selamat A, & Salleh M (2015), â€œThe Effect of Feature Selection on Phish Website Detection,â€ International Journal of Advanced Computer Science and Applications, Vol. 6, No. 10, pp. 221â€“232.
[27] Zuhair H, Selamat A, & Salleh M (2015), â€œSelection of robust feature subsets for phish webpage prediction using maximum relevance and minimum redundancy criterion,â€ Journal of Theoretical and Applied Information Technology, Vol. 81, No. 2, pp. 188â€“205.
[28] PhishTank (2017), â€œJoin the fight against phishingâ€, available online: https://www.phishtank.com/, last visit: 10.01.2017.
[29] OpenPhish (2017), â€œPhishing Intelligenceâ€, available online: https://www.openphish.com/, last visit: 01.01.2017.
[30] Alexa Internet Inc. (2017), â€œKeyword Research, Competitive Analysis, & Website Rankingâ€, available online: https://www.alexa.com/, last visit: 10.01.2017.
[31] â€œCommon Crawlâ€, available online: http://commoncrawl.org/, last visit: 10.01.2017.
[32] Selenium Project (2017), â€œSelenium WebDriverâ€, available online: http://www.seleniumhq.org/projects/webdriver/, last visit: 10.01.2017.
[33] Frank E, Hall MA, & Witten IH (2016), The WEKA Workbench, 4th edn. Morgan Kaufmann, Burlington, Massachusetts, pp. 553â€“571.

Identifying the Most Effective Feature Category in Machine Learning-based Phishing Website Detection

Authors

Choon Lin Tan

Kang Leng Chiew

Nadianatra Musa

Dayang Hanani Abang Ibrahim

How to Cite

DOI:

Keywords:

Abstract

References

Downloads

How to Cite