Comparison of twitter spam detection using various machine learning algorithms

 
 
 
  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract


    Online Social Networks(OSNs) have mutual themes such as information sharing, person-to-person interaction and creation of shared and collaborative content.  Lots of micro blogging websites available like Twitter, Instagram, Tumblr. A standout amongst the most prominent online networking stages is Twitter. It has 313 million months to month dynamic clients which post of 500 million tweets for each day. Twitter allows users to send short text based messages with up to 140-character letters called "tweets". Enlisted clients can read and post tweets however the individuals who are unregistered can just read them. Due to the reputation it attracts the consideration of spammers for their vindictive points, for example, phishing true blue clients or spreading malevolent programming and promotes through URLs shared inside tweets, forcefully take after/unfollow valid clients and commandeer drifting subjects to draw in their consideration, proliferating obscenity. Twitter Spam has become a critical problem nowadays. By looking at the execution of an extensive variety of standard machine learning calculations, fundamentally expecting to distinguish the acceptable location execution in light of a lot of information by utilizing account-based and tweet content-based highlights.


  • Keywords


    Twitter; Spammer; tweet; machine learning algorithm; account; tweet content –based.

  • References


      [1] F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida, “Detecting spammers on Twitter,” in Proc. Collaboration, Electron. Messaging, Anti-Abuse Spam Conf. (CEAS), vol. 6. 2010, p. 12.

      [2] C. Chen, J. Zhang, X. Chen, Y. Xiang, and W. Zhou, “6 million spam tweets: A large ground truth for timely Twitter spam detection,” in Proc.

      [3] C. Yang, R. Harkreader, and G. Gu, “Empirical evaluation and new design for fighting evolving Twitter spammers,” IEEE Trans. Inf. Forensics Security, vol. 8, no. 8, pp. 12801293, Aug. 2013.

      [4] K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song, “Design and evaluation of a real-time URL spam filtering service,” in Proc.

      [5] Cran R-Project, R Project Website. (Aug.6, 2015). A Short Introduction to the Caret Package.

      [6] M. Kuhn, “Caret package,” J. Statist. Softw., vol. 28, no. 5, pp. 1_26, 2008.

      [7] Z. Chu, S. Gianvecchio, H. Wang, S. Jajodia, Who is Tweeting on Twitter: Human, Bot, or Cyborg?, in: 26th Annu. Comput. Secur. Appl. Conf. (ACSAC 2010), Austin, Texas, USA, 2010: pp. 21–30. doi:10.1145/1920261.1920265.

      [8] P. Kaur, A. Singhal, J. Kaur, Spam Detection on Twitter: A Survey, in: 2016 Int. Conf. Comput. Sustain. Glob. Dev., IEEE, New Delhi, India, 2016: pp. 2570–2573.

      [9] C.D. Gowri, V. Mohanraj, A Survey on Spam Detection in Twitter: A Review, Int. J. Comput. Sci. Bus. Informatics. 14 (2014) 92–102.

      [10] J. Song, S. Lee, and J. Kim,“Spam filtering in Twitter using sender receiver relationship,” in Proc. Int.Workshop Recent Adv. Intrusion Detection, 2011, pp. 301317.

      [11] Statista. Number of Monthly Active Twitter Users Worldwide from 1st Quarter 2010 to 2nd Quarter 2016 (in millions), accessed on Aug. 9, 2016.

      [12] G.Stringhini, C. Kruegel, and G. Vigna, “Detecting spammers on social networks,” in Proc. 26th Annu. Comput. Secur. Appl. Conf., 2010,pp. 1-9.IEEE Int. Conf. Commun. (ICC), Jun. 2015, pp. 70657070.

      [13] G. Biau, “Analysis of a random forests model,” J. Mach. Learn. Res.,vol. 13, pp. 1063_1095, Apr. 2012.

      [14] C.M.Bishop,“Pattern recognition and machine learning,” New York,NY,USA: Springer, 2006.

      [15] D. Conway and J. White, Machine Learning for Hackers. Newton, MA,USA: O'Reilly Media, 2012.

      [16] M. Egele, G. Stringhini, C. Kruegel, and G. Vigna, “COMPA: Detecting compromised accounts on social networks,” in Proc. NDSS, 2013.

      [17] J.Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: A statistical view of boosting,” Ann. Statist., vol. 28, no. 2, p. 2000, 1998.

      [18] J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” Ann. Statist., vol. 29, no. 5, pp. 1189_1232, 2001.

      [19] K. Ghosh, P. Chaudhuri, and C. A. Murthy, “On visualization and aggregation of nearest neighbor classifiers,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 27, no. 10, pp. 1592_1602, Oct. 2005.

      [20] K. Hechenbichler and K. Schliep, “Weighted K-nearest-neighbor techniques and ordinal classifcation,”Ludwigs_Maximilias Univ. Munich,Munich, Germany, Discussion Paper 399, SFB 386, 2004, p. 16

      [21] H. Wang, “Don't follow me: Spam detection in Twitter,” in Proc. Int. Conf. Secur. Cryptogr. (SECRYPT), 2010, pp. 1_10.

      [22] D. Wang, S. B. Navathe, L. Liu, D. Irani, A. Tamersoy, and C. Pu, “Click traffic analysis of short URL spam on Twitter”.

      [23] J. R. Quinlan. Data mining tools See5 and C5.0, accessed on Jun. 10, 2017.[Online]. Available: http://www.rulequest.com/see5-info.html

      [24] Abdullah Talha Kabakus , Resul Kara,”A Survey of Spam Detection Methods on Twitte”.

      [25] K. Hechenbichler and K. Schliep, “Weighted K-nearest-neighbor techniques and ordinal classification,” LudwigsMaximilians Univ. Munich, Munich, Germany, Discussion Paper 399, SFB 386, 2004, p. 16


 

View

Download

Article ID: 9268
 
DOI: 10.14419/ijet.v7i1.3.9268




Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.