Clustering web users for reductions the internet traffic load and users access cost based on K-means algorithm


  • Maged Nasser
  • Naomie Salim
  • Hentabli Hamza
  • Faisal Saeed



The continuous growth in the size and use of the Internet is increasing the difficulties in searching for information. Reductions on the Inter-net traffic load and user access cost is therefore particular important. Clustering is an important part of web mining that involves finding natural groupings of web resources or web users. Researchers have pointed out some important differences between clustering in conven-tional applications and clustering in web mining. Web clustering as an important web usage mining (WUM) task groups web users based on their browsing patterns to ensure the provision of a useful knowledge of personalized web services. Based on the web structure, each Uniform Resource Locator (URL) in the web log data is parsed into tokens which are uniquely identified for URLs classification. The col-lective sequence of URLs a user navigated over a period of 30 minutes is considered as a session and the session is a representation of the users’ navigation pattern. This paper proposes a variation of the K-means clustering algorithm based on properties of rough sets. The pro-posed algorithm represents the clustering of the web users based on their browsing activities or patterns on the web. Specifically, a user may visit a website often and spends much time on each visit. users with similar browsing activities are clustered or grouped in to clusters. The paper also describes the design of an experiment including data collection and the clustering process.


[1] Kettani, O., F. Ramdani and B. Tadili. AK-means: an automatic clustering algorithm based on K-means. Journal of Advanced Computer Science & Technology. 2015. 4(2): 231.

[2] Manukonda, S. R. and N. Divya. Efficient document clustering for web search result. International Journal of Engineering and Technology (UAE). 2018. 7(3): 90-92.

[3] Sabitha, V. and D. S.K. Srivatsa. An Efficient Modified K-Means and Artificial Bee Colony Algorithm for Mining Search Result from Web Database. International Journal of Engineering & Technology. 2018. 7(2.20)5.

[4] Silverstone, R. Introduction. Media, technology and everyday life in Europe. Routledge. 19-36; 2017.

[5] Satish Babu, J., T. Ravi Kumar and D. Shahana Bano. Optimizing webpage relevancy using page ranking and content based ranking. 2018. 2018. Seven (2.7) five.

[6] Narayan Jadhav, J. and B. Arunkumar. Web Page Recommendation System Using Laplace Correction Dependent Probability and Chronological Dragonfly-Based Clustering. 2018. 2018. Seven (3.27): 13.

[7] Catledge, L. D. and J. E. Pitkow. Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN systems. 1995. 27(6): 1065-1073.

[8] Shahabi, C., A. M. Zarkesh, J. Adibi and V. Shah. Knowledge discovery from user’s web-page navigation. Research Issues in Data Engineering, 1997. Proceedings. Seventh International Workshop on: IEEE. 1997. 20-29.

[9] Yan, T. W., M. Jacobsen, H. Garcia-Molina and U. Dayal. From user access patterns to dynamic hypertext linking. Computer Networks and ISDN Systems. 1996. 28(7): 1007-1014.

[10] Cunha, C. R. and C. E. Jaccoud. Determining www user's next access and its application to pre-fetching. Computers and Communications, 1997. Proceedings. Second IEEE Symposium on: IEEE. 1997. 6-11.

[11] Cao, P. and S. Irani. Cost-Aware WWW Proxy Caching Algorithms. Usenix symposium on internet technologies and systems. 1997. 193-206.

[12] Cao, P., J. Zhang and K. Beach. Active cache: Caching dynamic contents on the web. Distributed Systems Engineering. 1999. 6(1): 43.

[13] Cooley, R., B. Mobasher and J. Srivastava. Data preparation for mining World Wide Web browsing patterns. Knowledge and information systems. 1999. 1(1): 5-32.

[14] Fu, Y., K. Sandhu and M.-Y. Shih. Clustering of web users based on access patterns. Proceedings of the 1999 KDD Workshop on Web Mining: San Diego, CA. Springer-Verlag. 1999.

[15] Su, Q. and L. Chen. A method for discovering clusters of e-commerce interest patterns using click-stream data. Electronic commerce research and applications. 2015. 14(1): 1-13.

[16] Yuvaraj, K. and D. Manjula. A performance analysis of clustering based algorithms for the microarray gene expression data. International Journal of Engineering and Technology (UAE). 2018. 7(2): 201-205.

[17] Aparajita, A., S. Swagatika and D. Singh. Comparative analysis of clustering techniques in cloud for effective load balancing. International Journal of Engineering and Technology (UAE). 2018. 7(3): 47-51.

[18] Patil, H. and R. Singh Thakur. A semantic approach for text document clustering using frequent itemsets and WordNet. 2018. 2018.7 (2.9) four.

[19] Srivastava, J., R. Cooley, M. Deshpande and P.-N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. ACM SIGKDD Explorations Newsletter. 2000. 1(2): 12-23.

[20] Mobasher, B., H. Dai, T. Luo and M. Nakagawa. Effective personalization based on association rule discovery from web usage data. Proceedings of the third international workshop on Web information and data management: ACM. 2001. 9-15.

[21] Yang, Q., H. H. Zhang and T. Li. Mining web logs for prediction models in WWW caching and prefetching. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining: ACM. 2001. 473-478.

[22] Li, I. T. Y., Q. Yang and K. Wang. Classification Pruning for Web-request Prediction. WWW Posters. 2001.

[23] Mobasher, B., R. Cooley and J. Srivastava. Creating adaptive web sites through usage-based clustering of URLs. Knowledge and Data Engineering Exchange, 1999. (KDEX'99) Proceedings. 1999 Workshop on: IEEE. 1999. 19-25.

[24] Pallis, G., L. Angelis and A. Vakali. Model-based cluster analysis for web user is sessions. Foundations of Intelligent Systems. Springer. 219-227; 2005

[25] Xiao, J. and Y. Zhang. Clustering of web users using session-based similarity measures. Computer Networks and Mobile Computing, 2001. Proceedings. 2001 International Conference on: IEEE. 2001. 223-228.

[26] Xu, J. and H. Liu. Web user clustering analysis based on KMeans algorithm. Information Networking and Automation (ICINA), 2010 International Conference on: IEEE. 2010. V2-6-V2-9.

[27] Chitraa, V. and A. S. Thanamani. An Enhanced Clustering Technique for Web Usage Mining. International Journal of Engineering Research & Technology (IJERT) Vol. 2012. 1.

[28] Poornalatha, G. and P. S. Raghavendra. Web user session clustering using modified K-means algorithm. Advances in Computing and Communications. Springer. 243-252; 2011

[29] Duraiswamy, K. and V. V. Mayil. Similarity matrix based session clustering by sequence alignment using dynamic programming. Computer and Information Science. 2008. 1(3): 66.

[30] Xiao, J., Y. Zhang, X. Jia and T. Li. Measuring similarity of interests for clustering web-users. Proceedings of the 12th Australasian database conference: IEEE Computer Society. 2001. 107-114.

[31] Sastry, J. K. R., N. Sreenidhi and K. Sasidhar. Quantifying quality of WEB site based on usability. International Journal of Engineering and Technology (UAE). 2018. 7(2.7 Special Issue 7): 320-322.

[32] Romano, S. and H. ElAarag. A neural network proxy cache replacement strategy and its implementation in the Squid proxy server. Neural computing and Applications. 2011. 20(1): 59-78.

[33] Jadhav, J. N. and B. Arunkumar. Web page recommendation system using laplace correction dependent probability and Chronological dragonfly-based clustering. International Journal of Engineering and Technology (UAE). 2018. 7(3.27 Special Issue 27): 290-302.

[34] Kaplan, A. M. and M. Haenlein. Users of the world, unite! The challenges and opportunities of Social Media. Business horizons. 2010. 53(1): 59-68.

[35] Consortium, W. W. W. RDF 1.1 concepts and abstract syntax. 2014.

[36] NLANR, M. B., National Laboratory for Applied Network Research. 2006.

[37] Abhari, A., S. P. Dandamudi and S. Majumdar. Web object-based storage management in proxy caches. Future Generation Computer Systems. 2006. 22(1-2): 16-31.

[38] Jain, A. K. Data clustering 50 years beyond K-means. Pattern recognition letters. 2010. 31(8): 651-666.

[39] Lingras, P. and C. West. Interval set cl.ustering of web users with rough k-means. Journal of Intelligent Information Systems. 2004. 23(1): 5-16.

[40] Singh, V. K., N. Tiwari and S. Garg. Document clustering using k-means, heuristic k-means and fuzzy c-means. Computational Intelligence and Communication Networks (CICN), 2011 International Conference on: IEEE. 2011. 297-301.

View Full Article: