Clustering Algorithms for Queries: A Comparative Analysis of ‎Farmer Call Center Data

  • Authors

    • C. Kiruthiga Research Scholar, Vels Institute of Science Technology & Advanced Studies (VISTAS), Assistant Professor, PG Department of Information ‎Technology and BCA, Dwaraka Doss Goverdhan Doss Vaishnav College, Affiliated to the University of Madras, Chennai. Chennai, India
    • Dr. K. Dharmarajan Professor, Department of Information Technology, Vels Institute of Science Technology & amp; Advanced Studies (VISTAS), Chennai, India
    https://doi.org/10.14419/7cvtfq26

    Received date: July 15, 2025

    Accepted date: July 24, 2025

    Published date: November 1, 2025

  • Agglomerative Clustering; Calinski-Harabasz Index; DBSCAN; HDBSCAN; GloVe; K-meansSBERT; Silhouette Score; TF-IDF; Word2Vec
  • Abstract

    Extracting insights from queries and feedback helps identify trends, enhance products and services, personalize customer interactions, and ‎craft effective marketing strategies. Data clustering, a powerful method, organizes unstructured data and refines queries by offering ‎suggestions based on similar or related inputs, ultimately enhancing the search experience. This study compares the performance of several ‎clustering algorithms, including Agglomerative Clustering, K-Means (KM), Hierarchical Density-Based Spatial Clustering of Applications ‎with Noise (HDBSCAN), and Density-Based Spatial Clustering of Applications with Noise (DBSCAN), as well as various embeddings, ‎such as Term Frequency-Inverse Document Frequency (TF-IDF)—Sentence-Bidirectional Encoder Representations from Transformers ‎‎(SBERT), Word2Vec, and GloVe. The Calinski-Harabasz Index, Davies-Bouldin Index, and Silhouette Score are used to measure the ‎effectiveness of these algorithms. Results indicated that HDBSCAN outperformed other clustering algorithms within the farmer helpline ‎dataset. The conclusions were derived from the medium-level performance of clustering algorithms. The findings showed that HDBSCAN, ‎combined with different embeddings, achieved a Silhouette Score of 0.85, a Davies-Bouldin Index of 0.66, and a Calinski-Harabasz Index ‎of 4,239.9‎.

  • References

    1. M. Ma, M. Liang, and Y. Ji, "Comparison and Evaluation of Clustering Algorithms," 2023 International Conference on Computers, Information Processing and Advanced Education (CIPAE), Ottawa, ON, Canada, 2023, pp. 213-219, https://doi.org/10.1109/CIPAE60493.2023.00047.
    2. Arockiam, A. J. M. S., and Elizabeth Shanthi Irudhayaraj. "Reclust: An efficient clustering algorithm for mixed data based on reclustering and clus-ter validation." Indonesia. J. Electr. Eng. Comput. Sci 29.1 (2023): 545-552. https://doi.org/10.11591/ijeecs.v29.i1.pp545-552.
    3. Azhir, Elham, et al. “Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark.” OSF Preprints, 17 Sept. 2022. Web. https://doi.org/10.31219/osf.io/mgpr7.
    4. Zhang, C. (2021). Research on Literature Clustering Algorithm for Massive Scientific and Technical Literature Query Service. Computational Intel-ligence and Neuroscience, 2022(1), 3392489. https://doi.org/10.1155/2022/3392489.
    5. Mehta, V., Bawa, S., & Singh, J. WEClustering: word embeddings-based text clustering technique for large datasets. Complex Intell. Syst. 7, 3211–3224 (2021). https://doi.org/10.1007/s40747-021-00512-9.
    6. D. Pradeep, C. Sundar, QAOC: Novel query analysis and ontology-based Clustering for data management in Hadoop,Future Generation Computer Systems,Volume 108, 2020,Pages 849-860,ISSN 0167-739X, https://doi.org/10.1016/j.future.2020.03.010.
    7. Salih, Niyaz Mohammed, and Karwan Jacksi. "State of the art document clustering algorithms based on semantic similarity." Jurnal Informat-ika 14.2 (2020): 58-75. https://doi.org/10.26555/jifo.v14i2.a17513.
    8. D. Mahapatra, C. Maharana, S. P. Panda, J. P. Mohanty, A. Talib and A. Mangaraj, "A Fuzzy-Cluster based Semantic Information Retrieval Sys-tem," 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 2020, pp. 675-678, https://doi.org/10.1109/ICCMC48092.2020.ICCMC-000125.
    9. Kfir Bernstein, Fiana Raiber, Oren Kurland, and J. Shane Culpepper. 2020. Cluster-Based Document Retrieval with Multiple Queries. In Proceed-ings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval (ICTIR '20). Association for Computing Machinery, New York, NY, USA, 33–40. https://doi.org/10.1145/3409256.3409825
    10. Sikdar S, Mukherjee A, Marsili M. Unsupervised ranking of clustering algorithms by INFOMAX. PLoS One. 2020 Oct 26;15(10):e0239331. PMID: 33104709; PMCID: PMC7588117. https://doi.org/10.1371/journal.pone.0239331.
    11. Rodriguez MZ, Comin CH, Casanova D, Bruno OM, Amancio DR, Costa LDF, Rodrigues FA. Clustering algorithms: A comparative approach. PLoS One. 2019 Jan 15;14(1):e0210236. PMID: 30645617; PMCID: PMC6333366. https://doi.org/10.1371/journal.pone.0210236.
    12. Juhi Srivastava,Prof. Gayatri Pandi,Computer Engineering, L.J. Institute of Engineering and Technology, Gujarat, India. A Time-Efficient Cluster-ing Algorithm for Query Optimization in Distributed Database© 2018 IJCRT | Volume 6, Issue 2 April 2018 | ISSN: 2320-2882
    13. Frédéric Ros, Rabia Riad, Serge Guillaume, "Deep Clustering Framework Review Using Multicriteria Evaluation," Knowledge-Based Systems, Volume 285, 2024, 111315, ISSN 0950-7051. https://doi.org/10.1016/j.knosys.2023.111315.
    14. K. R. Alla and G. Thangarasu, "Robust Text Clustering to Cluster the Text Documents in A Meta-Heuristic Optimization," 2023 IEEE 13th Sym-posium on Computer Applications & Industrial Electronics (ISCAIE), Penang, Malaysia, 2023, pp. 181-185. https://doi.org/10.1109/ISCAIE57739.2023.10165352.
    15. Faruque, O., Nji, F.N., Cham, M., Salvi, R.M., Zheng, X., Wang, J. (2023). Deep Spatiotemporal Clustering: A Temporal Clustering Approach for Multi-dimensional Climate Data. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Sci-ence(), vol 14175. Springer, Cham. https://doi.org/10.1007/978-3-031-43430-3_6.
    16. M. Ma, M. Liang and Y. Ji, "Comparison and Evaluation of Clustering Algorithms," 2023 International Conference on Computers, Information Processing and Advanced Education (CIPAE), Ottawa, ON, Canada, 2023, pp. 213-219, https://doi.org/10.1109/CIPAE60493.2023.00047.
    17. Oyewole, Gbeminiyi John, and George Alex Thopil. "Data clustering: application and trends." Artificial Intelligence Review 56.7 (2023): 6439-6475. https://doi.org/10.1007/s10462-022-10325-y.
    18. Abiodun M. Ikotun, Absalom E. Ezugwu, Laith Abualigah, Belal Abuhaija, Jia Heming, K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data, Information Sciences, Volume 622,2023,Pages 178-210,ISSN 0020-0255, https://doi.org/10.1016/j.ins.2022.11.139.
    19. Kamal Taha,Semi-supervised and un-supervised Clustering: A review and experimental evaluation, Information Systems, Volume 114,2023,102178,ISSN 0306-4379. https://doi.org/10.1016/j.is.2023.102178.
    20. Oyewole, G.J., Thopil, G.A. Data clustering: application and trends. Artif Intell Rev 56, 6439–6475 (2023). https://doi.org/10.1007/s10462-022-10325-y.
    21. Kamal Taha,Semi-supervised and un-supervised Clustering: A review and experimental evaluation, Information Systems, Volume 114,2023. https://doi.org/10.1016/j.is.2023.102178.
    22. Shahid, N. Comparison of hierarchical Clustering and neural network clustering: an analysis on precision dominance. Sci Rep 13, 5661 (2023). https://doi.org/10.1038/s41598-023-32790-3.
    23. Unciano, N. (2025). AI-Augmented Metasurface-Aided THz Communication: A Comprehensive Survey and Future Research Directions. Electron-ics, Communications, and Computing Summit, 3(2), 1–9.
    24. Madhanraj. (2025). Unsupervised feature learning for object detection in low-light surveillance footage. National Journal of Signal and Image Pro-cessing, 1(1), 34–43.
    25. Surendar, A. (2025). Hybrid Renewable Energy Systems for Islanded Microgrids: A Multi-Criteria Optimization Approach. National Journal of Renewable Energy Systems and Innovation, 27-37.
    26. Rahim, R. (2025). Lightweight speaker identification framework using deep embeddings for real-time voice biometrics. National Journal of Speech and Audio Processing, 1(1), 15–21.
    27. Ramchurn, R. (2025). Advancing autonomous vehicle technology: Embedded systems prototyping and validation. SCCTS Journal of Embedded Systems Design and Applications, 2(2), 56–64.
    28. Vardhan, K. V., & Musala, S. (2024). Thermometer Coding-Based Application-Specific Efficient Mod Adder for Residue Number Sys-tems. Journal of VLSI Circuits and Systems, 6(2), 122–129. https://doi.org/10.31838/jvcs/06.02.14.
    29. Rahim, R. (2024). Energy-Efficient Modulation Schemes for Low-Latency Wireless Sensor Networks in Industrial Environments. National Journal of RF Circuits and Wireless Systems, 1(1), 21–27.
    30. Uvarajan, K. P. (2025). Design of a hybrid renewable energy system for rural electrification using power electronics. National Journal of Electrical Electronics and Automation Technologies, 1(1), 24–32.
  • Downloads

  • How to Cite

    Kiruthiga, C., & Dharmarajan, D. K. . (2025). Clustering Algorithms for Queries: A Comparative Analysis of ‎Farmer Call Center Data. International Journal of Basic and Applied Sciences, 14(SI-1), 529-537. https://doi.org/10.14419/7cvtfq26