Quasi-optimality under pseudo f statistic in clustering data

  • Authors

    • Teruhisa Hochin
    • Yoshihiro Hayashi
    • Hiroki Nomiya
    • Morshed U. Chowdhury
    2018-05-16
    https://doi.org/10.14419/ijet.v7i2.28.13205
  • Clustering, Difference, Pseudo F Statistic, Quasi-Optimum, Relative Difference.
  • Pseudo F statistic is often used in deciding the number of clusters. A set of clusters having the largest pseudo F value is selected as the op-timum set of clusters. This paper proposes the quasi-optimum set of clusters, whose pseudo F value is larger than those of other sets of clusters, whose numbers are around the number of clusters in the quasi-optimum set. The before and behind (BB) difference of pseudo F values is proposed to find the number of clusters in the quasi-optimum set. The relative BB difference of pseudo F values, which is the ratio of the BB difference of pseudo F values to the pseudo F value itself, is also proposed to find it when the pseudo F value severely varies. This paper shows some examples to demonstrate that the BB differences of pseudo F values and the relative ones work well in finding qua-si-optimum sets of clusters.

     

  • References

    1. [1] Wikimedia Foundation, “Wikipedia,†https://en.wikipedia. org/.

      [2] Yahoo Group, “Yahoo! Answers,†https://answers.yahoo. com/.

      [3] S. Sagiroglu and D. Sinanc, “Big Data: A Review,†International Conference on Collaboration Technologies and Systems, (2013), pp. 42-47.

      [4] D. Agrawal, S. Das, and A. E. Abbadi, “Big Data and Cloud Computing: Current State and Future Opportunities,†Proceedins of the 14th International Conference on Extending Database Technology (EDBT/ICDT '11), (2011), pp. 530-533.

      [5] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of Things (IoT): A vision, architectural elements, and future directions,†Future Generation Computer Systems, Vol. 29, No. 7, (2013), pp. 1645-1660.

      [6] I. Lee and K. Lee, “The Internet of Things (IoT): Applications, investments, and challenges for enterprises,†Business Horizons, Vol. 58, No. 4, (2015), pp. 431-440.

      [7] W. He and L. Xu, “A state-of-the-art survey of cloud manufacturing,†International Journal of Computer Integrated Manufacturing, Vol. 28, No. 3, (2015), pp. 239-250.

      [8] S. Marsland, Machine Learning, Chapman & Hall/CRC, (2015).

      [9] N. Zumel and J. Mount, Practical Data Science with R, MANNING, (2014).

      [10] D. Pelleg, “X-means: Extending K-means with Efficient Estimation of the Number of Clusters,†Proceedings of the 17th International Conference on Machine Learning (ICML '00), (2000), pp. 727-734.

      [11] U. Maulik and S. Bandyopadhyay, “Performance evaluation of some clustering algorithms and validity indices,†IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 12, (2002), pp. 1650-1654.

      [12] [12] L. Wilkinson, L. Engelman, J. Corter, and M. Coward, “Cluster Analysis,†http://cda.psych.uiuc.edu/multivariate_fall_2012/systat_ cluster_ manual.pdf (Accessed on Dec. 22, 2017).

      [13] T. Calinski, and J. Harabasz, “A dendrite method for cluster analysis,†Communications in Statistics, vol. 3, (1074), pp. 1-27.

      [14] The Data and Story Library, http://lib.stat.cmu.edu/DASL/ Datafiles/Protein.html.

      [15] Dept. of Electronics, Information and Bioengineering, Polytechnic University of Milan, “Fuzzy C-Means Clustering,†https://home.deib.polimi.it/matteucc/Clustering/tutorial_html/ cmeans.html (Accessed on Dec. 22, 2017).

      [16] J. C. Dunn, “A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters,†Journal of Cybernetics, Vol. 3, (1973), pp. 32-57.

      [17] J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algoritms,†Plenum Press, (1981).

  • Downloads

  • How to Cite

    Hochin, T., Hayashi, Y., Nomiya, H., & U. Chowdhury, M. (2018). Quasi-optimality under pseudo f statistic in clustering data. International Journal of Engineering & Technology, 7(2.28), 320-324. https://doi.org/10.14419/ijet.v7i2.28.13205