A Study on the Clustering of Bidding Company Trends Using Machine Learning Based G2B Data


  • Min Sun Kim
  • Eun Soo Choi
  • Min Soo Kang






Bidding, Clustering, G2B Data, K-Means Clustering, Unsupervised Learning


KONEPS is the National Comprehensive Electronic Procurement System of the Public Procurement Service. If KONEPS can know the bidding possibility and trend before bidding, it will be more efficient for companies to bid. In this paper, we used in the experiment was the data of "Progress Bidding Classification" of the Procurement Information Open Portal. And preprocessing process was performed to facilitate prediction model learning. Prior to learning, preprocessed 1,158 data sets were normalized to match the range of data or to make the distribution similar. After normalization we select the number of cluster. As a result of K-Means Clustering, Biddropping is 77 ~ 80%, Budget Allocated is about 2 billion Won(â‚©), Biddropping is 83 ~ 87%, Budget Allocated is about 1 billion won, bid dropping is 87 ~ 90% Budget Allocated is distributed around 500 million won. And can be confirmed that the cluster is divided based on the number of enterprise 58. Through the results, it is possible to study the tendering trends through the community by learning the prediction models of the bidder companies, the number of bidders, and the tendency of the bidding business, and it will help KONEPS to develop the next generation ISP.



[1] http://www.g2b.go.kr/

[2] http://www.zdnet.co.kr/news/news_view.asp?artice_id=20180329092725&lo=zv41

[3] James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013.

[4] MinSoo Kang, EunSoo Choi, “Start Machine Learning with MicroSoft AZURE ML â€, Hanti Media, 2018.03

[5] Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100-108.

[6] http://data.g2b.go.kr:8275/pt/main/index.do

[7] Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.

[8] Team, R. Core. "R language definition." Vienna, Austria: R foundation for statistical computing (2000).

[9] Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2012). NbClust Package: finding the relevant number of clusters in a dataset. UseR! 2012.

[10] Hubert LJ, Arabie P (1985). Comparing Partitions." Journal of Classi_cation, 2(1), 193-218

[11] Lebart L, Morineau A, Piron M (2000). Statistique Exploratoire Multidimensionnelle. Dunod, Paris.

View Full Article: