Comparison Approaches of the Fuzzy C-Means and Gaussian Mixture Model in Clustering the Welfare of the Indonesian People

Abstract

We compared the previous study about clustering the welfare of the Indonesian people using the Fuzzy C-Means (FCM) approach to a recent study, the Gaussian mixture model (GMM). Both of which were soft clustering. The case analyzes by classifying 34 provincial data in Indonesia, based on eight welfare of people indicator variables in 2017, which the Central Statistics Agency had issued. We compared the FCM and the GMM approaches to determine a better level of accuracy in clustering data using the Silhouette index, the Davies-Bouldin index, and the Calinski-Harabasz index values as a validity test method. The FCM and GMM methods found that the optimal clusters were 2 and 6. When we observed the consistency of the three tests’ validity results, the GMM method was preferable to the FCM clustering method.


Keywords: fuzzy, Gaussian mixture model, clustering

References
[1] Kustian N, Julaeha S, Parulian D, Selvia N, Ambarsari EW. Venn versus relation diagram models for database relation in SQL command line. J Phys Conf Ser. 2021 Feb;1783(1):012050.

[2] Yang XD, Tan HW, Zhu WM. SpinachDB: A well-characterized genomic database for gene family classification and SNP information of spinach. PLoS One. 2016 May;11(5):e0152706.

[3] Dimitriadis SI, Messaritaki E, Jones DK. The impact of graph construction scheme and community detection algorithm on the repeatability of community and hub identification in structural brain networks. Hum Brain Mapp. 2021 Sep;42(13):4261– 4280.

[4] Dwitiyanti N, Selvia N, Andrari FR. Penerapan Fuzzy C-Means Cluster dalam Pengelompokkan Provinsi Indonesia Menurut Indikator Kesejahteraan Rakyat. Fakt Exacta. 2019;12(3):201–209.

[5] Rajkumar KV, Yesubabu A, Subrahmanyam K. Fuzzy clustering and Fuzzy C-Means partition cluster analysis and validation studies on a subset of CiteScore dataset. Iran J Electr Comput Eng. 2019;9(4):2760–2770.

[6] Lin X, Yang X, Li Y. A Deep clustering algorithm based on Gaussian Mixture Model. J Phys Conf Ser. 2019;1302(3):032012.

[7] Baid U, Talbar S, Talbar S. “Comparative study of K-means, Gaussian Mixture Model, Fuzzy C-means algorithms for Brain Tumor Segmentation.” In Proceedings of the International Conference on Communication and Signal Processing 2016 (ICCASP 2016), 2017;137:592–597.

[8] Rashid S, Ahmed A, Al Barazanchi I, Jaaz ZA. Clustering algorithms subjected to K-mean and gaussian mixture model on multidimensional data set. Period Eng Nat Sci. 2019;7(2):448–457.

[9] Alasadi SA, Bhaya WS. Review of data preprocessing techniques.pdf. J Eng Applie Sci. 2017;12(16):4102–4107.

[10] Patel E, Kushwaha DS. “Clustering cloud workloads: K-Means vs Gaussian Mixture Model.” Procedia Comput Sci. 2019;171:158–167. https://doi.org/10.1016/j.procs.2020.04.017

[11] Khairati AF, Adlina AA, Hertono GF, Handari BD. “Kajian Indeks Validitas pada Algoritma K-Means Enhanced dan K-Means MMCA,” in Prosiding Seminar Nasional Matematika, 2019, vol. 2, pp. 161–170.

[12] Subbalakshmi C, Krishna GR, Rao SK, Rao PV. A Method to Find Optimum Number of Clusters Based on Fuzzy Silhouette on Dynamic Data Set. Procedia Comput Sci. 2015;46:346–53.

[13] T. HENDRAWAN NATA UTAMA. Analisis Performa Algoritma FUZZY C-MEANS dan K-MEANS Clustering untuk Pengelompokan Pelanggan pada PT. PART STATION JEMBER. Universitas Muhammadiyah Jember; 2019.

[14] Davies DL, Bouldin DW. A cluster separation measure. IEEE Trans Pattern Anal Mach Intell. 1979 Feb;1(2):224–227.

[15] Baarsch J, Celebi ME. Investigation of internal validity measures for K-means clustering. Proc Int Multiconf Comp Sci Inf Technol. 2012;1:471–476.

[16] Sari HL, Suranti D. “Perbandingan Algoritma Fuzzy C-Means (FCM) dan Algoritma Mixture Dalam Penclusteran Data Curah Hujan Kota Bengkulu.” In Proceeding SNATI 2016. 2016. pp. 7–15.