Analysis of the K-Means Algorithm for Clustering School Participation Rates in Central Java

Abstract

One indication of the development of educational services in Indonesia is the School Enrollment Rate (SER). Higher the rate of enrolment, the better a location offers access to training. The dataset source was collected from the Central Java Statistical Agency website. The analysis object is the percentage of SERs for ages 7-12 years, 13-15 years, and 16-18 years in the Central Java region during 2017-2019. In the Central Java province, the aim of which is the third largest province after West Java and East Java, was to analyze the level of school participation as mapped. The created research product is a mapping of locations in the District and City areas in the form of clusters. The solution is the clustering algorithm k-means. In this study, there were two groups: high (C1) and low. The clusters were separated into (C2). Cluster-mapping studies results for the years 7-12 were, that in a high cluster, 24 provinces (cluster 0) and 11 provinces (cluster 1) were in a lower cluster, whereas the 13-15-year-old cluster mapping results from 23 provinces (cluster 0) and 12 provinces (cluster 1) and the 16-18-year-old cluster mapping results from 15 provinces. Final centroid value is the basis for the determination of the clusters where the final centroid value for a cluster aged 7-12 years were high (cluster 0) {99.81, 99.87, 99.75} and low (cluster 1) {99.73, 99.43, 99.25}, whereas the final centroid value of a cluster aged 13-15 years was high (cluster 0). For all age categories, the mapping findings reveal a good proportion, that is, over 50% in the top class. In particular, 24 provinces (57%) were in the low cluster of the 16-18-year age group. Research results information can provide a macro-image of the level of SER development in recent years.


Keywords: K-Means, algorithm, clustering

References
[1] Bednar DJ, Reames TG. Recognition of and response to energy poverty in the United States. Nat Energy. 2020;5(6):432–439.

[2] Sandra H, Majid SA, Dawood TC, Hamid A. What causes children to work in Indonesia? J Asian Fin Econ Bus. 2020;7(11):585–593.

[3] Damanik IS, Windarto AP, Wanto A, Poningsih SR, Andani SR, Andani SW. “Decision tree optimization in C4.5 algorithm using genetic algorithm.” J Phys Conf Ser. 2019 Aug;1255(1):1–6.

[4] Katrina W, Damanik HJ, Parhusip F, Hartama D, Windarto AP, Wanto A. C.45 classification rules model for determining students level of understanding of the subject. J Phys Conf Ser. 2019;1255(1):1–7.

[5] Siahaan H, Mawengkang H, Efendi S, Wanto A, Perdana Windarto A. Application of classification method C4.5 on selection of exemplary teachers. J Phys Conf Ser. 2019;1235(1):012005.

[6] Parlina I, Yusuf Arnol M, Febriati NA, Dewi R, Wanto A, Lubis MR, et al. Naive Bayes algorithm analysis to determine the percentage level of visitors the most dominant zoo visit by age category. J Phys Conf Ser. 2019;1255(1):1–5.

[7] Hartama D, Perdana Windarto A, Wanto A. The application of data mining in determining patterns of interest of high school graduates. J Phys Conf Ser. 2019;1339(1):1–6.

[8] Hanafiah MA, Wanto A, Indonesia PB. Implementation of data mining algorithms for grouping poverty lines by district/city in North Sumatra. Int J Inf Sys Technol. 2020;3(2):315–322.

[9] Febriyati NA, GS AD, Wanto A. “GRDP growth rate clustering in Surabaya City uses the K- Means Algorithm.” Int J Inf Sys Technol. 2020;3(2):276–283.

[10] Sudirman S, Windarto AP, Wanto A. Data mining tools | RapidMiner : K-Means method on clustering of rice crops by province as efforts to stabilize food crops in Indonesia. IOP Conf Series Mater Sci Eng. 2018;420(012089):1–8.

[11] Abbas SA, Aslam A, Rehman AU, Abbasi WA, Arif S, Kazmi SZ. K-Means and KMedoids: Cluster analysis on birth data collected in City Muzaffarabad, Kashmir. IEEE Access. 2020;8:151847–151855.

[12] Hutagalung J, Ginantra NL, Bhawika GW, Parwita WG, Wanto A, Panjaitan PD. COVID- 19 cases and deaths in Southeast Asia clustering using K-Means algorithm. J Phys Conf Ser. 2021;1783(1):012027.

[13] BPS. “Angka Partisipasi Sekolah (APS) (Persen), 2017-2019,” Badan Pusat Statistik Provinsi Jawa Tengah, 2020. [Online]. Available: https://jateng.bps.go.id/indicator/28/71/1/angka-partisipasi-sekolah-aps-.html. [Accessed: 25-Oct-2020].

[14] Supriyadi B, Windarto AP, Soemartono T, Mungad. Classification of natural disaster prone areas in Indonesia using K-Means. Int J Grid Distrib Comput. 2018;11(8):87–98.

[15] Ahmar AS, Napitupulu D, Rahim R, Hidayat R, Sonatha Y, Azmi M. Using K-Means clustering to cluster provinces in Indonesia. J Phys Conf Ser. 2018;1028(1):1–6.

[16] Rahayu K, Novianti L, Kusnandar M. Implementation data mining with K-Means algorithm for clustering distribution rabies case area in Palembang City. J Phys Conf Ser. 2020;1500(1):1–9.