Analysis of the K-Means Algorithm for Clustering School Participation Rates in Central Java


One indication of the development of educational services in Indonesia is the School Enrollment Rate (SER). Higher the rate of enrolment, the better a location offers access to training. The dataset source was collected from the Central Java Statistical Agency website. The analysis object is the percentage of SERs for ages 7-12 years, 13-15 years, and 16-18 years in the Central Java region during 2017-2019. In the Central Java province, the aim of which is the third largest province after West Java and East Java, was to analyze the level of school participation as mapped. The created research product is a mapping of locations in the District and City areas in the form of clusters. The solution is the clustering algorithm k-means. In this study, there were two groups: high (C1) and low. The clusters were separated into (C2). Cluster-mapping studies results for the years 7-12 were, that in a high cluster, 24 provinces (cluster 0) and 11 provinces (cluster 1) were in a lower cluster, whereas the 13-15-year-old cluster mapping results from 23 provinces (cluster 0) and 12 provinces (cluster 1) and the 16-18-year-old cluster mapping results from 15 provinces. Final centroid value is the basis for the determination of the clusters where the final centroid value for a cluster aged 7-12 years were high (cluster 0) {99.81, 99.87, 99.75} and low (cluster 1) {99.73, 99.43, 99.25}, whereas the final centroid value of a cluster aged 13-15 years was high (cluster 0). For all age categories, the mapping findings reveal a good proportion, that is, over 50% in the top class. In particular, 24 provinces (57%) were in the low cluster of the 16-18-year age group. Research results information can provide a macro-image of the level of SER development in recent years.

Keywords: K-Means, algorithm, clustering

