This study explores the most important socio-economic variables determining the voting decisions of the provinces in Municipality Elections by using classification trees. We collected data on many potential variables that may affect voting decisions in favor of a political party. Each province’s economic, geographic and demographic data is taken into consideration as independent variables. The dependent variable is the winner party in 2014 Municipality Elections. Data set consists of 81 provinces’ data on 69 variables. The aim of the study is to find which variables affect voting decision the most and try to find a pattern that may lead political campaigns. Amongst many classification algorithms, we used C5.0 algorithm coded in R. It helps us explore the structure of a set of data, while developing easy to visualize decision rules for predicting a categorical (classification tree) or continuous (regression tree) outcome. The C5.0 algorithm determines the separation criterion with the greatest information gain in each decision node and performs optimal separation.
Since our data size is small, we used k=1000 trials (estimations) and then summarized them to provide more robust results. By choosing C5.0 algorithm’s sub-trial size as 5, 5000 trees are formed and the mean of all importance scores of all trees formed are calculated and interpreted. The most important independent variables discriminating the voting decision are found to be the result of the previous elections, mean household population, proportion of population between ages 15 and 19, electricity consumption per person, and proportion of population between ages 55 and 64.
Keywords: classification trees, voting decision, C5.0 algorithm, decision trees