Predicting preeclampsia and related risk factors using data mining approaches: A cross-sectional study


Background: Preeclampsia is a type of pregnancy hypertension disorder that has adverse effects on both the mother and the fetus. Despite recent advances in the etiology of preeclampsia, no adequate clinical screening tests have been identified to diagnose the disorder.

Objective: We aimed to provide a model based on data mining approaches that can be used as a screening tool to identify patients with this syndrome and also to identify the risk factors associated with it.

Materials and Methods: The data used to perform this cross-sectional study were extracted from the clinical records of 726 mothers with preeclampsia and 726 mothers without preeclampsia who were referred to Fatemieh Hospital in Hamadan City during April 2005–March 2015. In this study, six data mining methods were adopted, including logistic regression, k-nearest neighborhood, C5.0 decision tree, discriminant analysis, random forest, and support vector machine, and their performance was compared using the criteria of accuracy, sensitivity, and specificity.

Results: Underlying condition, age, pregnancy season and the number of pregnancies were the most important risk factors for diagnosing preeclampsia. The accuracy of the models were as follows: logistic regression (0.713), k-nearest neighborhood (0.742), C5.0 decision tree (0.788), discriminant analysis (0.687), random forest (0.758) and support vector machine (0.791).

Conclusion: Among the data mining methods employed in this study, support vector machine was the most accurate in predicting preeclampsia. Therefore, this model can be considered as a screening tool to diagnose this disorder.

Key words: Preeclampsia, Random forest, C5.0 decision tree, Support vector machine, Logistic regression.

[1] Beltran AJ, Wu J, Laurent O. Associations of meteorology with adverse pregnancy outcomes: A systematic review of preeclampsia, preterm birth and birth weight. Int J Environ Res Public Health 2014; 11: 91–172.

[2] Green P. Update in the diagnosis and management of hypertensive disorders in pregnancy. Michigan: Wayne State University School of Medicine; 2014.

[3] Sibai B, Dekker G, Kupferminc M. Pre-eclampsia. Lancet 2005; 365: 785–799.

[4] Mol BWJ, Roberts CT, Thangaratinam Sh, Magee LA, de Groot ChJ, Hofmeyr GJ. Pre-eclampsia. Lancet 2016; 387: 999–1011.

[5] Ananth CV, Keyes KM, Wapner RJ. Pre-eclampsia rates in the United States, 1980-2010: Age-period-cohort analysis. BMJ 2013; 347: f6564.

[6] Saleem S, McClure EM, Goudar ShS, Patel A, Esamai F, Garces A, et al. A prospective study of maternal, fetal and neonatal deaths in low- and middle-income countries. Bull World Health Organ 2014; 92: 605–612.

[7] Shahgheibi Sh, Rezaie M, Kamangar TM, Zarea Sh, Yousefi SR. The effect of season on the prevalence of preeclampsia. J Clin Gynecol Obstet 2016; 5: 81–84.

[8] Kharaghani R, Cheraghi Z, Okhovat Esfahani B, Mohammadian Z, Nooreldinc RS. Prevalence of preeclampsia and eclampsia in Iran. Arch Iran Med 2016; 19: 64–71.

[9] Omani-Samani R, Ranjbaran M, Amini P, Esmailzadeh A, Sepidarkish M, Almasi-Hashiani A. Adverse maternal and neonatal outcomes in women with preeclampsia in Iran. J Matern Fetal Neonatal Med 2019; 32: 212–216.

[10] Guleria P, Sood M. Data mining in education: A review on the knowledge discovery perspective. Int J Data Min Knowledge Manage Proc 2014; 4: 47–60.

[11] Frizzell JD, Liang L, Schulte PJ, Yancy CW, Heidenreich PA, Hernandez AF, et al. Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: Comparison of machine learning and other statistical approaches. JAMA Cardiol 2017; 2: 204–209.

[12] Nalluri S, Saraswathi RV, Ramasubbareddy S, Govinda K, Swetha E. Chronic heart disease prediction using data mining techniques. In: Raju KS, Senkerik R, Lapka SP, Rajagopal V. Data engineering and communication technology. New York: Springer; 2020. 903–912.

[13] Zhang Y, Xin Y, Li Q, Ma J, Li Sh, Lv X, et al. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications. Biomed Eng Online 2017; 16: 125.

[14] Ricciardi C, Valente AS, Edmund K, Cantoni V, Green R, Fiorillo A, et al. Linear discriminant analysis and principal component analysis to predict coronary artery disease. Health Inform J 2020; 26: 2181–2192.

[15] Peng H, Liang D, Choi C. Evaluating parallel logistic regression models. 2013 IEEE International Conference on Big Data. USA: IEEE; 2013 October 6-9. 119–126.

[16] Manoochehri Z, Rezaei M, Salari N, Khazaie H, Khaledi Paveh B, Manoochehri S. The prediction of obstructive sleep apnea using data mining approaches. Arch Iran Med 2018; 21: 460–465.

[17] Cover TM, Hart PE. Nearest neighbor pattern classification. IEEE Transact Inform Theory 1967; 13: 21–27.

[18] Izenman AJ. Modern multivariate statistical techniques: Regression, classification and manifold learning. Switzerland: Springer; 2013.

[19] Shaikhina T, Lowe D, Daga S, Briggs D, Higgins R, Khovanova N. Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation. Biomed Signal Proc Control 2019; 52: 456–462.

[20] Quinlan JR. Induction of decision trees. Machine Learn 1986; 1: 81–106.

[21] RuleQuest RP. Is See5/C5.0 better than C4.5. St Ive, Australia. 2009.

[22] Breiman L. Random forests. Machine Learn 2001; 45: 5– 32.

[23] Tamura H, Tanno K. Midpoint validation method for support vector machines with margin adjustment technique. Int J Innovat Comput Inform Control 2009; 5: 4025–4032.

[24] Manoochehri Z, Salari N, Rezaei M, Khazaie H, Manoochehri S, Khaledi Pavah B. Comparison of support vector machine based on genetic algorithm with logistic regression to diagnose obstructive sleep apnea. J Res Med Sci 2018; 23: 65.

[25] English FA, Kenny LC, McCarthy FP. Risk factors and effective management of preeclampsia. Integr Blood Press Control 2015; 8: 7–12.

[26] Elmugabil A, Rayis DA, Ahmed MA, Adam I, Gasim GI. O blood group as risk factor for preeclampsia among Sudanese women. Open Access Maced J Med Sci 2016; 4: 603–606.

[27] Rezende KBdC, Cunha AJLAd, Pritsivelis C, Faleiro EC, Amim Junior J, Bornia RG. How do maternal factors impact preeclampsia prediction in Brazilian population? J Matern Fetal Neonatal Med 2019; 32: 1051–1056.

[28] Farzaneh F, Tavakolikia Z, Soleimanzadeh Mousavi SH. Assessment of occurrence of preeclampsia and some clinical and demographic risk factors in Zahedan city in 2017. Clin Exp Hypertens 2019; 41: 583–588.

[29] Asfaw TA. Prediction of diabetes mellitus using machine learning techniques. Int J Comput Engin Technol 2019; 10: 25–32.

[30] Basu A, Roy R, Savitha N. Performance analysis of regression and classification models in the prediction of breast cancer. Indian J Sci Technol 2018; 11: 1–6.

[31] Maroco J, Silva D, Rodrigues A, Guerreiro M, Santana I, de Mendonça A. Data mining methods in the prediction of dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC Res Notes 2011; 4: 299.

[32] Gbenga DE, Christopher N, Yetunde DC. Performance comparison of machine learning techniques for breast cancer detection. Nova J Eng Appl Sci 2017; 6: 1–8.