A hybrid feature selection algorithm to determine effective factors in predictive model of success rate for in vitro fertilization/intracytoplasmic sperm injection treatment: A cross-sectional study

Abstract Background Previous research has identified key factors affecting in vitro fertilization or intracytoplasmic sperm injection success, yet the lack of a standardized approach for various treatments remains a challenge. Objective The objective of this study is to utilize a machine learning approach to identify the principal predictors of success in in vitro fertilization and intracytoplasmic sperm injection treatments. Materials and Methods We collected data from 734 individuals at 2 infertility centers in Mashhad, Iran between November 2016 and March 2017. We employed feature selection methods to reduce dimensionality in a random forest model, guided by hesitant fuzzy sets (HFSs). A hybrid approach enhanced predictor identification and accuracy (ACC), as assessed using machine learning metrics such as Matthew's correlation coefficient, runtime, ACC, area under the receiver operating characteristic curve, precision or positive predictive value, recall, and F-Score, demonstrating the effectiveness of combining feature selection methods. Results Our hybrid feature selection method excelled with the highest ACC (0.795), area under the receiver operating characteristic curve (0.72), and F-Score (0.8), while selecting only 7 features. These included follicle-stimulation hormone (FSH), 16Cells, FAge, oocytes, quality of transferred embryos (GIII), compact, and unsuccessful. Conclusion We introduced HFSs in our novel method to select influential features for predicting infertility success rates. Using a multi-center dataset, HFSs improved feature selection by reducing the number of features based on standard deviation among criteria. Results showed significant differences between pregnant and non-pregnant groups for selected features, including FSH, FAge, 16Cells, oocytes, GIII, and compact. We also found a significant correlation between FAge and fetal heart rate and clinical pregnancy rate, with the highest FSH level (31.87%) observed for doses ranging from 10-13 (mIU/ml).


Introduction
Infertility prompts couples worldwide to seek medical help for successful conception.
Diagnosing its causes and predicting treatment success are essential for guiding interventions and identifying key factors.Several predictive models have been developed using machine learning tools and classification methods to forecast the success rate of infertility treatment (1)(2)(3)(4).Identifying crucial factors for predicting infertility treatment success is essential in clinical practice.This challenge is solved using predictive models for infertility treatment.Many predictive models for the success rate of infertility treatment have been presented so far, which have been modeled using machine learning tools and various classification methods (5)(6)(7)(8).Feature selection is one of the methods of reducing dimensions to improve the model's performance and determine the essential factors.Some studies used statistical methods, such as Chi-square and student t test, to select feature (9), while others used filter-based methods, such as principal component analysis (10).Also, there are studies based on wrapper-based methods, such as forward float selection (11).Another study applied embedded methods such as linear support vector classifier and tree-based for using feature selection process (12).
Furthermore, meta-heuristic algorithms, such as the hill-climbing algorithm (13), were used to select practical features in infertility treatment methods.
Also, many models have been proposed to predict the success of therapy, which is to compare the performance of different prediction models without a feature selection method.The collected features are listed based on the expert domain of models (14,15).A combination of the wrapper, filter, and embedded feature selection methods were used in machine learning.Hesitant fuzzy sets (HFSs) are used to rank methods in determining the similarity between features and output, to improve the performance of the combination (16).
Challenges remain in predictive models for infertility treatment, particularly in feature selection, which is critical for success.Different

Outcome
Clinical pregnancy was defined as a customarily

Methodology
In RFM is a robust ensemble method in infertility treatment that uses decision tree classifiers and majority votes to predict (22).Moreover, we included a detailed pseudo code along with a semi-flowchart to further enhance the clarity and understanding of our proposed method.For more information and a comprehensive visual representation, please refer to the supplementary file titled "Pseudo.docx".

Filter method
Filter methods are split into univariate and multivariate approaches (23).Herein, we focused on the univariate approach to improve feature selection methods.The VT is the statistical test and simple baseline approach in the univariate filter method.In this study, we considered threshold = 0.35, which obtains the highest accuracy (ACC) for VT.Another univariate approach is to select k-Best.The k-Best is a filter-based method that removes features in terms of ANOVA Fvalue between each feature and the target vector.

Embedded method
In these methods, the search for an optimal subset of features is performed during the modeling phase (24).L1-based feature selection (L1-based) is an embedded method that selects features as part of the model construction process.Moreover, tree-based is another popular embedded method that includes a forest of the tree to decide on removing features.A decision tree is a classifier built up using different splitting criteria (25).

Wrapper method
Wrapper-based feature selection methods have

Proposed hybrid method by HFS
Since problem-solving speed is high in filter/embedded methods, the runtime measure for comparing methods is not significant.We also noted that the purpose of using a filter/embedded method in the preprocessing phase is to remove low-significance features and gain a reduced dimension.Therefore, the number of features criterion does not matter at the phase.Therefore, we obtained the best filter-based method according to the 6 evaluation criteria in table I.
For this purpose, we applied an HFS.HFSs were presented as a generalization of simple fuzzy sets (27).HFSs are useful in medical decision-making when the expert hesitates between several values.This theory has been proven, to help enhance discernment in decision-making (28).We supposed that: Where To decide on the best performance for each method, we used the scoring system of HFS provided by Liao and Xu (29) as follows: Where  ℎ and γ  are the number and values of elements in ℎ  (), respectively.In other words, ′(ℎ) is called the deviation degree of ℎ  (), which reflects the standard deviation among all pairs of elements in the HFS.Therefore, we considered the function  (ℎ) in terms of ′(ℎ) as follows: So,  (ℎ) is called the scoring function (SF) of ℎ  ().This function denoted the score of each method x.According to the Helsinki Declaration, we have complied with the subject's ethics.We have obtained informed and free written consent from the participants, to apply their data in all infertility projects.

Statistical analysis
These results were obtained using Python

Results
First, we implemented VT, tree-based, L1-based, and k-Best for IVF/ICSI dataset.
The threshold limit in the variance method is 0.35, which provides the best ACC.The results of the filter-based methods using a random forest algorithm are given in table I.This table also presents influential factors selected after treatment.
We selected the RFM regarding its best results among well-known machine learning models for dataset (30).
The However, regarding the value of AUC, k-Best has a higher value (0.736) than tree-based with an AUC of 0.7 (Table I).
We also used 5 standard wrapper-based methods for feature selection using an RFM on IVF/ICSI dataset (Table II).This step has not used any preprocessing method.We noted that the random selection method obtains the best subset of features in different iterations based on ACC.
Since the feature selection in this method is based on the model, we included it in wrapper methods.
We validated models by k-fold cross-validation with k = 10.
According to the results of table II The k-Best and SFS method showed that this hybrid method increases the model accuracy (0.754-0.795) and simultaneously reduces runtime (from 737 to 89s).This improvement can be seen in other criteria, such as MCC, F-score, PPV, and recall.Also, the number of selected features obtained is n = 7 (Table IV).Furthermore, figure 3 shows ROC curve hybrid wrapper-based methods.

Statistical analysis
This hybrid method aims to reduce the number of features and increase ACC, PPV, recall, AUC, MCC, and F-score values.We used spider plots to compare hybrid methods regarding the principle criteria.It is important to note that for the accurate comparison and considering that the short runtime indicates a better method, we have used the inverse of this value for this measure.Then, according to the result in figure 4, the larger area of the polygon formed by connecting the values of each criterion has a better performance of the model.
In addition, the results obtained in figure 5 show that the hybrid SFS and k-Best method has achieved the best performance compared to the wrapper-based methods.In figure 5, a comparison of the improvement of the criteria presented for the SFS method before and after applying the k-Best and tree-based methods is presented.In this figure, the better method has more area based on the spider plot among the 3 proposed methods.As can be seen, the proposed hybrid method (SFS and k-Best) has a higher performance than the other 2 methods.
Table V shows the features selected to predict the success after treatment IVF/ICSI by each method.We used the get_metric_dict method of the panda's package in Python for the sequential feature selector object.This method displays the output of SFS as a data frame.
The columns avg_score and ci_bound represent the average and confidence interval around the computed cross-validation scores (CI = 95%).Also, the columns std_dev and std_err represent the cross-validation scores' standard deviation and standard errors, respectively.
In the complete set of features obtained by k-Best, 19 features were selected.The SFS and k-Best method obtained the best ACC for 7 features.Figure 6 shows the model's performance based on the ACC and the number of selected features.The ACC of the RFM without using feature selection was 0.76.After the proposed hybrid method, its ACC was improved to 0.795, as well the other criterias.Moreover, the mean of oocytes collected in pregnant women was less than in another group.

Clinical analysis
This difference was not significant based on FHR, and it had a significantly different base on CPR (p = 0.009) (Figure 7C).From the changes in CPR and ongoing pregnancy, it can be concluded that the highest success rate of pregnancy is obtained with < 6 retrieved oocytes.This amount decreases as the number of oocytes collected increases until the number of oocytes collected exceeds 29.
Also, figure 7  Similarly, the negative effect of features including 16Cells, GIII, and compact on the success of IVF/ICSI between 2 groups can be seen in figure 7 (E, F, G), respectively.The difference between ongoing pregnancy in the 2 groups is significant for compact factors.Also, a significant difference exists between successful clinical pregnancy and unsuccessful for 16Cells (p = 0.01) and GIII (p = 0.039).
Finally, in figure 8, the shapley additive explanations value plot for principal clinical factors and those effects on pregnancy prediction are presented.As can be seen, for most patients, FSH factor, FAge, and oocytes have a negative relationship in predicting pregnancy with an impact factor of < 2. There were patients in whom the FSH factor was positively associated with predicting pregnancy rates (with an impact factor greater than 3).Also, there were patients who

Limitations and strengths
This study has some limitations and precautions.The data were collected from This demonstrates a thorough evaluation of the proposed method's effectiveness and ensures its suitability for practical applications.
Future studies could explore the application of heuristic algorithms for dimensional reduction and feature ranking.This could offer alternative approaches to selecting influential features and improve the overall methodology.In addition, external datasets can be utilized to evaluate and generalize the proposed model.
This external validation would provide further confidence in the method's performance and reliability.

Conclusion
Our study introduces an innovative approach that leverages HFSs in feature selection, utilizing essential factors in predictive models.Various techniques have been used in previous studies.Existing studies have employed various predictive models using different feature selection techniques.
methods of feature selection in different studies lead to inconsistent results and difficulty in identifying important factors.Using different scoring criteria may not fully represent clinical significance.A comprehensive hybrid approach can improve the accuracy and reliability of predictive models.Previous research has explored single-feature selection techniques, but a comprehensive hybrid approach can improve the accuracy of predictive models.Our primary objective is to address these challenges by proposing a novel hybrid method using HFSs for feature selection in the prediction of in vitro fertilization/intra-cytoplasmic sperm injection (IVF/ICSI) infertility treatment success.This research aims to improve the accuracy and reliability of the prediction model by integrating filter, wrapper, and embedded techniques.The outcome of this research will provide clinicians and medical practitioners with more precise insights into the essential factors contributing to treatment success, guiding personalized interventions, and ultimately improving the overall outcomes of IVF/ICSI infertility treatment.International Journal of Reproductive BioMedicine Machine learning in ART

2 )
selected a subset of features and trained the model with each iteration.This process continues until the best subset is achieved based on the model evaluation function.Sequential feature selector methods are greedy search algorithms that try to eliminate redundant and irrelevant features by reducing the number of features and increasing the model's performance.We considered k = 7 as the number of features in the wrapper methods according to the expert idea for selecting a suitable number of features from 38.International Journal of Reproductive BioMedicine Machine learning in ART 2.4.Evaluation metrics Various measures can be used to evaluate the performance of different methods.Most applicable measures for evaluating the model include ACC and area under the receiver operating characteristic (ROC) curve (AUC), precision or positive predictive value (PPV), recall, and F-score.Considering the imbalanced dataset, criteria, such as ACC and recall, may not be decisive criteria because they are presented according to those couples with successful outcomes (true positive rate).In contrast, most infertile couples in our dataset have cycles with unsuccessful outcomes.Therefore, we used Matthew's Correlation Coefficient (MCC) criterion, representing a robust criterion for evaluating model performance in both groups.The MCC is used in machine learning to measure the quality of binary classifications, introduced by biochemist Brian W Matthews (26).This criterion considers positive and negative cycles and is considered a balanced criterion that can be used even if the classes are of very different sizes.MCC is a value of correlation coefficient between -1 and 1, in which 1 means a complete forecast and -1 shows an inverse forecast.MCC is calculated by the following equation:  =  *  −  *  √( +  )( + )( +  )( + )(1)Where TP is the true positive value determined by the model, TN is the correct negative value, and FN and FP are, the negative and positive values the model incorrectly specifies.There are several measures applied for comparison, which are briefly specified below:  =   +    (*     +    (5)

software version 3 . 8 ,
which was implemented on a system equipped with 2 GB of RAM and a Core i3 CPU, enhancing the accuracy and efficiency of the analysis.Furthermore, for statistical analysis, IBM® SPSS® (Statistical Package for the Social Sciences) version 25 was employed, ensuring robust and comprehensive data processing.
goal of feature selection is choosing the minimum feature and the highest performance of the model.The tree-based method has the best ACC (0.79) among filter-based methods, although it has more features (k = 20) than the other methods.Also, for this method, MCC = 0.5 indicates the correct performance of the model for selecting features.With a slight difference, the k-Best method has obtained a relatively good ACC (0.786) with MCC = 0.487 and 19 features selected.
(D)  shows the relationship between the number of previous IVF/ICSI treatments and outcome.Although no significant difference was observed between successful and unsuccessful ongoing pregnancy groups for women who had a previous unsuccessful IVF/ICSI treatment; women who had 3 previous unsuccessful treatments are less likely to become pregnant.Although the risk of pregnancy decreases with increased number of unsuccessful treatments (3 or more), it can still be hoped that pregnancy is possible.

were
FAge and oocytes positively impacted (< 2) on the prediction.Almost all patients are divided into 2 categories based on unsuccessful, 16Cells, compact, and GIII.The first group has a high effect on the model prediction.It is negatively related to pregnancy with an impact factor of < 2. The rest have a positive relationship with pregnancy despite a low effect on the model prediction.It is indicated that these clinical factors are almost inversely related to the pregnancy outcome and behave independently of other features.Although factors such as FAge were negatively associated with pregnancy, many patients became pregnant despite advanced age.It can be concluded that other clinical factors affect these features, which may change treatment outcomes.We also used a heat map to show the relationship between the selected features and the model's output (pregnancy and non-pregnancy).This map shows the correlation https://doi.org/10.18502/ijrm.v21i12.15038International Journal of Reproductive BioMedicine Machine learning in ART

Figure 5 .Figure 6 .
Figure 5. Spider plot for comparison among SFS, hybrid k-Best and SFS and hybrid tree-based and SFS methods.

Figure 7 .
Figure 7. A) Relationship of female age and outcome of treatment, B) Relationship of dose 3 day FSH and outcome of treatment, C) Relationship of number of retrieved oocytes and outcome of treatment, D) Relationship of number of pre unsuccessful IVF/ICSI treatment and outcome, E): Relationship of number of cells day (16Cells) and outcome of treatment, F) Relationship of quality of transferred embryos (GIII) and outcome of treatment, G) Relationship of number of cells day (compact) and outcome of treatment, outcome of treatment, that is, FHR.

Figure 8 .
Figure 8. Shapley Additive exPlanations value for important clinical factors, output considered as FHR in RF model.

Figure 9 .
Figure 9. Heat map indicate correlation among features and outcome, outcome considered as FHR.

Figure 10 .
Figure 10.Association of selected features base on hybrid model.

only 2
infertility centers in one city.This limited scope may affect the generalizability of the findings.Future research should aim to collect data from multiple centers across different geographical locations to enhance the representativeness and external validity of the results.In addition, the size of the dataset used, particularly in the successful samples, was not significant.This imbalance may introduce bias and affect the statistical power of the analysis.It is important to consider this limitation when interpreting the results and to strive for larger and more balanced datasets in future studies.The suggested hybrid method selects the best model based on the novel approach in using the scoring system of HFS, which is considered an advantage of the study.Although the proposed hybrid method incorporates a novel approach using the scoring system of HFS, the study only employed a subset of standard feature selection methods.Exploring additional feature selection techniques and comparing their performance could provide further insights and enhance the robustness of the methodology.The proposed hybrid method, which selects the best model based on the innovative scoring system of HFS, is a strength of the study.This approach enhances the accuracy and reliability of feature selection and contributes to the advancement of the field.Furthermore, the research incorporates newer tools, such as the MCC measure, to assess model performance in unbalanced datasets.

a
multicenter dataset to predict infertility success rates.By considering the standard deviation among various criteria, HFSs improve feature selection quality and reduce feature quantity.Notably, our findings reveal significant distinctions in mean values between pregnant and non-pregnant groups for key features, including FSH, Age, 16Cells, oocytes, GIII, and compact.Additionally, we establish a noteworthy correlation between age and FHR and the CPR, with the highest FSH level (31.87%) observed within the FSH dose range of 10-13 (mIU/ml).
method has different criteria for evaluation, we used the corresponding HFS.Also, the criteria values varied in the range of 0-1.Then, MCC was transferred to [0, 1] for each method to International Journal of Reproductive BioMedicine Mehrjerd et al. correct the comparison.So, for each method , we have  = {< .{.. − ..  .}>|  ∈ )} (7)

Table I .
Results of filter/embedded methods on random forest classifier using IVF/ICSI dataset

Table V .
Principle factors in detail with features selected by SFS and k-Best method for prediction in IVF/ICSI treatment Sequential forward selection, IVF: In vitro fertilization, ICSI: Intracytoplasmic sperm injection, Ave_score: Average_sacore, Ci_bound: Confidence interval, Feature_idx: Feature index, Std_dev: Standard deviation, Str_err: Standard error, FSH: Folliclestimulation hormone