The Effect of Data Splitting Methods on Classification Performance in Wrapper-Based Cuttlefish Gene-Selection Model

Authors

  • Wahab Kh. Arabo Department of Computer Science, University of Zakho, Duhok, Iraq
  • Omar M. Malallah Department of Computer Science, University of Zakho, Duhok, Iraq

DOI:

https://doi.org/10.25007/ajnu.v11n4a1424

Abstract

Considering the high dimensionality of gene expression datasets, selecting informative genes is key to improving classification performance. The outcomes of data classification, on the other hand, are affected by data splitting strategies for the training-testing task. In light of the above facts, this paper aims to investigate the impact of three different data splitting methods on the performance of eight well-known classifiers when paired by Cuttlefish algorithm (CFA) as a Gene-Selection. The classification algorithms included in this study are K-Nearest Neighbors (KNN), Logistic Regression (LR), Gaussian Naive Bayes (GNB), Linear Support Vector Machine (SVM-L), Sigmoid Support Vector Machine (SVM-S), Random Forest (RF), Decision Tree (DT), and Linear Discriminant Analysis (LDA). Whereas the tested data splitting methods are cross-validation (CV), train-test (TT), and train-validation-test (TVT). The efficacy of the investigated classifiers was evaluated on nine cancer gene expression datasets using various evaluation metrics, such as accuracy, F1-score, Friedman test. Experimental results revealed that LDA and SVM-L outperformed other algorithms in general. In contrast, the RF and DT algorithms provided the worst results. In most often used datasets, the results of all algorithms demonstrated that the train-test method of data separation is more accurate than the train-validation-test method, while the cross-validation method was superior to both. Furthermore, RF and GNB was affected by data splitting techniques less than other classifiers, whereas the LDA was the most affected one.

Downloads

Download data is not yet available.

References

Abdu-Aljabar, Rana Dhia’a, and Osama A. Awad. 2021. “A Comparative Analysis Study of Lung Cancer Detection and Relapse Prediction Using XGBoost Classifier.” IOP Conference Series: Materials Science and Engineering 1076(1): 012048.

Alanni, Russul, Jingyu Hou, Hasseeb Azzawi, and Yong Xiang. 2019. “A Novel Gene Selection Algorithm for Cancer Classification Using Microarray Datasets.” BMC Medical Genomics 12(1): 1–12.

Alba, Enrique, José García-Nieto, Laetitia Jourdan, and El Ghazali Talbi. 2007. “Gene Selection in Cancer Classification Using PSO/SVM and GA/SVM Hybrid Algorithms.” In 2007 IEEE Congress on Evolutionary Computation, CEC 2007,.

Alshamlan, Hala, Ghada Badr, and Yousef Alohali. 2019. “Microarray Gene Selection and Cancer Classification Method Using Artificial Bee Colony and SVM Algorithms (ABC-SVM).” In Lecture Notes in Electrical Engineering,.

Arshak, Yousif, and Adel Eesa. 2018. “A New Dimensional Reduction Based on Cuttlefish Algorithm for Human Cancer Gene Expression.” ICOASE 2018 - International Conference on Advanced Science and Engineering: 48–53.

Baliarsingh, Santos Kumar, Swati Vipsita, and Bodhisattva Dash. 2020. “A New Optimal Gene Selection Approach for Cancer Classification Using Enhanced Jaya-Based Forest Optimization Algorithm.” Neural Computing and Applications 32(12): 8599–8616.

Begum, Shemim et al. 2018. “Gene Selection for Diagnosis of Cancer in Microarray Data Using Memetic Algorithm.” In , 441–49.

Bolón-Canedo, Verónica, Noelia Sánchez-Maroño, and Amparo Alonso-Betanzos. 2013. “A Review of Feature Selection Methods on Synthetic Data.” Knowledge and Information Systems 34(3): 483–519.

Chen, Kun Huang, Kung Jeng Wang, Kung Min Wang, and Melani Adrian Angelia. 2014. “Applying Particle Swarm Optimization-Based Decision Tree Classifier for Cancer Classification on Gene Expression Data.” Applied Soft Computing Journal 24: 773–80.

Dash, Rasmita, Rajashree Dash, and Rasmita Rautray. 2022. “An Evolutionary Framework Based Microarray Gene Selection and Classification Approach Using Binary Shuffled Frog Leaping Algorithm.” Journal of King Saud University - Computer and Information Sciences 34(3): 880–91.

Dino, Hivi Ismat, Haval Ismael Hussein, Masoud Muhammed Hassan, and Adel Sabry Eesa. 2022. “Gene Expression Microarray Data Classification Based on PCA and Cuttlefish Algorithm.” In 2022 International Conference on Computer Science and Software Engineering (CSASE), IEEE, 277–82.

Eesa, Adel Sabry, Zeynep Orman, and Adnan Mohsin Abdulazeez Brifcani. 2015. “A Novel Feature-Selection Approach Based on the Cuttlefish Optimization Algorithm for Intrusion Detection Systems.” Expert Systems with Applications 42(5): 2670–79.

Fahrudin, Tresna Maulana, Iwan Syarif, and Ali Ridho Barakbah. 2017. “Ant Colony Algorithm for Feature Selection on Microarray Datasets.” Proceedings - 2016 International Electronics Symposium, IES 2016: 351–56.

Guo, Shun, Donghui Guo, Lifei Chen, and Qingshan Jiang. 2016. “A Centroid-Based Gene Selection Method for Microarray Data Classification.” Journal of Theoretical Biology 400: 32–41.

Jansi Rani, M., and D. Devaraj. 2019. “Two-Stage Hybrid Gene Selection Using Mutual Information and Genetic Algorithm for Cancer Data Classification.” Journal of Medical Systems 43(8): 235.

Kar, Subhajit, Kaushik Das Sharma, and Madhubanti Maitra. 2015a. “Gene Selection from Microarray Gene Expression Data for Classification of Cancer Subgroups Employing PSO and Adaptive K-Nearest Neighborhood Technique.” Expert Systems with Applications 42(1): 612–27.

Kar, Subhajit, Kaushik das Sharma, and Madhubanti Maitra. 2015b. “Gene Selection from Microarray Gene Expression Data for Classification of Cancer Subgroups Employing PSO and Adaptive K-Nearest Neighborhood Technique.” Expert Systems with Applications 42(1).

Lee, Chien-Pang, and Yungho Leu. 2011. “A Novel Hybrid Feature Selection Method for Microarray Data Analysis.” Applied Soft Computing 11(1): 208–13.

Li, Tao, Chengliang Zhang, and Mitsunori Ogihara. 2004. “A Comparative Study of Feature Selection and Multiclass Classfication Methods for Tissue Classification Based on Gene Expression.” Bioinformatics 20(15).

Mohamad, Mohd Saberi, Sigeru Omatu, Safaai Deris, and Michifumi Yoshioka. 2009. “Particle Swarm Optimization for Gene Selection in Classifying Cancer Classes.” Artificial Life and Robotics 14(1): 16–19.

Ooi, C. H., and Patrick Tan. 2003. “Genetic Algorithms Applied to Multi-Class Prediction for the Analysis of Gene Expression Data.” Bioinformatics 19(1).

Othman, Mohd Shahizan, Shamini Raja Kumaran, and Lizawati Mi Yusuf. 2020. “Gene Selection Using Hybrid Multi-Objective Cuckoo Search Algorithm With Evolutionary Operators for Cancer Microarray Data.” IEEE Access 8: 186348–61.

Pragadeesh, C. et al. 2019. “Hybrid Feature Selection Using Micro Genetic Algorithm on Microarray Gene Expression Data.” Journal of Intelligent & Fuzzy Systems 36(3): 2241–46.

Sabry Eesa, Adel, Adnan Mohsin Abdulazeez, and Zeynep Orman. 2013. “Cuttlefish Algorithm – A Novel Bio-Inspired.” International Journal of Scientific & Engineering Research 4(9): 1978–86.

Sahu, Barnali, and Debahuti Mishra. 2012. “A Novel Feature Selection Algorithm Using Particle Swarm Optimization for Cancer Microarray Data.” Procedia Engineering 38: 27–31.

Sayed, Sabah, Mohammad Nassef, Amr Badr, and Ibrahim Farag. 2019. “A Nested Genetic Algorithm for Feature Selection in High-Dimensional Cancer Microarray Datasets.” Expert Systems with Applications 121: 233–43.

Settouti, Nesma, Mohammed El Amine Bechar, and Mohammed Amine Chikh. 2016. “Statistical Comparisons of the Top 10 Algorithms in Data Mining for Classi Cation Task.” International Journal of Interactive Multimedia and Artificial Intelligence 4(1): 46.

Shi, Zhiao, Bo Wen, Qiang Gao, and Bing Zhang. 2021. “Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data.” Molecular and Cellular Proteomics 20: 100083.

Soufan, Othman, Dimitrios Kleftogiannis, Panos Kalnis, and Vladimir B. Bajic. 2015. “DWFS: A Wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm.” PLOS ONE 10(2): e0117988.

Tabakhi, Sina, Ali Najafi, Reza Ranjbar, and Parham Moradi. 2015. “Gene Selection for Microarray Data Classification Using a Novel Ant Colony Optimization.” Neurocomputing 168: 1024–36.

Vafaee, Fatemeh, Sara Mosafer, and Mohammad Hossein. 2016a. “Genomics A Hybrid Gene Selection Approach for Microarray Data Classi Fi Cation Using Cellular Learning Automata and Ant Colony Optimization.” Genomics 107(6): 231–38.

Wang, Aiguo et al. 2017. “Wrapper-Based Gene Selection with Markov Blanket.” Computers in Biology and Medicine 81(September 2016): 11–23.

Zhu, Zexuan, Yew-Soon Ong, and Manoranjan Dash. 2007. “Markov Blanket-Embedded Genetic Algorithm for Gene Selection.” Pattern Recognition 40(11): 3236–48.

Published

2022-11-25

How to Cite

Kh. Arabo, W., & M. Malallah, O. (2022). The Effect of Data Splitting Methods on Classification Performance in Wrapper-Based Cuttlefish Gene-Selection Model. Academic Journal of Nawroz University, 11(4), 284–293. https://doi.org/10.25007/ajnu.v11n4a1424

Issue

Section

Articles