The Effect of Data Splitting Methods on Classification Performance in Wrapper-Based Cuttlefish Gene-Selection Model
DOI:
https://doi.org/10.25007/ajnu.v11n4a1424Abstract
Considering the high dimensionality of gene expression datasets, selecting informative genes is key to improving classification performance. The outcomes of data classification, on the other hand, are affected by data splitting strategies for the training-testing task. In light of the above facts, this paper aims to investigate the impact of three different data splitting methods on the performance of eight well-known classifiers when paired by Cuttlefish algorithm (CFA) as a Gene-Selection. The classification algorithms included in this study are K-Nearest Neighbors (KNN), Logistic Regression (LR), Gaussian Naive Bayes (GNB), Linear Support Vector Machine (SVM-L), Sigmoid Support Vector Machine (SVM-S), Random Forest (RF), Decision Tree (DT), and Linear Discriminant Analysis (LDA). Whereas the tested data splitting methods are cross-validation (CV), train-test (TT), and train-validation-test (TVT). The efficacy of the investigated classifiers was evaluated on nine cancer gene expression datasets using various evaluation metrics, such as accuracy, F1-score, Friedman test. Experimental results revealed that LDA and SVM-L outperformed other algorithms in general. In contrast, the RF and DT algorithms provided the worst results. In most often used datasets, the results of all algorithms demonstrated that the train-test method of data separation is more accurate than the train-validation-test method, while the cross-validation method was superior to both. Furthermore, RF and GNB was affected by data splitting techniques less than other classifiers, whereas the LDA was the most affected one.
Downloads
References
Abdu-Aljabar, Rana Dhia’a, and Osama A. Awad. 2021. “A Comparative Analysis Study of Lung Cancer Detection and Relapse Prediction Using XGBoost Classifier.” IOP Conference Series: Materials Science and Engineering 1076(1): 012048.
Alanni, Russul, Jingyu Hou, Hasseeb Azzawi, and Yong Xiang. 2019. “A Novel Gene Selection Algorithm for Cancer Classification Using Microarray Datasets.” BMC Medical Genomics 12(1): 1–12.
Alba, Enrique, José García-Nieto, Laetitia Jourdan, and El Ghazali Talbi. 2007. “Gene Selection in Cancer Classification Using PSO/SVM and GA/SVM Hybrid Algorithms.” In 2007 IEEE Congress on Evolutionary Computation, CEC 2007,.
Alshamlan, Hala, Ghada Badr, and Yousef Alohali. 2019. “Microarray Gene Selection and Cancer Classification Method Using Artificial Bee Colony and SVM Algorithms (ABC-SVM).” In Lecture Notes in Electrical Engineering,.
Arshak, Yousif, and Adel Eesa. 2018. “A New Dimensional Reduction Based on Cuttlefish Algorithm for Human Cancer Gene Expression.” ICOASE 2018 - International Conference on Advanced Science and Engineering: 48–53.
Baliarsingh, Santos Kumar, Swati Vipsita, and Bodhisattva Dash. 2020. “A New Optimal Gene Selection Approach for Cancer Classification Using Enhanced Jaya-Based Forest Optimization Algorithm.” Neural Computing and Applications 32(12): 8599–8616.
Begum, Shemim et al. 2018. “Gene Selection for Diagnosis of Cancer in Microarray Data Using Memetic Algorithm.” In , 441–49.
Bolón-Canedo, Verónica, Noelia Sánchez-Maroño, and Amparo Alonso-Betanzos. 2013. “A Review of Feature Selection Methods on Synthetic Data.” Knowledge and Information Systems 34(3): 483–519.
Chen, Kun Huang, Kung Jeng Wang, Kung Min Wang, and Melani Adrian Angelia. 2014. “Applying Particle Swarm Optimization-Based Decision Tree Classifier for Cancer Classification on Gene Expression Data.” Applied Soft Computing Journal 24: 773–80.
Dash, Rasmita, Rajashree Dash, and Rasmita Rautray. 2022. “An Evolutionary Framework Based Microarray Gene Selection and Classification Approach Using Binary Shuffled Frog Leaping Algorithm.” Journal of King Saud University - Computer and Information Sciences 34(3): 880–91.
Dino, Hivi Ismat, Haval Ismael Hussein, Masoud Muhammed Hassan, and Adel Sabry Eesa. 2022. “Gene Expression Microarray Data Classification Based on PCA and Cuttlefish Algorithm.” In 2022 International Conference on Computer Science and Software Engineering (CSASE), IEEE, 277–82.
Eesa, Adel Sabry, Zeynep Orman, and Adnan Mohsin Abdulazeez Brifcani. 2015. “A Novel Feature-Selection Approach Based on the Cuttlefish Optimization Algorithm for Intrusion Detection Systems.” Expert Systems with Applications 42(5): 2670–79.
Fahrudin, Tresna Maulana, Iwan Syarif, and Ali Ridho Barakbah. 2017. “Ant Colony Algorithm for Feature Selection on Microarray Datasets.” Proceedings - 2016 International Electronics Symposium, IES 2016: 351–56.
Guo, Shun, Donghui Guo, Lifei Chen, and Qingshan Jiang. 2016. “A Centroid-Based Gene Selection Method for Microarray Data Classification.” Journal of Theoretical Biology 400: 32–41.
Jansi Rani, M., and D. Devaraj. 2019. “Two-Stage Hybrid Gene Selection Using Mutual Information and Genetic Algorithm for Cancer Data Classification.” Journal of Medical Systems 43(8): 235.
Kar, Subhajit, Kaushik Das Sharma, and Madhubanti Maitra. 2015a. “Gene Selection from Microarray Gene Expression Data for Classification of Cancer Subgroups Employing PSO and Adaptive K-Nearest Neighborhood Technique.” Expert Systems with Applications 42(1): 612–27.
Kar, Subhajit, Kaushik das Sharma, and Madhubanti Maitra. 2015b. “Gene Selection from Microarray Gene Expression Data for Classification of Cancer Subgroups Employing PSO and Adaptive K-Nearest Neighborhood Technique.” Expert Systems with Applications 42(1).
Lee, Chien-Pang, and Yungho Leu. 2011. “A Novel Hybrid Feature Selection Method for Microarray Data Analysis.” Applied Soft Computing 11(1): 208–13.
Li, Tao, Chengliang Zhang, and Mitsunori Ogihara. 2004. “A Comparative Study of Feature Selection and Multiclass Classfication Methods for Tissue Classification Based on Gene Expression.” Bioinformatics 20(15).
Mohamad, Mohd Saberi, Sigeru Omatu, Safaai Deris, and Michifumi Yoshioka. 2009. “Particle Swarm Optimization for Gene Selection in Classifying Cancer Classes.” Artificial Life and Robotics 14(1): 16–19.
Ooi, C. H., and Patrick Tan. 2003. “Genetic Algorithms Applied to Multi-Class Prediction for the Analysis of Gene Expression Data.” Bioinformatics 19(1).
Othman, Mohd Shahizan, Shamini Raja Kumaran, and Lizawati Mi Yusuf. 2020. “Gene Selection Using Hybrid Multi-Objective Cuckoo Search Algorithm With Evolutionary Operators for Cancer Microarray Data.” IEEE Access 8: 186348–61.
Pragadeesh, C. et al. 2019. “Hybrid Feature Selection Using Micro Genetic Algorithm on Microarray Gene Expression Data.” Journal of Intelligent & Fuzzy Systems 36(3): 2241–46.
Sabry Eesa, Adel, Adnan Mohsin Abdulazeez, and Zeynep Orman. 2013. “Cuttlefish Algorithm – A Novel Bio-Inspired.” International Journal of Scientific & Engineering Research 4(9): 1978–86.
Sahu, Barnali, and Debahuti Mishra. 2012. “A Novel Feature Selection Algorithm Using Particle Swarm Optimization for Cancer Microarray Data.” Procedia Engineering 38: 27–31.
Sayed, Sabah, Mohammad Nassef, Amr Badr, and Ibrahim Farag. 2019. “A Nested Genetic Algorithm for Feature Selection in High-Dimensional Cancer Microarray Datasets.” Expert Systems with Applications 121: 233–43.
Settouti, Nesma, Mohammed El Amine Bechar, and Mohammed Amine Chikh. 2016. “Statistical Comparisons of the Top 10 Algorithms in Data Mining for Classi Cation Task.” International Journal of Interactive Multimedia and Artificial Intelligence 4(1): 46.
Shi, Zhiao, Bo Wen, Qiang Gao, and Bing Zhang. 2021. “Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data.” Molecular and Cellular Proteomics 20: 100083.
Soufan, Othman, Dimitrios Kleftogiannis, Panos Kalnis, and Vladimir B. Bajic. 2015. “DWFS: A Wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm.” PLOS ONE 10(2): e0117988.
Tabakhi, Sina, Ali Najafi, Reza Ranjbar, and Parham Moradi. 2015. “Gene Selection for Microarray Data Classification Using a Novel Ant Colony Optimization.” Neurocomputing 168: 1024–36.
Vafaee, Fatemeh, Sara Mosafer, and Mohammad Hossein. 2016a. “Genomics A Hybrid Gene Selection Approach for Microarray Data Classi Fi Cation Using Cellular Learning Automata and Ant Colony Optimization.” Genomics 107(6): 231–38.
Wang, Aiguo et al. 2017. “Wrapper-Based Gene Selection with Markov Blanket.” Computers in Biology and Medicine 81(September 2016): 11–23.
Zhu, Zexuan, Yew-Soon Ong, and Manoranjan Dash. 2007. “Markov Blanket-Embedded Genetic Algorithm for Gene Selection.” Pattern Recognition 40(11): 3236–48.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Academic Journal of Nawroz University

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors retain copyright
The use of a Creative Commons License enables authors/editors to retain copyright to their work. Publications can be reused and redistributed as long as the original author is correctly attributed.
- Copyright
- The researcher(s), whether a single or joint research paper, must sell and transfer to the publisher (the Academic Journal of Nawroz University) through all the duration of the publication which starts from the date of entering this Agreement into force, the exclusive rights of the research paper/article. These rights include the translation, reuse of papers/articles, transmit or distribute, or use the material or parts(s) contained therein to be published in scientific, academic, technical, professional journals or any other periodicals including any other works derived from them, all over the world, in English and Arabic, whether in print or in electronic edition of such journals and periodicals in all types of media or formats now or that may exist in the future. Rights also include giving license (or granting permission) to a third party to use the materials and any other works derived from them and publish them in such journals and periodicals all over the world. Transfer right under this Agreement includes the right to modify such materials to be used with computer systems and software, or to reproduce or publish it in e-formats and also to incorporate them into retrieval systems.
- Reproduction, reference, transmission, distribution or any other use of the content, or any parts of the subjects included in that content in any manner permitted by this Agreement, must be accompanied by mentioning the source which is (the Academic Journal of Nawroz University) and the publisher in addition to the title of the article, the name of the author (or co-authors), journal’s name, volume or issue, publisher's copyright, and publication year.
- The Academic Journal of Nawroz University reserves all rights to publish research papers/articles issued under a “Creative Commons License (CC BY-NC-ND 4.0) which permits unrestricted use, distribution, and reproduction of the paper/article by any means, provided that the original work is correctly cited.
- Reservation of Rights
The researcher(s) preserves all intellectual property rights (except for the one transferred to the publisher under this Agreement).
- Researcher’s guarantee
The researcher(s) hereby guarantees that the content of the paper/article is original. It has been submitted only to the Academic Journal of Nawroz University and has not been previously published by any other party.
In the event that the paper/article is written jointly with other researchers, the researcher guarantees that he/she has informed the other co-authors about the terms of this agreement, as well as obtaining their signature or written permission to sign on their behalf.
The author further guarantees:
- The research paper/article does not contain any defamatory statements or illegal comments.
- The research paper/article does not violate other's rights (including but not limited to copyright, patent, and trademark rights).
This research paper/article does not contain any facts or instructions that could cause damages or harm to others, and publishing it does not lead to disclosure of any confidential information.