Exploring the Power of eXtreme Gradient Boosting Algorithm in Machine Learning: a Review
Keywords:Data Science, Machine learning, XGBoost Algorithm, semi-supervised learning, python
The primary task of machine learning is to extract valuable information from the data that is generated every day, process it to learn from it, and take useful actions. Original language process, pattern detection, search engines, medical diagnostics, bioinformatics, and chemical informatics are all examples of application areas for machine learning. XGBoost is a recently released machine learning algorithm that has shown exceptional capability for modeling complex systems and is the most superior machine learning algorithm in terms of prediction accuracy and interpretability and classification versatility. XGBoost is an enhanced distributed scaling enhancement library that is built to be extremely powerful, adaptable, and portable. It uses augmented scaling to incorporate machine learning algorithms. it is a parallel tree boost that addresses a variety of data science problems quickly and accurately. Python remains the language of choice for scientific computing, data science, and machine learning, which boosts performance and productivity by enabling the use of clean low-level libraries and high-level APIs. This paper presents one of the most prominent supervised and semi-supervised learning (SSL) machine learning algorithms in a Python environment.
M. Khalaf et al., “A Data Science Methodology Based on Machine Learning Algorithms for Flood Severity Prediction,” Sep. 2018, doi: 10.1109/CEC.2018.8477904.
“CLASSIFICATION BASED ON SEMI-SUPERVISED LEARNING: A REVIEW | Iraqi Journal for Computers and Informatics.” http://ijci.uoitc.edu.iq/index.php/ijci/article/view/277 (accessed May 20, 2021).
A. Dey, “Machine Learning Algorithms: A Review,” Int. J. Comput. Sci. Inf. Technol., vol. 7, no. 3, pp. 1174–1179, 2016, [Online]. Available: www.ijcsit.com.
N. M. Abdulkareem and A. M. Abdulazeez, “Machine Learning Classification Based on Radom Forest Algorithm: A Review,” Int. J. Sci. Bus., vol. 5, no. 2, pp. 128–142, 2021, Accessed: May 20, 2021. [Online]. Available: https://ideas.repec.org/a/aif/journl/v5y2021i2p128-142.html.
J. Pesantez-Narvaez, M. Guillen, and M. Alcañiz, “Predicting motor insurance claims using telematics data—XGboost versus logistic regression,” Risks, vol. 7, no. 2, 2019, doi: 10.3390/risks7020070.
S. Raschka, J. Patterson, and C. Nolet, “Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence,” Inf., vol. 11, no. 4, 2020, doi: 10.3390/info11040193.
“(2) Machine Learning Supervised Algorithms of Gene Selection: A Review | Request PDF.” https://www.researchgate.net/publication/341119469_Machine_Learning_Supervised_Algorithms_of_Gene_Selection_A_Review (accessed May 20, 2021).
O. Ahmed and A. Brifcani, “Gene Expression Classification Based on Deep Learning,” in 4th Scientific International Conference Najaf, SICN 2019, Apr. 2019, pp. 145–149, doi: 10.1109/SICN47020.2019.9019357.
R. N. Behera, K. Das, B. Tech, and A. Professor, “A Survey on Machine Learning: Concept, Algorithms and Applications Machine Learning View project International Journal of Innovative Research in Computer and Communication Engineering A Survey on Machine Learning: Concept, Algorithms and Applications,” Artic. Int. J. Innov. Res. Comput., 2017, doi: 10.15680/IJIRCCE.2017.
S. Athmaja, M. Hanumanthappa, and V. Kavitha, “A survey of machine learning algorithms for big data analytics,” in Proceedings of 2017 International Conference on Innovations in Information, Embedded and Communication Systems, ICIIECS 2017, Jan. 2018, vol. 2018-Janua, pp. 1–4, doi: 10.1109/ICIIECS.2017.8276028.
M. Iqbal, I. Muhammad, and Z. Yan, “SUPERVISED MACHINE LEARNING APPROACHES: A SURVEY machine learning, SEO, Virtual Reality View project Content Management System View project SUPERVISED MACHINE LEARNING APPROACHES: A SURVEY,” Artic. Int. J. Soft Comput., 2015, doi: 10.21917/ijsc.2015.0133.
H. Yahia, A. A.-I. J. of S. and, and undefined 2021, “Medical Text Classification Based on Convolutional Neural Network: A Review,” ideas.repec.org, Accessed: May 06, 2021. [Online]. Available: https://ideas.repec.org/a/aif/journl/v5y2021i3p27-41.html.
P. C. Sen, M. Hajra, and M. Ghosh, “Supervised Classification Algorithms in Machine Learning: A Survey and Review,” in Advances in Intelligent Systems and Computing, 2020, vol. 937, pp. 99–111, doi: 10.1007/978-981-13-7403-6_11.
A. Mohsin Abdulazeez, D. Zeebaree, D. M. Abdulqader, and D. Q. Zeebaree, “Machine Learning Supervised Algorithms of Gene Selection: A Review Machine Learning View project How To Choose A Performance Metric View project Machine Learning Supervised Algorithms of Gene Selection: A Review,” 2020. Accessed: May 06, 2021. [Online]. Available: https://www.researchgate.net/publication/341119469.
S. C. Dharmadhikari, M. Ingle, and P. Kulkarni, “Empirical Studies on Machine Learning Based Text Classification Algorithms,” Adv. Comput. An Int. J. ( ACIJ ), vol. 2, no. 6, 2011, doi: 10.5121/acij.2011.2615.
D. Mustafa Abdullah and A. Mohsin Abdulazeez, “Machine Learning Applications based on SVM Classification A Review,” Qubahan Acad. J., vol. 1, no. 2, pp. 81–90, Apr. 2021, doi: 10.48161/qaj.v1n2a50.
S. B. Kotsiantis, I. D. Zaharakis, and P. E. Pintelas, “Machine learning: A review of classification and combining techniques,” Artif. Intell. Rev., vol. 26, no. 3, pp. 159–190, Nov. 2006, doi: 10.1007/s10462-007-9052-3.
P. Strecht, L. Cruz, C. Soares, J. Mendes-Moreira, and R. Abreu, “A Comparative Study of Classification and Regression Algorithms for Modelling Students’ Academic Performance,” International Educational Data Mining Society. e-mail: [email protected]; Web site: http://www.educationaldatamining.org, Jun. 2015.
H. Q. Tran and C. Ha, “Improved visible light-based indoor positioning system using machine learning classification and regression,” Appl. Sci., vol. 9, no. 6, p. 1048, Mar. 2019, doi: 10.3390/app9061048.
J. Alzubi, A. Nayyar, and A. Kumar, “Machine Learning from Theory to Algorithms: An Overview,” in Journal of Physics: Conference Series, Nov. 2018, vol. 1142, no. 1, p. 12012, doi: 10.1088/1742-6596/1142/1/012012.
B. Choubin, E. Moradi, M. Golshan, J. Adamowski, F. Sajedi-Hosseini, and A. Mosavi, “An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines,” Sci. Total Environ., vol. 651, pp. 2087–2096, Feb. 2019, doi: 10.1016/j.scitotenv.2018.10.064.
D. P. P. Mesquita, J. P. P. Gomes, and A. H. Souza Junior, “Ensemble of Efficient Minimal Learning Machines for Classification and Regression,” Neural Process. Lett., vol. 46, no. 3, pp. 751–766, Dec. 2017, doi: 10.1007/s11063-017-9587-5.
R. Costache et al., “Novel hybrid models between bivariate statistics, artificial neural networks and boosting algorithms for flood susceptibility assessment,” J. Environ. Manage., vol. 265, p. 110485, Jul. 2020, doi: 10.1016/j.jenvman.2020.110485.
G. Biau, B. Cadre, and L. Rouvière, “Accelerated gradient boosting,” Mach. Learn., vol. 108, no. 6, pp. 971–992, Jun. 2019, doi: 10.1007/s10994-019-05787-1.
F. Sigrist, “Gradient and Newton boosting for classification and regression,” Expert Syst. Appl., vol. 167, p. 114080, Apr. 2021, doi: 10.1016/j.eswa.2020.114080.
R. Mitchell, A. Adinets, T. Rao, and E. Frank, “XGBoost: Scalable GPU accelerated learning,” arXiv, pp. 1–5, 2018.
W. Niu, T. Li, X. Zhang, T. Hu, T. Jiang, and H. Wu, “Using XGBoost to Discover Infected Hosts Based on HTTP Traffic,” Secur. Commun. Networks, vol. 2019, 2019, doi: 10.1155/2019/2182615.
D. Uenoyama, H. Yoshiura, and M. Ichino, “Personal authentication of iris and periocular recognition using XGBoost,” 2019 IEEE 8th Glob. Conf. Consum. Electron. GCCE 2019, pp. 186–187, 2019, doi: 10.1109/GCCE46687.2019.9015469.
S. Zhao et al., “Mutation grey wolf elite PSO balanced XGBoost for radar emitter individual identification based on measured signals,” Meas. J. Int. Meas. Confed., vol. 159, p. 107777, 2020, doi: 10.1016/j.measurement.2020.107777.
C. Zopluoglu, “Detecting Examinees With Item Preknowledge in Large-Scale Testing Using Extreme Gradient Boosting (XGBoost),” Educ. Psychol. Meas., vol. 79, no. 5, pp. 931–961, 2019, doi: 10.1177/0013164419839439.
D. Bhulakshmi and G. Gandhi, “The Prediction of Diabetes in Pima Indian Women Mellitus Based on XGBOOST Ensemble Modeling Using Data Science The Prediction of Diabetes in Pima Indian women Mellitus Based on XGBOOST Ensemble Modeling using data science,” 2020.
M. A. Fauzan and H. Murfi, “The accuracy of XGBoost for insurance claim prediction,” Int. J. Adv. Soft Comput. its Appl., vol. 10, no. 2, pp. 159–171, 2018.
A. Pathy, S. Meher, and B. P, “Predicting algal biochar yield using eXtreme Gradient Boosting (XGB) algorithm of machine learning methods,” Algal Res., vol. 50, no. April, p. 102006, 2020, doi: 10.1016/j.algal.2020.102006.
R. Zhong, R. Johnson, and Z. Chen, “Generating pseudo density log from drilling and logging-while-drilling data using extreme gradient boosting (XGBoost),” Int. J. Coal Geol., vol. 220, no. July 2019, p. 103416, 2020, doi: 10.1016/j.coal.2020.103416.
R. Santhanam, N. Uzir, S. Raman, and S. Banerjee, “Experimenting XGBoost Algorithm for Prediction and Classification of Different Ramraj S , Nishant Uzir , Sunil R and Shatadeep Banerjee Experimenting XGBoost Algorithm for Prediction and Classi fi cation of Different Datasets,” Int. J. Control Theory Appl., vol. 9, no. March, pp. 651–662, 2017.
R. Sundaram, “An End-to-End Guide to Understand the Math behind XGBoost,” Anal. Vidhja, 2018, [Online]. Available: https://www.analyticsvidhya.com/blog/2018/09/an-end-to-end-guide-to-understand-the-math-behind-xgboost/.
R. Zhang, B. Li, and B. Jiao, “Application of XGboost Algorithm in Bearing Fault Diagnosis,” IOP Conf. Ser. Mater. Sci. Eng., vol. 490, no. 7, 2019, doi: 10.1088/1757-899X/490/7/072062.
“XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction | IEEE Journals & Magazine | IEEE Xplore.” https://ieeexplore.ieee.org/abstract/document/8370098 (accessed May 20, 2021).
Y. Qiu, J. Zhou, M. Khandelwal, H. Yang, P. Yang, and C. Li, “Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration,” Eng. Comput., pp. 1–18, Apr. 2021, doi: 10.1007/s00366-021-01393-9.
X. Ji, W. Tong, Z. Liu, and T. Shi, “Five-feature model for developing the classifier for synergistic vs. Antagonistic drug combinations built by XGboost,” Front. Genet., vol. 10, no. JUL, p. 600, Jul. 2019, doi: 10.3389/fgene.2019.00600.
X. Liao, N. Cao, M. Li, and X. Kang, “Research on Short-Term Load Forecasting Using XGBoost Based on Similar Days,” in Proceedings - 2019 International Conference on Intelligent Transportation, Big Data and Smart City, ICITBS 2019, Mar. 2019, pp. 675–678, doi: 10.1109/ICITBS.2019.00167.
B. Yu et al., “SubMito-XGBoost: Predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting,” Bioinformatics, vol. 36, no. 4, pp. 1074–1081, 2020, doi: 10.1093/bioinformatics/btz734.
C. Midroni, P. J. Leimbigler, G. Baruah, M. Kolla, A. J. Whitehead, and Y. Fossat, “Predicting glycemia in type 1 diabetes patients: Experiments with XGBoost,” CEUR Workshop Proc., vol. 2148, pp. 79–84, 2018.
X. Ma, J. Sha, D. Wang, Y. Yu, Q. Yang, and X. Niu, “Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning,” Electron. Commer. Res. Appl., vol. 31, pp. 24–39, 2018, doi: 10.1016/j.elerap.2018.08.002.
Y. Liang et al., “Product marketing prediction based on XGboost and LightGBM algorithm,” ACM Int. Conf. Proceeding Ser., no. 1, pp. 150–153, 2019, doi: 10.1145/3357254.3357290.
Y. Song et al., “Prediction of double-high biochemical indicators based on lightGBM and XGBoost,” ACM Int. Conf. Proceeding Ser., pp. 189–193, 2019, doi: 10.1145/3349341.3349400.
Z. Chen, F. Jiang, Y. Cheng, X. Gu, W. Liu, and J. Peng, “XGBoost Classifier for DDoS Attack Detection and Analysis in SDN-Based Cloud,” Proc. - 2018 IEEE Int. Conf. Big Data Smart Comput. BigComp 2018, pp. 251–256, 2018, doi: 10.1109/BigComp.2018.00044.
L. Chao, Z. Wen-hui, and L. Ji-ming, “Study of Star/Galaxy Classification Based on the XGBoost Algorithm,” Chinese Astron. Astrophys., vol. 43, no. 4, pp. 539–548, 2019, doi: 10.1016/j.chinastron.2019.11.005.
C. Wang, C. Deng, and S. Wang, “Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost,” Pattern Recognit. Lett., vol. 136, pp. 190–197, 2020, doi: 10.1016/j.patrec.2020.05.035.
N. Manju, B. S. Harish, and V. Prajwal, “Ensemble Feature Selection and Classification of Internet Traffic using XGBoost Classifier,” Int. J. Comput. Netw. Inf. Secur., vol. 11, no. 7, pp. 37–44, 2019, doi: 10.5815/ijcnis.2019.07.06.
S. Thongsuwan, S. Jaiyen, A. Padcharoen, and P. Agarwal, “ConvXGB: A new deep learning model for classification problems based on CNN and XGBoost,” Nucl. Eng. Technol., vol. 53, no. 2, pp. 522–531, 2021, doi: 10.1016/j.net.2020.04.008.
K. Song, F. Yan, T. Ding, L. Gao, and S. Lu, “A steel property optimization model based on the XGBoost algorithm and improved PSO,” Comput. Mater. Sci., vol. 174, no. December 2019, p. 109472, 2020, doi: 10.1016/j.commatsci.2019.109472.
Y. Wang, “a Xgb Oost R Isk M Odel Via F Eature S Election and B Ayesian H Yper -P Arameter O Ptimization,” vol. 11, no. 1, pp. 1–17, 2019.
K. Budholiya, S. K. Shrivastava, and V. Sharma, “An optimized XGBoost based diagnostic system for effective prediction of heart disease,” J. King Saud Univ. - Comput. Inf. Sci., no. xxxx, 2020, doi: 10.1016/j.jksuci.2020.10.013.
J. Guo et al., “An XGBoost-based physical fitness evaluation model using advanced feature selection and Bayesian hyper-parameter optimization for wearable running monitoring,” Comput. Networks, vol. 151, pp. 166–180, 2019, doi: 10.1016/j.comnet.2019.01.026.
J. Zhou, Y. Qiu, S. Zhu, D. J. Armaghani, M. Khandelwal, and E. T. Mohamad, “Estimation of the TBM advance rate under hard rock conditions using XGBoost and Bayesian optimization,” Undergr. Sp., 2020, doi: 10.1016/j.undsp.2020.05.008.
D. Zhang, L. Qian, B. Mao, C. Huang, B. Huang, and Y. Si, “A Data-Driven Design for Fault Detection of Wind Turbines Using Random Forests and XGboost,” IEEE Access, vol. 6, no. c, pp. 21020–21031, 2018, doi: 10.1109/ACCESS.2018.2818678.
J. Montiel, R. Mitchell, E. Frank, B. Pfahringer, T. Abdessalem, and A. Bifet, “Adaptive XGBoost for evolving data streams,” arXiv, no. 1, 2020.
S. Ji, X. Wang, W. Zhao, and D. Guo, “An application of a three-stage XGboost-based model to sales forecasting of a cross-border e-commerce enterprise,” Math. Probl. Eng., vol. 2019, 2019, doi: 10.1155/2019/8503252.
S. S. Dhaliwal, A. Al Nahid, and R. Abbas, “Effective intrusion detection system using XGBoost,” Inf., vol. 9, no. 7, 2018, doi: 10.3390/info9070149.
Y. Qu, Z. Lin, H. Li, and X. Zhang, “Feature Recognition of Urban Road Traffic Accidents Based on GA-XGBoost in the Context of Big Data,” IEEE Access, vol. 7, pp. 170106–170115, 2019, doi: 10.1109/ACCESS.2019.2952655.
Y. Xu, Y. Jiang, C. Li, Y. Chen, and Y. Yang, “Integration of an XGBoost model and EIS detection to determine the effect of low inhibitor concentrations on E. coli,” J. Electroanal. Chem., vol. 877, p. 114534, 2020, doi: 10.1016/j.jelechem.2020.114534.
A. B. Parsa, A. Movahedi, H. Taghipour, S. Derrible, and A. (Kouros) Mohammadian, “Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis,” Accid. Anal. Prev., vol. 136, no. October 2019, p. 105405, 2020, doi: 10.1016/j.aap.2019.105405.
H. Dong, D. He, and F. Wang, “SMOTE-XGBoost using Tree Parzen Estimator optimization for copper flotation method classification,” Powder Technol., vol. 375, pp. 174–181, 2020, doi: 10.1016/j.powtec.2020.07.065.
Z. Yi et al., “An Efficient Spectral Selection of M Giants Using XGBoost,” Astrophys. J., vol. 887, no. 2, p. 241, 2019, doi: 10.3847/1538-4357/ab54d0.
W. Li, Y. Yin, X. Quan, and H. Zhang, “Gene Expression Value Prediction Based on XGBoost Algorithm,” Front. Genet., vol. 10, no. November, pp. 1–7, 2019, doi: 10.3389/fgene.2019.01077.
J. Nobre and R. F. Neves, “Combining Principal Component Analysis, Discrete Wavelet Transform and XGBoost to trade in the financial markets,” Expert Syst. Appl., vol. 125, pp. 181–194, 2019, doi: 10.1016/j.eswa.2019.01.083.
H. Nguyen, X. N. Bui, H. B. Bui, and D. T. Cuong, “Developing an XGBoost model to predict blast-induced peak particle velocity in an open-pit mine: a case study,” Acta Geophys., vol. 67, no. 2, pp. 477–490, 2019, doi: 10.1007/s11600-019-00268-4.
D. K. Choi, “Data-Driven Materials Modeling with XGBoost Algorithm and Statistical Inference Analysis for Prediction of Fatigue Strength of Steels,” Int. J. Precis. Eng. Manuf., vol. 20, no. 1, pp. 129–138, 2019, doi: 10.1007/s12541-019-00048-6.
E. Al Daoud, “Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset,” Int. J. Comput. Inf. Eng., vol. 13, no. 1, pp. 6–10, 2019.
G. N. Dimitrakopoulos, A. G. Vrahatis, K. Sgarbas, and V. Plagianakos, “Pathway analysis using xgboost classification in biomedical data,” ACM Int. Conf. Proceeding Ser., pp. 1–6, 2018, doi: 10.1145/3200947.3201029.
M. Z. Joharestani, C. Cao, X. Ni, B. Bashir, and S. Talebiesfandarani, “PM2.5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data,” Atmosphere (Basel)., vol. 10, no. 7, pp. 12–18, 2019, doi: 10.3390/atmos10070373.
Y. Zhang, J. Tong, Z. Wang, and F. Gao, “Customer Transaction Fraud Detection Using Xgboost Model,” Proc. - 2020 Int. Conf. Comput. Eng. Appl. ICCEA 2020, pp. 554–558, 2020, doi: 10.1109/ICCEA50009.2020.00122.
H. Mo, H. Sun, J. Liu, and S. Wei, “Developing window behavior models for residential buildings using XGBoost algorithm,” Energy Build., vol. 205, p. 109564, 2019, doi: 10.1016/j.enbuild.2019.109564.
D. Chakraborty and H. Elzarka, “Early detection of faults in HVAC systems using an XGBoost model with a dynamic threshold,” Energy Build., vol. 185, pp. 326–344, 2019, doi: 10.1016/j.enbuild.2018.12.032.
P. Su, Y. Liu, and X. Song, “Research on intrusion detection method based on improved SMOTE and XGBoost,” ACM Int. Conf. Proceeding Ser., pp. 42–49, 2018, doi: 10.1145/3290480.3290505.
How to Cite
Copyright (c) 2023 Academic Journal of Nawroz University
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors retain copyright
The use of a Creative Commons License enables authors/editors to retain copyright to their work. Publications can be reused and redistributed as long as the original author is correctly attributed.
- The researcher(s), whether a single or joint research paper, must sell and transfer to the publisher (the Academic Journal of Nawroz University) through all the duration of the publication which starts from the date of entering this Agreement into force, the exclusive rights of the research paper/article. These rights include the translation, reuse of papers/articles, transmit or distribute, or use the material or parts(s) contained therein to be published in scientific, academic, technical, professional journals or any other periodicals including any other works derived from them, all over the world, in English and Arabic, whether in print or in electronic edition of such journals and periodicals in all types of media or formats now or that may exist in the future. Rights also include giving license (or granting permission) to a third party to use the materials and any other works derived from them and publish them in such journals and periodicals all over the world. Transfer right under this Agreement includes the right to modify such materials to be used with computer systems and software, or to reproduce or publish it in e-formats and also to incorporate them into retrieval systems.
- Reproduction, reference, transmission, distribution or any other use of the content, or any parts of the subjects included in that content in any manner permitted by this Agreement, must be accompanied by mentioning the source which is (the Academic Journal of Nawroz University) and the publisher in addition to the title of the article, the name of the author (or co-authors), journal’s name, volume or issue, publisher's copyright, and publication year.
- The Academic Journal of Nawroz University reserves all rights to publish research papers/articles issued under a “Creative Commons License (CC BY-NC-ND 4.0) which permits unrestricted use, distribution, and reproduction of the paper/article by any means, provided that the original work is correctly cited.
- Reservation of Rights
The researcher(s) preserves all intellectual property rights (except for the one transferred to the publisher under this Agreement).
- Researcher’s guarantee
The researcher(s) hereby guarantees that the content of the paper/article is original. It has been submitted only to the Academic Journal of Nawroz University and has not been previously published by any other party.
In the event that the paper/article is written jointly with other researchers, the researcher guarantees that he/she has informed the other co-authors about the terms of this agreement, as well as obtaining their signature or written permission to sign on their behalf.
The author further guarantees:
- The research paper/article does not contain any defamatory statements or illegal comments.
- The research paper/article does not violate other's rights (including but not limited to copyright, patent, and trademark rights).
This research paper/article does not contain any facts or instructions that could cause damages or harm to others, and publishing it does not lead to disclosure of any confidential information.