Exploring the Power of eXtreme Gradient Boosting Algorithm in Machine Learning: a Review

Authors

  • Zeravan Arif Ali Department of Information Technology Management, Duhok Polytechnic, Kurdistan Region, Iraq
  • Ziyad H. Abduljabbar Department of Information Technology Management, Duhok Polytechnic, Kurdistan Region, Iraq
  • Hanan A. Taher Department of Information Technology Management, Duhok Polytechnic, Kurdistan Region, Iraq
  • Amira Bibo Sallow Department of Information Technology Management, Duhok Polytechnic, Kurdistan Region, Iraq
  • Saman M. Almufti Department Computer Science, Nawroz University, Duhok, KRG-Iraq

DOI:

https://doi.org/10.25007/ajnu.v12n2a1612

Keywords:

Data Science, Machine learning, XGBoost Algorithm, semi-supervised learning, python

Abstract

The primary task of machine learning is to extract valuable information from the data that is generated every day, process it to learn from it, and take useful actions. Original language process, pattern detection, search engines, medical diagnostics, bioinformatics, and chemical informatics are all examples of application areas for machine learning. XGBoost is a recently released machine learning algorithm that has shown exceptional capability for modeling complex systems and is the most superior machine learning algorithm in terms of prediction accuracy and interpretability and classification versatility. XGBoost is an enhanced distributed scaling enhancement library that is built to be extremely powerful, adaptable, and portable. It uses augmented scaling to incorporate machine learning algorithms. it is a parallel tree boost that addresses a variety of data science problems quickly and accurately. Python remains the language of choice for scientific computing, data science, and machine learning, which boosts performance and productivity by enabling the use of clean low-level libraries and high-level APIs. This paper presents one of the most prominent supervised and semi-supervised learning (SSL) machine learning algorithms in a Python environment.

Downloads

Download data is not yet available.

References

M. Khalaf et al., “A Data Science Methodology Based on Machine Learning Algorithms for Flood Severity Prediction,” Sep. 2018, doi: 10.1109/CEC.2018.8477904.

“CLASSIFICATION BASED ON SEMI-SUPERVISED LEARNING: A REVIEW | Iraqi Journal for Computers and Informatics.” http://ijci.uoitc.edu.iq/index.php/ijci/article/view/277 (accessed May 20, 2021).

A. Dey, “Machine Learning Algorithms: A Review,” Int. J. Comput. Sci. Inf. Technol., vol. 7, no. 3, pp. 1174–1179, 2016, [Online]. Available: www.ijcsit.com.

N. M. Abdulkareem and A. M. Abdulazeez, “Machine Learning Classification Based on Radom Forest Algorithm: A Review,” Int. J. Sci. Bus., vol. 5, no. 2, pp. 128–142, 2021, Accessed: May 20, 2021. [Online]. Available: https://ideas.repec.org/a/aif/journl/v5y2021i2p128-142.html.

J. Pesantez-Narvaez, M. Guillen, and M. Alcañiz, “Predicting motor insurance claims using telematics data—XGboost versus logistic regression,” Risks, vol. 7, no. 2, 2019, doi: 10.3390/risks7020070.

S. Raschka, J. Patterson, and C. Nolet, “Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence,” Inf., vol. 11, no. 4, 2020, doi: 10.3390/info11040193.

“(2) Machine Learning Supervised Algorithms of Gene Selection: A Review | Request PDF.” https://www.researchgate.net/publication/341119469_Machine_Learning_Supervised_Algorithms_of_Gene_Selection_A_Review (accessed May 20, 2021).

O. Ahmed and A. Brifcani, “Gene Expression Classification Based on Deep Learning,” in 4th Scientific International Conference Najaf, SICN 2019, Apr. 2019, pp. 145–149, doi: 10.1109/SICN47020.2019.9019357.

R. N. Behera, K. Das, B. Tech, and A. Professor, “A Survey on Machine Learning: Concept, Algorithms and Applications Machine Learning View project International Journal of Innovative Research in Computer and Communication Engineering A Survey on Machine Learning: Concept, Algorithms and Applications,” Artic. Int. J. Innov. Res. Comput., 2017, doi: 10.15680/IJIRCCE.2017.

S. Athmaja, M. Hanumanthappa, and V. Kavitha, “A survey of machine learning algorithms for big data analytics,” in Proceedings of 2017 International Conference on Innovations in Information, Embedded and Communication Systems, ICIIECS 2017, Jan. 2018, vol. 2018-Janua, pp. 1–4, doi: 10.1109/ICIIECS.2017.8276028.

M. Iqbal, I. Muhammad, and Z. Yan, “SUPERVISED MACHINE LEARNING APPROACHES: A SURVEY machine learning, SEO, Virtual Reality View project Content Management System View project SUPERVISED MACHINE LEARNING APPROACHES: A SURVEY,” Artic. Int. J. Soft Comput., 2015, doi: 10.21917/ijsc.2015.0133.

H. Yahia, A. A.-I. J. of S. and, and undefined 2021, “Medical Text Classification Based on Convolutional Neural Network: A Review,” ideas.repec.org, Accessed: May 06, 2021. [Online]. Available: https://ideas.repec.org/a/aif/journl/v5y2021i3p27-41.html.

P. C. Sen, M. Hajra, and M. Ghosh, “Supervised Classification Algorithms in Machine Learning: A Survey and Review,” in Advances in Intelligent Systems and Computing, 2020, vol. 937, pp. 99–111, doi: 10.1007/978-981-13-7403-6_11.

A. Mohsin Abdulazeez, D. Zeebaree, D. M. Abdulqader, and D. Q. Zeebaree, “Machine Learning Supervised Algorithms of Gene Selection: A Review Machine Learning View project How To Choose A Performance Metric View project Machine Learning Supervised Algorithms of Gene Selection: A Review,” 2020. Accessed: May 06, 2021. [Online]. Available: https://www.researchgate.net/publication/341119469.

S. C. Dharmadhikari, M. Ingle, and P. Kulkarni, “Empirical Studies on Machine Learning Based Text Classification Algorithms,” Adv. Comput. An Int. J. ( ACIJ ), vol. 2, no. 6, 2011, doi: 10.5121/acij.2011.2615.

D. Mustafa Abdullah and A. Mohsin Abdulazeez, “Machine Learning Applications based on SVM Classification A Review,” Qubahan Acad. J., vol. 1, no. 2, pp. 81–90, Apr. 2021, doi: 10.48161/qaj.v1n2a50.

S. B. Kotsiantis, I. D. Zaharakis, and P. E. Pintelas, “Machine learning: A review of classification and combining techniques,” Artif. Intell. Rev., vol. 26, no. 3, pp. 159–190, Nov. 2006, doi: 10.1007/s10462-007-9052-3.

P. Strecht, L. Cruz, C. Soares, J. Mendes-Moreira, and R. Abreu, “A Comparative Study of Classification and Regression Algorithms for Modelling Students’ Academic Performance,” International Educational Data Mining Society. e-mail: [email protected]; Web site: http://www.educationaldatamining.org, Jun. 2015.

H. Q. Tran and C. Ha, “Improved visible light-based indoor positioning system using machine learning classification and regression,” Appl. Sci., vol. 9, no. 6, p. 1048, Mar. 2019, doi: 10.3390/app9061048.

J. Alzubi, A. Nayyar, and A. Kumar, “Machine Learning from Theory to Algorithms: An Overview,” in Journal of Physics: Conference Series, Nov. 2018, vol. 1142, no. 1, p. 12012, doi: 10.1088/1742-6596/1142/1/012012.

B. Choubin, E. Moradi, M. Golshan, J. Adamowski, F. Sajedi-Hosseini, and A. Mosavi, “An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines,” Sci. Total Environ., vol. 651, pp. 2087–2096, Feb. 2019, doi: 10.1016/j.scitotenv.2018.10.064.

D. P. P. Mesquita, J. P. P. Gomes, and A. H. Souza Junior, “Ensemble of Efficient Minimal Learning Machines for Classification and Regression,” Neural Process. Lett., vol. 46, no. 3, pp. 751–766, Dec. 2017, doi: 10.1007/s11063-017-9587-5.

R. Costache et al., “Novel hybrid models between bivariate statistics, artificial neural networks and boosting algorithms for flood susceptibility assessment,” J. Environ. Manage., vol. 265, p. 110485, Jul. 2020, doi: 10.1016/j.jenvman.2020.110485.

G. Biau, B. Cadre, and L. Rouvière, “Accelerated gradient boosting,” Mach. Learn., vol. 108, no. 6, pp. 971–992, Jun. 2019, doi: 10.1007/s10994-019-05787-1.

F. Sigrist, “Gradient and Newton boosting for classification and regression,” Expert Syst. Appl., vol. 167, p. 114080, Apr. 2021, doi: 10.1016/j.eswa.2020.114080.

R. Mitchell, A. Adinets, T. Rao, and E. Frank, “XGBoost: Scalable GPU accelerated learning,” arXiv, pp. 1–5, 2018.

W. Niu, T. Li, X. Zhang, T. Hu, T. Jiang, and H. Wu, “Using XGBoost to Discover Infected Hosts Based on HTTP Traffic,” Secur. Commun. Networks, vol. 2019, 2019, doi: 10.1155/2019/2182615.

D. Uenoyama, H. Yoshiura, and M. Ichino, “Personal authentication of iris and periocular recognition using XGBoost,” 2019 IEEE 8th Glob. Conf. Consum. Electron. GCCE 2019, pp. 186–187, 2019, doi: 10.1109/GCCE46687.2019.9015469.

S. Zhao et al., “Mutation grey wolf elite PSO balanced XGBoost for radar emitter individual identification based on measured signals,” Meas. J. Int. Meas. Confed., vol. 159, p. 107777, 2020, doi: 10.1016/j.measurement.2020.107777.

C. Zopluoglu, “Detecting Examinees With Item Preknowledge in Large-Scale Testing Using Extreme Gradient Boosting (XGBoost),” Educ. Psychol. Meas., vol. 79, no. 5, pp. 931–961, 2019, doi: 10.1177/0013164419839439.

D. Bhulakshmi and G. Gandhi, “The Prediction of Diabetes in Pima Indian Women Mellitus Based on XGBOOST Ensemble Modeling Using Data Science The Prediction of Diabetes in Pima Indian women Mellitus Based on XGBOOST Ensemble Modeling using data science,” 2020.

M. A. Fauzan and H. Murfi, “The accuracy of XGBoost for insurance claim prediction,” Int. J. Adv. Soft Comput. its Appl., vol. 10, no. 2, pp. 159–171, 2018.

A. Pathy, S. Meher, and B. P, “Predicting algal biochar yield using eXtreme Gradient Boosting (XGB) algorithm of machine learning methods,” Algal Res., vol. 50, no. April, p. 102006, 2020, doi: 10.1016/j.algal.2020.102006.

R. Zhong, R. Johnson, and Z. Chen, “Generating pseudo density log from drilling and logging-while-drilling data using extreme gradient boosting (XGBoost),” Int. J. Coal Geol., vol. 220, no. July 2019, p. 103416, 2020, doi: 10.1016/j.coal.2020.103416.

R. Santhanam, N. Uzir, S. Raman, and S. Banerjee, “Experimenting XGBoost Algorithm for Prediction and Classification of Different Ramraj S , Nishant Uzir , Sunil R and Shatadeep Banerjee Experimenting XGBoost Algorithm for Prediction and Classi fi cation of Different Datasets,” Int. J. Control Theory Appl., vol. 9, no. March, pp. 651–662, 2017.

R. Sundaram, “An End-to-End Guide to Understand the Math behind XGBoost,” Anal. Vidhja, 2018, [Online]. Available: https://www.analyticsvidhya.com/blog/2018/09/an-end-to-end-guide-to-understand-the-math-behind-xgboost/.

R. Zhang, B. Li, and B. Jiao, “Application of XGboost Algorithm in Bearing Fault Diagnosis,” IOP Conf. Ser. Mater. Sci. Eng., vol. 490, no. 7, 2019, doi: 10.1088/1757-899X/490/7/072062.

“XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction | IEEE Journals & Magazine | IEEE Xplore.” https://ieeexplore.ieee.org/abstract/document/8370098 (accessed May 20, 2021).

Y. Qiu, J. Zhou, M. Khandelwal, H. Yang, P. Yang, and C. Li, “Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration,” Eng. Comput., pp. 1–18, Apr. 2021, doi: 10.1007/s00366-021-01393-9.

X. Ji, W. Tong, Z. Liu, and T. Shi, “Five-feature model for developing the classifier for synergistic vs. Antagonistic drug combinations built by XGboost,” Front. Genet., vol. 10, no. JUL, p. 600, Jul. 2019, doi: 10.3389/fgene.2019.00600.

X. Liao, N. Cao, M. Li, and X. Kang, “Research on Short-Term Load Forecasting Using XGBoost Based on Similar Days,” in Proceedings - 2019 International Conference on Intelligent Transportation, Big Data and Smart City, ICITBS 2019, Mar. 2019, pp. 675–678, doi: 10.1109/ICITBS.2019.00167.

B. Yu et al., “SubMito-XGBoost: Predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting,” Bioinformatics, vol. 36, no. 4, pp. 1074–1081, 2020, doi: 10.1093/bioinformatics/btz734.

C. Midroni, P. J. Leimbigler, G. Baruah, M. Kolla, A. J. Whitehead, and Y. Fossat, “Predicting glycemia in type 1 diabetes patients: Experiments with XGBoost,” CEUR Workshop Proc., vol. 2148, pp. 79–84, 2018.

X. Ma, J. Sha, D. Wang, Y. Yu, Q. Yang, and X. Niu, “Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning,” Electron. Commer. Res. Appl., vol. 31, pp. 24–39, 2018, doi: 10.1016/j.elerap.2018.08.002.

Y. Liang et al., “Product marketing prediction based on XGboost and LightGBM algorithm,” ACM Int. Conf. Proceeding Ser., no. 1, pp. 150–153, 2019, doi: 10.1145/3357254.3357290.

Y. Song et al., “Prediction of double-high biochemical indicators based on lightGBM and XGBoost,” ACM Int. Conf. Proceeding Ser., pp. 189–193, 2019, doi: 10.1145/3349341.3349400.

Z. Chen, F. Jiang, Y. Cheng, X. Gu, W. Liu, and J. Peng, “XGBoost Classifier for DDoS Attack Detection and Analysis in SDN-Based Cloud,” Proc. - 2018 IEEE Int. Conf. Big Data Smart Comput. BigComp 2018, pp. 251–256, 2018, doi: 10.1109/BigComp.2018.00044.

L. Chao, Z. Wen-hui, and L. Ji-ming, “Study of Star/Galaxy Classification Based on the XGBoost Algorithm,” Chinese Astron. Astrophys., vol. 43, no. 4, pp. 539–548, 2019, doi: 10.1016/j.chinastron.2019.11.005.

C. Wang, C. Deng, and S. Wang, “Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost,” Pattern Recognit. Lett., vol. 136, pp. 190–197, 2020, doi: 10.1016/j.patrec.2020.05.035.

N. Manju, B. S. Harish, and V. Prajwal, “Ensemble Feature Selection and Classification of Internet Traffic using XGBoost Classifier,” Int. J. Comput. Netw. Inf. Secur., vol. 11, no. 7, pp. 37–44, 2019, doi: 10.5815/ijcnis.2019.07.06.

S. Thongsuwan, S. Jaiyen, A. Padcharoen, and P. Agarwal, “ConvXGB: A new deep learning model for classification problems based on CNN and XGBoost,” Nucl. Eng. Technol., vol. 53, no. 2, pp. 522–531, 2021, doi: 10.1016/j.net.2020.04.008.

K. Song, F. Yan, T. Ding, L. Gao, and S. Lu, “A steel property optimization model based on the XGBoost algorithm and improved PSO,” Comput. Mater. Sci., vol. 174, no. December 2019, p. 109472, 2020, doi: 10.1016/j.commatsci.2019.109472.

Y. Wang, “a Xgb Oost R Isk M Odel Via F Eature S Election and B Ayesian H Yper -P Arameter O Ptimization,” vol. 11, no. 1, pp. 1–17, 2019.

K. Budholiya, S. K. Shrivastava, and V. Sharma, “An optimized XGBoost based diagnostic system for effective prediction of heart disease,” J. King Saud Univ. - Comput. Inf. Sci., no. xxxx, 2020, doi: 10.1016/j.jksuci.2020.10.013.

J. Guo et al., “An XGBoost-based physical fitness evaluation model using advanced feature selection and Bayesian hyper-parameter optimization for wearable running monitoring,” Comput. Networks, vol. 151, pp. 166–180, 2019, doi: 10.1016/j.comnet.2019.01.026.

J. Zhou, Y. Qiu, S. Zhu, D. J. Armaghani, M. Khandelwal, and E. T. Mohamad, “Estimation of the TBM advance rate under hard rock conditions using XGBoost and Bayesian optimization,” Undergr. Sp., 2020, doi: 10.1016/j.undsp.2020.05.008.

D. Zhang, L. Qian, B. Mao, C. Huang, B. Huang, and Y. Si, “A Data-Driven Design for Fault Detection of Wind Turbines Using Random Forests and XGboost,” IEEE Access, vol. 6, no. c, pp. 21020–21031, 2018, doi: 10.1109/ACCESS.2018.2818678.

J. Montiel, R. Mitchell, E. Frank, B. Pfahringer, T. Abdessalem, and A. Bifet, “Adaptive XGBoost for evolving data streams,” arXiv, no. 1, 2020.

S. Ji, X. Wang, W. Zhao, and D. Guo, “An application of a three-stage XGboost-based model to sales forecasting of a cross-border e-commerce enterprise,” Math. Probl. Eng., vol. 2019, 2019, doi: 10.1155/2019/8503252.

S. S. Dhaliwal, A. Al Nahid, and R. Abbas, “Effective intrusion detection system using XGBoost,” Inf., vol. 9, no. 7, 2018, doi: 10.3390/info9070149.

Y. Qu, Z. Lin, H. Li, and X. Zhang, “Feature Recognition of Urban Road Traffic Accidents Based on GA-XGBoost in the Context of Big Data,” IEEE Access, vol. 7, pp. 170106–170115, 2019, doi: 10.1109/ACCESS.2019.2952655.

Y. Xu, Y. Jiang, C. Li, Y. Chen, and Y. Yang, “Integration of an XGBoost model and EIS detection to determine the effect of low inhibitor concentrations on E. coli,” J. Electroanal. Chem., vol. 877, p. 114534, 2020, doi: 10.1016/j.jelechem.2020.114534.

A. B. Parsa, A. Movahedi, H. Taghipour, S. Derrible, and A. (Kouros) Mohammadian, “Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis,” Accid. Anal. Prev., vol. 136, no. October 2019, p. 105405, 2020, doi: 10.1016/j.aap.2019.105405.

H. Dong, D. He, and F. Wang, “SMOTE-XGBoost using Tree Parzen Estimator optimization for copper flotation method classification,” Powder Technol., vol. 375, pp. 174–181, 2020, doi: 10.1016/j.powtec.2020.07.065.

Z. Yi et al., “An Efficient Spectral Selection of M Giants Using XGBoost,” Astrophys. J., vol. 887, no. 2, p. 241, 2019, doi: 10.3847/1538-4357/ab54d0.

W. Li, Y. Yin, X. Quan, and H. Zhang, “Gene Expression Value Prediction Based on XGBoost Algorithm,” Front. Genet., vol. 10, no. November, pp. 1–7, 2019, doi: 10.3389/fgene.2019.01077.

J. Nobre and R. F. Neves, “Combining Principal Component Analysis, Discrete Wavelet Transform and XGBoost to trade in the financial markets,” Expert Syst. Appl., vol. 125, pp. 181–194, 2019, doi: 10.1016/j.eswa.2019.01.083.

H. Nguyen, X. N. Bui, H. B. Bui, and D. T. Cuong, “Developing an XGBoost model to predict blast-induced peak particle velocity in an open-pit mine: a case study,” Acta Geophys., vol. 67, no. 2, pp. 477–490, 2019, doi: 10.1007/s11600-019-00268-4.

D. K. Choi, “Data-Driven Materials Modeling with XGBoost Algorithm and Statistical Inference Analysis for Prediction of Fatigue Strength of Steels,” Int. J. Precis. Eng. Manuf., vol. 20, no. 1, pp. 129–138, 2019, doi: 10.1007/s12541-019-00048-6.

E. Al Daoud, “Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset,” Int. J. Comput. Inf. Eng., vol. 13, no. 1, pp. 6–10, 2019.

G. N. Dimitrakopoulos, A. G. Vrahatis, K. Sgarbas, and V. Plagianakos, “Pathway analysis using xgboost classification in biomedical data,” ACM Int. Conf. Proceeding Ser., pp. 1–6, 2018, doi: 10.1145/3200947.3201029.

M. Z. Joharestani, C. Cao, X. Ni, B. Bashir, and S. Talebiesfandarani, “PM2.5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data,” Atmosphere (Basel)., vol. 10, no. 7, pp. 12–18, 2019, doi: 10.3390/atmos10070373.

Y. Zhang, J. Tong, Z. Wang, and F. Gao, “Customer Transaction Fraud Detection Using Xgboost Model,” Proc. - 2020 Int. Conf. Comput. Eng. Appl. ICCEA 2020, pp. 554–558, 2020, doi: 10.1109/ICCEA50009.2020.00122.

H. Mo, H. Sun, J. Liu, and S. Wei, “Developing window behavior models for residential buildings using XGBoost algorithm,” Energy Build., vol. 205, p. 109564, 2019, doi: 10.1016/j.enbuild.2019.109564.

D. Chakraborty and H. Elzarka, “Early detection of faults in HVAC systems using an XGBoost model with a dynamic threshold,” Energy Build., vol. 185, pp. 326–344, 2019, doi: 10.1016/j.enbuild.2018.12.032.

P. Su, Y. Liu, and X. Song, “Research on intrusion detection method based on improved SMOTE and XGBoost,” ACM Int. Conf. Proceeding Ser., pp. 42–49, 2018, doi: 10.1145/3290480.3290505.

Published

2023-05-31

How to Cite

Arif Ali, Z., H. Abduljabbar, Z., A. Taher, H., Bibo Sallow, A., & Almufti, S. M. (2023). Exploring the Power of eXtreme Gradient Boosting Algorithm in Machine Learning: a Review. Academic Journal of Nawroz University, 12(2), 320–334. https://doi.org/10.25007/ajnu.v12n2a1612

Issue

Section

Articles