Customer churn prediction in banks using machine learning algorithms

Document Type : Original Article

Author

Member of Bank Sepah Scholars Club.

Abstract
Purpose: In the banking industry, retaining loyal customers is considerably more cost-effective and profitable than acquiring new ones. Customer churn remains a major challenge for banks, directly reducing profitability, increasing marketing expenditures, and lowering market share. This study evaluates the performance of machine learning algorithms for predicting customer churn across branches of a state-owned bank in Iran between 2021 and 2024. By focusing on customer retention and minimizing the costs of attrition, the study aims to develop an efficient, interpretable model to identify customers at risk of churn.
Methodology: This descriptive-analytical, retrospective study analyzed data from 2,025 active customers over 4 years. For each customer, 12 features covering transactional, behavioral, and demographic characteristics were collected. Following data cleaning, z-score normalization was applied. Several machine learning algorithms—including Decision Tree, Random Forest, Support Vector Machine, Multilayer Perceptron, Bayesian Network, and XGBoost—were implemented in R. Their performance was assessed through 10-fold cross-validation based on accuracy, sensitivity, and specificity.
Findings: Among the 2,025 customers examined, 325 (16%) were identified as churners. Statistical tests revealed no significant differences between churners and non-churners in age, relationship duration with the bank, or average deposits over the past six months. Among the models tested, XGBoost demonstrated superior performance with an accuracy of 96.89%, sensitivity of 87.11%, and specificity of 98.71%. The area under the ROC curve (AUC) for this model was 0.9907, indicating excellent discriminatory power.
Originality/Value: The contribution of this study lies in integrating advanced machine learning techniques with rigorous statistical analysis using real-world banking data. To the best of our knowledge, this is among the few studies to systematically compare multiple ML algorithms within the Iranian banking context, emphasizing both interpretability and robust validation. The findings provide practical insights for banking policymakers to design proactive strategies to improve customer retention.

Keywords

Subjects


[1]     Jafari, M. J., Tarokh, M. J., & Soleimani, P. (2024). A data-driven Agent-based model and framework for Churn prediction in Telecommunication Industry. Modern research in decision making, 9(2), 164–190. (In Persian). https://journal.saim.ir/article_714344.html?lang=en
[2]     Khajvand, M., & Tarokh, M. J. (2011). Analyzing customer segmentation based on customer value components (Case study: a private bank) (Technical note). Advances in industrial engineering, 45(Special Issue), 79–93. https://aie.ut.ac.ir/article_23328.html
[3]     Ha, S., & Bae, S. (2006). Keeping track of customer life cycle to build customer relationship (Vol. 4093).
[4]     Keaveney, S. M. (1995). Customer switching behavior in service industries: An exploratory study. Journal of marketing, 59, 71–82. https://doi.org/10.1177/002224299505900206
[5]     Clemes, M., Gan, C., & Zhang, D. (2010). Customer switching behaviour in the Chinese retail banking industry. International journal of bank marketing, 28, 519–546. https://doi.org/10.1108/02652321011085185?urlappend=%3Futm_source%3Dresearchgate
[6]     Peng, K., Peng, Y., & Li, W. (2023). Research on customer churn prediction and model interpretability analysis. PLOS one, 18, e0289724. https://doi.org/10.1371/journal.pone.0289724
[7]     Asgari, M., Taghva, M., & Taghavifard, M. (2018). Churn prediction in Iran banking industry case of a private Iranian bank. Public management researches, 11(41), 57–82. https://doi.org/10.22111/jmr.2018.4378
[8]     Bilal Zorić, A. (2016). Predicting customer churn in banking industry using neural networks. Interdisciplinary description of complex systems, 14, 116–124. https://doi.org/10.7906/indecs.14.2.1
[9]     Zhang, T. (2022). Prediction and clustering of bank customer churn based on XGBoost and K-means. BCP business & management, 23, 360–366. https://doi.org/10.54691/bcpbm.v23i.1373
[10]   Lundberg, S., & Lee, S. I. (2017). A unified approach to interpreting model predictions. https://doi.org/10.48550/arXiv.1705.07874
[11] Hassani, M., Siccha, S., Richter, F., & Seidl, T. (2015). Efficient process discovery from event streams using sequential pattern mining. 2015 IEEE symposium series on computational intelligence (SSCI). IEEE. https://doi.org/10.1109/SSCI.2015.195
[12]   Böttcher, M., Spott, M., Nauck, D., & Kruse, R. (2009). Mining changing customer segments in dynamic markets. Expert systems with applications, 36(1), 155–164. https://doi.org/10.1016/j.eswa.2007.09.006
[13]   Blocker, C. P., & Flint, D. J. (2007). Customer segments as moving targets: Integrating customer value dynamism into segment instability logic. Industrial marketing management, 36(6), 810–822. https://doi.org/10.1016/j.indmarman.2006.05.016
[14]   Lemmens, A., Croux, C., & Stremersch, S. (2012). Dynamics in the international market segmentation of new product growth. International journal of research in marketing, 29(1), 81–92. https://doi.org/10.1016/j.ijresmar.2011.06.003
[15]   Woo, J., Bae, S., & Park, S. C. (2005). Visualization method for customer targeting using customer map. Expert syst. appl., 28, 763–772. https://doi.org/10.1016/j.eswa.2004.12.041
[16]   Ha, S. H. (2007). Applying knowledge engineering techniques to customer analysis in the service industry. Advanced engineering informatics, 21(3), 293–301. https://doi.org/10.1016/j.aei.2006.12.001
[17]   Tan, H., Xu, J., & Zhao, B. (2009). Research on index system of dynamic customer segmentation based on the case study of china telecom. 2009 international conference on information management and engineering (pp. 441–445). IEEE. https://doi.org/10.1109/ICIME.2009.82
[18]   Homburg, C., Steiner, V., & Totzek, D. (2009). Managing dynamics in a customer portfolio. Journal of marketing - j marketing, 73, 70–89. https://doi.org/10.1509/jmkg.73.5.70?urlappend=%3Futm_source%3Dresearchgate
[19]   Nie, G., Rowe, W., Zhang, L., Tian, Y., & Shi, Y. (2011). Credit card churn forecasting by logistic regression and decision tree. Expert systems with applications, 38(12), 15273–15285. https://doi.org/10.1016/j.eswa.2011.06.028
[20]   Xie, Y., Li, X., Ngai, E., & Ying, W. (2009). Customer churn prediction using improved balanced random forests. Expert systems with applications, 7, 5445–5449. https://doi.org/10.1016/j.eswa.2008.06.121
[21]   Amin, A., Al-Obeidat, F., Shah, B., Adnan, A., Loo, J., & Anwar, S. (2018). Customer churn prediction in telecommunication industry using data certainty. Journal of business research, 94. https://doi.org/10.1016/j.jbusres.2018.03.003
[22]   Ahmad, R., & Buttle, F. (2002). Customer retention management: A reflection of theory and practice. Marketing intelligence & planning, 20, 149–161. https://doi.org/10.1108/02634500210428003?urlappend=%3Futm_source%3Dresearchgate
[23]   Awanife, S. (2025). Customer churn prediction in digital banking: a comparative study of xai techniques for interpretable decision-making. American journal of humanities and social sciences research, 09(07), 114–122. https://B2n.ir/nr8717
[24]   Coussement, K., & Van den Poel, D. (2008). Improving customer attrition prediction by integrating emotions from client/company interaction emails and evaluating multiple classifiers. Expert systems with applications, 36, 6127–6134. https://doi.org/10.1016/j.eswa.2008.07.021
[25]   Ballings, M., & Van den Poel, D. (2012). Customer event history for churn prediction: How long is long enough? Expert systems with applications, 39, 13517–13522. https://doi.org/10.1016/j.eswa.2012.07.006
[26]   Li, Y., & Yan, K. (2025). Prediction of bank credit customers churn based on machine learning and interpretability analysis. Data science in finance and economics, 5(1), 19–34. https://doi.org/10.3934/DSFE.2025002
[27]   Singh, P. P., Anik, F. I., Senapati, R., Sinha, A., Sakib, N., & Hossain, E. (2024). Investigating customer churn in banking: a machine learning approach and visualization app for data science and management. Data science and management, 7(1), 7–16. https://doi.org/10.1016/j.dsm.2023.09.002
[28]   Kosiba, J. P., Boateng, H., Okoe, A., & Hinson, R. (2018). Trust and customer engagement in the retail banking sector. Service industries journal, 40(13). https://doi.org/10.1080/02642069.2018.1520219?urlappend=%3Futm_source%3Dresearchgate
[29]   Bhuria, R., Gupta, S., Kaur, U., Bharany, S., Rehman, A., Hussen, S., … & Jangir, P. (2025). Ensemble-based customer churn prediction in banking: a voting classifier approach for improved client retention using demographic and behavioral data. Discover sustainability, 6(1). https://doi.org/10.1007/s43621-025-00807-8
[30]   Jones, K., & Leonard, L. N. K. (2008). Trust in consumer-to-consumer electronic commerce. Information & management, 45(2), 88–95. https://doi.org/10.1016/j.im.2007.12.002
[31]   Larivière, B., & Van den Poel, D. (2005). Predicting customer retention and profitability by using random forests and regression forests techniques. Expert systems with applications, 29(2), 472–484. https://doi.org/10.1016/j.eswa.2005.04.043
[32]   Bucko, J., Pavlov, B., & Pitka, T. (2025). Evaluating the effectiveness of customer behavior analysis in online sales through financial composite metrics. Journal of marketing analytics, 1–7. https://doi.org/10.1057/s41270-025-00430-6
[33]   Aliyev, M., Ahmadov, E., Gadirli, H., Mammadova, A., & Alasgarov, E. (2020). Segmenting bank customers via RFM model and unsupervised machine learning. ArXiv E-Prints. https://ui.adsabs.harvard.edu/link_gateway/2020arXiv200808662A/doi:10.48550/arXiv.2008.08662