Comparing machine learning and binary regression approach for motor insurance prediction

Ridha Sefina Samosir, Jorge Luis Bazán Guzmán, Giselle Halim

Abstract


This study compares the performance of binary regression with the power cauchit (PC) link function and random forest in predicting motor insurance policyholder behavior using an imbalanced dataset. The dataset comprises 4,000 policyholders, with the response variable indicating whether a client purchased a full coverage plan (1) or not (0). Predictors include characteristics such as men, urban, private, age, and seniority. Binary regression was implemented using PyStan, while random forest was created with scikit-learn without additional hyperparameter tuning. Results demonstrate that random forest outperformed binary regression in a range of performance metrics, as well as specialized metrics suitable for imbalanced data. Findings point to the effectiveness of machine learning (ML) algorithms, exemplified by random forest, offer more robust performance in handling complex, imbalanced datasets compared to traditional statistical models. This highlights the potential of random forest to improve predictive accuracy in applications such as motor insurance policyholder behavior analysis.

Keywords


Binary regression; Imbalanced data; Insurance; Machine learning; Random forest

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v40.i3.pp1576-1585

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).

shopify stats IJEECS visitor statistics