Comparing machine learning and binary regression approach for motor insurance prediction
Abstract
This study compares the performance of binary regression with the power cauchit (PC) link function and random forest in predicting motor insurance policyholder behavior using an imbalanced dataset. The dataset comprises 4,000 policyholders, with the response variable indicating whether a client purchased a full coverage plan (1) or not (0). Predictors include characteristics such as men, urban, private, age, and seniority. Binary regression was implemented using PyStan, while random forest was created with scikit-learn without additional hyperparameter tuning. Results demonstrate that random forest outperformed binary regression in a range of performance metrics, as well as specialized metrics suitable for imbalanced data. Findings point to the effectiveness of machine learning (ML) algorithms, exemplified by random forest, offer more robust performance in handling complex, imbalanced datasets compared to traditional statistical models. This highlights the potential of random forest to improve predictive accuracy in applications such as motor insurance policyholder behavior analysis.
Keywords
Binary regression; Imbalanced data; Insurance; Machine learning; Random forest
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v40.i3.pp1576-1585
Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).