Comparing machine learning and binary regression approach for motor insurance prediction

Ridha Sefina Samosir; Jorge Luis Bazán Guzmán; Giselle Halim

doi:10.11591/ijeecs.v40.i3.pp1576-1585

Comparing machine learning and binary regression approach for motor insurance prediction

Ridha Sefina Samosir, Jorge Luis Bazán Guzmán, Giselle Halim

Abstract

This study compares the performance of binary regression with the power cauchit (PC) link function and random forest in predicting motor insurance policyholder behavior using an imbalanced dataset. The dataset comprises 4,000 policyholders, with the response variable indicating whether a client purchased a full coverage plan (1) or not (0). Predictors include characteristics such as men, urban, private, age, and seniority. Binary regression was implemented using PyStan, while random forest was created with scikit-learn without additional hyperparameter tuning. Results demonstrate that random forest outperformed binary regression in a range of performance metrics, as well as specialized metrics suitable for imbalanced data. Findings point to the effectiveness of machine learning (ML) algorithms, exemplified by random forest, offer more robust performance in handling complex, imbalanced datasets compared to traditional statistical models. This highlights the potential of random forest to improve predictive accuracy in applications such as motor insurance policyholder behavior analysis.

Keywords

Binary regression; Imbalanced data; Insurance; Machine learning; Random forest

Full Text:

PDF

DOI: http://doi.org/10.11591/ijeecs.v40.i3.pp1576-1585

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).

IJEECS visitor statistics

Username
Password
Remember me