Diabetes diagnosis based on hard and soft voting classifiers combining statistical learning models

Gustavo Peixoto de Oliveira
Anderson Fonseca
Paulo Canas Rodrigues


Diabetes mellitus is one of the deadliest incurable diseases globally, and its cases continue upward. The
identification of the disease in an early way helps fight it; however, blood tests can be considered invasive,
discouraging its accomplishment. In this vein, this work aims to build a model as an alternative to traditional exams to identify the disease. Statistical learning algorithms such as logistic regression, K-nearest neighbors, decision trees, random forest, and support vector machines were used for diabetes classification. These models were considered separately and combined via hard and soft voting classifiers. The methods were applied to a widely known dataset of 768 individuals and nine variables, compared using several accuracy metrics based on the confusion matrix, and used to estimate the probability of diabetes for a given profile.

Oliveira, G. P. de, Fonseca, A., & Rodrigues, P. C. (2022). Diabetes diagnosis based on hard and soft voting classifiers combining statistical learning models. REVISTA BRASILEIRA DE BIOMETRIA, 40(4), 415–427. https://doi.org/10.28951/bjb.v40i4.605


