An Efficient Categorization of Diabetes Imbalanced Data Using SMOTE-ENN With Fine-Tuned LS-SVM Algorithm

Authors

  • Nwayyin Najat Mohammed University of Sulaimani
  • Mariwan Hama Saeed University of Halabja

DOI:

https://doi.org/10.25195/ijci.v51i1.579

Keywords:

Diabetes Mellitus; Imbalanced datasets; Preprocessing; Resampling; SMOTE-ENN; least square Support vector machine; Hyperparameter; Optimization.

Abstract

Diabetes has been recognized as a major cause of death. Diabetes is a chronic disease. In recent years, the impact of diabetes has increased dramatically, and it has become a global threat. Machine learning is a part of computational algorithms designed to imitate human intelligence by learning from the surrounding environment. Type 2 diabetes is indicated by deviation high blood glucose levels attributable to insulin resistance and reduced pancreatic insulin production. In this study, two diabetes datasets are used, the Pima Indians diabetes and Iraqi Society Diabetes ISD datasets. They are collection of data on diabetes which characterized by an imbalanced distribution and the presence of outliers. The diabetes data sets are preprocessed. Many methods, including data resampling have been proposed to address the data sets imbalance issue. We utilized the resampling SMOTE-ENN technique to address the imbalance diabetes datasets issue and imputation. The classification of imbalanced datasets is a crucial field in machine learning. The machine learning approach that is used in this study is the Least Square Support Vector Machine LS-SVM to categorize the diabetes patients. Machine Learning ML algorithms are constructed by a set of hyperparameters. Thus, hyperparameters values should be carefully chosen. We used grid search algorithm to optimize LS-SVM algorithm hyperparameters. The classification results were improved. In addition, we could enhance the performance of the fine-tuned LS-SVM with the used resampling technique, SMOTE-ENN, that processes diabetes datasets. The performance metrics that evaluate the proposed algorithm SMOTE-ENN and fine-tuned LS-SVM are accuracy, recall and precision. The metrics measurements obtained were much better and higher when the proposed algorithm was used to categorize diabetes patients.

Downloads

Download data is not yet available.

Author Biographies

Nwayyin Najat Mohammed , University of Sulaimani

Computer Science Department

Mariwan Hama Saeed, University of Halabja

Computer Education Department

Downloads

Published

2025-06-28