Shakirov Kirill Faridovich (Senior Lecturer
Federal State Budgetary Educational Institution of Higher Education
«Plekhanov Russian University of Economics».
)
| |
The article discusses the problem of class imbalance in machine learning. Various resampling methods for solving this problem are compared. A study using synthetically generated data with varying degrees of imbalance from 10% to 90% of the minority class is presented. The data was trained on a random forest model. Various methods of resampling to the training sample were analyzed: without processing, random oversampling (Random Over), SMOTE, random reduction of the sample (Random Under) and SMOTETomek. The effectiveness of the methods was evaluated using the following metrics: Accuracy, area under the ROC curve (ROC-AUC), Precision, Recall, and F1-measure. The results showed that the SMOTETomek method demonstrates the best performance among the considered approaches.
Keywords:data imbalance, imbalance processing methods, synthetic data, Random Over, SMOTE, Random Under, SMOTETomek, quality metrics, machine learning.
|
|
| |
|
Read the full article …
|
Citation link: Shakirov K. F. INVESTIGATION OF THE EFFECTIVENESS OF DATA IMBALANCE PROCESSING METHODS ON SYNTHETIC DATASETS // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2026. -№01. -С. 171-174 DOI 10.37882/2223-2966.2026.01.38 |
|
|