Журнал «Современная Наука»

Russian (CIS)English (United Kingdom)
MOSCOW +7(495)-142-86-81

INVESTIGATION OF THE EFFECTIVENESS OF DATA IMBALANCE PROCESSING METHODS ON SYNTHETIC DATASETS

Shakirov Kirill Faridovich  (Senior Lecturer Federal State Budgetary Educational Institution of Higher Education «Plekhanov Russian University of Economics». )

The article discusses the problem of class imbalance in machine learning. Various resampling methods for solving this problem are compared. A study using synthetically generated data with varying degrees of imbalance from 10% to 90% of the minority class is presented. The data was trained on a random forest model. Various methods of resampling to the training sample were analyzed: without processing, random oversampling (Random Over), SMOTE, random reduction of the sample (Random Under) and SMOTETomek. The effectiveness of the methods was evaluated using the following metrics: Accuracy, area under the ROC curve (ROC-AUC), Precision, Recall, and F1-measure. The results showed that the SMOTETomek method demonstrates the best performance among the considered approaches.

Keywords:data imbalance, imbalance processing methods, synthetic data, Random Over, SMOTE, Random Under, SMOTETomek, quality metrics, machine learning.

 

Read the full article …



Citation link:
Shakirov K. F. INVESTIGATION OF THE EFFECTIVENESS OF DATA IMBALANCE PROCESSING METHODS ON SYNTHETIC DATASETS // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2026. -№01. -С. 171-174 DOI 10.37882/2223-2966.2026.01.38
LEGAL INFORMATION:
Reproduction of materials is permitted only for non-commercial purposes with reference to the original publication. Protected by the laws of the Russian Federation. Any violations of the law are prosecuted.
© ООО "Научные технологии"