Журнал «Современная Наука»

INVESTIGATION OF THE EFFECTIVENESS OF DATA IMBALANCE PROCESSING METHODS ON SYNTHETIC DATASETS

Shakirov Kirill Faridovich (Senior Lecturer Federal State Budgetary Educational Institution of Higher Education «Plekhanov Russian University of Economics». )

The article discusses the problem of class imbalance in machine learning. Various resampling methods for solving this problem are compared. A study using synthetically generated data with varying degrees of imbalance from 10% to 90% of the minority class is presented. The data was trained on a random forest model. Various methods of resampling to the training sample were analyzed: without processing, random oversampling (Random Over), SMOTE, random reduction of the sample (Random Under) and SMOTETomek. The effectiveness of the methods was evaluated using the following metrics: Accuracy, area under the ROC curve (ROC-AUC), Precision, Recall, and F1-measure. The results showed that the SMOTETomek method demonstrates the best performance among the considered approaches.

Keywords:data imbalance, imbalance processing methods, synthetic data, Random Over, SMOTE, Random Under, SMOTETomek, quality metrics, machine learning.

Read the full article …

Citation link:
Shakirov K. F. INVESTIGATION OF THE EFFECTIVENESS OF DATA IMBALANCE PROCESSING METHODS ON SYNTHETIC DATASETS // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2026. -№01. -С. 171-174 DOI 10.37882/2223-2966.2026.01.38

LEGAL INFORMATION:
Reproduction of materials is permitted only for non-commercial purposes with reference to the original publication. Protected by the laws of the Russian Federation. Any violations of the law are prosecuted.
© ООО "Научные технологии"

Terms Publications

Issues of the journals (Archive)

ECONOMICS AND LAW

HUMANITIES

NATURAL AND TECHNICAL SCIENCES

COGNITION

Серия - Natural and Technical Sciences

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026