|
Modern data quality management systems are generally focused on anomaly detection, leaving the stage of interpretable and well-grounded correction up to the user. In the context of growing data volumes and limited human resources, this creates a risk of improper anomaly handling and reduced trust in analytical results. This article proposes an algorithm for generating recommendations for correcting anomalies in tabular data, combining statistical methods (mean, standard deviation, Z-score, quantile approach) with machine learning algorithms (SVM, Random Forest, Isolation Forest). The algorithm not only identifies likely anomalies but also suggests correction strategies along with explanations of the reasoning behind each decision. Based on the detection method, distribution properties, and the share of outliers, the algorithm generates recommendations such as substitution, deletion, or manual verification. Pseudocode illustrating decision logic is provided, as well as a mapping table between detection methods and correction strategies. Evaluation on synthetic data confirms the interpretability, flexibility, and practical relevance of the proposed approach. The results can be useful for developing intelligent data preprocessing systems and for integration into decision support systems.
Keywords:anomaly detection, data correction, data analytics, data quality, anomalies, tabular data.
|