Captains Andrey Ivanovich (Assistant, National Research University "MIET")
Troyanovsky Vladimir Mikhailovich (Doctor of Technical Sciences Professor, National Research University "MIET")
When solving the task of clustering contexts, we face to the problem of automatically determining the number of clusters. Clustering of contexts allows us to effectively solve the problem of homonymy, which in turn leads to an increase in the quality of several problems in computational linguistics. Using the example of the text document classification problem, we will try to calculate the required number of clusters to increase the percentage of recognized documents. In the course of work, based on the DBSCAN density algorithm, we were able to calculate the number of clusters, then, based on agglomerative hierarchical clustering, break down homonymous contexts into clusters and remove homonymy. After that, we checked the quality of classification based on the naive Bayesian classifier algorithm and made sure that the percentage of correctly recognized documents increased.
Keywords:hierarchical clustering, cluster analysis, classification, polysemy, DBSCAN.
Read the full article …
Citation link: Captains A. I., Troyanovsky V. M. The problem of automatic determination the number of clusters in the clustering contexts task // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2020. -№06. -С. 100-104 DOI 10.37882/2223-2966.2020.06.19 |