Induction of semantic classes from natural language text
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering by committee
Linguistic preprocessing for distributional classification of words
ElectricDict '04 Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries
Statistical Word Sense Disambiguation in Contexts for Russian Nouns Denoting Physical Objects
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Random indexing distributional semantic models for Croatian language
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
ROCLING '11 Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing
Hi-index | 0.00 |
The paper deals with development and application of automatic word clustering (AWC) tool aimed at processing Russian texts of various types, which should satisfy the requirements of flexibility and compatibility with other linguistic resources. The construction of AWC tool requires computer implementation of latent semantic analysis (LSA) combined with clustering algorithms. To meet the need, Python-based software has been developed. Major procedures performed by AWC tool are segmentation of input texts and context analysis, co-occurrence matrix construction, agglomerative and K- means clustering. Special attention is drawn to experimental results on clustering words in raw texts with changing parameters.