Understanding search engines: mathematical modeling and text retrieval
Understanding search engines: mathematical modeling and text retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Documents, concepts and neural networks
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Learning Image Components for Object Recognition
The Journal of Machine Learning Research
An approach to clustering abstracts
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Boolean Factor Analysis by Attractor Neural Network
IEEE Transactions on Neural Networks
Neural network Boolean factor analysis and application
CIMMACS'07 Proceedings of the 6th WSEAS international conference on Computational intelligence, man-machine systems and cybernetics
New measure of boolean factor analysis quality
ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part I
Attractor neural network combined with likelihood maximization algorithm for boolean factor analysis
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
Hi-index | 0.00 |
The objective of this paper is to introduce a neural-network-based algorithm for word clustering as an extension of the neural-network-based Boolean factor analysis algorithm (Frolov et al., 2007). It is shown that this extended algorithm supports even the more complex model of signals that are supposed to be related to textual documents. It is hypothesized that every topic in textual data is characterized by a set of words which coherently appear in documents dedicated to a given topic. The appearance of each word in a document is coded by the activity of a particular neuron. In accordance with the Hebbian learning rule implemented in the network, sets of coherently appearing words (treated as factors) create tightly connected groups of neurons, hence, revealing them as attractors of the network dynamics. The found factors are eliminated from the network memory by the Hebbian unlearning rule facilitating the search of other factors. Topics related to the found sets of words can be identified based on the words' semantics. To make the method complete, a special technique based on a Bayesian procedure has been developed for the following purposes: first, to provide a complete description of factors in terms of component probability, and second, to enhance the accuracy of classification of signals to determine whether it contains the factor. Since it is assumed that every word may possibly contribute to several topics, the proposed method might be related to the method of fuzzy clustering. In this paper, we show that the results of Boolean factor analysis and fuzzy clustering are not contradictory, but complementary. To demonstrate the capabilities of this attempt, the method is applied to two types of textual data on neural networks in two different languages. The obtained topics and corresponding words are at a good level of agreement despite the fact that identical topics in Russian and English conferences contain different sets of keywords.