Recurrent-neural-network-based Boolean factor analysis and its application to word clustering

Authors:
Alexander A. Frolov;Dusan Husek;Pavel Yu. Polyakov
Affiliations:
Institute of Higher Nervous Activity and Neurophysiology, Russian Academy of Science, Moscow, Russia;Institute of Computer Science, Academy of Science of the Czech Republic, Prague, Czech Republic;Scientific-Research Institute for System Studies, Russian Academy of Science, Moscow, Russia
Venue:
IEEE Transactions on Neural Networks
Year:
2009

Citing 6
Cited 6

Understanding search engines: mathematical modeling and text retrieval

Understanding search engines: mathematical modeling and text retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Documents, concepts and neural networks

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Learning Image Components for Object Recognition

The Journal of Machine Learning Research
An approach to clustering abstracts

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Boolean Factor Analysis by Attractor Neural Network

IEEE Transactions on Neural Networks

Neural network Boolean factor analysis and application

CIMMACS'07 Proceedings of the 6th WSEAS international conference on Computational intelligence, man-machine systems and cybernetics
New measure of boolean factor analysis quality

ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part I
Attractor neural network combined with likelihood maximization algorithm for boolean factor analysis

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
Probability based document clustering and image clustering using content-based image retrieval

Applied Soft Computing
Two Expectation-Maximization algorithms for Boolean Factor Analysis

Neurocomputing
New BFA method based on attractor neural network and likelihood maximization

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The objective of this paper is to introduce a neural-network-based algorithm for word clustering as an extension of the neural-network-based Boolean factor analysis algorithm (Frolov et al., 2007). It is shown that this extended algorithm supports even the more complex model of signals that are supposed to be related to textual documents. It is hypothesized that every topic in textual data is characterized by a set of words which coherently appear in documents dedicated to a given topic. The appearance of each word in a document is coded by the activity of a particular neuron. In accordance with the Hebbian learning rule implemented in the network, sets of coherently appearing words (treated as factors) create tightly connected groups of neurons, hence, revealing them as attractors of the network dynamics. The found factors are eliminated from the network memory by the Hebbian unlearning rule facilitating the search of other factors. Topics related to the found sets of words can be identified based on the words' semantics. To make the method complete, a special technique based on a Bayesian procedure has been developed for the following purposes: first, to provide a complete description of factors in terms of component probability, and second, to enhance the accuracy of classification of signals to determine whether it contains the factor. Since it is assumed that every word may possibly contribute to several topics, the proposed method might be related to the method of fuzzy clustering. In this paper, we show that the results of Boolean factor analysis and fuzzy clustering are not contradictory, but complementary. To demonstrate the capabilities of this attempt, the method is applied to two types of textual data on neural networks in two different languages. The obtained topics and corresponding words are at a good level of agreement despite the fact that identical topics in Russian and English conferences contain different sets of keywords.