Efficient system for clustering of dynamic document database

Authors:
Pawel Foszner;Aleksandra Gruca;Andrzej Polanski
Affiliations:
Silesian University of Technology, Institute of Informatics, Gliwice, Poland;Silesian University of Technology, Institute of Informatics, Gliwice, Poland;Silesian University of Technology, Institute of Informatics, Gliwice, Poland
Venue:
CDVE'11 Proceedings of the 8th international conference on Cooperative design, visualization, and engineering
Year:
2011

Citing 2
Cited 0

Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe in this paper, a system that groups, classifies and finds the latent semantic features in a database composed of a large number of documents. The database will be constantly growing as users who co-create it will be adding more and more new documents. Users require a system to provide them information, both about a specific document, and about the entire set of documents. This information includes statistical data about words in documents, information about aspects in which this words appears, classification, clustering, etc. To meet these expectations we propose using methods for searching for hidden patterns in multivariable data. We apply machine learning algorithms for data analysis, useful in identifying local patterns in multivariate data. We consider two different algorithms described in the literature (1) Probabilistic Latent Semantic Analysis Method [2] and (2) Nonnegative Matrix Factorization algorithm described in [4] and used in the text analysis system [1].