Intelligent information agents: review and challenges for distributed information sources
Journal of the American Society for Information Science
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Ontology Learning for the Semantic Web
IEEE Intelligent Systems
Document similarity based on concept tree distance
Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
Hi-index | 0.00 |
In this paper, we proposed a new generalized Multivariate Probalistic Modeling (MPM) to automatically extract topics from text collection and attach them with existing ontology. Specially, we first make use of KeyConcept which is a classification system classify documents into a set of predefined concepts. Then, by modeling documents cluster based MPM, we extract latent concepts and corrensponding sub-clusters from document collection. We compare our MPM with Probabilistic Latent Semantic Indexing (PLSI) and other clustering algorithm on Citeseerx data sets. Experiment results show that MPM outperforms PLSI in terms of time efficiency and provides better topics representation. Clustering analysis also prove the advantages of our MPM over other clustering technique in precision.