Elements of information theory
Elements of information theory
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Partitioning-based clustering for Web document categorization
Decision Support Systems - Special issue on WITS '97
Document clustering using word clusters via the information bottleneck method
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Data mining: concepts and techniques
Data mining: concepts and techniques
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
Topic-based document segmentation with probabilistic latent semantic analysis
Proceedings of the eleventh international conference on Information and knowledge management
Clustering based on conditional distributions in an auxiliary space
Neural Computation
A Hierarchical Model for Clustering and Categorising Documents
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
A Unified Framework for Clustering Heterogeneous Web Objects
WISE '02 Proceedings of the 3rd International Conference on Web Information Systems Engineering
From User Access Patterns to Dynamic Hypertext Linking
From User Access Patterns to Dynamic Hypertext Linking
Latent semantic models for collaborative filtering
ACM Transactions on Information Systems (TOIS)
Web usage mining based on probabilistic latent semantic analysis
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Most existing algorithms cluster highly correlated data objects (e.g. web pages and web queries) separately. Some other algorithms, however, do take into account the relationship between data objects, but they either integrate content and link features into a unified feature space or apply a hard clustering algorithm, making it difficult to fully utilize the correlated information over the heterogeneous Web objects. In this paper, we propose a novel unified probabilistic framework for iteratively clustering correlated heterogeneous data objects until it converges. Our approach introduces two latent clustering layers, which serve as two mixture probabilistic models of the features. In each clustering iteration we use EM (Expectation-Maximization) algorithm to estimate the parameters of the mixture model in one latent layer and propagate them to the other one. The experimental results show that our approach effectively combines the content and link features and improves the performance of the clustering.