A unified probabilistic framework for clustering correlated heterogeneous web objects

Authors:
Guowei Liu;Weibin Zhu;Yong Yu
Affiliations:
Computer Science & Engineering Department, Shanghai Jiaotong University, Shanghai, China;Computer Science & Engineering Department, Shanghai Jiaotong University, Shanghai, China;Computer Science & Engineering Department, Shanghai Jiaotong University, Shanghai, China
Venue:
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Year:
2005

Citing 14
Cited 0

Elements of information theory

Elements of information theory
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Partitioning-based clustering for Web document categorization

Decision Support Systems - Special issue on WITS '97
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Data mining: concepts and techniques

Data mining: concepts and techniques
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Topic-based document segmentation with probabilistic latent semantic analysis

Proceedings of the eleventh international conference on Information and knowledge management
Clustering based on conditional distributions in an auxiliary space

Neural Computation
A Hierarchical Model for Clustering and Categorising Documents

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
A Unified Framework for Clustering Heterogeneous Web Objects

WISE '02 Proceedings of the 3rd International Conference on Web Information Systems Engineering
From User Access Patterns to Dynamic Hypertext Linking

From User Access Patterns to Dynamic Hypertext Linking
Latent semantic models for collaborative filtering

ACM Transactions on Information Systems (TOIS)
Web usage mining based on probabilistic latent semantic analysis

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most existing algorithms cluster highly correlated data objects (e.g. web pages and web queries) separately. Some other algorithms, however, do take into account the relationship between data objects, but they either integrate content and link features into a unified feature space or apply a hard clustering algorithm, making it difficult to fully utilize the correlated information over the heterogeneous Web objects. In this paper, we propose a novel unified probabilistic framework for iteratively clustering correlated heterogeneous data objects until it converges. Our approach introduces two latent clustering layers, which serve as two mixture probabilistic models of the features. In each clustering iteration we use EM (Expectation-Maximization) algorithm to estimate the parameters of the mixture model in one latent layer and propagate them to the other one. The experimental results show that our approach effectively combines the content and link features and improves the performance of the clustering.