Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A vector space model for automatic indexing
Communications of the ACM
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning Probabilistic Relational Models
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
A Unified Framework for Clustering Heterogeneous Web Objects
WISE '02 Proceedings of the 3rd International Conference on Web Information Systems Engineering
ReCoM: reinforcement clustering of multi-type interrelated data objects
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A unified framework for model-based clustering
The Journal of Machine Learning Research
IEEE Transactions on Knowledge and Data Engineering
Clustering Web Documents Based on Correlation of Hyperlinks
ICDEW '05 Proceedings of the 21st International Conference on Data Engineering Workshops
Enhancing web page classification through image-block importance analysis
Information Processing and Management: an International Journal
Probabilistic classification and clustering in relational data
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
A hierarchical document clustering environment based on the induced bisecting k-means
FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
An incremental document clustering algorithm based on a hierarchical agglomerative approach
ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
Hi-index | 0.00 |
The exponential growth of information available on the World Wide Web, and retrievable by search engines, has implied the necessity to develop efficient and effective methods for organizing relevant contents. In this field document clustering plays an important role and remains an interesting and challenging problem in the field of web computing. In this paper we present a document clustering method, which takes into account both contents information and hyperlink structure of web page collection, where a document is viewed as a set of semantic units. We exploit this representation to determine the strength of a relation between two linked pages and to define a relational clustering algorithm based on a probabilistic graph representation. The experimental results show that the proposed approach, called RED-clustering, outperforms two of the most well known clustering algorithm as k-Means and Expectation Maximization.