Correlation-based Document Clustering using Web Logs

Authors:
Z. Su;Q. Yang;H. Zhang;X. Xu;Y. Hu
Affiliations:
-;-;-;-;-
Venue:
HICSS '01 Proceedings of the 34th Annual Hawaii International Conference on System Sciences ( HICSS-34)-Volume 5 - Volume 5
Year:
2001

Citing 0
Cited 14

Categorizing information objects from user access patterns

Proceedings of the eleventh international conference on Information and knowledge management
ReCoM: reinforcement clustering of multi-type interrelated data objects

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Mining web navigations for intelligence

Decision Support Systems - Special issue: Intelligence and security informatics
Clustering heterogeneous data using clustering by compression

ICCOMP'09 Proceedings of the WSEAES 13th international conference on Computers
Web Co-clustering of Usage Network Using Tensor Decomposition

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Mining web navigations for intelligence

Decision Support Systems - Special issue: Intelligence and security informatics
A new method for clustering heterogeneous data: clustering by compression

WSEAS Transactions on Computers
A proposal for news recommendation based on clustering techniques

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part III
A similarity reinforcement algorithm for heterogeneous web pages

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
A similarity-aware multiagent-based web content management scheme

ICMLC'05 Proceedings of the 4th international conference on Advances in Machine Learning and Cybernetics
Discovering conceptual page hierarchy of a web site from user traversal history

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
An overview of web data clustering practices

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Clustering of search engine keywords using access logs

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Performance improvement of web caching in Web 2.0 via knowledge discovery

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

A problem facing information retrieval on the web is how to effectively cluster large amounts of web documents. One approach is to cluster the documents based on information provided only by users usage logs and not by the content of the documents. In this paper, we present a recursive density based clustering algorithm that can adaptively change its parameters intelligently. Our clustering algorithm RDBC is based on DBSCAN, a density based algorithm that has been proven in its ability in processing very large datasets. The fact that DBSCAN does not require the pre-determination of the number of clusters and is linear in time complexity makes it particularly attractive in web page clustering. It can be shown that RDBC require the same time complexity as that of the DBSCAN algorithm. In addition, we prove both analytically and experimentally that our method yields clustering results that are superior to that of DBSCAN