CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
IEEE Transactions on Knowledge and Data Engineering
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Adaptive web sites: an AI challenge
IJCAI'97 Proceedings of the 15th international joint conference on Artifical intelligence - Volume 1
Hi-index | 0.00 |
As an increasing number of user access information on the Web, there is a great opportunity to learn from the Web server logs to cluster large amounts of Web documents. One approach is to cluster the documents based on information provided only by users' usage logs and not by the content of the documents. A major advantage of this approach is that the relevancy information is objectively reflected by the usage logs; frequent simultaneous visits to two seemingly unrelated documents should indicate that they are in fact closely related. Our clustering algorithm PDBSCAN (Partitioning Based DBSCAN algorithm) is based on DBSCAN, a density based algorithm that has been proven in its ability in processing very large datasets. In addition, we prove both analytically and experimentally that our method yields clustering results that are superior to that of DBSCAN.