Incremental Document Clustering Using Cluster Similarity Histograms

Authors:
Khaled M. Hammouda;Mohamed S. Kamel
Affiliations:
-;-
Venue:
WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
Year:
2003

Citing 0
Cited 9

A case study of failure mode analysis with text mining methods

AIDM '07 Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining - Volume 84
Distributed collaborative Web document clustering using cluster keyphrase summaries

Information Fusion
Matching task profiles and user needs in personalized web search

Proceedings of the 17th ACM conference on Information and knowledge management
Aggregated cross-media news visualization and personalization

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Dynamically constructing user profiles with similarity-based online incremental clustering

International Journal of Advanced Intelligence Paradigms
Incremental Document Clustering Based on Graph Model

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Enhancing an Incremental Clustering Algorithm for Web Page Collections

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Efficient approach for incremental Vietnamese document clustering

Proceedings of the eleventh international workshop on Web information and data management
On-line single-pass clustering based on diffusion maps

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering of large collections of text documents is a key process in providing a higher level of knowledge about the underlying inherent classification of the documents. Web documents, in particular, are of great interest since managing, accessing, searching, and browsing large repositories of web content requires efficient organization. Incremental clustering algorithms are always preferred to traditional clustering techniques, since they can be applied in a dynamic environment such as the Web. An incremental document clustering algorithm is introduced in this paper, which relies only on pair-wise document similarity information. Clusters are represented using a Cluster Similarity Histogram, a concise statistical representation of the distribution of similarities within each cluster, which provides a measure of cohesiveness. The measure guides the incremental clustering process. Complexity analysis and experimental results are discussed and show that the algorithm requires less computational time than standard methods while achieving a comparable or better clustering quality.