A case study of failure mode analysis with text mining methods
AIDM '07 Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining - Volume 84
Matching task profiles and user needs in personalized web search
Proceedings of the 17th ACM conference on Information and knowledge management
Aggregated cross-media news visualization and personalization
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Dynamically constructing user profiles with similarity-based online incremental clustering
International Journal of Advanced Intelligence Paradigms
Incremental Document Clustering Based on Graph Model
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Enhancing an Incremental Clustering Algorithm for Web Page Collections
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Efficient approach for incremental Vietnamese document clustering
Proceedings of the eleventh international workshop on Web information and data management
On-line single-pass clustering based on diffusion maps
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Hi-index | 0.00 |
Clustering of large collections of text documents is a key process in providing a higher level of knowledge about the underlying inherent classification of the documents. Web documents, in particular, are of great interest since managing, accessing, searching, and browsing large repositories of web content requires efficient organization. Incremental clustering algorithms are always preferred to traditional clustering techniques, since they can be applied in a dynamic environment such as the Web. An incremental document clustering algorithm is introduced in this paper, which relies only on pair-wise document similarity information. Clusters are represented using a Cluster Similarity Histogram, a concise statistical representation of the distribution of similarities within each cluster, which provides a measure of cohesiveness. The measure guides the incremental clustering process. Complexity analysis and experimental results are discussed and show that the algorithm requires less computational time than standard methods while achieving a comparable or better clustering quality.