Fast clustering algorithm for information organization

Authors:
Kwangcheol Shin;Sangyong Han
Affiliations:
Dept. of Computer Science and Engineering, Chung-Ang Univ., Seoul, Korea;Dept. of Computer Science and Engineering, Chung-Ang Univ., Seoul, Korea
Venue:
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Year:
2003

Citing 0
Cited 7

Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Improving the Clustering of Blogosphere with a Self-term Enriching Technique

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
UPV-SI: word sense induction using self term expansion

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Evaluation of internal validity measures in short-text corpora

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
BUAP: performance of K-Star at the INEX'09 clustering task

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
PKU at INEX 2010 XML mining track

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Clustering abstracts of scientific texts using the transition point technique

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

This study deals with information organization for more efficient Internet document search and browsing results. As the appropriate algorithm for this purpose, this study proposes the heuristic algorithm, which functions similarly with the star clustering algorithm but performs a more efficient time complexity of O(kn), (k≪n) instead of O(n2) found in the star clustering algorithm. The proposed heuristic algorithm applies the cosine similarity and sets vectors composed of the most non-zero elements as the initial standard value. The algorithm is purported to execute the clustering procedure based on the concept vector and produce clusters for information organization in O(kn) period of time. In order to see how fast the proposed algorithm is in producing clusters for organizing information, the algorithm is tested on TIME and CLASSIC3 in comparison with the star clustering algorithm.