Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Improving the Clustering of Blogosphere with a Self-term Enriching Technique
TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
UPV-SI: word sense induction using self term expansion
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Evaluation of internal validity measures in short-text corpora
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
BUAP: performance of K-Star at the INEX'09 clustering task
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
PKU at INEX 2010 XML mining track
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Clustering abstracts of scientific texts using the transition point technique
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.01 |
This study deals with information organization for more efficient Internet document search and browsing results. As the appropriate algorithm for this purpose, this study proposes the heuristic algorithm, which functions similarly with the star clustering algorithm but performs a more efficient time complexity of O(kn), (k≪n) instead of O(n2) found in the star clustering algorithm. The proposed heuristic algorithm applies the cosine similarity and sets vectors composed of the most non-zero elements as the initial standard value. The algorithm is purported to execute the clustering procedure based on the concept vector and produce clusters for information organization in O(kn) period of time. In order to see how fast the proposed algorithm is in producing clusters for organizing information, the algorithm is tested on TIME and CLASSIC3 in comparison with the star clustering algorithm.