Algorithms for clustering data
Algorithms for clustering data
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Concept decompositions for large sparse text data using clustering
Machine Learning
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A probabilistic framework for semi-supervised clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Agglomerative hierarchical clustering with constraints: theoretical and empirical results
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Hi-index | 0.00 |
In this paper, we propose the topic-constrained hierarchical clustering, which organizes document datasets into hierarchical trees consistant with a given set of topics. The proposed algorithm is based on a constrained agglomerative clustering framework and a semi-supervised criterion function that emphasizes the relationship between documents and topics and the relationship among documents themselves simultaneously. The experimental evaluation show that our algorithm outperformed the traditional agglomerative algorithm by 7.8% to 11.4%.