Implementing agglomerative hierarchic clustering algorithms for use in document retrieval
Information Processing and Management: an International Journal
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Efficient Phrase-Based Document Indexing for Web Document Clustering
IEEE Transactions on Knowledge and Data Engineering
Semantic Smoothing for Model-based Document Clustering
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A new suffix tree similarity measure for document clustering
Proceedings of the 16th international conference on World Wide Web
Hi-index | 0.00 |
Given a document collection, a hierarchical clustering algorithm groups several clusters. Recent works have identified the set of overlap phrases as useful features in hierarchical document clustering. However, they did not consider the relationship between co-occurred overlap phrases in a document and degrees of opposite relationships between overlap phrases. In this paper, we propose new algorithms for effective similarity measure before working hierarchical clustering algorithm. There are two important features in the proposed methods: the ranking list of top-k phrases for each particular overlap phrase and the opposite significances between two overlap phrases with each other. Experiment result shows that proposed method improves the results of clustering.