PhraseRank for document clustering: reweighting the weight of phrase

Authors:
Yoon-Ho Cho;Sang-Hyun Park;SangKeun Lee
Affiliations:
Korea University, Seoul, Republic of Korea;Korea University, Seoul, Republic of Korea;Korea University, Seoul, Republic of Korea
Venue:
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Year:
2009

Citing 7
Cited 0

Implementing agglomerative hierarchic clustering algorithms for use in document retrieval

Information Processing and Management: an International Journal
OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Efficient Phrase-Based Document Indexing for Web Document Clustering

IEEE Transactions on Knowledge and Data Engineering
Semantic Smoothing for Model-based Document Clustering

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A new suffix tree similarity measure for document clustering

Proceedings of the 16th international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a document collection, a hierarchical clustering algorithm groups several clusters. Recent works have identified the set of overlap phrases as useful features in hierarchical document clustering. However, they did not consider the relationship between co-occurred overlap phrases in a document and degrees of opposite relationships between overlap phrases. In this paper, we propose new algorithms for effective similarity measure before working hierarchical clustering algorithm. There are two important features in the proposed methods: the ranking list of top-k phrases for each particular overlap phrase and the opposite significances between two overlap phrases with each other. Experiment result shows that proposed method improves the results of clustering.