Influence sets based on reverse nearest neighbor queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Document Categorization and Query Generation on the World Wide WebUsing WebACE
Artificial Intelligence Review - Special issue on data mining on the Internet
Evaluation of hierarchical clustering algorithms for document datasets
Proceedings of the eleventh international conference on Information and knowledge management
Hypergraph Models and Algorithms for Data-Pattern-Based Clustering
Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery
Document Clustering with Cluster Refinement and Non-negative Matrix Factorization
ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Document clustering using NMF and fuzzy relation
Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.00 |
Hypergraph partitioning has been considered as a promising method to address the challenges of high dimensionality in document clustering. With documents modeled as vertices and the relationship among documents captured by the hyperedges, the goal of graph partitioning is to minimize the edge cut. Therefore, the definition of hyperedges is vital to the clustering performance. While several definitions of hyperedges have been proposed, a systematic understanding of desired characteristics of hyperedges is still missing. To that end, in this paper, we first provide a unified clique perspective of the definition of hyperedges, which serves as a guide to define hyperedges. With this perspective, based on the concepts of hypercliques and shared (reverse) nearest neighbors, we propose three new types of clique hyperedges and analyze their properties regarding purity and size issues. Finally, we present an extensive evaluation using real-world document datasets. The experimental results show that, with shared (reverse) nearest neighbor based hyperedges, the clustering performance can be improved significantly in terms of various external validation measures without the need for fine tuning of parameters.