Algorithms for clustering data
Algorithms for clustering data
Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
Co-clustering documents and words using bipartite spectral graph partitioning
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Bipartite graph partitioning and data clustering
Proceedings of the tenth international conference on Information and knowledge management
Document clustering with committees
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical Unsupervised Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Proceedings of the 13th international conference on World Wide Web
A partitioning based algorithm to fuzzy co-cluster documents and words
Pattern Recognition Letters
Minimum sum-squared residue for fuzzy co-clustering
Intelligent Data Analysis
Possibilistic fuzzy co-clustering of large document collections
Pattern Recognition
A heuristic-based fuzzy co-clustering algorithm for categorization of high-dimensional data
Fuzzy Sets and Systems
Efficiently finding web services using a clustering semantic approach
Proceedings of the 2008 international workshop on Context enabled source and service selection, integration and adaptation: organized with the 17th International World Wide Web Conference (WWW 2008)
Bipartite isoperimetric graph partitioning for data co-clustering
Data Mining and Knowledge Discovery
Interpretable and reconfigurable clustering of document datasets by deriving word-based rules
Proceedings of the 18th ACM conference on Information and knowledge management
Dual fuzzy-possibilistic coclustering for categorization of documents
IEEE Transactions on Fuzzy Systems
Mining fuzzy frequent itemsets for hierarchical document clustering
Information Processing and Management: an International Journal
Automatic taxonomy generation: issues and possibilities
IFSA'03 Proceedings of the 10th international fuzzy systems association World Congress conference on Fuzzy sets and systems
Fuzzy relational clustering around medoids: A unified view
Fuzzy Sets and Systems
A new fuzzy co-clustering algorithm for categorization of datasets with overlapping clusters
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Hi-index | 0.01 |
This paper proposes an algorithm to hierarchically cluster documents. Each cluster is actually a cluster of documents and an associated cluster of words, thus a document-word co-cluster. Note that, the vector model for documents creates the document-word matrix, of which every co-cluster is a submatrix. One would intuitively expect a submatrix made up of high values to be a good document cluster, with the corresponding word cluster containing its most distinctive features. Our algorithm looks to exploit this. We have defined matrix density, and our algorithm basically uses matrix density considerations in its working.The algorithm is a partitional-agglomerative algorithm. The partitioning step involves the identification of dense submatrices so that the respective row sets partition the row set of the complete matrix. The hierarchical agglomerative step involves merging the most "similar" submatrices until we are down to the required number of clusters (if we want a flat clustering) or until we have just the single complete matrix left (if we are interested in a hierarchical arrangement of documents). It also generates apt labels for each cluster or hierarchy node. The similarity measure between clusters that we use here for the merging cleverly uses the fact that the clusters here are co-clusters, and is a key point of difference from existing agglomerative algorithms. We will refer to the proposed algorithm as RPSA (Rowset Partitioning and Submatrix Agglomeration). We have compared it as a clustering algorithm with Spherical K-Means and Spectral Graph Partitioning. We have also evaluated some hierarchies generated by the algorithm.