Possibilistic fuzzy co-clustering of large document collections

Authors:
William-Chandra Tjhi;Lihui Chen
Affiliations:
Nanyang Technological University, Republic of Singapore;Nanyang Technological University, Republic of Singapore
Venue:
Pattern Recognition
Year:
2007

Citing 23
Cited 4

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Data mining: concepts and techniques

Data mining: concepts and techniques
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Data Mining for Scientific and Engineering Applications

Data Mining for Scientific and Engineering Applications
A matrix density based algorithm to hierarchically co-cluster documents and words

WWW '03 Proceedings of the 12th international conference on World Wide Web
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Document clustering by concept factorization

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering via adaptive subspace iteration

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
The indexable web is more than 11.5 billion pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
General C-Means Clustering Model

IEEE Transactions on Pattern Analysis and Machine Intelligence
Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Co-clustering by block value decomposition

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A partitioning based algorithm to fuzzy co-cluster documents and words

Pattern Recognition Letters
A new fuzzy co-clustering algorithm for categorization of datasets with overlapping clusters

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
The possibilistic C-means algorithm: insights and recommendations

IEEE Transactions on Fuzzy Systems
Comments on “A possibilistic approach to clustering”

IEEE Transactions on Fuzzy Systems
Improved possibilistic C-means clustering algorithms

IEEE Transactions on Fuzzy Systems
A Possibilistic Fuzzy c-Means Clustering Algorithm

IEEE Transactions on Fuzzy Systems
A New Convergence Proof of Fuzzy c-Means

IEEE Transactions on Fuzzy Systems
A possibilistic approach to clustering

IEEE Transactions on Fuzzy Systems
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Using a new relational concept to improve the clustering performance of search engines

Information Processing and Management: an International Journal
Fuzzy and possibilistic clustering for fuzzy data

Computational Statistics & Data Analysis
High performance genetic algorithm based text clustering using parts of speech and outlier elimination

Applied Intelligence
On possibilistic clustering with repulsion constraints for imprecise data

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper we propose a new co-clustering algorithm called possibilistic fuzzy co-clustering (PFCC) for automatic categorization of large document collections. PFCC integrates a possibilistic document clustering technique and a combined formulation of fuzzy word ranking and partitioning into a fast iterative co-clustering procedure. This novel framework brings about simultaneously some benefits including robustness in the presence of document and word outliers, rich representations of co-clusters, highly descriptive document clusters, a good performance in a high-dimensional space, and a reduced sensitivity to the initialization in the possibilistic clustering. We present the detailed formulation of PFCC together with the explanations of the motivations behind. The advantages over other existing works and the algorithm's proof of convergence are provided. Experiments on several large document data sets demonstrate the effectiveness of PFCC.