Fuzzy semi-supervised co-clustering for text documents

Authors:
Yang Yan;Lihui Chen;William-Chandra Tjhi
Affiliations:
Nanyang Technological University, School of Electric and Electronic Engineering, Republic of Singapore;Nanyang Technological University, School of Electric and Electronic Engineering, Republic of Singapore;Nanyang Technological University, School of Electric and Electronic Engineering, Republic of Singapore
Venue:
Fuzzy Sets and Systems
Year:
2013

Citing 25
Cited 1

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Concept decompositions for large sparse text data using clustering

Machine Learning
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Kernel k-means: spectral clustering and normalized cuts

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Generative model-based document clustering: a comparative study

Knowledge and Information Systems
Semi-supervised graph clustering: a kernel approach

ICML '05 Proceedings of the 22nd international conference on Machine learning
Document clustering with prior knowledge

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Orthogonal nonnegative matrix t-factorizations for clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised model-based document clustering: A comparative study

Machine Learning
A probabilistic framework for relational clustering

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A heuristic-based fuzzy co-clustering algorithm for categorization of high-dimensional data

Fuzzy Sets and Systems
Active semi-supervised fuzzy clustering

Pattern Recognition
Non-negative matrix factorization for semi-supervised data clustering

Knowledge and Information Systems
Semi-supervised learning in knowledge discovery

Fuzzy Sets and Systems
Semi-supervised clustering with metric learning: An adaptive kernel method

Pattern Recognition
A novel semi-supervised fuzzy C-means clustering method

CCDC'09 Proceedings of the 21st annual international conference on Chinese control and decision conference
Non-Negative Matrix Factorization for Semisupervised Heterogeneous Data Coclustering

IEEE Transactions on Knowledge and Data Engineering
A method for training finite mixture models under a fuzzy clustering principle

Fuzzy Sets and Systems
Producing accurate interpretable clusters from high-dimensional data

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Orthogonal nonnegative matrix tri-factorization for semi-supervised document co-clustering

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
A Kernel Approach for Semisupervised Metric Learning

IEEE Transactions on Neural Networks

Editorial: Partially supervised learning for pattern recognition

Pattern Recognition Letters

Quantified Score

Hi-index	0.20

Visualization

Abstract

In this paper we propose a new heuristic semi-supervised fuzzy co-clustering algorithm (SS-HFCR) for categorization of large web documents. In this approach, the clustering process is carried out by incorporating some prior knowledge in the form of pair-wise constraints provided by users into the fuzzy co-clustering framework. Each constraint specifies whether a pair of documents ''must'' or ''cannot'' be clustered together. Moreover, we formulate the competitive agglomeration cost function which is also able to make use of prior knowledge in the clustering process. The experimental studies on a number of large benchmark datasets demonstrate the strength and potentials of SS-HFCR in terms of accuracy, stability and efficiency, compared with some of the recent popular semi-supervised clustering approaches.