Probability, random processes, and estimation theory for engineers
Probability, random processes, and estimation theory for engineers
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
WebACE: a Web agent for document categorization and exploration
AGENTS '98 Proceedings of the second international conference on Autonomous agents
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
An experimental comparison of model-based clustering methods
Machine Learning
Concept decompositions for large sparse text data using clustering
Machine Learning
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Improving Short-Text Classification using Unlabeled Data for Classification Problems
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Multivariate Information Bottleneck
UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Learning from Labeled and Unlabeled Data using Graph Mincuts
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Self-Supervised Learning for Visual Tracking and Recognition of Human Hand
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Enhanced word clustering for hierarchical text classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Using unlabeled data to improve text classification
Using unlabeled data to improve text classification
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Information Theoretic Clustering of Sparse Co-Occurrence Data
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CBC: Clustering Based Text Classification Requiring Minimal Labeled Data
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Generative model-based clustering of directional data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A unified framework for model-based clustering
The Journal of Machine Learning Research
A probabilistic framework for semi-supervised clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Locally linear metric adaptation for semi-supervised clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
An information theoretic analysis of maximum likelihood mixture estimation for exponential families
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Generative model-based document clustering: a comparative study
Knowledge and Information Systems
Criterion functions for document clustering
Criterion functions for document clustering
Advances in Neural Information Processing Systems 18: Proceedings of the 2005 Conference (Neural Information Processing)
IEEE Transactions on Information Theory - Part 2
An active learning framework for semi-supervised document clustering with language modeling
Data & Knowledge Engineering
Harmony K-means algorithm for document clustering
Data Mining and Knowledge Discovery
A Semi-supervised Topic-Driven Approach for Clustering Textual Answers to Survey Questions
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Finding the optimal feature representations for Bayesian network learning
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Document clustering via dirichlet process mixture model with feature selection
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised Bayesian ARTMAP
Applied Intelligence
A novel initialization method for semi-supervised clustering
KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
Semi-supervised k-means clustering by optimizing initial cluster centers
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Research of immune intrusion detection algorithm based on semi-supervised clustering
AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part II
Tri-training and data editing based semi-supervised clustering algorithm
MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Fuzzy semi-supervised co-clustering for text documents
Fuzzy Sets and Systems
Clustering documents with labeled and unlabeled documents using fuzzy semi-Kmeans
Fuzzy Sets and Systems
Absolute and relative clustering
Proceedings of the 4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering
Robust predictive model for evaluating breast cancer survivability
Engineering Applications of Artificial Intelligence
Hi-index | 0.00 |
Semi-supervised learning has become an attractive methodology for improving classification models and is often viewed as using unlabeled data to aid supervised learning. However, it can also be viewed as using labeled data to help clustering, namely, semi-supervised clustering. Viewing semi-supervised learning from a clustering angle is useful in practical situations when the set of labels available in labeled data are not complete, i.e., unlabeled data contain new classes that are not present in labeled data. This paper analyzes several multinomial model-based semi-supervised document clustering methods under a principled model-based clustering framework. The framework naturally leads to a deterministic annealing extension of existing semi-supervised clustering approaches. We compare three (slightly) different semi-supervised approaches for clustering documents: Seeded damnl, Constrained damnl, and Feedback-based damnl, where damnl stands for multinomial model-based deterministic annealing algorithm. The first two are extensions of the seeded k-means and constrained k-means algorithms studied by Basu et al. (2002); the last one is motivated by Cohn et al. (2003). Through empirical experiments on text datasets, we show that: (a) deterministic annealing can often significantly improve the performance of semi-supervised clustering; (b) the constrained approach is the best when available labels are complete whereas the feedback-based approach excels when available labels are incomplete.