Elements of information theory
Elements of information theory
Document clustering using word clusters via the information bottleneck method
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised document classification using sequential information maximization
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Iterative Clustering of High Dimensional Text Data Augmented by Local Search
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
A probabilistic framework for semi-supervised clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised content discovery in composite audio
Proceedings of the 13th annual ACM international conference on Multimedia
Semi-supervised model-based document clustering: A comparative study
Machine Learning
Journal of Global Optimization
Learning correlations using the mixture-of-subsets model
ACM Transactions on Knowledge Discovery from Data (TKDD)
Comparing Non-parametric Ensemble Methods for Document Clustering
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Finding cohesive clusters for analyzing knowledge communities
Knowledge and Information Systems
Analyzing knowledge communities using foreground and background clusters
ACM Transactions on Knowledge Discovery from Data (TKDD)
Data-driven co-clustering model of internet usage in large mobile societies
Proceedings of the 13th ACM international conference on Modeling, analysis, and simulation of wireless and mobile systems
Proceedings of the 14th ACM international conference on Modeling, analysis and simulation of wireless and mobile systems
Research paper recommender systems: a subspace clustering approach
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
A method for query expansion using a hierarchy of clusters
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Power-law distributions for paraphrases extracted from bilingual corpora
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Partitioning and ranking tagged data sources
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
A novel approach to clustering co-occurrence data posesit as an optimization problem in information theory whichminimizes the resulting loss in mutual information. A divisiveclustering algorithm that monotonically reduces thisloss function was recently proposed. In this paper we showthat sparse high-dimensional data presents special challengeswhich can result in the algorithm getting stuck atpoor local minima. We propose two solutions to this problem:(a) a "prior" to overcome infinite relative entropy valuesas in the supervised Naive Bayes algorithm, and (b)local search to escape local minima. Finally, we combinethese solutions to get a robust algorithm that is computationallyefficient. We present experimental results to showthat the proposed method is effective in clustering documentcollections and outperforms previous information-theoreticclustering approaches.