Algorithms for clustering data
Algorithms for clustering data
Elements of information theory
Elements of information theory
An information-theoretic analysis of hard and soft assignment methods for clustering
Proceedings of the NATO Advanced Study Institute on Learning in graphical models
A view of the EM algorithm that justifies incremental, sparse, and other variants
Learning in graphical models
Exploiting generative models in discriminative classifiers
Proceedings of the 1998 conference on Advances in neural information processing systems II
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Item-based collaborative filtering recommendation algorithms
Proceedings of the 10th international conference on World Wide Web
Concept Decompositions for Large Sparse Text Data Using Clustering
Machine Learning
Clustering based on conditional distributions in an auxiliary space
Neural Computation
Learning Mixtures of Gaussians
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
A Sublinear Time Approximation Scheme for Clustering in Metric Spaces
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Iterative Clustering of High Dimensional Text Data Augmented by Local Search
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A unified framework for model-based clustering
The Journal of Machine Learning Research
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
A probabilistic framework for semi-supervised clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Eigenspace-based anomaly detection in computer systems
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An objective evaluation criterion for clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A privacy-sensitive approach to distributed clustering
Pattern Recognition Letters - Special issue: Advances in pattern recognition
An approach to spacecraft anomaly detection problem using kernel feature space
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Unsupervised learning on k-partite graphs
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised model-based document clustering: A comparative study
Machine Learning
Mixture of Spherical Distributions for Single-View Relighting
IEEE Transactions on Pattern Analysis and Machine Intelligence
Table Based Single Pass Algorithm for Clustering News Articles in NewsPage.com
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Field independent probabilistic model for clustering multi-field documents
Information Processing and Management: an International Journal
A Semi-supervised Topic-Driven Approach for Clustering Textual Answers to Survey Questions
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Classification of aeronautics system health and safety documents
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A probabilistic model for clustering text documents with multiple fields
ECIR'07 Proceedings of the 29th European conference on IR research
Effective dimension in anomaly detection: its application to computer systems
JSAI'03/JSAI04 Proceedings of the 2003 and 2004 international conference on New frontiers in artificial intelligence
An improved spectral clustering algorithm based on random walk
Frontiers of Computer Science in China
Representing document as dependency graph for document clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
The multivariate Watson distribution: Maximum-likelihood estimation and other aspects
Journal of Multivariate Analysis
Hi-index | 0.00 |
High dimensional directional data is becoming increasingly important in contemporary applications such as analysis of text and gene-expression data. A natural model for multi-variate directional data is provided by the von Mises-Fisher (vMF) distribution on the unit hypersphere that is analogous to the multi-variate Gaussian distribution in Rd. In this paper, we propose modeling complex directional data as a mixture of vMF distributions. We derive and analyze two variants of the Expectation Maximization (EM) framework for estimating the parameters of this mixture. We also propose two clustering algorithms corresponding to these variants. An interesting aspect of our methodology is that the spherical kmeans algorithm (kmeans with cosine similarity) can be shown to be a special case of both our algorithms. Thus, modeling text data by vMF distributions lends theoretical validity to the use of cosine similarity which has been widely used by the information retrieval community. As part of experimental validation, we present results on modeling high-dimensional text and gene-expression data as a mixture of vMF distributions. The results indicate that our approach yields superior clusterings especially for difficult clustering tasks in high-dimensional spaces.