Short communication: Variable space hidden Markov model for topic detection and analysis
Knowledge-Based Systems
A framework for WWW user activity analysis based on user interest
Knowledge-Based Systems
A new distance measure for hidden Markov models
Expert Systems with Applications: An International Journal
Multi-grain hierarchical topic extraction algorithm for text mining
Expert Systems with Applications: An International Journal
Semantic multi-grain mixture topic model for text analysis
Expert Systems with Applications: An International Journal
ESPClust: an effective skew prevention method for model-based document clustering
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Topics modeling based on selective Zipf distribution
Expert Systems with Applications: An International Journal
MMPClust: a skew prevention algorithm for model-based document clustering
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Hi-index | 0.01 |
In many emerging data mining applications, one needs to cluster complex data such as very high-dimensional sparse text documents and continuous or discrete time sequences. Probabilistic model-based clustering techniques have shown promising results in many such applications. For real-valued low-dimensional vector data, Gaussian models have been frequently used. For very high-dimensional vector and non-vector data, model-based clustering is a natural choice when it is difficult to extract good features or identify an appropriate measure of similarity between pairs of data objects. This dissertation presents a unified framework for model-based clustering based on a bipartite graph view of data and models. The framework includes an information-theoretic analysis of model-based partitional clustering from a deter ministic annealing point of view and a view of model-based hierarchical clustering that leads to several useful extensions. The framework is used to develop two new variations of model-based clustering—a balanced model-based partitional clustering algorithm that produces clusters of comparable sizes and a hybrid model-based clustering approach that combines the advantages of partitional and hierarchical model-based algorithms. I apply the framework and new clustering algorithms to cluster several distinct types of complex data, ranging from arbitrary-shaped 2-D synthetic data to high dimensional documents, EEG time series, and gene expression time sequences. The empirical results demonstrate the usefulness of the scalable, balanced model-based clustering algorithms, as well as the benefits of the hybrid model-based clustering approach. They also showcase the generality of the proposed clustering framework.