Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Probabilistic model-based clustering of complex data
Probabilistic model-based clustering of complex data
Generative model-based document clustering: a comparative study
Knowledge and Information Systems
Incorporating with Recursive Model Training in Time Series Clustering
CIT '05 Proceedings of the The Fifth International Conference on Computer and Information Technology
Pachinko allocation: DAG-structured mixture models of topic correlations
ICML '06 Proceedings of the 23rd international conference on Machine learning
A General Framework for Agglomerative Hierarchical Clustering Algorithms
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Incremental hierarchical clustering of text documents
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Exploiting asymmetry in hierarchical topic extraction
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Topic sentiment mixture: modeling facets and opinions in weblogs
Proceedings of the 16th international conference on World Wide Web
Lognormal Distribution of BBS Articles and its Social and Generative Mechanism
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Mixtures of hierarchical topics with Pachinko allocation
Proceedings of the 24th international conference on Machine learning
Short communication: Variable space hidden Markov model for topic detection and analysis
Knowledge-Based Systems
Modeling online reviews with multi-grain topic models
Proceedings of the 17th international conference on World Wide Web
Opinion integration through semi-supervised topic modeling
Proceedings of the 17th international conference on World Wide Web
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Topic-based ranking in Folksonomy via probabilistic model
Artificial Intelligence Review
Topics modeling based on selective Zipf distribution
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
Topic extraction from text corpus is the fundamental of many topic analysis tasks, such as topic trend prediction, opinion extraction. Since hierarchical structure is characteristics of topics, it is preferential for a topic extraction algorithm to output the topics description with this kind of structure. However, the hierarchical topic structure that is extracted by most of the current topic analysis algorithms cannot provide a meaningful description for all subtopics in the hierarchical tree. Here, we propose a new hierarchical topic extraction algorithm based on topic grain computation. By considering the distribution of word document frequency as a mixture Gaussian, an EM-like algorithm is employed to achieve the best number of mixture components, and the mean value of each component. Then topic grain is defined based on the mixture Gaussian parameters, and feature words are selected for the grain. A clustering algorithm is employed to the converted text set based on the feature words. After repeatedly applying the clustering algorithm to different converted text set, a multi-grain hierarchical topic structure with different subtopic feature words description is extracted. Experiments on two real world datasets which are collected from a news website show that the proposed algorithm can generate more meaningful multi-grain topic structure, by comparing with the current hierarchical topic clustering algorithms.