Word association norms, mutual information, and lexicography
Computational Linguistics
The Journal of Machine Learning Research
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic labeling of multinomial topic models
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Incorporating domain knowledge into topic modeling via Dirichlet Forest priors
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Accounting for burstiness in topic models
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Evaluation methods for topic models
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Topic Significance Ranking of LDA Generative Models
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Automatic evaluation of topic coherence
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Termite: visualization techniques for assessing textual topic models
Proceedings of the International Working Conference on Advanced Visual Interfaces
Supervised HDP using prior knowledge
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
SurfShop: combing a product ontology with topic model results for online window-shopping
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session
Evaluating unsupervised ensembles when applied to word sense induction
ACL '12 Proceedings of ACL 2012 Student Research Workshop
Exploring topic coherence over many models and many topics
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Unsupervised graph-based topic labelling using dbpedia
Proceedings of the sixth ACM international conference on Web search and data mining
On collocations and topic models
ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2
Representing documents through their readers
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast rank-2 nonnegative matrix factorization for hierarchical document clustering
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
One theme in all views: modeling consensus topics in multiple contexts
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 22nd international conference on World Wide Web
A biterm topic model for short texts
Proceedings of the 22nd international conference on World Wide Web
Discovering coherent topics using general knowledge
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Experiments with semantic similarity measures based on LDA and LSA
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Leveraging multi-domain prior knowledge in topic models
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Going beyond Corr-LDA for detecting specific comments on news & blogs
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
Latent variable models have the potential to add value to large document collections by discovering interpretable, low-dimensional subspaces. In order for people to use such models, however, they must trust them. Unfortunately, typical dimensionality reduction methods for text, such as latent Dirichlet allocation, often produce low-dimensional subspaces (topics) that are obviously flawed to human domain experts. The contributions of this paper are threefold: (1) An analysis of the ways in which topics can be flawed; (2) an automated evaluation metric for identifying such topics that does not rely on human annotators or reference collections outside the training data; (3) a novel statistical topic model based on this metric that significantly improves topic quality in a large-scale document collection from the National Institutes of Health (NIH).