OCELOT: a system for summarizing Web pages
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Enhanced web document summarization using hyperlinks
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
The Journal of Machine Learning Research
The automated acquisition of topic signatures for text summarization
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Web-page summarization using clickthrough data
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Topic modeling: beyond bag-of-words
ICML '06 Proceedings of the 23rd international conference on Machine learning
Proceedings of the Second ACM International Conference on Web Search and Data Mining
The cluster-abstraction model: unsupervised learning of topic hierarchies from text data
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies
Journal of the ACM (JACM)
Latent variable models of concept-attribute attachment
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Document hierarchies from text and links
Proceedings of the 21st international conference on World Wide Web
SSHLDA: a semi-supervised hierarchical topic model
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Document-topic hierarchies from document graphs
Proceedings of the 21st ACM international conference on Information and knowledge management
Hierarchical topic integration through semi-supervised hierarchical topic modeling
Proceedings of the 21st ACM international conference on Information and knowledge management
Automatic labeling hierarchical topics
Proceedings of the 21st ACM international conference on Information and knowledge management
A hierarchical Dirichlet model for taxonomy expansion for search engines
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
We investigate the relevance of hierarchical topic models to represent the content of Web gists. We focus our attention on DMOZ, a popular Web directory, and propose two algorithms to infer such a model from its manually-curated hierarchy of categories. Our first approach, based on information-theoretic grounds, uses an algorithm similar to recursive feature selection. Our second approach is fully Bayesian and derived from the more general model, hierarchical LDA. We evaluate the performance of both models against a flat 1-gram baseline and show improvements in terms of perplexity over held-out data.