The Journal of Machine Learning Research
Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Personal blog retrieval using opinion features
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Aspect extraction through semi-supervised modeling
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
Faceted blog distillation aims at retrieving the blogs that are not only relevant to a query but also exhibit an interested facet. In this paper we consider personal and official facets. Personal blogs depict various topics related to the personal experiences of bloggers while official blogs deliver contents with bloggers' commercial influences. We observe that some terms, such as nouns, usually describe the topics of posts in blogs while other terms, such as pronouns and adverbs, normally reflect the facets of posts. Thus we present a model that estimates the probabilistic distributions of topics and those of facets in posts. It leverages a classifier to separate facet terms from topical terms in the posterior inference. We also observe that the posts from a blog are likely to exhibit the same facet. So we propose another model that constrains the posts from a blog to have the same facet distributions in its generative process. Experimental results using the TREC 2009-2010 queries over the TREC Blogs08 collection show the effectiveness of both models. Our results outperform the best known results for personal and official distillation.