Simultaneous joint and conditional modeling of documents tagged from two perspectives

Authors:
Pradipto Das;Rohini Srihari;Yun Fu
Affiliations:
SUNY Buffalo, Buffalo, NY, USA;SUNY Buffalo, Buffalo, NY, USA;SUNY Buffalo, Buffalo, NY, USA
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 12
Cited 2

Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Modeling local coherence: an entity-based approach

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Use of ranked cross document evidence trails for hypothesis generation

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering the tagged web

Proceedings of the Second ACM International Conference on Web Search and Data Mining
WordNet: similarity - measuring the relatedness of concepts

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
DUC 2005: evaluation of question-focused summarization systems

SumQA '06 Proceedings of the Workshop on Task-Focused Summarization and Question Answering
Joint Emotion-Topic Modeling for Social Affective Text Mining

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
The topic-perspective model for social tagging systems

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic evaluation of topic coherence

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
How many words is a picture worth? Automatic caption generation for news images

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Discovering different types of topics: factored topic models

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Comment-based multi-view clustering of web 2.0 items

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores correspondence and mixture topic modeling of documents tagged from two different perspectives. There has been ongoing work in topic modeling of documents with tags (tag-topic models) where words and tags typically reflect a single perspective, namely document content. However, words in documents can also be tagged from different perspectives, for example, syntactic perspective as in part-of-speech tagging or an opinion perspective as in sentiment tagging. The models proposed in this paper are novel in: (i) the consideration of two different tag perspectives -- a document level tag perspective that is relevant to the document as a whole and a word level tag perspective pertaining to each word in the document; (ii) the attribution of latent topics with word level tags and labeling latent topics with images in case of multimedia documents; and (iii) discovering the possible correspondence of the words to document level tags. The proposed correspondence tag-topic model shows better predictive power i.e. higher likelihood on heldout test data than all existing tag topic models and even a supervised topic model. To evaluate the models in practical scenarios, quantitative measures between the outputs of the proposed models and the ground truth domain knowledge have been explored. Manually assigned (gold standard) document category labels in Wikipedia pages are used to validate model-generated tag suggestions using a measure of pairwise concept similarity within an ontological hierarchy like WordNet. Using a news corpus, automatic relationship discovery between person names was performed and compared to a robust baseline.