Incorporating content structure into text analysis applications

Authors:
Christina Sauper;Aria Haghighi;Regina Barzilay
Affiliations:
Massachusetts Institute of Technology;Massachusetts Institute of Technology;Massachusetts Institute of Technology
Venue:
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Year:
2010

Citing 17
Cited 5

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
Semiring parsing

Computational Linguistics
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Characterising measures of lexical distributional similarity

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Automatic identification of pro and con reasons in online reviews

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Just how mad are you? finding strong and weak opinion clauses

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning with compositional semantics as structural inference for subsentential sentiment analysis

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Joint parsing and named entity recognition

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Exploring content models for multi-document summarization

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning document-level semantic properties from free-text annotations

Journal of Artificial Intelligence Research
Seeing stars when there aren't many stars: graph-based semi-supervised learning for sentiment categorization

TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
Fast consensus decoding over translation forests

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Supervised and unsupervised methods in employing discourse relations for improving opinion polarity classification

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Content modeling using latent permutations

Journal of Artificial Intelligence Research

Content models with attitude

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Semi-supervised latent variable models for sentence-level sentiment analysis

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Sentiment classification based on supervised latent n-gram analysis

Proceedings of the 20th ACM international conference on Information and knowledge management
Named entity recognition in tweets: an experimental study

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatic aggregation by joint modeling of aspects and values

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the linguistic intuition that rich contextual information should be useful in these tasks. We present a framework which combines a supervised text analysis application with the induction of latent content structure. Both of these elements are learned jointly using the EM algorithm. The induced content structure is learned from a large unannotated corpus and biased by the underlying text analysis task. We demonstrate that exploiting content structure yields significant improvements over approaches that rely only on local context.