Genre identification and goal-focused summarization

Authors:
Jade Goldstein;Gary M. Ciany;Jaime G. Carbonell
Affiliations:
U.S. Department of Defense, Fort Meade, MD;Dragon Development Corporation, Columbia, MD;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Year:
2007

Citing 5
Cited 2

Genre based Navigation on the Web

HICSS '01 Proceedings of the 34th Annual Hawaii International Conference on System Sciences ( HICSS-34)-Volume 4 - Volume 4
The Functionality Attribute of Cybergenres

HICSS '99 Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences-Volume 2 - Volume 2
Recognizing text genres with simple metrics using discriminant analysis

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
The form is the substance: classification of genres in text

HLTKM '01 Proceedings of the workshop on Human Language Technology and Knowledge Management - Volume 2001
Recognizing contextual polarity in phrase-level sentiment analysis

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

Person identification from text and speech genre samples

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Cross-lingual genre classification

EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a novel technique of first performing document genre identification, then utilizing the genre for producing tailored summaries based on a user's information seeking needs - genre oriented goal-focused summarization - such as a plot or opinion summary of a movie review. We create a test corpus to determine genre classification accuracy for 16 genres, and examine performance on various amounts of training data for machine learning algorithms - Random Forests, SVM light and Naïve Bayes. Results show that Random Forests outperforms SVM light and Naïve Bayes. The genre tag is used to inform a downstream summarization engine. We define types of summaries for 7 genres, create a ground truth corpus and analyze the results of genre oriented goal-focused summarization, showing that this type of user based summarization requires different algorithms than the leading sentence baseline which is known to perform well in the case of news articles.