Genre identification and goal-focused summarization

  • Authors:
  • Jade Goldstein;Gary M. Ciany;Jaime G. Carbonell

  • Affiliations:
  • U.S. Department of Defense, Fort Meade, MD;Dragon Development Corporation, Columbia, MD;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a novel technique of first performing document genre identification, then utilizing the genre for producing tailored summaries based on a user's information seeking needs - genre oriented goal-focused summarization - such as a plot or opinion summary of a movie review. We create a test corpus to determine genre classification accuracy for 16 genres, and examine performance on various amounts of training data for machine learning algorithms - Random Forests, SVM light and Naïve Bayes. Results show that Random Forests outperforms SVM light and Naïve Bayes. The genre tag is used to inform a downstream summarization engine. We define types of summaries for 7 genres, create a ground truth corpus and analyze the results of genre oriented goal-focused summarization, showing that this type of user based summarization requires different algorithms than the leading sentence baseline which is known to perform well in the case of news articles.