Selecting labels for news document clusters

  • Authors:
  • Krishnaprasad Thirunarayan;Trivikram Immaneni;Mastan Vali Shaik

  • Affiliations:
  • Department of Computer Science and Engineering, Wright State University, Dayton, Ohio;Department of Computer Science and Engineering, Wright State University, Dayton, Ohio;Department of Computer Science and Engineering, Wright State University, Dayton, Ohio

  • Venue:
  • NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work deals with determination of meaningful and terse cluster labels for News document clusters. We analyze a number of alternatives for selecting headlines and/or sentences of document in a document cluster (obtained as a result of an entity-event-duration query), and formalize an approach to extracting a short phrase from well-supported headlines/sentences of the cluster that can serve as the cluster label. Our technique maps a sentence into a set of significant stems to approximate its semantics, for comparison. Eventually a cluster label is extracted from a selected headline/sentence as a contiguous sequence of words, resuscitating word sequencing information lost in the formalization of semantic equivalence.