Selecting labels for news document clusters

Authors:
Krishnaprasad Thirunarayan;Trivikram Immaneni;Mastan Vali Shaik
Affiliations:
Department of Computer Science and Engineering, Wright State University, Dayton, Ohio;Department of Computer Science and Engineering, Wright State University, Dayton, Ohio;Department of Computer Science and Engineering, Wright State University, Dayton, Ohio
Venue:
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Year:
2007

Citing 8
Cited 1

Frequent term-based text clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The Anatomy of a Hierarchical Clustering Engine for Web-page, News and Book Snippets

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Ranking a stream of news

WWW '05 Proceedings of the 14th international conference on World Wide Web
A personalized search engine based on web-snippet hierarchical clustering

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
The anatomy of a news search engine

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
A Concept-Driven Algorithm for Clustering Search Results

IEEE Intelligent Systems
QCS: a tool for querying, clustering, and summarizing documents

NAACL-Demonstrations '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Demonstrations - Volume 4
Clustering of search results using temporal attributes

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Beyond cluster labeling: Semantic interpretation of clusters' contents using a graph representation

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work deals with determination of meaningful and terse cluster labels for News document clusters. We analyze a number of alternatives for selecting headlines and/or sentences of document in a document cluster (obtained as a result of an entity-event-duration query), and formalize an approach to extracting a short phrase from well-supported headlines/sentences of the cluster that can serve as the cluster label. Our technique maps a sentence into a set of significant stems to approximate its semantics, for comparison. Eventually a cluster label is extracted from a selected headline/sentence as a contiguous sequence of words, resuscitating word sequencing information lost in the formalization of semantic equivalence.