Phrase pair classification for identifying subtopics

  • Authors:
  • Sujatha Das;Prasenjit Mitra;C. Lee Giles

  • Affiliations:
  • Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA;School of Information Science and Technology, The Pennsylvania State University, University Park, PA;School of Information Science and Technology, The Pennsylvania State University, University Park, PA

  • Venue:
  • ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic identification of subtopics for a given topic is desirable because it eliminates the need for manual construction of domain-specific topic hierarchies. In this paper, we design features based on corpus statistics to design a classifier for identifying the (subtopic, topic) links between phrase pairs. We combine these features along with the commonly-used syntactic patterns to classify phrase pairs from datasets in Computer Science and WordNet. In addition, we show a novel application of our is-a-subtopic-of classifier for query expansion in Expert Search and compare it with pseudo-relevance feedback.