Keyword extraction from a single document using centrality measures

  • Authors:
  • Girish Keshav Palshikar

  • Affiliations:
  • Tata Research Development and Design Centre, Pune, India

  • Venue:
  • PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Keywords characterize the topics discussed in a document. Extracting a small set of keywords from a single document is an important problem in text mining. We propose a hybrid structural and statistical approach to extract keywords. We represent the given document as an undirected graph, whose vertices are words in the document and the edges are labeled with a dissimilarity measure between two words, derived from the frequency of their co-occurrence in the document. We propose that central vertices in this graph are candidates as keywords. We model importance of a word in terms of its centrality in this graph. Using graph-theoretical notions of vertex centrality, we suggest several algorithms to extract keywords from the given document. We demonstrate the effectiveness of the proposed algorithms on real-life documents.