Keyword extraction from a single document using centrality measures

Authors:
Girish Keshav Palshikar
Affiliations:
Tata Research Development and Design Centre, Pune, India
Venue:
PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Year:
2007

Citing 2
Cited 3

Introduction to algorithms

Introduction to algorithms
KeyGraph: Automatic Indexing by Co-occurrence Graph based on Building Construction Metaphor

ADL '98 Proceedings of the Advances in Digital Libraries Conference

Task 5: Single document keyphrase extraction using sentence clustering and latent Dirichlet allocation

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Combining summaries using unsupervised rank aggregation

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
A tweet summarization method based on a keyword graph

Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keywords characterize the topics discussed in a document. Extracting a small set of keywords from a single document is an important problem in text mining. We propose a hybrid structural and statistical approach to extract keywords. We represent the given document as an undirected graph, whose vertices are words in the document and the edges are labeled with a dissimilarity measure between two words, derived from the frequency of their co-occurrence in the document. We propose that central vertices in this graph are candidates as keywords. We model importance of a word in terms of its centrality in this graph. Using graph-theoretical notions of vertex centrality, we suggest several algorithms to extract keywords from the given document. We demonstrate the effectiveness of the proposed algorithms on real-life documents.