Keyword Extraction Using Word Co-occurrence

Authors:
Christian Wartena;Rogier Brussee;Wout Slakhorst
Affiliations:
-;-;-
Venue:
DEXA '10 Proceedings of the 2010 Workshops on Database and Expert Systems Applications
Year:
2010

Citing 0
Cited 6

Selecting keywords for content based recommendation

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Automatic tagging and geotagging in video collections and communities

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
A semi-supervised approach for key-synset extraction to be used in word sense disambiguation

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Using Wikipedia concepts and frequency in language to extract key terms from support documents

Expert Systems with Applications: An International Journal
Exploiting user comments for audio-visual content indexing and retrieval

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Keyword extraction for blogs based on content richness

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common strategy to assign keywords to documents is to select the most appropriate words from the document text. One of the most important criteria for a word to be selected as keyword is its relevance for the text. The tf.idf score of a term is a widely used relevance measure. While easy to compute and giving quite satisfactory results, this measure does not take (semantic) relations between words into account. In this paper we study some alternative relevance measures that do use relations between words. They are computed by defining co-occurrence distributions for words and comparing these distributions with the document and the corpus distribution. We then evaluate keyword extraction algorithms defined by selecting different relevance measures. For two corpora of abstracts with manually assigned keywords, we compare manually extracted keywords with different automatically extracted ones. The results show that using word co-occurrence information can improve precision and recall over tf.idf.