Automatic subject metadata generation for scientific documents using wikipedia and genetic algorithms

  • Authors:
  • Arash Joorabchi;Abdulhussain E. Mahdi

  • Affiliations:
  • Department of Electronic and Computer Engineering, University of Limerick, Ireland;Department of Electronic and Computer Engineering, University of Limerick, Ireland

  • Venue:
  • EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents. However, scientific documents that are manually annotated with keyphrases are in the minority. This paper describes a machine learning-based automatic keyphrase annotation method for scientific documents, which utilizes Wikipedia as a thesaurus for candidate selection from documents' content and deploys genetic algorithms to learn a model for ranking and filtering the most probable keyphrases. Reported experimental results show that the performance of our method, evaluated in terms of inter-consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised methods.