Automatic subject metadata generation for scientific documents using wikipedia and genetic algorithms

Authors:
Arash Joorabchi;Abdulhussain E. Mahdi
Affiliations:
Department of Electronic and Computer Engineering, University of Limerick, Ireland;Department of Electronic and Computer Engineering, University of Limerick, Ireland
Venue:
EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
Year:
2012

Citing 12
Cited 0

KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Using Noun Phrase Heads to Extract Document Keyphrases

AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Thesaurus based automatic keyphrase indexing

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Mining Domain-Specific Thesauri from Wikipedia: A Case Study

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Domain-independent automatic keyphrase indexing with small training sets

Journal of the American Society for Information Science and Technology
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Extracting key terms from noisy and multitheme documents

Proceedings of the 18th international conference on World wide web
Mining meaning from Wikipedia

International Journal of Human-Computer Studies
Coherent keyphrase extraction via web mining

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Keyphrase extraction in scientific publications

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
A citation-based approach to automatic topical indexing of scientific literature

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents. However, scientific documents that are manually annotated with keyphrases are in the minority. This paper describes a machine learning-based automatic keyphrase annotation method for scientific documents, which utilizes Wikipedia as a thesaurus for candidate selection from documents' content and deploys genetic algorithms to learn a model for ranking and filtering the most probable keyphrases. Reported experimental results show that the performance of our method, evaluated in terms of inter-consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised methods.