Domain-independent automatic keyphrase indexing with small training sets

Authors:
Olena Medelyan;Ian H. Witten
Affiliations:
Department of Computer Science, University of Waikato, Private Bag 3105, Hamilton 3240, New Zealand;Department of Computer Science, University of Waikato, Private Bag 3105, Hamilton 3240, New Zealand
Venue:
Journal of the American Society for Information Science and Technology
Year:
2008

Citing 0
Cited 12

A clustering-based semi-automated technique to build cultural ontologies

Journal of the American Society for Information Science and Technology
273. Task 5. Keyphrase extraction based on core word identification and word expansion

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
BUAP: An unsupervised approach to automatic keyphrase extraction from scientific articles

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
HUMB: Automatic key term extraction from scientific articles in GROBID

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
A citation-based approach to automatic topical indexing of scientific literature

Journal of Information Science
The HIVE impact: contributing to consistency via automatic indexing

Proceedings of the 2012 iConference
Micropinion generation: an unsupervised approach to generating ultra-concise summaries of opinions

Proceedings of the 21st international conference on World Wide Web
Investigating keyphrase indexing with text denoising

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Automatic subject metadata generation for scientific documents using wikipedia and genetic algorithms

EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
DIKEA: domain-independent keyphrase extraction algorithm

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms

Journal of Information Science
Ontologies and terminologies: Continuum or dichotomy?

Applied Ontology - Ontologies and Terminologies: Continuum or Dichotomy?

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain-specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents. © 2008 Wiley Periodicals, Inc.