Creating a testbed for the evaluation of automatically generated back-of-the-book indexes

Authors:
Andras Csomai;Rada F. Mihalcea
Affiliations:
Computer Science Department, University of North Texas;Computer Science Department, University of North Texas
Venue:
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2006

Citing 6
Cited 0

Learning Algorithms for Keyphrase Extraction

Information Retrieval
Automatic Keyword Extraction Using Domain Knowledge

CICLing '01 Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing
Named Entity recognition without gazetteers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
A language model approach to keyphrase extraction

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Coherent keyphrase extraction via web mining

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The automatic generation of back-of-the book indexes seems to be out of sight of the Information Retrieval and Natural Language Processing communities, although the increasingly large number of books available in electronic format, as well as recent advances in keyphrase extraction, should motivate an increased interest in this topic. In this paper, we describe the background relevant to the process of creating back-of-the-book indexes, namely (1) a short overview of the origin and structure of back-of-the-book indexes, and (2) the correspondence that can be established between techniques for automatic index construction and keyphrase extraction. Since the development of any automatic system requires in the first place an evaluation testbed, we describe our work in building a gold standard collection of books and indexes, and we present several metrics that can be used for the evaluation of automatically generated indexes against the gold standard. Finally, we investigate the properties of the gold standard index, such as index size, length of index entries, and upper bounds on coverage as indicated by the presence of index entries in the document.