Can back-of-the-book indexes be automatically created?

Authors:
Zhaohui Wu;Zhenhui Li;Prasenjit Mitra;C. Lee Giles
Affiliations:
Computer Science and Engineering, Pennsylvania State University, State College, PA, USA;Information Sciences and Technology, Pennsylvania State University, State College, USA;Information Sciences and Technology, Pennsylvania State University, State College, USA;Information Sciences and Technology, Pennsylvania State University, State College, USA
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 17
Cited 0

Word association norms, mutual information, and lexicography

Computational Linguistics
KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Frequency estimates for statistical word similarity measures

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Wikify!: linking documents to encyclopedic knowledge

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Semantic-based estimation of term informativeness

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
The impact of document structure on keyphrase extraction

Proceedings of the 18th ACM conference on Information and knowledge management
Human-competitive tagging using automatic keyphrase extraction

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Keyword extraction for social snippets

Proceedings of the 19th international conference on World wide web
Automatic generation of personalized annotation tags for Twitter users

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Automatic keyphrase extraction via topic decomposition

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Automatically suggesting topics for augmenting text documents

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Topical keyphrase extraction from Twitter

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Searching online book documents and analyzing book citations

Proceedings of the 2013 ACM symposium on Document engineering
Table of Contents Recognition and Extraction for Heterogeneous Book Documents

ICDAR '13 Proceedings of the 2013 12th International Conference on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic creation of back-of-the-book indexes remains one of the few manual tasks related to publishing. Inspired by how human indexers work on back-of-the-book indexes creation, we present a new domain-independent, corpus-free and training-free automation approach. Given a book, the index terms will be sequentially selected according to an indexability score encoded by the structure information residing in a book as well as a novel context-aware term informativeness measurement utilizing the power of the web knowledge base such as Wikipedia. By extensive experiments on books from various domains, we show our approach to be a more effective and practical than ones that used previous keyword extraction and supervised learning.