Extracting useful information from the full text of fiction

Authors:
Sharon Givon;Maria Milosavljevic
Affiliations:
University of Edinburgh, Edinburgh;Macquarie University, Sydney, NSW, Australia
Venue:
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Year:
2007

Citing 3
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Language independent NER using a maximum entropy tagger

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
The utility of information extraction in the classification of books

ECIR'07 Proceedings of the 29th European conference on IR research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe some experiments in large-scale Information Extraction (IE) focusing on book texts. We investigate the scalability of IE techniques to full-sized books, and the utility of IE techniques in extracting useful information from fiction. In particular, we evaluate a variety of Named Entity Recognition (NER) techniques in identifying the central characters in works of fiction. First, we describe the creation of a gold standard for evaluation, which contains ordered lists of characters for a corpus of classic book texts in Project Gutenberg. Second, we describe several approaches to the task of character identification, where our best model achieves an average coverage score of 78.4% across all central characters. Finally, we propose a number of approaches for future work.