Extracting useful information from the full text of fiction

  • Authors:
  • Sharon Givon;Maria Milosavljevic

  • Affiliations:
  • University of Edinburgh, Edinburgh;Macquarie University, Sydney, NSW, Australia

  • Venue:
  • Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe some experiments in large-scale Information Extraction (IE) focusing on book texts. We investigate the scalability of IE techniques to full-sized books, and the utility of IE techniques in extracting useful information from fiction. In particular, we evaluate a variety of Named Entity Recognition (NER) techniques in identifying the central characters in works of fiction. First, we describe the creation of a gold standard for evaluation, which contains ordered lists of characters for a corpus of classic book texts in Project Gutenberg. Second, we describe several approaches to the task of character identification, where our best model achieves an average coverage score of 78.4% across all central characters. Finally, we propose a number of approaches for future work.