Event-centric search and exploration in document collections

  • Authors:
  • Jannik Strötgen;Michael Gertz

  • Affiliations:
  • Heidelberg University, Heidelberg, Germany;Heidelberg University, Heidelberg, Germany

  • Venue:
  • Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Textual data ranging from corpora of digitized historic documents to large collections of news feeds provide a rich source for temporal and geographic information. Such types of information have recently gained a lot of interest in support of different search and exploration tasks, e.g., by organizing news along a timeline or placing the origin of documents on a map. However, for this, temporal and geographic information embedded in documents is often considered in isolation. We claim that through combining such information into (chronologically ordered) event-like features interesting and meaningful search and exploration tasks are possible. In this paper, we present a framework for the extraction, exploration, and visualization of event information in document collections. For this, one has to identify and combine temporal and geographic expressions from documents, thus enriching a document collection by a set of normalized events. Traditional search queries then can be enriched by conditions on the events relevant to the search subject. Most important for our event-centric approach is that a search result consists of a sequence of events relevant to the search terms and not just a document hit-list. Such events can originate from different documents and can be further explored, in particular events relevant to a search query can be ordered chronologically. We demonstrate the utility of our framework by different (multilingual) search and exploration scenarios using a Wikipedia corpus.