A weakly-supervised detection of entity central documents in a stream

  • Authors:
  • Ludovic Bonnefoy;Vincent Bouvier;Patrice Bellot

  • Affiliations:
  • University of Avignon CERI-LIA / iSmart, Avignon, France;Aix-Marseille University / LSIS CNRS / Kware, Marseille, France;Aix-Marseille University / LSIS CNRS, Marseille, France

  • Venue:
  • Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Filtering a time-ordered corpus for documents that are highly relevant to an entity is a task receiving more and more attention over the years. One application is to reduce the delay between the moment an information about an entity is being first observed and the moment the entity entry in a knowledge base is being updated. Current state-of-the-art approaches are highly supervised and require training examples for each entity monitored. We propose an approach which does not require new training data when processing a new entity. To capture intrinsic characteristics of highly relevant documents our approach relies on three types of features: document centric features, entity profile related features and time features. Evaluated within the framework of the "Knowledge Base Acceleration" track at TREC 2012, it outperforms current state-of-the-art approaches.