Combining lexical and formatting cues for named entity acquisition from the web

  • Authors:
  • Christian Jacquemin;Caroline Bush

  • Affiliations:
  • CNRS-LIMSI, ORSAY Cedex, France;CNRS-LIMSI, ORSAY Cedex, France

  • Venue:
  • EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Because of their constant renewal, it is necessary to acquire fresh named entities (NEs) from recent text sources. We present a tool for the acquisition and the typing of NEs from the Web that associates a harvester and three parallel shallow parsers dedicated to specific structures (lists, enumerations, and anchors). The parsers combine lexical indices such as discourse markers with formatting instructions (HTML tags) for analyzing enumerations and associated initializers.