Automated Information Extraction out of Classified Advertisements

  • Authors:
  • Ramón Aragüés Peleato;Jean-Cédric Chappelier;Martin Rajman

  • Affiliations:
  • -;-;-

  • Venue:
  • NLDB '00 Proceedings of the 5th International Conference on Applications of Natural Language to Information Systems-Revised Papers
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an information extraction system that processes the textual content of classified newspaper advertisements in French. The system uses both lexical (words, regular expressions) and contextual information to structure the content of the ads on the basis of predefined thematic forms. The paper first describes the enhanced tagging mechanism used for extraction. A quantitative evaluation of the system is then provided: scores of 99.0% precision/99.8% recall for domain identification and 73% accuracy for information extraction were achieved, on the basis of a comparison with human annotators.