Automated Information Extraction out of Classified Advertisements

Authors:
Ramón Aragüés Peleato;Jean-Cédric Chappelier;Martin Rajman
Affiliations:
-;-;-
Venue:
NLDB '00 Proceedings of the 5th International Conference on Applications of Natural Language to Information Systems-Revised Papers
Year:
2000

Citing 9
Cited 0

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
An empirical study of automated dictionary construction for information extraction in three domains

Artificial Intelligence - Special volume on empirical methods
Ontology-based extraction and structuring of information from data-rich unstructured documents

Proceedings of the seventh international conference on Information and knowledge management
Conceptual-model-based data extraction from multiple-record Web pages

Data & Knowledge Engineering
Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction

IEEE Transactions on Knowledge and Data Engineering
Information Extraction: Techniques and Challenges

SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Description of the UMass system as used for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding
The NYU system for MUC-6 or where's the syntax?

MUC6 '95 Proceedings of the 6th conference on Message understanding
SRI International FASTUS system: MUC-6 test results and analysis

MUC6 '95 Proceedings of the 6th conference on Message understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an information extraction system that processes the textual content of classified newspaper advertisements in French. The system uses both lexical (words, regular expressions) and contextual information to structure the content of the ads on the basis of predefined thematic forms. The paper first describes the enhanced tagging mechanism used for extraction. A quantitative evaluation of the system is then provided: scores of 99.0% precision/99.8% recall for domain identification and 73% accuracy for information extraction were achieved, on the basis of a comparison with human annotators.