Sequence models for automatic highlighting and surface information extraction

Authors:
Massih-Reza Amini;Hugo Zaragoza;Patrick Gallinari
Affiliations:
LIP6, University of Paris 6, Paris Cedex 05, France;LIP6, University of Paris 6, Paris Cedex 05, France;LIP6, University of Paris 6, Paris Cedex 05, France
Venue:
IRSG'99 Proceedings of the 21st Annual BCS-IRSG conference on Information Retrieval Research
Year:
1999

Citing 8
Cited 0

Introduction to the theory of neural computation

Introduction to the theory of neural computation
Fundamentals of speech recognition

Fundamentals of speech recognition
Document and passage retrieval based on hidden Markov models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Language Learning

Statistical Language Learning
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Equations for part-of-speech tagging

AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
On the marriage of information retrieval and information extraction

IRSG'97 Proceedings of the 19th Annual BCS-IRSG conference on Information Retrieval Research
Coupled hierarchical IR and stochastic models for surface information extraction

IRSG'98 Proceedings of the 20th Annual BCS-IRSG conference on Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the increase of textual information available electronically, we assist to a great diversification of the demands on Information Retrieval (IR) and Information Extraction (IE) systems. In this paper we apply Machine Learning techniques of sequence analysis to the tasks of highlighting and labeling text with respect to an information extraction task. Specifically, dynamic probability models are used. Like IR systems, they use little semantics, are fully trainable and do not require any knowledge representation of the domain. Unlike IR approaches, documents are considered as a dynamic sequence of words. Furthermore, additional word information is naturally included in the representation. Models are evaluated on a sub-task of the MUC6 Scenario Template corpus. When morpho-syntactic word information is introduced into the representation, an increase in performances is observed.