Can We Make Information Extraction More Adaptive?

Authors:
Yorick Wilks;Roberta Catizone
Affiliations:
-;-
Venue:
Information Extraction: Towards Scalable, Adaptable Systems
Year:
1999

Citing 25
Cited 3

Machine translation: past, present, future

Machine translation: past, present, future
Semantic interpretation and the resolution of ambiguity

Semantic interpretation and the resolution of ambiguity
Combining weak methods in large scale text processing

Text-based intelligent systems
Lexical ambiguity and information retrieval

ACM Transactions on Information Systems (TOIS)
Bayesian inductive logic programming

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Deterministic part-of-speech tagging with finite-state transducers

Computational Linguistics
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
DATR: a language for lexical knowledge representation

Computational Linguistics
Information Extraction: Techniques and Challenges

SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Evaluating message understanding systems: an analysis of the third message understanding conference (MUC-3)

Computational Linguistics
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Mixed-initiative development of language processing systems

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Dialogue act tagging with Transformation-Based Learning

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Generalizing automatically generated selectional patterns

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Genus disambiguation: a study in weighted preference

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
MUC-5 evaluation metrics

MUC5 '93 Proceedings of the 5th conference on Message understanding
The statistical significance of the MUC-5 results

MUC5 '93 Proceedings of the 5th conference on Message understanding
The generic information extraction system

MUC5 '93 Proceedings of the 5th conference on Message understanding
New York University: description of the Proteus system as used for MUC-5

MUC5 '93 Proceedings of the 5th conference on Message understanding
University of Massachusetts: description of the CIRCUS system as used for MUC-4

MUC4 '92 Proceedings of the 4th conference on Message understanding
University of Durham: description of the LOLITA system as used in MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding
MITRE: description of the Alembic system used for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding
Using coreference chains for text summarization

CorefApp '99 Proceedings of the Workshop on Coreference and its Applications
A statistical approach to mechanized encoding and searching of literary information

IBM Journal of Research and Development

Greek Verb Semantic Processing for Stock Market Text Mining

NLP '00 Proceedings of the Second International Conference on Natural Language Processing
Mining the semantics of text via counter-training

EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Extracting structured subject information from digital document archives

ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities

Quantified Score

Hi-index	0.00

Visualization

Abstract

It seems widely agreed that IE (Information Extraction) is now a tested language technology that has reached precision+recall values that put it in about the same position as Information Retrieval and Machine Translation, both of which are widely used commercially. There is also a clear range of practical applications that would be eased by the sort of template-style data that IE provides. The problem for wider deployment of the technology is adaptability: the ability to customize IE rapidly to new domains. In this paper we discuss some methods that have been tried to ease this problem, and to create something more rapid than the benchmark one-month figure, which was roughly what ARPA teams in IE needed to adapt an existing system by hand to a new domain of corpora and templates. An important distinction in discussing the issue is the degree to which a user can be assumed to know what is wanted, to have pre-existing templates ready to hand, as opposed to a user who has a vague idea of what is needed from a corpus. We shall discuss attempts to derive templates directly from corpora; to derive knowledge structures and lexicons directly from corpora, including discussion of the recent LE project ECRAN which attempted to tune existing lexicons to new corpora. An important issue is how far established methods in Information Retrieval of tuning to a user's needs with feedback at an interface can be transferred to IE.