Machine translation: past, present, future
Machine translation: past, present, future
Semantic interpretation and the resolution of ambiguity
Semantic interpretation and the resolution of ambiguity
Combining weak methods in large scale text processing
Text-based intelligent systems
Lexical ambiguity and information retrieval
ACM Transactions on Information Systems (TOIS)
Bayesian inductive logic programming
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Some advances in transformation-based part of speech tagging
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Deterministic part-of-speech tagging with finite-state transducers
Computational Linguistics
DATR: a language for lexical knowledge representation
Computational Linguistics
Information Extraction: Techniques and Challenges
SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Mixed-initiative development of language processing systems
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Dialogue act tagging with Transformation-Based Learning
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Generalizing automatically generated selectional patterns
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Genus disambiguation: a study in weighted preference
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
MUC5 '93 Proceedings of the 5th conference on Message understanding
The statistical significance of the MUC-5 results
MUC5 '93 Proceedings of the 5th conference on Message understanding
The generic information extraction system
MUC5 '93 Proceedings of the 5th conference on Message understanding
New York University: description of the Proteus system as used for MUC-5
MUC5 '93 Proceedings of the 5th conference on Message understanding
University of Massachusetts: description of the CIRCUS system as used for MUC-4
MUC4 '92 Proceedings of the 4th conference on Message understanding
University of Durham: description of the LOLITA system as used in MUC-6
MUC6 '95 Proceedings of the 6th conference on Message understanding
MITRE: description of the Alembic system used for MUC-6
MUC6 '95 Proceedings of the 6th conference on Message understanding
Using coreference chains for text summarization
CorefApp '99 Proceedings of the Workshop on Coreference and its Applications
A statistical approach to mechanized encoding and searching of literary information
IBM Journal of Research and Development
Greek Verb Semantic Processing for Stock Market Text Mining
NLP '00 Proceedings of the Second International Conference on Natural Language Processing
Mining the semantics of text via counter-training
EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Extracting structured subject information from digital document archives
ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
Hi-index | 0.00 |
It seems widely agreed that IE (Information Extraction) is now a tested language technology that has reached precision+recall values that put it in about the same position as Information Retrieval and Machine Translation, both of which are widely used commercially. There is also a clear range of practical applications that would be eased by the sort of template-style data that IE provides. The problem for wider deployment of the technology is adaptability: the ability to customize IE rapidly to new domains. In this paper we discuss some methods that have been tried to ease this problem, and to create something more rapid than the benchmark one-month figure, which was roughly what ARPA teams in IE needed to adapt an existing system by hand to a new domain of corpora and templates. An important distinction in discussing the issue is the degree to which a user can be assumed to know what is wanted, to have pre-existing templates ready to hand, as opposed to a user who has a vague idea of what is needed from a corpus. We shall discuss attempts to derive templates directly from corpora; to derive knowledge structures and lexicons directly from corpora, including discussion of the recent LE project ECRAN which attempted to tune existing lexicons to new corpora. An important issue is how far established methods in Information Retrieval of tuning to a user's needs with feedback at an interface can be transferred to IE.