Customizing an information extraction system to a new domain

Authors:
Mihai Surdeanu;David McClosky;Mason R. Smith;Andrey Gusev;Christopher D. Manning
Affiliations:
Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA
Venue:
RELMS '11 Proceedings of the ACL 2011 Workshop on Relational Models of Semantics
Year:
2011

Citing 16
Cited 2

Automatic labeling of semantic roles

Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Named Entity recognition without gazetteers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Logic form transformation of WordNet and its applicability to question answering

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Postnominal prepositional phrase attachment in proteomics

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Overview of BioNLP'09 shared task on event extraction

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
Design challenges and misconceptions in named entity recognition

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
NER systems that suit user's preferences: adjusting the recall-precision trade-off for entity extraction

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Distant supervision for relation extraction without labeled data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Simple coreference resolution with rich syntactic and semantic features

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Coupled semi-supervised learning for information extraction

Proceedings of the third ACM international conference on Web search and data mining
Joint inference for knowledge extraction from biomedical literature

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improving mention detection robustness to noisy input

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Collective cross-document relation extraction without labelled data

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Recall-oriented learning of named entities in Arabic Wikipedia

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Multi-instance multi-label learning for relation extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce several ideas that improve the performance of supervised information extraction systems with a pipeline architecture, when they are customized for new domains. We show that: (a) a combination of a sequence tagger with a rule-based approach for entity mention extraction yields better performance for both entity and relation mention extraction; (b) improving the identification of syntactic heads of entity mentions helps relation extraction; and (c) a deterministic inference engine captures some of the joint domain structure, even when introduced as a postprocessing step to a pipeline system. All in all, our contributions yield a 20% relative increase in F1 score in a domain significantly different from the domains used during the development of our information extraction system.