Customizing an information extraction system to a new domain

  • Authors:
  • Mihai Surdeanu;David McClosky;Mason R. Smith;Andrey Gusev;Christopher D. Manning

  • Affiliations:
  • Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA

  • Venue:
  • RELMS '11 Proceedings of the ACL 2011 Workshop on Relational Models of Semantics
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce several ideas that improve the performance of supervised information extraction systems with a pipeline architecture, when they are customized for new domains. We show that: (a) a combination of a sequence tagger with a rule-based approach for entity mention extraction yields better performance for both entity and relation mention extraction; (b) improving the identification of syntactic heads of entity mentions helps relation extraction; and (c) a deterministic inference engine captures some of the joint domain structure, even when introduced as a postprocessing step to a pipeline system. All in all, our contributions yield a 20% relative increase in F1 score in a domain significantly different from the domains used during the development of our information extraction system.