An experimental study of the impact of information extraction accuracy on semantic search performance

  • Authors:
  • Jennifer Chu-Carroll;John Prager

  • Affiliations:
  • IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Researchers have shown that various natural language processing techniques can be used in document analysis to impact search performance. For the most part, they examined how an analysis system with certain performance characteristics can be leveraged to improve document and/or passage search results. We have previously shown that semantic queries which utilize named entity and relation information extracted from the corpus can lead to significant improvement in search performance. In this paper, we extend our previous efforts and examine how search performance degrades in the face of imperfect named entity and relation extraction. Our study was carried out by developing gold standard annotated corpora and applying different error models to the gold standard annotations to simulate errors made by automatic recognizers. We identify automatic recognizer characteristics that make them more amenable to our search tasks, show that recognizer recall has more significant impact on semantic search performance than its precision, and demonstrate that significant improvement in both MAP and Exact Precision scores can be achieved by adopting automatic named entity and relation recognizers with near state-of-the-art performance.