Cost-effective web search in bootstrapping for named entity recognition

  • Authors:
  • Hideki Kawai;Hironori Mizuguchi;Masaaki Tsuchida

  • Affiliations:
  • NEC C&C Innovation Research Laboratories, Ikoma, Nara, Japan;NEC Service Platforms Research Laboratories, Ikoma, Nara, Japan;NEC Service Platforms Research Laboratories, Ikoma, Nara, Japan

  • Venue:
  • DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a cost-effective search strategy framework to extract keywords in the same semantic class from the Web. Constructing a dictionary based on the bootstrapping technique is one promising approach to harnessing knowledge scattered around the Web. Open web application programming interfaces (APIs) are powerful tools for the knowledge-gathering process. However, we have to consider the cost of API calls because too many queries can overload the search engines, and they also limit the number of API calls. Our goal is to optimize a search strategy that can collect as many new words as possible with the least API calls. Our results show that the optimized search strategy can extract 64,642 words in five different domains with a precision of 0.94 with only 1,000 search API calls.