Testbed for information extraction from deep web

  • Authors:
  • Yasuhiro Yamada;Nick Craswell;Tetsuya Nakatoh;Sachio Hirokawa

  • Affiliations:
  • Kyushu University, Fukuoka, Japan;CSIRO Mathematical and Information Sciences, Canberra, Australia;Kyushu University, Fukuoka, Japan;Kyushu University, Fukuoka, Japan

  • Venue:
  • Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web. We need to extract the target data in results pages to integrate them on different searchable databases. We propose a test bed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results pageand manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.