Towards Deeper Understanding of the Search Interfaces of the Deep Web

  • Authors:
  • Hai He;Weiyi Meng;Yiyao Lu;Clement Yu;Zonghuan Wu

  • Affiliations:
  • Department of Computer Science, SUNY at Binghamton, Binghamton, USA 13902;Department of Computer Science, SUNY at Binghamton, Binghamton, USA 13902;Department of Computer Science, SUNY at Binghamton, Binghamton, USA 13902;Department of Computer Science, University of Illinois at Chicago, Chicago, USA 60607;Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, USA 70504

  • Venue:
  • World Wide Web
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many databases have become Web-accessible through form-based search interfaces (i.e., HTML forms) that allow users to specify complex and precise queries to access the underlying databases. In general, such a Web search interface can be considered as containing an interface schema with multiple attributes and rich semantic/meta-information; however, the schema is not formally defined in HTML. Many Web applications, such as Web database integration and deep Web crawling, require the construction of the schemas. In this paper, we first propose a schema model for representing complex search interfaces, and then present a layout-expression based approach to automatically extract the logical attributes from search interfaces. We also rephrase the identification of different types of semantic information as a classification problem, and design several Bayesian classifiers to help derive semantic information from extracted attributes. A system, WISE-iExtractor, has been implemented to automatically construct the schema from any Web search interfaces. Our experimental results on real search interfaces indicate that this system is highly effective.