Web data retrieval and extraction

  • Authors:
  • Zoé Lacroix

  • Affiliations:
  • Department of Computer Science, Arizona State University, P.O. Box 876106, Tempe, AZ

  • Venue:
  • Data & Knowledge Engineering - Special issue: Data integration over the Web
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present the Object-Web Mediator to querying integrated Web data sources composed of a retrieval component based on an intermediate object view mechanism and search views, and an XML engine. Search views map the source capabilities to attributes defined at object classes, and parsers that process retrieved documents and cache them in XML format. The XML engine queries cached documents, extracts data, and returns extracted data for evaluation. The originality of this approach consists of a generic view mechanism to access data sources with limited data access and complex capabilities, and an XML engine to support data extraction and reorganization. This approach has been developed and demonstrated as part of the multi-database system supposing queries via uniform Object Protocol Model interfaces against public Web data sources of interest to the biologists.