Extraction and integration of web data by end-users

  • Authors:
  • Sudhir Agarwal;Michael Genesereth

  • Affiliations:
  • Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

For increasingly sophisticated use cases end users often need to extract, combine, and aggregate information from various (often dynamically generated) web pages from multiple websites. Current search engines do not focus on combining information from various web pages in order to answer the overall information need of the user. Semantic Web and Linked Data usually take a static view on the data and rely on providers' cooperation. In this paper, we present a novel approach that enables end users to easily extract data from web pages while they browse, store it locally in their browser as well as structure, integrate and search such data. We propose Datalog rules for integrating and searching the extracted data. We show how cleaning steps and integration rules can be reused to accelerate the cleaning and integration of extracted data. The proposed approach is implemented as a browser plugin. We present its implementation details and report on our evaluation of the plugin concerning user experience and browsing time saving.