Extraction and integration of web data by end-users

Authors:
Sudhir Agarwal;Michael Genesereth
Affiliations:
Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 9
Cited 0

Wrapper generation for semi-structured Internet sources

ACM SIGMOD Record
Data integration: a theoretical perspective

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The perfect search engine is not enough: a study of orienteering behavior in directed search

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Transforming arbitrary tables into logical form with TARTAR

Data & Knowledge Engineering
Freebase: a collaboratively created graph database for structuring human knowledge

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
No Code Required: Giving Users Tools to Transform the Web

No Code Required: Giving Users Tools to Transform the Web
Studying trailfinding algorithms for enhanced web search

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Assessing the scenic route: measuring the value of search trails in web logs

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

For increasingly sophisticated use cases end users often need to extract, combine, and aggregate information from various (often dynamically generated) web pages from multiple websites. Current search engines do not focus on combining information from various web pages in order to answer the overall information need of the user. Semantic Web and Linked Data usually take a static view on the data and rely on providers' cooperation. In this paper, we present a novel approach that enables end users to easily extract data from web pages while they browse, store it locally in their browser as well as structure, integrate and search such data. We propose Datalog rules for integrating and searching the extracted data. We show how cleaning steps and integration rules can be reused to accelerate the cleaning and integration of extracted data. The proposed approach is implemented as a browser plugin. We present its implementation details and report on our evaluation of the plugin concerning user experience and browsing time saving.