Materialization of web data sources

Authors:
Alessandro Bozzon;Stefano Ceri;Srđan Zagorac
Affiliations:
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy;Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy;Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy
Venue:
Search Computing
Year:
2012

Citing 13
Cited 0

Materialized views: techniques, implementations, and applications

Materialized views: techniques, implementations, and applications
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
Downloading textual hidden web content through keyword queries

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Query Selection Techniques for Efficient Crawling of Structured Web Sources

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Data integration: the teenage years

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
A random walk approach to sampling hidden databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Google's Deep Web crawl

Proceedings of the VLDB Endowment
Web-scale extraction of structured data

ACM SIGMOD Record
Liquid query: multi-domain exploratory search on the web

Proceedings of the 19th international conference on World wide web
A refreshing perspective of search engine caching

Proceedings of the 19th international conference on World wide web
Search computing: multi-domain search on ranked data

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A Framework for Integrating, Exploring, and Searching Location-Based Web Data

IEEE Internet Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent years witnessed an exponential increase in the number of data services available on the Web. Many popular Web sites, including social networks, offer API for interacting with their information, and open data initiative such as the Linked Data project promise to achieve the vision of the Web of data. Unfortunately, access to Web data is typically limited by the constraints imposed by the query interface, and by technical limitations such as the network latency, or the number and frequency of allowed daily service invocations. Moreover, several sources may independently publish data about the same real-world objects; in such case, their combined use for assembling all available information about those objects requires duplicate removal, reconciliation and integration. This paper describes various data materialization problems, defining properties such as source coverage and data alignment of the materialized data, and then focuses on a specific problem, the reseeding of data access methods by using available information from previous calls in order to build a materialization of maximum size.