Indexing relations on the web

Authors:
Sergio Luis Sardi Mergen;Juliana Freire;Carlos Alberto Heuser
Affiliations:
Universidade Federal do Rio Grande do Sul(UFRGS), Porto Alegre, RS - Brasil;School of Computing--University of Utah, Salt Lake City;Universidade Federal do Rio Grande do Sul(UFRGS), Porto Alegre, RS - Brasil
Venue:
Proceedings of the 13th International Conference on Extending Database Technology
Year:
2010

Citing 15
Cited 1

A layered architecture for querying dynamic Web content

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Conceptual-model-based data extraction from multiple-record Web pages

Data & Knowledge Engineering
Information integration using logical views

Theoretical Computer Science - Special issue on the 6th International Conference on Database Theory—ICDT '97
Theory of answering queries using views

ACM SIGMOD Record
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Statistical schema matching across web query interfaces

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
DBXplorer: A System for Keyword-Based Search over Relational Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
From databases to dataspaces: a new abstraction for information management

ACM SIGMOD Record
Indexing dataspaces

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
iTrails: pay-as-you-go information integration in dataspaces

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
The Claremont report on database research

ACM SIGMOD Record

Web table taxonomy and formalization

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been a substantial increase in the volume of (semi) structured data on the Web. This opens new opportunities for exploring and querying these data that goes beyond the keyword-based queries traditionally used on the Web. But supporting queries over a very large number of apparently disconnected Web sources is challenging. In this paper we propose index methods that capture both the structure of the sources and connections between them. The indexes are designed for data that is represented as relations, such as HTML tables, and support queries with predicates. We show how associations between overlapping sources are discovered, captured in the indexes, and used to derive query rewritings that join multiple sources. We demonstrate, through an experimental evaluation, that our approach scales to a large number of sources.