Integrating web query results: holistic schema matching

Authors:
Shui-Lung Chuang;Kevin Chen-Chuan Chang
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 19
Cited 0

SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks

Data & Knowledge Engineering
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Using Schema Matching to Simplify Heterogeneous Data Translation

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Comparison of Schema Matching Evaluations

Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
On schema matching with opaque column names and data values

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Statistical schema matching across web query interfaces

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
An interactive clustering-based approach to integrating source query interfaces on the deep Web

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Corpus-Based Schema Matching

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Schema Matching Using Duplicates

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Light-weight domain-based form assistant: querying web databases on the fly

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Merging Source Query Interfaces onWeb Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Instance-based schema matching for web databases by domain-specific query probing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Context-aware wrapping: synchronized data extraction

VLDB '07 Proceedings of the 33rd international conference on Very large data bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

The emergence of numerous data sources online has presented a pressing need for more automatic yet accurate data integration techniques. For the data returned from querying such sources, most works focus on how to extract the embedded structured data more accurately. However, to eventually provide an integrated access to these query results, a last but not least step is to combine the extracted data coming from different sources. A critical task is finding the correspondence of the data fields between the sources - a problem well known as schema matching. Query results are a small and biased sample set of instances obtained from sources; the obtained schema information is thus very implicit and incomplete, which often prevents existing schema matching approaches from performing effectively. In this paper, we develop a novel framework for understanding and effectively supporting schema matching on such instance-based data, especially for integrating multiple sources. We view discovering matching as constructing a more complete domain schema that best describes the input data. With this conceptual view, we can leverage various data instances and observed regularities seamlessly with holistic, multiple-source schema matching to achieve more accurate matching results. Our experiments show that our framework consistently outperforms baseline pairwise and clustering-based approaches (raising F-measure from 50-89% to 89-94%) and works uniformly well for the surveyed domains.