HAMSTER: using search clicklogs for schema and taxonomy matching

Authors:
Arnab Nandi;Philip A. Bernstein
Affiliations:
University of Michigan, Ann Arbor;Microsoft Research
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 27
Cited 4

Integration of heterogeneous databases without common domains using queries based on textual similarity

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Earth Mover's Distance as a Metric for Image Retrieval

International Journal of Computer Vision
On integrating catalogs

Proceedings of the 10th international conference on World Wide Web
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Content integration for e-business

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Query clustering using user logs

ACM Transactions on Information Systems (TOIS)
Data integration: a theoretical perspective

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Semantic integration: a survey of ontology-based approaches

ACM SIGMOD Record
Corpus-Based Schema Matching

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Schema and ontology matching with COMA++

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Tuning schema matching software using synthetic scenarios

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Semantic-integration research in the database community

AI Magazine - Special issue on semantic integration
Integration Workbench: Integrating Schema Integration Tools

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Microformats: a pragmatic path to the semantic web

Proceedings of the 15th international conference on World Wide Web
Data management projects at Google

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Incremental schema matching

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Ontology Matching

Ontology Matching
Unsupervised query segmentation using generative language models and wikipedia

Proceedings of the 17th international conference on World Wide Web
Introduction to special issue on query log analysis: Technology and ethics

ACM Transactions on the Web (TWEB)
Analyzing and revising data integration schemas to improve their matchability

Proceedings of the VLDB Endowment
Usage-Based Schema Matching

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Matching Schemas in Online Communities: A Web 2.0 Approach

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Query recommendation using query logs in search engines

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology

Graph-based search over web application model repositories

ICWE'11 Proceedings of the 11th international conference on Web engineering
Sample-driven schema mapping

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields

ACM Transactions on Database Systems (TODS)
Semantic similarity measurement using historical google search patterns

Information Systems Frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of unsupervised matching of schema information from a large number of data sources into the schema of a data warehouse. The matching process is the first step of a framework to integrate data feeds from third-party data providers into a structured-search engine's data warehouse. Our experiments show that traditional schema-based and instance-based schema matching methods fall short. We propose a new technique based on the search engine's clicklogs. Two schema elements are matched if the distribution of keyword queries that cause click-throughs on their instances are similar. We present experiments on large commercial datasets that show the new technique has much better accuracy than traditional techniques.