A comparative analysis of methodologies for database schema integration
ACM Computing Surveys (CSUR)
Machine Learning
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
On schema matching with opaque column names and data values
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
iMAP: discovering complex semantic matches between database schemas
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Mining complex matchings across Web query interfaces
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Automatic extraction of web search interfaces for interface schema integration
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Structured databases on the web: observations and implications
ACM SIGMOD Record
Wise-integrator: an automatic integrator of web search interfaces for E-commerce
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Automatic complex schema matching across Web query interfaces: A correlation mining approach
ACM Transactions on Database Systems (TODS)
Why is schema matching tough and what can we do about it?
ACM SIGMOD Record
Rank Aggregation for Automatic Schema Matching
IEEE Transactions on Knowledge and Data Engineering
Managing Uncertainty in Schema Matcher Ensembles
SUM '07 Proceedings of the 1st international conference on Scalable Uncertainty Management
Advances in Web Semantics I
Association pattern mining for product specification integration
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 2
Automatic wrappers for large scale web extraction
Proceedings of the VLDB Endowment
Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
Hi-index | 0.00 |
The Web has been rapidly "deepened" by myriad searchable databases online, where data are hidden behind query interfaces. As an essential task toward integrating these massive "deep Web" sources, large scale schema matching (i.e., discovering semantic correspondences of attributes across many query interfaces) has been actively studied recently. In particular, many works have emerged to address this problem by "holistically" matching many schemas at the same time and thus pursuing "mining" approaches in nature. However, while holistic schema matching has built its promise upon the large quantity of input schemas, it also suffers the robustness problem caused by noisy data quality. Such noises often inevitably arise in the automatic extraction of schema data, which is mandatory in large scale integration. For holistic matching to be viable, it is thus essential to make it robust against noisy schemas. To tackle this challenge, we propose a data-ensemble framework with sampling and voting techniques, which is inspired by bagging predictors. Specifically, our approach creates an ensemble of matchers, by randomizing input schema data into many independently downsampled trials, executing the same matcher on each trial and then aggregating their ranked results by taking majority voting. As a principled basis, we provide analytic justification of the effectiveness of this data-ensemble framework. Further, empirically, our experiments on real Web data show that the "ensemblization" indeed significantly boosts the matching accuracy under noisy schema input, and thus maintains the desired robustness of a holistic matcher.