Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
A Data Transformation System for Biological Data Sources
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Semi-Automatic Wrapper Generation for Internet Information Sources
COOPIS '97 Proceedings of the Second IFCIS International Conference on Cooperative Information Systems
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Integration of biological sources: current systems and challenges ahead
ACM SIGMOD Record
VLDB '05 Proceedings of the 31st international conference on Very large data bases
WebIQ: Learning from the Web to Match Deep-Web Query Interfaces
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Instance-based schema matching for web databases by domain-specific query probing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Bootstrapping pay-as-you-go data integration systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Query Planning for Searching Inter-dependent Deep-Web Databases
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
This paper presents data mining-based techniques for enabling data integration across deep web data sources. We target query processing across inter-dependent data sources. Thus, besides input-input and output-output matching of attributes, we also need to consider input-output matching. We develop data mining techniques for discovering the instances for querying deep web data sources from the information provided by the query interfaces themselves, as well as from the obtained output pages of the related data sources, by query probing using dynamically identified input instances. Then, using a hierarchical representation of schemas and by applying clustering techniques, we are able to generate schema matches. We show the effectiveness of our technique while integrating 24 query interfaces.