Database techniques for the World-Wide Web: a survey
ACM SIGMOD Record
The Clio project: managing heterogeneity
ACM SIGMOD Record
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
MedMaker: A Mediation System Based on Declarative Specifications
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Information Integration Using Logical Views
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Discovering complex matchings across web query interfaces: a correlation mining approach
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Structured databases on the web: observations and implications
ACM SIGMOD Record
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Wise-integrator: an automatic integrator of web search interfaces for E-commerce
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Site-Wide Wrapper Induction for Life Science Deep Web Databases
DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
The ontological key: automatically understanding and integrating forms to access the deep Web
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
The Web has been rapidly "deepened" -- with myriad searchable databases online, where data are hidden behind query interfaces. Toward large scale integration over this "deep Web," we are facing a new challenge- With its dynamic and ad-hoc nature, such large scale integration mandates dynamic semantics discovery. That is, we must on-the-fly cope with "semantics" of dynamically discovered sources without pre-configured source-specific knowledge. To tackle this challenge, our initial works hinge on the insight that the large scale is itself also a unique opportunity: We observe that the desired "semantics" often connects to surface presentation characteristics, through some hidden regularities over many sources. Such regularities can be essentially leveraged in enabling semantics discovery. In particular, we report our evidences in three initial tasks for integrating the deep Web: interface extraction, schema matching, and query translation. Generalizing these specific evidences, we thus propose our "unified insight" of "mining" semantics for large scale integration by exploiting hidden regularities across holistic sources. Further, to fulfill the promise of such holistic mining, we discuss challenges toward its realization for dynamic semantics discovery. As our initial works as well as several related efforts have witnessed, we believe our unified insight, holistic mining for semantics discovery, is a promising methodology toward enabling large scale integration.