Statistical schema matching across web query interfaces

Authors:
Bin He;Kevin Chen-Chuan Chang
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Year:
2003

Citing 9
Cited 102

A comparative analysis of methodologies for database schema integration

ACM Computing Surveys (CSUR)
A Theory of Attributed Equivalence in Databases with Application to Schema Integration

IEEE Transactions on Software Engineering
Integration of heterogeneous databases without common domains using queries based on textual similarity

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Introduction to Algorithms

Introduction to Algorithms
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
A Methodology for View Inegration in Logical Database Design

VLDB '82 Proceedings of the 8th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases

An interactive clustering-based approach to integrating source query interfaces on the deep Web

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Understanding Web query interfaces: best-effort parsing with hidden syntax

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Knocking the door to the deep Web: integrating Web query interfaces

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Mining complex matchings across Web query interfaces

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Searching databases for sematically-related schemas

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Discovering complex matchings across web query interfaces: a correlation mining approach

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Abbreviation Expansion in Schema Matching and Web Integration

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Organizing structured web sources by query schemas: a clustering approach

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Structured databases on the web: observations and implications

ACM SIGMOD Record
Introduction to the special issue on semantic integration

ACM SIGMOD Record
A holistic paradigm for large scale schema matching

ACM SIGMOD Record
Editorial: special issue on web content mining

ACM SIGKDD Explorations Newsletter
Mining structures for semantics

ACM SIGKDD Explorations Newsletter
Mining semantics for large scale integration on the web: evidences, insights, and challenges

ACM SIGKDD Explorations Newsletter
Corpus-Based Schema Matching

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Towards Building a MetaQuerier: Extracting and Matching Web Query Interfaces

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Downloading textual hidden web content through keyword queries

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
MetaQuerier: querying structured web sources on-the-fly

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Making holistic schema matching robust: an ensemble approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Light-weight domain-based form assistant: querying web databases on the fly

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Tuning schema matching software using synthetic scenarios

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Mapping maintenance for data integration systems

VLDB '05 Proceedings of the 31st international conference on Very large data bases
WISE-Integrator: a system for extracting and integrating complex web search interfaces of the deep web

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Semantic-integration research in the database community

AI Magazine - Special issue on semantic integration
Merging Interface Schemas on the Deep Web via Clustering Aggregation

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Automatic complex schema matching across Web query interfaces: A correlation mining approach

ACM Transactions on Database Systems (TODS)
Automatic structured query transformation over distributed digital libraries

Proceedings of the 2006 ACM symposium on Applied computing
Principles of dataspace systems

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Integration of XML schemas at various "severity" levels

Information Systems
Dealing with semantic heterogeneity for improving web usage

Data & Knowledge Engineering - Special issue: ER 2004
Data integration: the teenage years

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
eTuner: tuning schema matching software using synthetic scenarios

The VLDB Journal — The International Journal on Very Large Data Bases
QMatch - Using paths to match XML schemas

Data & Knowledge Engineering
Clustering e-commerce search engines based on their search interface pages using WISE-cluster

Data & Knowledge Engineering - Special issue: WIDM 2004
A composite approach to automating direct and indirect schema mappings

Information Systems
Combining classifiers to identify online databases

Proceedings of the 16th international conference on World Wide Web
An adaptive crawler for locating hidden-Web entry points

Proceedings of the 16th international conference on World Wide Web
Matching large schemas: Approaches and evaluation

Information Systems
Query relaxation using malleable schemas

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Rank Aggregation for Automatic Schema Matching

IEEE Transactions on Knowledge and Data Engineering
Towards Deeper Understanding of the Search Interfaces of the Deep Web

World Wide Web
Combining Description Logics with synopses for inferring complex knowledge patterns from XML sources

Information Systems
Wise-integrator: an automatic integrator of web search interfaces for E-commerce

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Structures, semantics and statistics

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Instance-based schema matching for web databases by domain-specific query probing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Randomized algorithms for data reconciliation in wide area aggregate query processing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
Towards a global schema for web entities

Proceedings of the 17th international conference on World Wide Web
Bootstrapping pay-as-you-go data integration systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Schema Matching across Query Interfaces on the Deep Web

BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
Efficient Top-k Data Sources Ranking for Query on Deep Web

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Learning to extract form labels

Proceedings of the VLDB Endowment
Integrating web query results: holistic schema matching

Proceedings of the 17th ACM conference on Information and knowledge management
Supporting the automatic construction of entity aware search engines

Proceedings of the 10th ACM workshop on Web information and data management
Querying structured information sources on the web

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Web-scale extraction of structured data

ACM SIGMOD Record
Query generation for retrieving data from distributed semistructured documents using a metadata interface

Computer Languages, Systems and Structures
A Prioritized Collective Selection Strategy for Schema Matching across Query Interfaces

BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
Data Modeling in Dataspace Support Platforms

Conceptual Modeling: Foundations and Applications
Site-Wide Wrapper Induction for Life Science Deep Web Databases

DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
Deriving Customized Integrated Web Query Interfaces

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
An empirical study on using hidden markov model for search interface segmentation

Proceedings of the 18th ACM conference on Information and knowledge management
An evidential approach to query interface matching on the deep Web

Information Systems
Stop word and related problems in web interface integration

Proceedings of the VLDB Endowment
Indexing relations on the web

Proceedings of the 13th International Conference on Extending Database Technology
Parsing query interfaces of deep web: from specialization to generalization

IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Association pattern mining for product specification integration

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 2
Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Querying structured information sources on the Web

International Journal of Metadata, Semantics and Ontologies
Tuning the ensemble selection process of schema matchers

Information Systems
Understanding deep web search interfaces: a survey

ACM SIGMOD Record
Web database schema identification through simple query interface

RED'09 Proceedings of the 2nd international conference on Resource discovery
Instance discovery and schema matching with applications to biological deep web data integration

DILS'10 Proceedings of the 7th international conference on Data integration in the life sciences
Double-layered schema integration of heterogeneous XML sources

Journal of Systems and Software
Materializing multi-relational databases from the web using taxonomic queries

Proceedings of the fourth ACM international conference on Web search and data mining
On-line web database integration

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
A query interface matching approach based on extended evidence theory for deep web

Journal of Computer Science and Technology
ETTA-IM: A deep web query interface matching approach based on evidence theory and task assignment

Expert Systems with Applications: An International Journal
Attribute domain discovery for hidden web databases

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Deep web integrated systems: current achievements and open issues

Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Clustering-based schema matching of web data for constructing digital library

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
Constructing interface schemas for search interfaces of web databases

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Holistic schema matching for web query interfaces

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Clustering structured web sources: a schema-based, model-differentiation approach

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
sPLMap: a probabilistic approach to schema matching

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Automatic generation of data types for classification of deep web sources

DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Automatically grounding semantically-enriched conceptual models to concrete web services

ER'05 Proceedings of the 24th international conference on Conceptual Modeling
A novel clustering-based approach to schema matching

ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems
Chapter 6: web data extraction for service creation

Search Computing
Information retrieval from distributed semistructured documents using metadata interface

KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents
InfoGather: entity augmentation and attribute discovery by holistic matching with web tables

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Extracting widget descriptions from GUIs

FASE'12 Proceedings of the 15th international conference on Fundamental Approaches to Software Engineering
Optimal algorithms for crawling a hidden database in the web

Proceedings of the VLDB Endowment
Learning to discover complex mappings from web forms to ontologies

Proceedings of the 21st ACM international conference on Information and knowledge management
Identifying and weighting integration hypotheses on open data platforms

Proceedings of the First International Workshop on Open Data
E-FFC: an enhanced form-focused crawler for domain-specific deep web databases

Journal of Intelligent Information Systems
Towards a More Scalable Schema Matching: A Novel Approach

International Journal of Distributed Systems and Technologies
Deep Web Information Retrieval Process: A Technical Survey

International Journal of Information Technology and Web Engineering
Assessing relevance and trust of the deep web sources and results based on inter-source agreement

ACM Transactions on the Web (TWEB)
Publish-time data integration for open data platforms

Proceedings of the 2nd International Workshop on Open Data
Schema matching prediction with applications to data source discovery and dynamic ensembling

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding pairwise-attribute correspondence. This paper proposes a different approach, motivated by integrating large numbers of data sources on the Internet. On this "deep Web," we observe two distinguishing characteristics that offer a new view for considering schema matching: First, as the Web scales, there are ample sources that provide structured information in the same domains (e.g., books and automobiles). Second, while sources proliferate, their aggregate schema vocabulary tends to converge at a relatively small size. Motivated by these observations, we propose a new paradigm, statistical schema matching: Unlike traditional approaches using pairwise-attribute correspondence, we take a holistic approach to match all input schemas by finding an underlying generative schema model. We propose a general statistical framework MGS for such hidden model discovery, which consists of hypothesis modeling, generation, and selection. Further, we specialize the general framework to develop Algorithm MGSsd, targeting at synonym discovery, a canonical problem of schema matching, by designing and discovering a model that specifically captures synonym attributes. We demonstrate our approach over hundreds of real Web sources in four domains and the results show good accuracy.