Tuning the ensemble selection process of schema matchers

Authors:
Avigdor Gal;Tomer Sagi
Affiliations:
Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel;Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel
Venue:
Information Systems
Year:
2010

Citing 52
Cited 6

A comparative analysis of methodologies for database schema integration

ACM Computing Surveys (CSUR)
The Strength of Weak Learnability

Machine Learning
Federated database systems for managing distributed, heterogeneous, and autonomous databases

ACM Computing Surveys (CSUR) - Special issue on heterogeneous databases
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks

Data & Knowledge Engineering
Semantic integration of heterogeneous information sources

Data & Knowledge Engineering - Special issue on heterogeneous information resources need semantic access
The Clio project: managing heterogeneity

ACM SIGMOD Record
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Learning to map between ontologies on the semantic web

Proceedings of the 11th international conference on World Wide Web
Data integration: a theoretical perspective

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Global Viewing of Heterogeneous Data Sources

IEEE Transactions on Knowledge and Data Engineering
Using output codes to boost multiclass learning problems

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Schema Mapping as Query Discovery

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
Autoplex: Automated Discovery of Content for Virtual Databases

CooplS '01 Proceedings of the 9th International Conference on Cooperative Information Systems
A Model Theory for Generic Schema Management

DBPL '01 Revised Papers from the 8th International Workshop on Database Programming Languages
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Representing and reasoning about mappings between domain models

Eighteenth national conference on Artificial intelligence
Rondo: a programming platform for generic model management

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Statistical schema matching across web query interfaces

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Meta Data Management

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Generic Model Management: Concepts And Algorithms (Lecture Notes in Computer Science)

Generic Model Management: Concepts And Algorithms (Lecture Notes in Computer Science)
Industrial-strength schema matching

ACM SIGMOD Record
A framework for modeling and evaluating automatic semantic reconciliation

The VLDB Journal — The International Journal on Very Large Data Bases
Information preserving XML schema embedding

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Designing information-preserving mapping schemes for XML

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Automatic ontology matching using application semantics

AI Magazine - Special issue on semantic integration
Integration Workbench: Integrating Schema Integration Tools

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Inverting schema mappings

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
eTuner: tuning schema matching software using synthetic scenarios

The VLDB Journal — The International Journal on Very Large Data Bases
Ontology Matching

Ontology Matching
Information retrieval and machine learning for probabilistic schema matching

Information Processing and Management: an International Journal
A composite approach to automating direct and indirect schema mappings

Information Systems
Rank Aggregation for Automatic Schema Matching

IEEE Transactions on Knowledge and Data Engineering
Quasi-inverses of schema mappings

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
An approach to XML path matching

Proceedings of the 9th annual ACM international workshop on Web information and data management
Data integration with uncertainty

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Managing Uncertainty in Schema Matcher Ensembles

SUM '07 Proceedings of the 1st international conference on Scalable Uncertainty Management
Domain-based data integration for web databases

Domain-based data integration for web databases
A Flexible Approach for Planning Schema Matching Algorithms

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
Advances in Ontology Matching

Advances in Web Semantics I
XML Schema Element Similarity Measures: A Schema Matching Context

OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
Bootstrapping ontology alignment methods with APFEL

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Semantic schema matching

OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
Holistic schema matching for web query interfaces

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Soundness of schema matching methods

ESWC'05 Proceedings of the Second European conference on The Semantic Web: research and Applications
A survey of schema-based matching approaches

Journal on Data Semantics IV
Managing uncertainty in schema matching with top-k schema mappings

Journal on Data Semantics VI
Performance oriented schema matching

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications

Rule-based construction of matching processes

Proceedings of the 20th ACM international conference on Information and knowledge management
Making sense of top-k matchings: a unified match graph for schema matching

Proceedings of the Ninth International Workshop on Information Integration on the Web
A differentor-based adaptive ontology-matching approach

Journal of Information Science
Non-binary evaluation for schema matching

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Actively soliciting feedback for query answers in keyword search-based data integration

Proceedings of the VLDB Endowment
Schema matching prediction with applications to data source discovery and dynamic ensembling

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. It is recognized to be one of the basic operations required by the process of data and schema integration and its outcome serves in many tasks such as targeted content delivery and view integration. Schema matching research has been going on for more than 25 years now. An interesting research topic, that was largely left untouched involves the automatic selection of schema matchers to an ensemble, a set of schema matchers. To the best of our knowledge, none of the existing algorithmic solutions offer such a selection feature. In this paper we provide a thorough investigation of this research topic. We introduce a new heuristic, Schema Matcher Boosting (SMB). We show that SMB has the ability to choose among schema matchers and to tune their importance. As such, SMB introduces a new promise for schema matcher designers. Instead of trying to design a perfect schema matcher, a designer can instead focus on finding better than random schema matchers. For the effective utilization of SMB, we propose a complementary approach to the design of new schema matchers. We separate schema matchers into first-line and second-line matchers. First-line schema matchers were designed by-and-large as applications of existing works in other areas (e.g., machine learning and information retrieval) to schemata. Second-line schema matchers operate on the outcome of other schema matchers to improve their original outcome. SMB selects matcher pairs, where each pair contains a first-line matcher and a second-line matcher. We run a thorough set of experiments to analyze SMB ability to effectively choose schema matchers and show that SMB performs better than other, state-of-the-art ensemble matchers.