Boosting Schema Matchers

Authors:
Anan Marie;Avigdor Gal
Affiliations:
Technion --- Israel Institute of Technology, Israel 32000;Technion --- Israel Institute of Technology, Israel 32000
Venue:
OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
Year:
2008

Citing 24
Cited 8

A comparative analysis of methodologies for database schema integration

ACM Computing Surveys (CSUR)
The Strength of Weak Learnability

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Managing semantic heterogeneity in databases: a theoretical prospective

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks

Data & Knowledge Engineering
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Using output codes to boost multiclass learning problems

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
Autoplex: Automated Discovery of Content for Virtual Databases

CooplS '01 Proceedings of the 9th International Conference on Cooperative Information Systems
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Rondo: a programming platform for generic model management

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Generic Model Management: Concepts And Algorithms (Lecture Notes in Computer Science)

Generic Model Management: Concepts And Algorithms (Lecture Notes in Computer Science)
A framework for modeling and evaluating automatic semantic reconciliation

The VLDB Journal — The International Journal on Very Large Data Bases
Automatic ontology matching using application semantics

AI Magazine - Special issue on semantic integration
Integration Workbench: Integrating Schema Integration Tools

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
eTuner: tuning schema matching software using synthetic scenarios

The VLDB Journal — The International Journal on Very Large Data Bases
Information retrieval and machine learning for probabilistic schema matching

Information Processing and Management: an International Journal
A composite approach to automating direct and indirect schema mappings

Information Systems
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Managing Uncertainty in Schema Matcher Ensembles

SUM '07 Proceedings of the 1st international conference on Scalable Uncertainty Management
Domain-based data integration for web databases

Domain-based data integration for web databases
Bootstrapping ontology alignment methods with APFEL

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
A survey of schema-based matching approaches

Journal on Data Semantics IV

YAM: a schema matcher factory

Proceedings of the 18th ACM conference on Information and knowledge management
Element similarity measures in XML schema matching

Information Sciences: an International Journal
Measuring the quality of an integrated schema

ER'10 Proceedings of the 29th international conference on Conceptual modeling
Ontology alignment evaluation initiative: six years of experience

Journal on data semantics XV
A distance function for ontology concepts using extension of attributes' semantics

ICCCI'11 Proceedings of the Third international conference on Computational collective intelligence: technologies and applications - Volume Part I
A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

Web Semantics: Science, Services and Agents on the World Wide Web
Minimizing user effort in XML grammar matching

Information Sciences: an International Journal
Automatic configuration selection using ontology matching task profiling

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Schema matching is recognized to be one of the basic operations required by the process of data and schema integration, and thus has a great impact on its outcome. We propose a new approach to combining matchers into ensembles, called Schema Matcher Boosting (SMB ). This approach is based on a well-known machine learning technique, called boosting. We present a boosting algorithm for schema matching with a unique ensembler feature, namely the ability to choose the matchers that participate in an ensemble. SMB introduces a new promise for schema matcher designers. Instead of trying to design a perfect schema matcher that is accurate for all schema pairs, a designer can focus on finding better than random schema matchers. We provide a thorough comparative empirical results where we show that SMB outperforms, on average, any individual matcher. In our experiments we have compared SMB with more than 30 other matchers over a real world data of 230 schemata and several ensembling approaches, including the Meta-Learner of LSD. Our empirical analysis shows that SMB improves, on average, over the performance of individual matchers. Moreover, SMB is shown to be consistently dominant, far beyond any other individual matcher. Finally, we observe that SMB performs better than the Meta-Learner in terms of precision, recall and F-Measure.