Machine learning models: combining evidence of similarity for XML schema matching

Authors:
Tran Hong-Minh;Dan Smith
Affiliations:
School of Computing Sciences, University of Of East Anglia, Norwich, UK;School of Computing Sciences, University of Of East Anglia, Norwich, UK
Venue:
KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents
Year:
2006

Citing 9
Cited 0

Automatic combination of multiple ranked retrieval systems

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Analyses of multiple evidence combination

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Predicting the performance of linearly combined IR systems

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Approximate String Matching

ACM Computing Surveys (CSUR)
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
System Fusion for Improving Performance in Information Retrieval Systems

ITCC '01 Proceedings of the International Conference on Information Technology: Coding and Computing
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Matching schemas at an element level or structural level is generally categorized as either hybrid, which uses one algorithm, or composite, which combines evidence from several different matching algorithms for the final similarity measure. We present an approach for combining element-level evidence of similarity for matching XML schemas with a composite approach. By combining high recall algorithms in a composite system we reduce the number of real matches missed. By performing experiments on a number of machine learning models for combination of evidence in a composite approach and choosing the SMO for the high precision and recall, we increase the reliability of the final matching results. The precision is therefore enhanced (e.g., with data sets used by Cupid and suggested by the author of LSD, our precision is respectively 13.05% and 31.55% higher than COMA and Cupid on average).