Web Service aggregation with string distance ensembles and active probe selection

Authors:
Eddie Johnston;Nicholas Kushmerick
Affiliations:
School of Computer Science and Informatics, University College Dublin, Belfield, Dublin, Ireland;School of Computer Science and Informatics, University College Dublin, Belfield, Dublin, Ireland
Venue:
Information Fusion
Year:
2008

Citing 14
Cited 2

Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Issues and approaches of database integration

Communications of the ACM
SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks

Data & Knowledge Engineering
Learning to Match the Schemas of Data Sources: A Multistrategy Approach

Machine Learning
Information Integration: The MOMIS Project Demonstration

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
Value-added Mediation in Large-Scale Information Systems

DS-6 Proceedings of the Sixth IFIP TC-2 Working Conference on Data Semantics: Database Applications Semantics
Learning to map between structured representations of data

Learning to map between structured representations of data
DIKE: a system supporting the semi-automatic construction of cooperative information systems from heterogeneous databases

Software—Practice & Experience
Translating web data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Category translation: learning to understand information on the internet

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Query-answering algorithms for information agents

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
GSMA: a structural matching algorithm for schema matching in data warehousing

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II

Finding top-k similar pairs of objects annotated with terms from an ontology

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
WS-Aggregation: distributed aggregation of web services data

Proceedings of the 2011 ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The adoption of standards for exchanging information across the Web presents both new opportunities and important challenges for data integration and aggregation. Although Web Services simplify the discovery and access of information sources, the problem of semantic heterogeneity remains: how to find semantic correspondences across the data being integrated. In this paper, we explore these issues in the context of Web Services, and propose OATS, a novel algorithm for schema matching that is specifically suited to Web Service data aggregation. We show how probing Web Services with a small set of related queries results in semantically correlated data instances which greatly simplifies the matching process, and demonstrate that the use of an ensemble of string distance metrics in matching data instances performs better than individual metrics. We also show how the choice of probe queries has a dramatic effect on matching accuracy. Motivated by this observation, we describe and evaluate an machine learning approach to selecting probes to maximise accuracy while minimising cost.