A Technique for Extracting Sub-source Similarities from Information Sources Having Different Formats

Authors:
Domenico Rosaci;Giorgio Terracina;Domenico Ursino
Affiliations:
DIMET, Università “Mediterranea” di Reggio Calabria, Via Graziella, Località Feo di Vito, 89060 Reggio Calabria, Italy;Dipartimento di Matematica, Università della Calabria, Via P. Bucci, 87036 Rende (CS), Italy;DIMET, Università “Mediterranea” di Reggio Calabria, Via Graziella, Località Feo di Vito, 89060 Reggio Calabria, Italy
Venue:
World Wide Web
Year:
2003

Citing 24
Cited 1

Efficient algorithms for finding maximum matching in graphs

ACM Computing Surveys (CSUR)
Algorithms for clustering data

Algorithms for clustering data
A Theory of Attributed Equivalence in Databases with Application to Schema Integration

IEEE Transactions on Software Engineering
Explaining ambiguity in a formal query language

ACM Transactions on Database Systems (TODS)
Semantic vs. structural resemblance of classes

ACM SIGMOD Record
WordNet: a lexical database for English

Communications of the ACM
Research problems in data warehousing

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
A uniform methodology for extracting type conflicts and subscheme similarities from heterogeneous databases

Information Systems
Semantic integration of heterogeneous information sources

Data & Knowledge Engineering - Special issue on heterogeneous information resources need semantic access
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
System-Guided View Integration for Object-Oriented Databases

IEEE Transactions on Knowledge and Data Engineering
View Integration: A Step Forward in Solving Structural Conflicts

IEEE Transactions on Knowledge and Data Engineering
Global Viewing of Heterogeneous Data Sources

IEEE Transactions on Knowledge and Data Engineering
Uniform Techniques for Deriving Similarities of Objects and Subschemes in Heterogeneous Databases

IEEE Transactions on Knowledge and Data Engineering
A Graph-Based Approach For Extracting Terminological Properties of Elements of XML Documents

Proceedings of the 17th International Conference on Data Engineering
Using Schema Matching to Simplify Heterogeneous Data Translation

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Semantic Access: Semantic Interface for Querying Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
A multi-agent model for handling e-commerce activities

IDEAS '02 Proceedings of the 2002 International Symposium on Database Engineering & Applications
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
A Unified Graph-Based Framework for Deriving Nominal Interscheme Properties, Type Conflicts and Object Cluster Similarities

COOPIS '99 Proceedings of the Fourth IECIS International Conference on Cooperative Information Systems
MindNet: acquiring and structuring semantic information from text

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Data warehouse scenarios for model management

ER'00 Proceedings of the 19th international conference on Conceptual modeling

Agent clustering based on semantic negotiation

ACM Transactions on Autonomous and Adaptive Systems (TAAS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a semi-automatic technique for deriving the similarity degree between two portions of heterogeneous information sources (hereafter, sub-sources). The proposed technique consists in two phases: the first one selects the most promising pairs of sub-sources, whereas the second one computes the similarity degree relative to each promising pair. We show that the detection of sub-source similarities is a special case (and a very interesting one, for semi-structured information sources) of the more general problem of Scheme Match. In addition, we present a real example case to clarify the proposed technique, a set of experiments we have conducted to verify the quality of its results, a discussion about its computational complexity and its classification in the context of related literature. Finally, we discuss some possible applications which can benefit by derived similarities.