Schema Normalization for Improving Schema Matching

Authors:
Serena Sorrentino;Sonia Bergamaschi;Maciej Gawinecki;Laura Po
Affiliations:
ICT Doctorate School, University of Modena and Reggio Emilia, Italy;DII, University of Modena and Reggio Emilia, Italy;ICT Doctorate School, University of Modena and Reggio Emilia, Italy;DII, University of Modena and Reggio Emilia, Italy
Venue:
ER '09 Proceedings of the 28th International Conference on Conceptual Modeling
Year:
2009

Citing 13
Cited 13

Semantic integration of semistructured and structured data sources

ACM SIGMOD Record
Extracting Knowledge from Diagnostic Databases

IEEE Expert: Intelligent Systems and Their Applications
The disambiguation of nominalizations

Computational Linguistics
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
Comparison of Schema Matching Evaluations

Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
Synthesizing an Integrated Ontology

IEEE Internet Computing
Abbreviation Expansion in Schema Matching and Web Integration

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Schema and ontology matching with COMA++

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Ontology Matching

Ontology Matching
Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools

Proceedings of the 2008 international working conference on Mining software repositories
Learning noun-modifier semantic relations with corpus-based and WordNet-based features

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
The knowledge required to interpret noun compounds

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Interoperability by design using the StdTrip tool: an a priori approach

Proceedings of the 6th International Conference on Semantic Systems
Knowledge-based sense disambiguation (almost) for all structures

Information Systems
Using semantic techniques to access web data

Information Systems
Automatic generation of probabilistic relationships for improving schema matching

Information Systems
Automatic lexical annotation applied to the SCARLET ontology matcher

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part II
W-Ray: a strategy to publish deep web geographic data

ER'10 Proceedings of the 2010 international conference on Advances in conceptual modeling: applications and challenges
Automatic normalization and annotation for discovering semantic mappings

Search computing
A semantic approach to ETL technologies

Data & Knowledge Engineering
A framework for XML schema integration via conceptual model

WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
When conceptual model meets grammar: A dual approach to XML data modeling

Data & Knowledge Engineering
Structured data clouding across multiple webs

Information Systems
Thematic clustering and exploration of linked data

Search Computing
Schema decryption for large extract-transform-load systems

ER'12 Proceedings of the 31st international conference on Conceptual Modeling

Quantified Score

Hi-index	0.00

Visualization

Abstract

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the "hidden meaning" associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a "meaning" to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations. In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy.