Schema label normalization for improving schema matching

Authors:
Serena Sorrentino;Sonia Bergamaschi;Maciej Gawinecki;Laura Po
Affiliations:
-;-;-;-
Venue:
Data & Knowledge Engineering
Year:
2010

Citing 25
Cited 4

Semantic integration of semistructured and structured data sources

ACM SIGMOD Record
Extracting Knowledge from Diagnostic Databases

IEEE Expert: Intelligent Systems and Their Applications
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
Comparison of Schema Matching Evaluations

Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Using Compression to Identify Acronyms in Text

DCC '00 Proceedings of the Conference on Data Compression
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Synthesizing an Integrated Ontology

IEEE Internet Computing
Semi-automatic recognition of noun modifier relationships

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Abbreviation Expansion in Schema Matching and Web Integration

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Schema and ontology matching with COMA++

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Bootstrapping semantics on the web: meaning elicitation from schemas

Proceedings of the 15th international conference on World Wide Web
Ontology Matching

Ontology Matching
Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools

Proceedings of the 2008 international working conference on Mining software repositories
Interactive generation of integrated schemas

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Analyzing and revising data integration schemas to improve their matchability

Proceedings of the VLDB Endowment
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Models for the semantic classification of noun phrases

CLS '04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics
Learning noun-modifier semantic relations with corpus-based and WordNet-based features

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
The knowledge required to interpret noun compounds

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Stop word and related problems in web interface integration

Proceedings of the VLDB Endowment
Matching ontologies in open networked systems: techniques and applications

Journal on Data Semantics V
Automatic interpretation of noun compounds using wordnet similarity

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

CITOM: An incremental construction of multilingual topic maps

Data & Knowledge Engineering
A supervised method for lexical annotation of schema labels based on wikipedia

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Partial multi-dimensional schema merging in heterogeneous data warehouses

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Building linked ontologies with high precision using subclass mapping discovery

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources that are heterogeneous in format and in structure. Starting from the ''hidden meaning'' associated with schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a ''meaning'' to schema labels. However, the performance of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns, abbreviations, and acronyms. We address this problem by proposing a method to perform schema label normalization which increases the number of comparable labels. The method semi-automatically expands abbreviations/acronyms and annotates compound nouns, with minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching results.